Procedure: To develop an index model, use prior knowledge (i.e., prior empirical evidence or expert domain knowledge) to prepare a list of predictor variables. There is no limit on the number of predictor variables that can be used. Specify for each variable whether it has a positive or negative influence on the outcome variable, or define the variable in such a way that it has a positive influence.
Alternatively and more simply, score 1 for a positive influence and zero otherwise. (When there are only two choices, one can also pick the option that dominates.) Finally, add the scores to calculate the value of the index. The higher the index value, the more likely it is that the outcome will occur such as the better the job candidate will perform on the job, the more a country’s economy will grow, or the more profitable a movie will be.
Conditions: A forecasting method suitable for situations in which there are many important variables, much prior knowledge about causal relationships, and little need for precision in the estimates of the relationships. These conditions are common in selection problems such as political and job candidates, sites for a retail outlets, marriage partners, advertisements. Forecasts can be made about the relative performance of alternatives. Where sufficient historical data are available on a quantitative dependent variable and causal variable values can be assessed, a model estimated by simple linear regression against index scores can be used to produce quantitative forecasts (e.g., the percentage vote-share of candidates in an election).
The index method is useful if there are many important variables, good prior knowledge on the direction of the effect, and the values of causal variables can be assessed at least subjectively (e.g., as zero or one). The method is especially useful if a large number of causal variables are important, and valid and reliable quantitative data are scarce relative to the number of variables. The method easily accommodates new causal variables should they arise. Its primary disadvantage is that it is difficult to estimate effect sizes.
Early use: Burgess (1939) described the use of the index method for predicting the success of paroling individuals from prison. Based on a list of 25 factors, which were rated either “favorable” (+1) or “unfavorable” (0), an index score was calculated for each individual to determine the chance of successful parole. This approach was questioned since Burgess (1939) did not assess the relative importance of different variables and no consideration was given to their magnitude (i.e. how favorable the ratings were). However, when addressing these concerns, Gough (1962) did not obtain more accurate parole predictions.
Advantages: Since the index method does not estimate weights from the data, the issue of sample size is not relevant. The index method is not limited in the number of variables that one can incorporate in the model. Different variables can be used when forecasting new events. These are important advantages of the index method as it allows for using all cumulative knowledge in a domain. Index models can be viewed as “knowledge models”.
Performance: Armstrong and Graefe (2010) used an index of 59 variables to capture biographical information about candidates in U.S. presidential elections. In each election, the candidate with the higher “bio-index” score was predicted to win. This approach correctly predicted 27 of the 29 elections from 1896 to 2008. By using simple linear regression to relate the index scores to the popular vote, the “bio-index” model yielded more accurate out-of-sample forecasts than seven econometric models for the last four elections from 1996 to 2008.
Related work: The index method is based on the idea of unit weighting. Many empirical studies have analyzed the relative performance of unit weighting and multiple regression. The difference is that these studies analyze the relative performance of the two approaches for the same variables and the same data set.
Einhorn & Hogarth (1975) compared unit weighting and multiple regression for selection problems. For such problems, unit-weighting outperformed regression when the sample was small and the number of—and inter-correlation among—predictor variables was high. Empirical studies support this finding. In analyzing published data in the domain of applied psychology, Schmidt (1971) found regression to be less accurate than unit weighting. In a review of the literature, Armstrong (1985, p.230) found regression to be slightly more accurate in three studies (for academic performance, personnel selection, and medicine) but less accurate in five (three on academic performance, and one each on personnel selection and psychology). Czerlinski et al. (1999) compared the methods for 20 prediction problems (including psychological, economic, environmental, biological, and health problems), for which the number of variables varied between 3 and 19. Most of these examples were taken from statistical textbooks where they were being used to demonstrate the application of multiple regression analysis. The authors reported that unit-weighting produced out-of-sample forecasts that were more accurate.
Armstrong, J. S. (1985). Long-range forecasting: From crystal ball to computer, New York: John Wiley.
Armstrong, J. S. & Graefe, A. (in press). Predicting elections from biographical information about candidates, Journal of Business Research (forthcoming).
Burgess, E. W. (1939). Predicting success or failure in marriage, New York: Prentice-Hall.
Czerlinski, J., Gigerenzer, G. & Goldstein, D. G. (1999). How good are simple heuristics? In: G.
Gigerenzer, G. & Todd, P. M. (Eds.), Simple heuristics that make us smart, Oxford University Press, pp. 97-118.
Gough, H. G. (1962). Clinical versus statistical prediction in psychology. In: L. Postman (Eds.), Psychology in the making. New York; Knopf, pp. 526-584.
Einhorn, H. J. & Hogarth, R. M. (1975). Unit weighting schemes for decision-making, Organizational Behavior & Human Performance, 13, 171-192.
Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology, Educational and Psychological Measurement, 31, 699-714.