Exam results:
The range was from 26% to 74% with an average of 52%.
Three students were over 70%, three were below 32%, and the others were evenly spread over the range.

The classes were interactive and fun. The person with the 26% had gone out of her way early in the course to say how interested she was and how hard she was working in the course.


Back to Self-administered Tests of Forecasting Knowledge

E-mail a colleague about this learning tool using the email link above

Self-administered Forecasting Exams

There are two purposes to the self-administered exams. First, your can use them to guide your own learning. Second, if you are teaching a course that addresses any of these areas, you can use the exams to help students to learn the relevant material, and also to grade them on how much they learned. These two uses are discussed here:

Self-directed learning program

The self-administered tests allow you to conduct your own learning program. The preparation should put you in a good position to learn about important evidence-based findings related to forecasting. Many of these findings are not intuitively obvious.


One way to prepare for the self-administered exams is to first study the recommended preparation materials for a given topic. Then find a learning partner. Each of you would then complete the test. 

Your partner would grade your exam and provide feedback as to what percentage of the material you have mastered. This allows you to see if you understand the material well enough to explain it to another person.

An alternative approach is to take the exam prior to reading the preparation materials. Then grade your exam and then read the materials. This approach is frustrating, but it motivates people to relieve the frustration by studying the relevant aspects of the readings.

Still another approach is the read the questions and then try to memorize the answers. This is the low-frustration approach. On the negative side, this type of learning will not stick with you very long.

Exams to be used in courses; or, “Steal this exam!”

Instructors can assign the self-administered exams to students as learning tasks. Interestingly, they can then use the exact same questions on an end-of-course exam. How can this be? Isn’t it akin to stealing the exam?


In 2010, Scott Armstrong prepared a battery of 130 open-ended questions for a course on forecasting. The material related to the questions was discussed in the lectures and in related readings, and students were urged to apply these findings on their projects. The students received the questions well in advance of the final exam. They were advised to work with a learning partner. The exam consisted only of questions from this battery. Now get this: the answers were also provided. 

The questions all relate to evidence-based findings, and they went beyond every-day knowledge. Thus, someone who had not studied the material would score around zero. So what do you think was the average exam grade and the range of scores in this class of 11 students? 

Click to see the exam results.

Questions:

_____________________________________________________________

Judgmental Bootstrapping

Extrapolation Methods

Intentions and Expectations

Combining

Evaluating Forecasting Methods

Prepared by J. Scott Armstrong

April 3, 2010

  1. You would like to select forecasting software. How might you assess commercial software programs for forecasting, other than reading reviews and asking for references? List in order of importance.

a) request full disclosure of the methods

b) examine to what extent the packages adhere to the forecasting principles

c) request results and full disclosure about previous testing of the methods

d) test for reliability using same methods from different packages to forecast a given set of data


  1. What error measure would you use for:

a) selecting the most accurate method?

Median Relative Absolute Error (MdRAE)

b) identifying the series that involve the most important errors?

Mean Absolute Deviation (MAE) in consideration with the cost of errors, and perhaps Mean Error, including signs (to assess bias)

c) estimating prediction intervals?

For series with constant elasticities (common for economic series), assume the errors are symmetric and estimate prediction intervals using a log-log model. Then take anti-logs to express the prediction intervals in real units.

  1. When should you use R-square (or r) as a measure of predictive power?

• Never for time series.

• For cross-sectional data, r may be appropriate as a rough measure of predictability, especially for a hold-out sample.

  1. When should you use statistical significance? And why?

Never. There are many better ways to estimate uncertainty and no experimental evidence has been provided to show that it leads to better decisions.

  1. When are ex post (conditional) tests useful?

When testing whether the effects of certain policies were accurately predicted.

  1. When should you use Root Mean Square Error (RMSE) in forecasting?

Experimental studies show that it is unreliable and it is very difficult to see how it relates to decision-making.

  1. A forecasting software firm claims that their methods give short-term sales forecasts that are accurate within 3% of the true value. Would these be accurate forecasts?

If a firm made such a claim, I would eliminate them immediately. The key question is how well do other methods do in the same situation. In addition, one must specify what was being forecast and what were the time horizons. You would need assurance that the forecasts were ex ante, that they were replicated, that they were conducted by and independent third party, and that any potential sources of bias were revealed. In some cases one might turn to benchmark errors, but few of these have been published (see on the Practitioners’ Page of forprin.com) and it is difficult to match your problem to the benchmarks.

  1. You have developed a model to forecast the batting averages of a set of baseball players. You estimated the model based it on a sample of 90 players. The client tells you that he heard that measures of fit are not indicative of true accuracy levels. He asks you to provide out-of-sample forecasts. How would you do that? How many out-of sample forecasts would you use? Do you know the name of this procedure?

You can get 90 out-of-sample forecasts by excluding an observation, developing the model with 89 observations then predicting the hold-out observation. You then replace the observation and remove other as the hold-out. Continue until each of the observations has served as the hold-out, This is known as the jackknife procedure.

For further study, see “Evaluating Forecasting Methods” in J. S. Armstrong (2001), Principles of Forecasting.

Prepared by J. Scott Armstrong

April 4, 2010

  1. You would like to select forecasting software. How might you assess commercial software programs for forecasting, other than reading reviews and asking for references? List in order of importance.

  1. What error measure would you use for:

a) selecting the most accurate method?

b) identifying the series that involve the most important errors?

c) estimating prediction intervals?

  1. When should you use R-square (or r) as a measure of predictive power?

  1. When should you use statistical significance? And why?

  1. When are ex post (conditional) tests useful?

  1. When should you use Root Mean Square Error (RMSE) in forecasting?

  1. A forecasting software firm claims that heir methods give short-term sales forecasts that are accurate within 3% of the true value. Would these be accurate forecasts?

  1. You have developed a model to forecast the batting averages of a set of baseball players. You estimated the model based it on a sample of 90 players. The client tells you that he heard that measures of fit are not indicative of true accuracy levels. He asks you to provide out-of-sample forecasts. How would you do that? How many out-of sample forecasts would you use? Do you know the name of this procedure?

For further study, see “Evaluating Forecasting Methods” in J. S. Armstrong (2001), Principles of Forecasting.


click here for answers

Prepared by J. Scott Armstrong: April 3, 2010

 

  1. When should you exclude historical data from a time series extrapolation? 

Only when you have strong evidence that there were substantial changes from the current situation. For example, the way in which the data were collected may have been altered substantially or there might have been substantial changes in the definitions. You can substitute some type of average for the missing observation.

 

  1. When is it inappropriate to use seasonal factors? 

• If you cannot develop a good rationale that the seasonal fluctuations have a causal explanation (e.g., as with stock market data)

• If you lack sufficient data (especially important when the causal forces are weak). You would typically need many years of data to get useful seasonal factors. 

 

  1. How would you estimate seasonal factors when you have a few years of volatile data and where you have some expectation that the behavior is seasonal (such as grass seed or snow shovels)? 

I would use damped seasonal factors. (See the Miller-Williams freeware at forecastingprinciples.com,) 

 

  1. Extrapolation errors might occur because you do not have a good estimate of the current level, say as with sales for an item. How would you reduce errors due to a poor estimate of the level? 

Use alternative ways to estimate the level (e.g., regression, exponential smoothing, naïve, judgmental), and then combine them using equal weights unless you have strong evidence. 

 

  1. What procedures should you use to extrapolate a trend when there is uncertainty about the trend estimate? 

• Use damped trend (or combine the trend forecast with a naïve – no change--forecast).

• Find estimates of other time series subject to similar causal forces and obtain trends expressed in percentage terms. Combine these with the percentage trend in the series of interest.

  

  1. When are nonlinear methods useful for extrapolation?  

Use only when the expected behavior is known to follow a non-linear function. For example, logged data are typically used to represent economic data that grow in percentage terms.

 

  1. When is it appropriate to use cycles for annual data? 

When there are well-defined events taking place at known times and the events have a strong impact on the series (e.g., the summer Olympics occur once every four years.)

 

  1. How should you estimate uncertainty when using extrapolation models? 

Use empirically estimated error measures. Simulate the actual forecasting situation. Use the data to make ex ante forecasts, and then compare the forecasts with the actual values. Do this for each forecast horizon by using successive updating. Thus you could make one to five-year ahead forecasts starting in year t, calculate forecasts and errors from t+1 through t+5, then update the database by including t+1 in the database, and again make forecasts. You might use these empirically estimated uncertainty levels to address such questions as what is the error for the four-year-ahead forecasts.

 

  1. What is a contrary series and how should it be extrapolated? 

A contrary series is one in which the historical trend is opposite in direction to the expectation of domain experts. The causal forces should be identified prior to examining the data. Do not forecast trends from the data. A no-change forecast is often sufficient.

 

  1. Under what conditions are extrapolation methods useful? 
  1.  
    1. Many forecasts are needed, so cost is a factor
    2. No substantial changes are expected in the trend.
    3. The historical trend is long.
    4. The historical data are reliable and valid.

 

  1. A client decided to use an exponential smoothing program that he found through a Google search. He asked you to explain alpha and beta. 

Alpha is the weight that you place on the most recent observation in a time series. So if alpha were 0.4, this would put 40% weight on the most recent observation with the rest, 60%, on the prior average. Beta plays the same role with the estimates of the trend. Thus, for a volatile series, you would use a lower alpha, and a lower beta.