Powered By Blogger

Thursday, March 09, 2006

Predictive models: How well do they work?

"Prediction is very hard, especially about the future." -Dirac
From the Wall Street Journal, the Numbers Guy:
Ahead of the Oscars, an economics professor, at the request of Weekend Journal, processed data about this year's films nominated for best picture through his statistical model and predicted with 97.4% certainty that "Brokeback Mountain" would win. Oops. Last year, the professor tuned his model until it correctly predicted 18 of the previous 20 best-picture awards; then it predicted that "The Aviator" would win; "Million Dollar Baby" won instead. Sometimes models tuned to prior results don't have great predictive powers.
As I read to the end of the column, this extra bit was there.

There is always lots of discussion among decisionmakers of models and whether models that represent given data can be used to predict future events. Usually, like this paragraph finds, the models can't.

The underlying problem with most models is that we (scientists, that is) don't know the underlying physical processes and mathematical equations that govern events. If we know what they are, then we can (usually!) do a good job representing the phenomena that lead to the events we want to predict. Often, though, we simply don't know what the correct model is. So, we pick something we like, and that's usually driven by personal prejudice.

Few scientists admit to that prejudice, but it's there nonetheless. Ask a chaologist about models and he'll tell you why a chaotic model is best and how the world is governed by non-linear equations. Ask a cellular automatist and he'll extol the virtues of cellular automata as capturing the very thoughts of God exactly and therefore the perfect model of any of God's creations.

These models all have their place, it's just that place is usually not one for prediction.

Here's another example from MATLAB. It's part of their demo for the software.

This figure is a curve fit (cubic, that is, a third power polynomial) to census data from 1900 to 1990.

The predicted value for 2000 is 280 million people. A cubic does a pretty good job of fitting the data. A higher order polynomial would fit the data even better.


This plot is what happens for an 8th-order polynomial fit. It fits the given data well, only it "predicts" the population for the year 2000 at -70 million. That's negative 70 million.

Need I say more?

No comments: