The US midterms are on next week, and polling aggregators have already released their forecasts. In summary, they use polling, and other “fundamentals” such as presidential approval and recent partisanship, to determine who will win each race: the House, the Senate, and the Governorships. Probabilistic forecasts of this nature suffered a pummeling after their high profile failure in the 2016 election: the major prognosticators all gave Hillary Clinton a very high chance of winning, at three quarters or above. But she lost. What happened?
Poll dancing
Besides fundamental issues relevant to forecasting, let’s start with what went wrong with polls in 2016 (and, to a much lesser extent, in 2020).
The first is undecided voters - people who didn’t know who to vote for at any given poll. About 15% of people didn’t know who they were voting for until late in the election, and Clinton’s big polling leads were based on assuming undecided votes were breaking evenly (which they had for a really long time), when in reality they broke heavily for Trump. This also happened in Argentina’s 2019 election primaries, when polls showed a tight race between the incumbent and his challenger, but the results were a 16-point blowout for the government.
Another issue is nonresponse bias: people sometimes don’t respond to polls. If there are systematic differences between those who answer and those who don’t, then the polls don’t actually represent the electorate, only people in the electorate who want to say who they’re voting for - so, if there is a candidate who people are systematically less likely to admit they support, then the polls will all be skewed in one direction. There isn’t much evidence a “shy Trump” effect was there, but the election was close enough that perhaps it made a difference. A similar issue is that polls tend to divide people into all respondents and likely voters, depending on whether they said they were going to vote, and the latter were much more pro Clinton - so, once again, systematic differences in likelihood of voting (plus changes in behavior but we’ll come back to that) means that perhaps polls were too bullish on Hillary.
The last issue is education: highly educated voters picked Clinton, and the rest picked Trump. Because pollsters didn’t pay much attention to education weighing, since it hadn’t been that major an issue in the past, they completely missed that low education voters, particularly in the Midwest, were flipping hard for the Republicans - the effect of education had been less than 1% in the previous four elections, but quadrupled in 2016. This was especially prevalent for polling of specific states, too.
This would point to polling’s errors being fixable, to some extent, and polls were fairly accurate in 2018 - but the error was roughly the same in 2020, too. Besides from all the technical nitpicks, the big problem was, ultimately, that a 4-point margin of error doesn’t mean that you have it in the bag. And 2020 had its own super weird dynamics, such as barely anyone calling that Hispanic voters would turn hard right. So is there something more symptomatic going on?
Lucas County or Lucas, Robert
… the features which lead to success in short-term forecasting are unrelated to quantitative (…) evaluation, (…) the major econometric models are (well) designed to perform the former task only, and that simulations using these models can, in principle, provide no useful information as to the actual consequences of alternative economic policies.
Back in the 1970s, there was a big trend of using models built off of simple extrapolations of previous data to determine economic policy questions. For instance, inflation was a major issue, and economists wondered if reducing it was worth it, to answer that question, they extrapolated from previous data, and reached the conclusion that bringing inflation down by 1% (as in, from 5% to 4%) required a drop of 10% GDP, and a large increase in unemployment - so they decided it wasn’t worth it. Now, when inflation was reduced by the Volcker Shock in the late 70s and early 80s, inflation did require a painful recession to go away, but GDP did not fall by 80% (which would have been twice as severe as the Depression). So what gave?
Quite simply, the problem was how the forecasts themselves were made. Now, there is a key difference to understand here: a forecast is an extrapolation of past trends onto future events. A prediction is a conditional assertion based on analysis of the phenomena. Saying “there has never been a rapid increase in rates without a recession” is a forecast; saying “if you raise rates, there will be a recession” is a prediction. The difference is pretty simple: a forecasts assumes constant incentives and responses to them, and a prediction revolves around changes in best response behaviors.
One of the best teams in the NBA’s Eastern Conference are the Golden State Warriors, and the worst one are the Charlotte Hornets. If the two teams played tomorrow, it would be fairly easy to say who wins against who - even to me, a complete ignoramus of basketball1. But would it be possible to answer that question using the same data if, right before the match, the NBA announced that running around while carrying the ball was allowed? I'd say no - you'd have to know a lot about how each team was in a situation that there is no data for - you would, then, have to go off a deeper understanding of how each team works and the skills of their players.
What this example indicates is that historical patterns of human behavior often depend on the rules of the game in which people are participating. Since much human behavior is purposeful, it makes sense to expect that it will change to take advantage of changes in the rules. This principle is so familiar to fans of football and other sports that it hardly bears mentioning. However, the principle very much deserves mentioning in the context of economic policy because here it has been routinely ignored — and with some devastating results. Adherents of the theory of rational expectations believe, in fact, that no less than the field of macroeconomics must be reconstructed in order to take account of this principle of human behavior.
Sargent (1980), “Rational Expectations and the Reconstruction of Macroeconomics”
The answer to what went wrong with the 70s models for Great Depression-tier downturns being required to subdue moderate inflation was pretty simple: they assumed static patterns of behavior from the time when inflation was lower. Disinflation was less costly than predicted because Volcker did not reduce inflation by mechanically turning X or Y dial on the money spigot - he did it by convincing everyone that he was personally going to “spill blood, lots of blood, other people’s blood” to bring inflation down, which got them to adjust their behaviors back to low expectations world. This is also why the Great Depression ended when changes to the gold standard were announced: people stopped expecting that the economy would be in the dumps because prices would go down, and started expected prices to go up instead, which incentivized spending.
Now, to clarify, rational expectations doesn’t mean that people are perfectly intelligent or that they literally have all information and process it via complicated mathematical functions - rather, it means that the aggregate of individuals functions as if each individual acted in that way, because most people behave at least generally in a way that responds to incentives “correctly” - avoiding bad outcomes and leaning towards good ones. The main consequence of this view is that there are no possibilities for “free money” by exploiting information that everyone has - someone would have already. There are no $100 dollar bills on the sidewalk.
So the limits to traditional economic models are clear: if there are changes in the rules of the game, then forecasts made using data from times when old rules applied are not very informative, since the behaviors incentivized are different and therefore the best responses to various datapoints are different as well. The limits of forecasting are set by your knowledge of the thing you’re forecasting: if you need to make predictions, you need to be an expert on the subject.
This has pretty obvious implications for election forecasting: to get the results right, you not only need to know a lot about polling, biases, and samples, but also about how potential events could shift voter intentions. For instance, if voters who care a lot about crime vote for one party, a national story that brings attention to crime could change the priorities voters have, helping that party. But realizing this is really hard, and it’s fairly hard to infer this from polling - will inflation or immigration be bigger stories? How will proposed bills change the calculus? - especially when certain groups of voters are overwhelmingly less likely to express their preferences.
Conclusion
The limits of statistical forecasts are built around the limits of forecasting itself: namely, that exogenous changes in patterns of behavior cannot be captured by models trained and built on data corresponding to previous behaviors. Of course, since future behaviors are difficult to predict, then models are useful, but it is likely that a systematic change in the behavior of voters in a certain group will result in polls biased in either direction - with consequences to the overall accuracy of the model.
Sources
Polls
Nate Cohn, “A 2016 Review: Why Key State Polls Were Wrong About Trump”, The New York Times, 2017
Nate Silver, “The Death Of Polling Is Greatly Exaggerated”, FiveThirtyEight, 2021
RatEx
Previous post about rationality in economics
Lucas (1976), “Econometric Policy Evaluation: A Critique”
Sargent (1980), “Rational Expectations and the Reconstruction of Macroeconomics”
Actually I played basketball for four years in high school, but I was by far the worst player on my year. Also, don’t say they’re on different conferences. I don’t care.