### From Poll Margin to Wins: Polls as Predictors

#### Charles Franklin | November 6, 2006

The usual way to look at poll accuracy is to subtract the poll result from the vote result. But an alternative is to look at how the probability that a candidate wins depends on the margin they have in the pre-election polls. Since American elections are "winner-take-all" within districts, this is a good way of looking at the practical power of polls to predict winners.

After all-- a statistician would say a poll was better that predicted 51% for the loser who actually got 49% than a poll that predicted 51% for the winner who got 55%. That's right from one point of view, but not from the perspective of predicting winners right. Here I take a look at the latter view of what is important.

The data are from all statewide polls for Senate, Governor or President from 2000 and 2002.

The figure above plots results by poll margin. The x-axis shows the Dem minus Rep margin in the polls. The y-axis plots the percent of races the Dem ACTUALLY won for each margin we saw in the polls. So imagine I take all polls that found a 5-point lead for the Dem. The y-axis plots the proportion of those polls with a 5-point lead in which the Dem actually DID win. I do this separately for each race, Gov, Sen and Pres. The dots show there is a lot of variation, but the pattern of points, and the black trend line through the data show how the predictive accuracy varies over margins from -30 to +30.

One interesting feature is that a margin of zero (a tied poll) produces a 50-50 split in wins with remarkable accuracy. There is nothing I did statistically to force the black trend line to go through the "crosshairs" at the (0, .5) point in the graph, but it comes awfully close. So a tied poll really does predict a coin-flip outcome.

The probability of a win rises or falls rapidly as the polls move away from a margin of zero. By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win. When we see a 10 point margin for the Rep, about 90% of Reps win. That symmetry is also not something I forced with the statistics-- it represents the simple and symmetric pattern in the data.

More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.

A 5 point lead, on the other hand, turns out to be right only about 60-65% of the time. So bet on a candidate with a 5 point lead, but don't give odds. And for 1 or 2 point leads (as in some of our closer races tomorrow) the polls are only barely better than 50% right in picking the winner. That should be a sobering thought to those enthused by a narrow lead in the polls. Quite a few of those "leaders" will lose. Of course, an equal proportion of those trailing in the polls will win.

So read the polls-- they are a lot better than nothing. But don't take that 2 point lead to the bank. That is a failure to appreciate the practical consequences of the margin for error.

## Comments

Am I correct in assuming that candidate status--incumbent vs. challenger vs. open--isn't accounted for?

That would make for a more interesting graph, methinks.

Posted on November 6, 2006 2:09 PM

Wow, this is great. Am I correct that polls for President have the greatest accuracy, while for Senator have the least? What does it look like if you remove the presidential data?

Posted on November 6, 2006 2:18 PM

When computing the trend line, did you weight by the # of races each point represents? For example, one of the points at p=0.5 may represent only two races, where one went Democratic and one went Republican, while another point at p=0.8 may represent 10 races, where 8 went Democratic and 2 went Republican. The trend line should then pass much closer to the p=0.8 point than the p=0.5 point. Also, without some visual representation of the # of races each point represents, trying to get a feel for the data by "eyeballing" the chart is useless.

Posted on November 6, 2006 2:25 PM

To be a bit nitpicky, doesn't the graph show that if a Democrat is 1% behind in the polls, they have a 50% chance of winning?

And if this is true, isn't this odd? I've been under the impression that Democrats usually need to lead by 1 to 2 points in pre-election polling to have a 50% shot at winning, because Democrats turn out less than they say they will compared to Republicans. (For example, I thought this was part of the explanation of why the generic ballot question always overstates how well Dems will do.)

I'd be interested to hear (a) are each of the polls in the graph the final pre-election poll for that race? and (b) are they of "likely voters"?

Posted on November 6, 2006 2:30 PM

Am I correct in assuming that the analysis treated each poll separately, in some kind of logistic or probit analysis, and the results were just rolled up for graphing purposes?

How fresh does a poll have to be to be counted? I'm seeing some Senate races with double-digit leads in the poll going the 'wrong' way - is that polls which were stale by the time of the election? (A September poll in an election in November, for example.) Do you use the last poll from a given source, or all polls of that race from, say, Zogby? You could get a decent success measure by looking at the standard errors for these on a firm-by-firm basis.

It really shouldn't happen very often that a poll 'misses' outside of margin of error that often. (+/-) 5 points in the last week should be pretty solid with a fair poll, and +/- 10 well nigh determinant.

Posted on November 6, 2006 3:53 PM

Hve you considered doing a split of the data into "races won" vs. "races lost" and then plotting a KS curve (or better yet a ROC)?

Posted on November 6, 2006 3:57 PM

That is very beautiful. Thank you.

Posted on November 6, 2006 6:13 PM

Since the word bet was mentioned I'll jump in.

This is great analysis but the conclusion was reversed, in terms of real-world application, if we're actually talking about wagering. Find a candidate with a 5 point poll edge who is not favored. Or favored by less than the 60-65%, which would equate to slightly less than 1/2. You can't bet on those candidates without giving odds, unless you did it earlier in the cycle. On Tradesports and elsewhere those margins inevitably are bloated in the 4-5 point range, more like 75-85% expectancy.

The idea is to bet on candidates who you expect to lose, or have no opinion, with a comfort level you are taking value prices and over the long run it pays off. Similarly, in sports I literally make hundreds of bets during the year based on pure value, a suddenly out of whack money line allowing let's say +180 (10 to win 18) when the standard and sensible line is +150. The Colts last night over New England is an example. Notice I identify a game that won:)

The all-or-nothing focus reminds me of one obscure and largely unknown wagering opportunity. It's called action points. Instead of paying off on the bottom line winner or loser, a base number is assigned. Then you play a dollar figure, let's say $50 per point, and the payoff or penalty is determined by the final margin in relation to the base. So a pick'em game with a 10 point result nets $500. Or costs $500. Originally there was no cap but now places might limit it to 20 points up or down. It rewards handicapping strength, and lessens the luck factor regarding the inevitable toss-up outcomes.

In 2000 one offshore outlet, now out of business, offered action points on politics. My friends and I did very well isolating states with historical or trending partisanship that was not fully captured by the base number. For instance, taking Bush in Georgia and Gore in California and New Jersey. Clinton over Lazio in the New York senate race. The one we missed and regretted was Bush in West Virginia. The base number was low and history, of course, said Democatic advantage but the 2000 local issues favored Bush. He won by more than 6%.

The offshore outlets were remarkably petrified in offering political odds this cycle. Bodog.com put up the Connecticut senate race at pick'em immediately post primary and that was a horrible line, otherwise very little available.

For reference purposes, tonight on election eve the Tradesports number is 80% likelihood of Democrats taking the House, with the over/under basically +24 seats.

Posted on November 6, 2006 9:34 PM

Probably a naive question: you wrote that, for example: "By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win."

What if we see, e.g., 8 different polls all showing that a 10 point lead for one candidate?

Posted on November 7, 2006 12:04 AM

I know you gentlemen are busy, but at some point this graph either needs to be fixed or withdrawn.

The graph is meaningless because we don't know and can't see how many polls each of the points represents. Any individual point could represent one poll, or it could represent 100.

One reader asked if the accuracy of polls for presidential elections is greater than the accuracy for senate elections. That would be a reasonable conclusion, if not for the fatal flaw in this graph. It is possible that the outliers in the senate polls represent only a handful of polls, while the center points represent hundreds of polls. If that is the case, then the conclusion about the accuracy of senate polls would be completely wrong.

It also possible that the points on the left side of the graph represent only a handful or polls, while the points on the right side represent hundreds of polls. If that is the case, then the correct conclusion is that Democrats must be about 5 points ahead in the polls in order to have a 50-50 chance of winning. From this chart, we don't know if that is the case or not.

In conclusion, this chart is meaningless and possibly misleading. Please either fix it or retract it.

Posted on November 7, 2006 10:30 AM

Another crucial piece of information is missing: are these all polls reported in the final days of a campaign, final week, or might they include polls taken well ahead of the election? If the former, the results seem at odds with typical levels of accuracy of pre-election polls; if the latter, then it's of little relevance to today's election...

Posted on November 7, 2006 12:48 PM

