November 06, 2006
From Poll Margin to Wins: Polls as Predictors
The usual way to look at poll accuracy is to subtract the poll result from the vote result. But an alternative is to look at how the probability that a candidate wins depends on the margin they have in the pre-election polls. Since American elections are "winner-take-all" within districts, this is a good way of looking at the practical power of polls to predict winners.
After all-- a statistician would say a poll was better that predicted 51% for the loser who actually got 49% than a poll that predicted 51% for the winner who got 55%. That's right from one point of view, but not from the perspective of predicting winners right. Here I take a look at the latter view of what is important.
The data are from all statewide polls for Senate, Governor or President from 2000 and 2002.
The figure above plots results by poll margin. The x-axis shows the Dem minus Rep margin in the polls. The y-axis plots the percent of races the Dem ACTUALLY won for each margin we saw in the polls. So imagine I take all polls that found a 5-point lead for the Dem. The y-axis plots the proportion of those polls with a 5-point lead in which the Dem actually DID win. I do this separately for each race, Gov, Sen and Pres. The dots show there is a lot of variation, but the pattern of points, and the black trend line through the data show how the predictive accuracy varies over margins from -30 to +30.
One interesting feature is that a margin of zero (a tied poll) produces a 50-50 split in wins with remarkable accuracy. There is nothing I did statistically to force the black trend line to go through the "crosshairs" at the (0, .5) point in the graph, but it comes awfully close. So a tied poll really does predict a coin-flip outcome.
The probability of a win rises or falls rapidly as the polls move away from a margin of zero. By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win. When we see a 10 point margin for the Rep, about 90% of Reps win. That symmetry is also not something I forced with the statistics-- it represents the simple and symmetric pattern in the data.
More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.
A 5 point lead, on the other hand, turns out to be right only about 60-65% of the time. So bet on a candidate with a 5 point lead, but don't give odds. And for 1 or 2 point leads (as in some of our closer races tomorrow) the polls are only barely better than 50% right in picking the winner. That should be a sobering thought to those enthused by a narrow lead in the polls. Quite a few of those "leaders" will lose. Of course, an equal proportion of those trailing in the polls will win.
So read the polls-- they are a lot better than nothing. But don't take that 2 point lead to the bank. That is a failure to appreciate the practical consequences of the margin for error.
-- Charles Franklin
November 06, 2006 in Interpreting Polls
TrackBack
TrackBack URL for this entry:
http://www.pollster.com/cgi-bin/mt/mt-tb.fcgi/3168.
Listed below are links to weblogs that reference From Poll Margin to Wins: Polls as Predictors:
Comments
carrie:
Wow, this is great. Am I correct that polls for President have the greatest accuracy, while for Senator have the least? What does it look like if you remove the presidential data?
Posted on November 6, 2006 2:18 PM
Alan:
When computing the trend line, did you weight by the # of races each point represents? For example, one of the points at p=0.5 may represent only two races, where one went Democratic and one went Republican, while another point at p=0.8 may represent 10 races, where 8 went Democratic and 2 went Republican. The trend line should then pass much closer to the p=0.8 point than the p=0.5 point. Also, without some visual representation of the # of races each point represents, trying to get a feel for the data by "eyeballing" the chart is useless.
Posted on November 6, 2006 2:25 PM
To be a bit nitpicky, doesn't the graph show that if a Democrat is 1% behind in the polls, they have a 50% chance of winning?
And if this is true, isn't this odd? I've been under the impression that Democrats usually need to lead by 1 to 2 points in pre-election polling to have a 50% shot at winning, because Democrats turn out less than they say they will compared to Republicans. (For example, I thought this was part of the explanation of why the generic ballot question always overstates how well Dems will do.)
I'd be interested to hear (a) are each of the polls in the graph the final pre-election poll for that race? and (b) are they of "likely voters"?
Posted on November 6, 2006 2:30 PM
rvman:
Am I correct in assuming that the analysis treated each poll separately, in some kind of logistic or probit analysis, and the results were just rolled up for graphing purposes?
How fresh does a poll have to be to be counted? I'm seeing some Senate races with double-digit leads in the poll going the 'wrong' way - is that polls which were stale by the time of the election? (A September poll in an election in November, for example.) Do you use the last poll from a given source, or all polls of that race from, say, Zogby? You could get a decent success measure by looking at the standard errors for these on a firm-by-firm basis.
It really shouldn't happen very often that a poll 'misses' outside of margin of error that often. (+/-) 5 points in the last week should be pretty solid with a fair poll, and +/- 10 well nigh determinant.
Posted on November 6, 2006 3:53 PM
Hve you considered doing a split of the data into "races won" vs. "races lost" and then plotting a KS curve (or better yet a ROC)?
Posted on November 6, 2006 3:57 PM
Have you considered breaking the data out by "races won" vs. "races lost" and plotting a KS cirve (or better yet a ROC)?
Posted on November 6, 2006 4:00 PM
Elizabeth Liddle:
That is very beautiful. Thank you.
Posted on November 6, 2006 6:13 PM
Gary Kilbride:
Since the word bet was mentioned I'll jump in.
This is great analysis but the conclusion was reversed, in terms of real-world application, if we're actually talking about wagering. Find a candidate with a 5 point poll edge who is not favored. Or favored by less than the 60-65%, which would equate to slightly less than 1/2. You can't bet on those candidates without giving odds, unless you did it earlier in the cycle. On Tradesports and elsewhere those margins inevitably are bloated in the 4-5 point range, more like 75-85% expectancy.
The idea is to bet on candidates who you expect to lose, or have no opinion, with a comfort level you are taking value prices and over the long run it pays off. Similarly, in sports I literally make hundreds of bets during the year based on pure value, a suddenly out of whack money line allowing let's say +180 (10 to win 18) when the standard and sensible line is +150. The Colts last night over New England is an example. Notice I identify a game that won:)
The all-or-nothing focus reminds me of one obscure and largely unknown wagering opportunity. It's called action points. Instead of paying off on the bottom line winner or loser, a base number is assigned. Then you play a dollar figure, let's say $50 per point, and the payoff or penalty is determined by the final margin in relation to the base. So a pick'em game with a 10 point result nets $500. Or costs $500. Originally there was no cap but now places might limit it to 20 points up or down. It rewards handicapping strength, and lessens the luck factor regarding the inevitable toss-up outcomes.
In 2000 one offshore outlet, now out of business, offered action points on politics. My friends and I did very well isolating states with historical or trending partisanship that was not fully captured by the base number. For instance, taking Bush in Georgia and Gore in California and New Jersey. Clinton over Lazio in the New York senate race. The one we missed and regretted was Bush in West Virginia. The base number was low and history, of course, said Democatic advantage but the 2000 local issues favored Bush. He won by more than 6%.
The offshore outlets were remarkably petrified in offering political odds this cycle. Bodog.com put up the Connecticut senate race at pick'em immediately post primary and that was a horrible line, otherwise very little available.
For reference purposes, tonight on election eve the Tradesports number is 80% likelihood of Democrats taking the House, with the over/under basically +24 seats.
Posted on November 6, 2006 9:34 PM
Sholom:
Probably a naive question: you wrote that, for example: "By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win."
What if we see, e.g., 8 different polls all showing that a 10 point lead for one candidate?
Posted on November 7, 2006 12:04 AM
Alan:
I know you gentlemen are busy, but at some point this graph either needs to be fixed or withdrawn.
The graph is meaningless because we don't know and can't see how many polls each of the points represents. Any individual point could represent one poll, or it could represent 100.
One reader asked if the accuracy of polls for presidential elections is greater than the accuracy for senate elections. That would be a reasonable conclusion, if not for the fatal flaw in this graph. It is possible that the outliers in the senate polls represent only a handful of polls, while the center points represent hundreds of polls. If that is the case, then the conclusion about the accuracy of senate polls would be completely wrong.
It also possible that the points on the left side of the graph represent only a handful or polls, while the points on the right side represent hundreds of polls. If that is the case, then the correct conclusion is that Democrats must be about 5 points ahead in the polls in order to have a 50-50 chance of winning. From this chart, we don't know if that is the case or not.
In conclusion, this chart is meaningless and possibly misleading. Please either fix it or retract it.
Posted on November 7, 2006 10:30 AM
Plutzer:
Another crucial piece of information is missing: are these all polls reported in the final days of a campaign, final week, or might they include polls taken well ahead of the election? If the former, the results seem at odds with typical levels of accuracy of pre-election polls; if the latter, then it's of little relevance to today's election...
Posted on November 7, 2006 12:48 PM
Post a comment
Gov General Election:
Indiana
Montana
New Hampshire
North Carolina
North Dakota
Utah
Washington
Sen General Election:
Alabama
Alaska
Colorado
Idaho
Illinois
Iowa
Kansas
Kentucky
Louisiana
Maine
Massachusetts
Michigan
Minnesota
Mississippi (A)
Mississippi (Special)
Nebraska
New Hampshire
New Jersey
New Mexico
North Carolina
Oklahoma
Oregon
Rhode Island
South Dakota
Texas
Virginia
2008 POLL DATA
Pres General Election:
National
National 4-way
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
All 08 GE Match-ups
All 08 Primaries
PUBLIC POLLSTERS
ABC News
AP-IPSOS
CBS News
Democracy Corps (D)
Diageo/Hotline Poll
Economist/YouGov
EPIC/MRA
The Field Poll
FOX News
GWU/Battleground
Gallup
Harris Interactive
IBD/TIPP
ICR - International Communications Research
LA Times/Bloomberg
Mason Dixon Polling and Research
Marist Poll
Market Shares Corporation
Mitchell Interactive
NBC/Wall Street Journal
New York Times
Opinion Research Corporation
Pew Research Center
Polimetrix
Princeton Survey Research Associates International
Public Agenda
Public Policy Polling
Quinnipiac University Poll
Rasmussen Reports
Selzer & Company
Suffolk University Political Research Center
Survey USA
Time/SRBI
Washington Post
World Public Opinion
Zogby International
POLL BLOGS AND SITES
Political Arithmetik
Crosstabs.org
The Polling Report
Electoral-Vote.com
R. Chung's Graphics
Prof. Wang's State Poll Meta-Analysis
Prof. Pollkatz Pool of Polls
Slate: Election Scorecard
Public Opinion Pros
Frank Newport: Gallup Guru
Carl Bialik: The Numbers Guy
Poll Positions: Kathy Frankovic
The Numbers: Gary Langer
Washington Post: Behind the Numbers
SURVEY RESEARCH ORGANIZATIONS
American Association for Public Opinion Research (AAPOR)
The National Council on Public Polls (NCPP)
Council of American Survey Research Organizations (CASRO)
The World Association for Public Opinion Research (WAPOR)
The Council for Marketing and Opinion Research (CMOR)
Marketing Research Association
ARCHIVES
July 13, 2008 - July 19, 2008
July 6, 2008 - July 12, 2008
June 29, 2008 - July 5, 2008
June 22, 2008 - June 28, 2008
June 15, 2008 - June 21, 2008
June 8, 2008 - June 14, 2008
June 1, 2008 - June 7, 2008
May 25, 2008 - May 31, 2008
May 18, 2008 - May 24, 2008
May 11, 2008 - May 17, 2008
All pollster.com archives
MysteryPollster.com archives







Geek, Esq.:
Am I correct in assuming that candidate status--incumbent vs. challenger vs. open--isn't accounted for?
That would make for a more interesting graph, methinks.
Posted on November 6, 2006 2:09 PM