Articles and Analysis


NCPP's Report on Pollster Performance

Topics: Accuracy , Pollsters

Yesterday, the National Council on Public Polls (NCPP) posted its biennial review of poll performance as a three-part set of PDF documents: Tables scoring the final polls from each pollster at both the national and statewide level and a top-line analysis (full disclosure: We provided NCPP with a database of the general election polls we logged here at Pollster.com, although we had no involvement in their analysis).

Historically, NCPP has focused their analysis on the national polls. Here are their main conclusions on the performance of the national polls in 2008:

In terms of Candidate Error, the average is less than one percentage point (0.9), whether the pollster choose to allocate undecided voters at the end of not. That is the same as the 0.9 percentage point error reported by NCPP for this analysis in 2004. It is slightly less than the 1.1 percentage point average in 2000. In 2008, estimated errors ranged from 0.1 to 2.4 percentage points.

Thus, despite widely discussed concerns such as the growing size of the cell-phone-only population (and this year the possibility of a repeat of Bradley/Wilder effect), there was no change in poll average error.

NCPP is a consortium of media pollsters, and as such concentrates on evaluating the performance of the polls (plural) rather than on rating or ranking individual pollsters or methodologies. So while the report has some useful data for making year-to-year, industry-wide comparisons, it will likely frustrate those trying to find "best" or "worst" pollster.

That said, any thorough effort to rank the pollsters, to separate "good" from "bad" on the basis of the accuracy of the last poll is bound to frustrate for reasons the NCPP report identifies in an easily overlooked, next-to-last paragraph:

No method of judging the error works perfectly. Other evaluations of poll performance based on other methods may produce different conclusions.

The NCPP report includes two methods of measuring the poll error that differ slightly from the eight first proposed in 1948 by the renowned Harvard statistician Frederick Mosteller in his chapter of the report of the Social Science Research Council on the polling failures that year (an still used by many who score pollster error). The NCPP measures also differ from the odds-ratio scoring proposed three years ago in the pages of Public Opinion Quarterly by Martin, Traugott and Kennedy. I have looked at state level pollster error using some of these methods, and can confirm that different methods can and do produce different rankings in 2008.

The reasons is that four factors can affect the size of the error scores, especially when aggregated for any given pollster, and these are not comparable across organizations:

1) The number of polls conducted - Generally, if we average errors across multiple polls, those who do more polls should show lower average errors by the logic of regression to the mean. Any one poll can produce a large error by chance, but as we average more and more surveys, the average errors should be generally lower (there is an exception

2) The number of interviews for any given poll - More interviews should mean less random error, and different pollsters use different sample sizes. The sample sizes for some individual pollsters can also vary widely from state to state. So if we aggregate errors across pollsters, some will do better simply because their sample sizes are bigger.

3) How the scoring handles or interprets the "undecided" category - In general elections, "undecided" is not a choice on the ballot, so any reported undecided is an error, in a sense. What complicates the analysis is that some pollsters allocate undecided voters on their final poll and some do not. Some error scores effectively ignore the undecided (either by allocating or by focusing on the margins separating the candidates), while some scores penalize pollsters that leave undecideds unallocated. This issue remains a matter of considerable, unresolved debate among pollsters.

4) The lag between the dates of interviewing and the election -- A longer delay between the field dates and election day creates a greater potential for error due to last minute shifts in voter preferences. Those that field late have an inherent advantage over those that conclude earlier, although the size of any such advantage in any given election is debatable and hard to evaluate. And ignoring all the polls that came before "the last poll" opens the possibility of a misleading measure, especially when polls do seem to converge around a common mean on the last round of polls (at least they did in 2008, see our posts here, here, here and here).

All of these are reasons why we have been cautious (so far) in producing a "best-to-worst" ranking of individual pollsters for Pollster.com. A few weeks ago, Mark Lindeman and I ranked pollsters based on their statewide surveys using 12 different scores and time frames (don't bother searching, as we have not yet posted these online).** Even when we narrowed the list to the 15 or so organizations that produced at least five "final" poll results in statewide contests, we found seven different pollsters ranking 1st or 2nd at least once, five ranking lowest or second lowest at least once and three that ended up in both categories (best and worst) at least once. And none of these rankings ranked the pollsters in a way that controlled for the number of polls conducted or the sample sizes used, ranking each pollster against the standard of how well it should have done.

The NCPP report takes a first stab at that sort of analysis by comparing what they call candidate estimate error to one half the of the margin of error. "A total of 53  of the state polls," the report tells us, "or 12.8 percent had results that fell outside of the sampling margin of error for that survey."*** Given that the margin of error is based on a 95% level of statistical confidence, if the surveys (and these comparisons) were perfect, we would expect only 5% of the results to fall outside the margin of error. Caveat: They arrive at this statistic by calculating the error on the margin predicted by the poll, dividing that number by two (to get an estimate on the error for each candidate) and comparing it to the reported margin of error for that poll.   

Do some pollsters do better than others when judged by that standard? I will try to assess that in my next post.

**I haven't posted those scores, mostly because the endless number of tables adds up to no obvious conclusion. I'm willing to post those tables, in all their glorious and confusing detail, if readers demand it. But I would much rather try to find ways to evaluate pollsters that attempt to control for the four factors listed above. As always, readers suggestions are welcome.

***When I wrote this post, the links on the NCPP web site pointed to earlier drafts of the tables and analysis that were not based on the final results in each state.  As a result, the original version of my entry quoted an earlier computation of the percentage of polls falling outside the sampling margin of error, which I have now corrected. 



I'm not sure it's obvious that you should be controlling for #2. If a pollster tends to do polls with more samples, perhaps they should in fact be ranked as more trustworthy?

I think (but I'm not certain) that Nate Silver subtracts the expected variance from the observed variance to produce a "pollster-introduced error" number.



I'm more concerned with the house effect trend over time, not on the last poll. It's clear that some pollsters adjust their methods late in an election cycle in order to come in line with the average of polls.

Maybe it would be interesting to do the House Effects measurement from October 4th polls based on the actual election, noting of course that the race changed over those days and they should be off, but which one changed the most between then and the last poll, and did they move towards the consensus in general would be telling. That might help to expose the fudge.

I also noticed that Rasmussen seems to have moved away from the consensus late in the cycle in certain states that the McCain campaign claimed were in play and vital to win, i.e. Pennsylvania, Florida and Colorado. This trend started in October, exactly when the McCain campaign leaked it's big pickup and hold strategy with PA at the top. Mix this with their propensity to write headlines favorable to Republicans and test questions in polls that are more so conservative talking-points.

I did some quick math just after the election and it seemed like Rasmussen had about a 2 point Republican skew in their final polls just like they did in 2006.

I fear that like news, polling is becoming a mouthpiece for partisanship and propaganda. Those in the industry need to fight back at these dishonest organizations and discredit them, especially since some of them are trying to steal credit.


Mark in LA:

"I fear that like news, polling is becoming a mouthpiece for partisanship and propaganda."

If their polls don't match up with the final results, their credibility will suffer, and there ain't much left to market if you are a pollster with no credibility. Even Fox understands this when it comes to polling.



Let me start with a little full disclosure on my part. Politically, I am on the right side of the aisle. I tend to think of myself more as conservative than Republican. While I had 20 some hours of math in college, statistics is definitely not my strong suit. Those who write the articles on this site, and even most who post comments, are way better at that sort of thing than I am. So call me an interested observer.

I have to agree with Mr. Brambster's conclusion. In my admittedly amateur opinion, it looks like many polling organizations are engaged in outright propaganda.

Let's assume that after all the votes were counted the actual margin between the two major candidates was 7%. I looked at numerous web sites, and found no two with the same number. In general, they run from 6.7% to 7.2%. I believe the NCPP analysis used 6.7%. The exact number is not really that important.

In my opinion, Sen. Obama widened his lead in the final weeks of the campaign to arrive at that final margin of about 7%. From what I saw on the news, if any of it is to be believed, the Democrats out-GOTV'ed the Republicans this time. And, supposedly, it wasn't close. Another thing that I believe contributed to Sen. Obama widening his lead in the last few days was the infomercial. Let's give credit where it is due. That was extremely well done.

Putting all of the above together, it looks to me like for most of October the Obama lead had to be somewhere near or below the 7% final margin. The trend line on this site shows both candidates picking up support as the undecided's came around, but no significant change in the gap between the two.

Which brings me around, finally, to "house effect" or even outright propaganda. While I can't explain it in technical terms as well as most others on this site, I am aware of MOE, sample size, and other contributing factors. The numbers came from this web site. I started to list a bunch of them, but why? If you are really interested, I just told you where you can find them.

Rep bias:

Rasmussen does indeed look to have a Rep bias. I would not argue with 2-3 points. It was never a 3 point race a week before the election. While they may typically be within MOE, when they are off it is always in favor of Reps. Others that appear to have similar numbers include GWU, ARG, and IBD/TIPP.

Dem bias:

ABC/Post I would make the same argument as Rasmussen, except in Dem's favor. Same for Gallup and Marist.

BIG Dem bias:

CBS/Times consistently had a double digit Dem lead. Obvious bias, more than any 2-3 points. Their poll releases should have all ended with "I'm Barack Obama and I approve this message." Similar results for Newsweek, Pew, and except for the last week or so the Daily Kos.

No bias:

Numbers typically look like they are within MOE and tend to randomly vary.
CNN (which surprised me, I thought they were a bunch of libs)
Democracy Corps
FOX (Another surprise, I thought they were right of me)
NBC\WSJ (If FOX's numbers are OK, these guys are too)

Balance the scales and it looks like aggregate numbers are biased in the Dems favor.

Again, I don't claim these results to be any kind of statistical analysis. Even I know the number of polls by each pollster is too small a sample for that. I'm just giving it the old "if it walks like a duck and quacks" test. Feel free to disagree.


Mark Lindeman:

BigMike: Nothing wrong with trying to look closely at the results without any high-tech statistical analysis. I don't think you've really supported your hypothesis about "outright propaganda," however.

Leip currently gives Obama a 7.24% margin, and the pollster.com mashup (default sensitivity) had Obama +7.6%, so if there is an aggregate bias, it is subtle. But individual pollsters can have house effects for any number of reasons. Personally, I generally opt out of the accusations of deliberate bias in either direction.

Just to linger on one of your examples, Pew had two huge Obama numbers (+14, +15), and then their final poll had Obama +6. Respecting Pew's staff and work as I do, I'm inclined to conclude that it was pretty hard to get the likely voter screen right in advance this year. (Some would say that actually, the pollsters work hard to get the screens wrong, because massive vote suppression or fraud reduces Democratic vote share, and the pollsters want to match the tampered returns as closely as possible.)


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.