Articles and Analysis


Murray: Are Nate Silver's Pollster Ratings 'Done Right'?

Topics: AAPOR , AAPOR Transparency Initiative , Fivethirtyeight , Nate Silver , Patrick Murray , Poll Accuracy , Polling Errors , Transparency

Patrick Murray is director of the Monmouth University Polling Institute

The motto of Nate Silver's website, www.fiverthirtyeight.com, is "Politics Done Right." Questions have been raised whether his latest round of pollster ratings lives up to that claim.

After Mark Blumenthal noted errors and omissions in the data used to arrive at Research 2000's rating, I asked to examine Monmouth University's poll data. I found a number of errors in the 17 poll entries he attributes to us - including six polls that were actually conducted by another pollster before our partnership with the Gannett New Jersey newspapers started, one eligible poll that was omitted, one incorrect candidate margin, and even two incorrect election results that affected the error scores of four polls. [Nate emailed that he will correct these errors in his update later this summer.]

In the case of prolific pollsters, like Research 2000, these errors may not have a major impact on the ratings. But just one or two database errors could significantly affect the ratings of pollsters with relatively limited track records - such as the 157 (out of 262) organizations with fewer than 5 polls to their credit. Some observers have called on Nate to demonstrate transparency in his own methods by releasing that database. Nate has refused to do this (with a somewhat dubious justification), but at least he now has a process for pollsters to verify their own data.

Basic errors in the database are certainly a problem, but the issue that has really generated buzz in the polling community is his new "transparency bonus." This is based on the premise that pollsters who were members of the National Council on Public Polls or had committed to the American Association for Public Opinion Research (AAPOR) Transparency Initiative as of June 1, 2010 exhibit superior polling performance. These pollsters are awarded a very sizable "transparency bonus" in the latest ratings.

Others have remarked on the apparent arbitrariness of this "transparency bonus" cutoff date. Many, if not most, pollsters who signed onto the initiative by June 1, 2010 were either involved in the planning or attended the AAPOR national conference in May. A general call to support the initiative did not go out until June 7.

Nate claims that, regardless of how a pollster made it onto the list, these pollsters are simply better at election forecasting, and he provides the results of a regression analysis as evidence. The problem is that the transparency score misses most researchers' threshold for being significant (p<.05). In fact, of the three variables in his equation - transparent, partisan, and Internet polls - only partisan polling shows a significant relationship. Yet, his Pollster Introduced Error (PIE) calculation awards "transparent" polls and penalizes Internet polls, but leaves partisan polls untouched. Moreover, his model explains only 3% of the total variance in pollster raw scores (i.e. polling error).

I decided to run some ANOVA tests on the effect of the transparency variable on pollster raw scores for the full list of pollsters as well as sub-groups at various levels of polling output (e.g. pollsters with more than 10 polls, pollsters with only 1 or 2 polls, etc.). The F values for these tests range from only 1.2 to 3.6 under each condition, and none are significant at p<.05. In other words, there may be more that separates pollsters within the two groups (transparent versus non-transparent) than there is between the two groups.

I also ran a simple means analysis. The average error among all pollsters is +.54 (positive error is bad, negative is good). Among "transparent" pollsters, the average score is -.63 (se=.23), while among other pollsters it is +.68 (se=.28). A potential difference, to be sure.

I then isolated the more prolific pollsters - the 63 organizations with at least 10 polls. Among this group, the 19 "transparent" pollsters have an average error score of -.32 (se=.23) and the other 44 pollsters average +.03 (se=.17). The difference is now less stark.

On the flip side, organizations with fewer than 10 polls to their credit have an average error score of -1.38 (se=.73) if they are "transparent" - all 8 of them - and a mean of +.83 (se=.28) if they are not. That's a much larger difference. Could it be that the real contributing factor to pollster performance is the number of polls conducted over time?

Consider that 70% of "transparent" pollsters on Nate's list have 10 or more polls to their credit, but only 19% of the "non-transparent" organizations have been equally as prolific. In effect, "non-transparent" pollsters are penalized for being affiliated with a large number of colleagues who have only a handful of polls to their name - i.e. pollsters who are prone to greater error.

To assess the tangible effect of the transparency bonus (or non-transparency penalty) on pollster ratings, I re-ran Nate's PIE calculation using a level playing field for all 262 pollsters on the list to rank order them. [I set the group mean error to +.50, which is approximately the mean error among all pollsters.] Comparing the relative pollster ranking between his and my lists produced some intriguing results. The vast majority of pollster ranks (175) did not change by more than 10 spots on the table. On its face, this first finding raises questions about the meaningfulness of the transparency bonus.

Another 67 pollsters moved between 11 to 40 ranks between the two lists, 11 shifted by 41 to 100 spots, and 9 pollsters gained more than 100 spots in the rankings, solely due to the transparency bonus. Of this last group, only 2 of the 9 had more than 15 polls recorded in the database. This raises the question of whether these pollsters are being judged on their own merits or riding others' coattails, as it were.

Nate says that the main purpose of his project is not to rate pollsters' past performance but to determine probable accuracy going forward. The complexity of his approach boggles the mind - his methodology statement contains about 4,800 words including 18 footnotes. It's all a bit dazzling, but in reality it seems like he's making three left turns to go right.

Other poll aggregators use less elaborate methods - including straightforward means - and have been just as, or even more, accurate with their election models (see here and here). I wonder if, with the addition of this transparency score, Nate has taken one left turn too many.



Great article. I'm glad that someone smarter than me about polling analysis could put to words what I have been trying to say about Nate's methodology. As pointed out here, that "transparency bonus" appears to much too powerful and with nothing to justify it.

I still have seen no justification for the "transparency bonus" based on actual data. It appears to be a "gut feeling factor". Nate seems to be of the opinion that being a member of the AAPOR club will lead to more accurate polls in the future. That's opinion - not fact. He shouldn't be weighting polls based on his opinion.


Field Marshal:

Mark, are there any plans for pollster.com or nationaljournal to do their own polling organization rankings?


Harry Enten:

I cannot speak for Mark or the Pollster.com banner, but I can say that I (along with two others) have been working on a system that does evaluate "pollster accuracy" in the broad sense... The purpose of such a scheme is not to grade pollsters, but to create a more accurate snapshot of where the election stands.



I think the public has given Obama a rotten deal. Never any rhyme or reason for why his approval goes down. It is like he doesn't make the greatest speech of his life, and suddenly people think he isn't doing a good job.

Remind me to ever say I approve of any Republican president in my lifetime next time we have a crisis. America rallied behind George W. when we were attacked. Nobody said, "why didn't Bush prevent the terror attacks". With Obama we have 2 possible terror attacks, which one of them was from a US citizen in NYC and people raise hell, and think he has practically been behind the terror attacks.

The Center right folks never give Obama the benefit of the doubt the way the Ctr left tried to give George W. the benefit of the doubt in his first two years. George H w Bush had an oil spill which took 20 years to get settled in Alaska. I never remember people complaining about him. He actually didn't want offshore drilling, and people continued to support him.

I think it is sad what our nation is coming to. I hope and pray it isn't racially based. When the righties on this web site think we want to try to win votes by implying the GOP is racist, that simply isn't true. I would give anything to live in a society where race and politics didn't play such a big role. It is something I hope isn't true.

I don't want to generalize but there are some clear examples. What Steve King said from Iowa that "obama cares for Black people first and everyone else next" was an outright lie. Obama gets critisized every day for not spending enough time in African American inner cities. I hear they think he is out of touch with them. I mean his home city has an appauling rate of children gunned down, but Obama leaves it up to the state and city to take care of enforcing the law there.

Obama has spent most of his time as president with White people; in Europe, on the gulf and with our senate with one African American member. Limbaugh and others who think Obama is a reverse racist are simply scumbags and liars who unfortunately many Americans believe like a religion.

I can only hope things will get better. It is truly a sad state of affairs what our nation has become.



Any system that tries to predict future performance instead of just describing past performance is going to make larger adjustments to the ratings of organizations that have conducted only a small number of polls. The issue can't go away: Was that single great prediction just a matter of luck or was it really a sign of skill?

In the end, the proof is in the pudding. How good the rating system is must be compared to how well it predicts the accuracy of pollsters in the NEXT election. Of course it is very likely that the quality of Nate's rating system will itself look better for prolific pollsters than for "boutique" pollsters - at which time he will investigate the nature of the intangible's that made the good one's better. Thus, I expect the system will improve over time. Certainly Mark's suggestion to let pollsters review their own data has made it better already.



@Harry Enten, who wrote: I cannot speak for Mark or the Pollster.com banner, but I can say that I (along with two others) have been working on a system that does evaluate "pollster accuracy" in the broad sense... The purpose of such a scheme is not to grade pollsters, but to create a more accurate snapshot of where the election stands.

In case you are new to the game, this is exctly the purpose of Nate's pollster ratings and has been so since 2008. Welcome to the effort.

I think this is a move in the right direction toward greater accountability by the pollsters. Mark Blumenthal has been talking about this for several years, and the efforts by AAPOR have been a positive step. Pollster itself has published quite a few comments/articles on the "outlying" Rasmussen polls. But they haven't taken a larger step of weighting or adjusting polls for accuracy when they calculate their long-range trendlines. I wish they would.


Harry Enten:

Google my name... You'll see if I'm "new to the game". http://poughies.blogspot.com/2008/09/obama-set-to-rout-mccain.html

The fact of the matter is that Nate system seems to have a dual purpose... One is to predict future performance, and the other is to review past performance.

As Patrick Murray points out, it is somewhat difficult to tell what exactly is going on here.

The purpose of our system will be SOLELY to give people a better idea of where the election stands right now... We're not in the interest of ranking pollsters and throwing it out there because any measure of accuracy is difficult at best and impossible at worse. If nothing else, it is quite arbitrary.

More later.



@Patrick Murray: Maybe you missed Nate's column from June 16th in which he said that he has already been receiving corrections and additions to his database from pollsters and has invited all pollsters to participate in this process.

I think this is a positive result of doing rating systems. My guess is that it will also increase the number of polls in his database, since there is no single compilation of such polls (or the main results that are used in calculating accuracy), either at Pollster or any place else. And what is published or available in newspapers or at various websites sometimes has errors and misprints. So asking the pollsters to check their own data is a good idea.

An ideal situation would be if, in response to Mark's efforts via AAPOR or just more directly via Pollster, all researchers had a "fully disclosed" common database of polls, instead of having each analysit trying to pull them together from a dozen different sources, some "public" (i.e., published) and some "private" (or at least behind a pay window).

"Full disclosure" would involve following the AAPOR standards that Mark has talked about. We are far from this situation right now. And as a result we ended up with the fiasco of the likely fraudulent Strategic Vision, Inc. polling.



"Openess", "full disclosure", AAPOR, blah, blah, blah. There are different polls for different purposes. If a pollster chooses to manipulate a poll, he will find a way to manipulate a poll - openess or not.

Will AAPOR decide whether polls should be of all adults, registered voters, or likely voters? Will they create a standard set of figures of political party persuasion so that all pollsters weight their polls equally?

Will AAPOR determine the proper method for measuring approval/disapproval? How do you even determine accuracy on an approval poll?

Nate Silver has taken something simple and created a Rube Goldberg machine to magically make the raw data better than it is.

I hope pollster.com doesn't start weighting polls. The reason I like the charts on this site is because if you spend enough time with them, you find which ones are consistent and dependable by dicing and slicing the results.



hoosier_gary - As I mentioned in response to a prior comment, apparently he justifies his decision to provide the "bonus" by some of analysis of his data that showed that members (as of June 1) were more likely to remain good if they were good, and improve if they had been poor. (And please don't aggrandize your own opinion, as what this article says about the "bonus" is not how you're trying to spin it.)

It would be nice if Silver released that analysis, since it'd be an interesting result of itself, but at the very least there's a claim of justification there.


I don't at all understand why the author of the above article finds the complexity to be detrimental. Occam's Razor does favor simpler models when they perform equally well, but we still need to show that models do perform equally well. (And I don't see how he's judging the simpler models to have performed "just as, or even more, accurate" when this one hasn't even been started to be used yet, and that still says nothing about precision.) It would be nice if Silver did the work to show us that simpler models aren't as predictive (by, say, releasing an array of "hindcasts" for 2008 by training on pre-2006 data), but other people can certainly also try to show that simpler models can work just as well by doing the same thing, which they really haven't (yet).

I do think there's starting to be an argument that "transparency" is actually just a proxy for some other metric that would be more intuitive, and that it'd be better to figure out what that might be. I hope Silver looks into suggestions like replacing the binary categorization of "transparency" with some fit based on how prolific the pollster is over all time, or how large the companies are, or how many interviews they conduct per year, or the such, and regressing to the pollster's expected PIE on those fits. It surely sounds promising.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.