Pollster.com

Articles and Analysis

 

Column on NCPP/AAPOR Effect & Answering Silver

Topics: Accuracy , David Shor , Fivethirtyeight , Nate Silver , Poll Accuracy

My column for this week follows up on last week's topic from a different angle: Nate Silver's intriguing finding that as a group, pollsters that are members of the National Council of Public Polls (NCPP) or that endorsed the worthy Transparency Initiative of the American Association for Public Opinion Research (AAPOR) appear to be more accurate in forecasting election outcomes than other pollsters. While I'd like to see more evidence on this issue, it is definitely a topic worth further exploration. I hope you click through and read it all.

And yes, this has been the fourth or fifth item from me on Silver's ratings in a week, with two more from guest contributors, so it's time for me to move on to other subjects. However, since Nate responded yesterday, I want to clarify two things about my post on Friday:

First, he quarrels with my characterization of his effort to rate polls "from so many different types of elections spanning so many years into a single scoring and ranking system" as an "Everest-like challenge:"

Building the pollster ratings was not an Everest-like challenge. It was more like scaling some minor peak in the Adirondacks: certainly a hike intended for experienced climbers, but nothing all that prohibitive. I'd guess that, from start to finish, the pollster ratings required something like 100 hours of work. That's a pretty major project, but it's not as massive as our Presidential forecasting engine, or PECOTA, or the Soccer Power Index, all of which took literally months to develop, or even something like the neighborhoods project I worked on with New York magazine, which took a team of about ten of us several weeks to put together. Nor is anything about the pollster ratings especially proprietary. For the most part, we're using data sources that are publicly available to anyone with an Internet connection, and which are either free or cheap. And then we're applying some relatively basic, B.A.-level regression analysis. Every step is explained very thoroughly.

The point of my admittedly imperfect Everest metaphor was not that Silver has attempted something that requires a massive investment of time, money or physical endurance, but rather that the underlying idea is ambitious: Using a series of regression models to combine polls from 10 years and a wide variety of elections, from local primaries to national presidential general elections, fielded as far back as three weeks before each election, with controls to level the playing field statistically that all pollsters are treated fairly.

I am not an expert in statistical modeling, but when I ask those that are, they keep telling me the same things: Nate's scoring system is based on about four different regression models (only one of which he has shared), and he does not provide either standard errors of the scores (so we can better understand what the level of precision is) or the results of sensitivity testing (to test what happens when he varies the assumptions slightly -- do the results change a little or a lot). If there is "nothing especially proprietary" about the models, then I don't understand the reluctance to share these details.

Second, I will concede that my headline on Friday's post -- "Rating Pollster Accuracy: How Useful?" -- was an attempt to be both pithy and polite that may have implied too broad a dismissal of the notion of rating pollster accuracy. I do see value in such efforts, as I tried to explain up front, especially as a means of assessing polling methods generally and new technologies in particular. SurveyUSA, for example, has invested much effort into their own pollster scorecards over the years as a means of demonstrating the accuracy of their automated survey methodology in forecasting election outcomes. That sort of analysis is highly "useful."

And I also agree, as Berwood Yost and Chris Borick wrote in their guest contribution last week, that individual pollster ratings offer the promise of "helping the public determine the relative effectiveness of polls in predicting election outcomes [that] can be compared to Consumer Reports." The reason why I have found past efforts to score individual pollsters not very useful toward that end is that it's difficult to disentangle pollster-specific accuracy from the loud noise of random sampling error, especially when we have only a handful of polls to score. And as I wrote in December 2008, very small changes in the assumptions made for scoring accuracy in 2008 produced big differences in the resulting rankings. Except for identifying the occasional clunker, efforts to rate the most prolific pollsters usually produce little or no statistically meaningful differentiation. So in that sense, they have not proven very "useful."

That said, as Nate argues, the challenges may be surmountable. I'm confident that other smart statisticians will produce competing ways of assessing pollster performance, and when they do, we will link to and discuss them. David Shor's effort, posted just last night, is an example with great promise.

Update: John Sides weighs in on the usefulness of pollster ratings.

 

Comments
Huda:

interesting assessment, who knew I'll understand stats beyond my uni studies. Personally, polling accuracy is only important in so far as gauging the public's sentiments and opinions on issues that directly effect them, and of course during election session.

quote: "individual pollster ratings offer the promise of "helping the public determine the relative effectiveness of polls in predicting election outcomes [that] can be compared to Consumer Reports."

how about some of these individual polls are more interesting in creating and driving a particular political narrative or formulating a given public opinion that does not reflect the majority of the public or reality for that matter?

Consumer Reports is completely different from a given pollster who does phone or internet interview @ a given time, sampling people based on their political affiliation. I would argue that CR is based on reporting and results from its in-house testing laboratory, not something that is so temporal it changes as other conditions change: the economy, war, security, etc.

____________________

Patrick Murray:

Thanks Mark. David Shor's analysis promises to be most compelling -- and definitely "useful" in evaluating pollsters' work.

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR