Articles and Analysis


SurveyUSA Report Cards: A Correction

Topics: 2008 , ARG , IVR , Mason-Dixon , Pollsters , SurveyUSA

A week ago, I linked to two new pollster "report cards" prepared by SurveyUSA (one for all pollsters, one for the 14 most active this year), based on average accuracy scores for all pollsters that have released presidential primary surveys this year. I included a few paragraphs to try to add some perspective, both on these specific report cards and the subject of measuring pollster accuracy in general. I did not intend to be dismissive of SurveyUSA's work nor their generally excellent performance both this year and in prior years, although I can understand why some may have read it that way. Regardless, SurveyUSA's Jay Leve has posted a lengthy response worthy of further comment.

First, and most important, Leve's post highlights an error that I need to correct. I wrote:

SurveyUSA bases their ranking on one particular measure of polling error, which compares the margin between the percentages received by the first and second place finishers on election day to the margins as reported for the same two candidates on the final poll. There are other measures of poll error (SurveyUSA has posted a paper they authored that reviews eight such measures). Those critical of SurveyUSA will note that they typically report very small percentages for the "undecided" category, so they tend to do better on their measure of choice (Mosteller 5) which does not reallocate undecided voters [emphasis added].

The words in italics are not correct, at least according to the data that SurveyUSA includes on an interactive spreadsheet posted on their web site that summarizes head-to-head accuracy comparisons against other pollsters over the last five years. That spreadsheet shows that, if anything, the opposite is true: SurveyUSA tends to do a little worse relative to other pollsters on the Mosteller 5 measure than it does on other measures. I have corrected the original post, and I apologize for the error.

SurveyUSA is understandably sensitive to slights from the "traditional 'headset operator' telephone pollsters," who according to Leve, "have worked for 16 years to mock and marginalize the innovative work done by SurveyUSA." While there is some truth to that characterization, I hope readers will appreciate that I have not been among the "mockers." In fact, I took to the pages of Public Opinion Quarterly, the most respected journal of survey methodology, to advise that while "healthy skepticism is appropriate . . . a reflexive rejection of IVR as 'theoretically unsound' seems unwarranted." In the same article I quoted from a paper by an academic methodologist (Joel Bloom, now of SUNY-Albany), showing that SurveyUSA had "'performed at roughly the same level as other nonpartisan polling organizations in 2002,' though it did 'somewhat better' on 'most measures.'"

While it was unfair of me to imply that SurveyUSA "cherry-picked" (as Leve put it) a favorable measure for their 2008 report card, the issue of how the various measures of polling error handle the "undecided" category is important and may have implications for where some pollsters rank. That issue is the underlying theme of the paper on such measures that SurveyUSA linked to in their scorecard post. For the record, that paper makes the case that three other Mosteller measures (Mosteller 3, 4 and 6, but not Mosteller 5) should theoretically benefit a pollster with low undecided voters, and concludes by arguing for a new measure that "rewards the pollster whose estimate is not just the most precise, but whose numbers leave him/her the least amount of wiggle room." For their 2008 report card, however, SurveyUSA picked a measure that is typically tougher on them than the others available, and they deserve credit for that decision.

Aside from the issue of how to measure error, however, there are some additional issues still worth discussing. For example, Leve does step up and suggest at least one way to determine statistical significance from their error comparisons, but it is limited. I had raised the issue of how to identify "statistically meaningful" differences on a pollster scorecard because, to be perfectly honest, we have been discussing how to best create our own scorecard and provide appropriate guidance.

In his response, Leve points to their "Interactive Election Scorecard", a spreadsheet which (among other things) computes the odds of SurveyUSA besting their competitors over the five years of comparisons included therein. Unfortunately, the spreadsheet is not set up to allow for similar comparisons among other pollsters or (as far as I can tell) for comparisons filtered for individual election years. The 2008 report card tells us, for example, that Mason-Dixon has an average error score of 8.26 on 19 polls while ARG has a score of 8.50 on 20 polls. It tells us that SurveyUSA had an average error on 4.50 on 22 polls, while Gallup had an average error of 4.60 on 2 polls. Are those differences statistically meaningful? The point of these examples, by the way, is not to trash the SurveyUSA report card but to underscore that these are tricky questions.

The issue of timing -- which Leve promises to address in the future -- remains important. In my post, I wrote that the SurveyUSA report card is based:

[O]n the last poll conducted by each organization. Typically, surveys get more accurate as we get closer to election day, and the polls conducted a week or more before the election tend to be at a disadvantage when compared against those from organizations like SurveyUSA that typically continue to call right up until the night before the election. You can decide whether that issue is a "bug" in the report card or a critical "feature" in SurveyUSA's approach to pre-election polling.

I realize, in retrospect, that my argument and language were a little too glib. First, while polls generally tend to get more accurate as election day approaches, I do not know for certain that SurveyUSA has a meaningful advantage on these accuracy scores because they do more late polling. I can certainly think of specific races in which they have had such an advantage, but those are anecdotes. We have a still unresolved empirical question here as to how much of SurveyUSA's relative accuracy accrues from polling a bit later in the process than many of their competitors.

Let's assume for the sake of argument that SurveyUSA tends to score higher on accuracy measures because they field more polls later. One conclusion would be that their methodology -- which involves very short questionnaires and the ability to make a lot of calls for less money within a short period of time -- allows their clients to do more polling later in the campaign. The net result is a more accurate depiction of the horse race in the final hours of the campaign. In other words, under this hypothetical, the difference amounts to a "feature" not a "bug."

At the same time, again to the extent that differences in "accuracy" depend on timing, it may not be fair to describe all of the pollsters that tend to stop earlier as relatively "inaccurate." In some cases, their surveys may have been equally accurate at the time, but received lower accuracy scores because of shifts in vote preference that occurred in the final week of the campaign. Keep in mind that different surveys are done for different purposes, and those purposes sometimes come with methodological trade-offs. If a media organization wants to measure opinions on a wide variety of attitudes beyond the basic horse race question (especially if those measurements involve open-ended questions), then an automated methodology makes less sense. Moreover, media organizations that sponsor more in-depth surveys typically want to gather their data sooner, to drive stories over the final week of the campaign, rather than waiting until election eve to release the data.

We need to understand that different polls are done for different purposes and a one-size-fits-all measure of accuracy may not make sense for all polls. Either way, this is certainly a topic wide open for further commentary, debate and, ideally, more empirical evidence.


Daniel T:

If I sound like a broken record forgive me.

As you correctly note, poll accuracy has to be considered in light of the purpose of the poll. But I would go one step futher. By defination, a poll can only be considered accurate if it correctly measures what it is supposed to measure at the time it was taken. Polls are not predictive and to treat them as if they were may make for an amusing game but is really harmful in the long run. Put another way, accuracy is accuracy and prediction is prediction and the two concepts must not be confused. A poll can be perfectly accurate yet fail woefully as a predictor. If you wish to create an accurate predictor of election results, you have entered the field of forcasting and left polling far behind.



The purpose of the poll?

It seems that has become an easy escape hatch for those who don't want to be benchmarked.

IF the purpose of a poll two weeks out is simply to describe the race in someway why don't pollsters note that BEFORE the election rather than after? And then only if they are wrong.

Three significant articles ran in Connecticut's largest daily two weeks before the election with a poll many felt was horribly done and wrong from the outset. None of the articles said "This poll may fail woefully as a predictor."

Of course when the UConn was slammed for underrepresenting minority voters and using list based methodology for the first time since '78 they trotted out all the excuses to explain why they missed the outcome by 18 points.

If pollsters don't want to be judged say it up front in the press release and in the articles by the media funders. The truth is they don't because they fear the reaction of the media. It simply isn't what they want to hear and too many pollsters are afraid to risk the check.



Mark is absolutely right that timing is important, and Rick's example of the UConn / Courant poll is perfect. Between the time that the poll was conducted and the actual election (2 1/2 weeks), both candidates began to campaign in the state for the first time in the season (both ads and visits) and John Edwards left the race. While Rick cites these factors as "excuses," I find it hard to imagine any rational pollster or politico dismissing such events as meaningless to the race's outcome.

Also to Rick's point of pollsters not warning the media and the public before an election that their results should not be used as predictions, the UConn poll had the following lines in their press release: "Even at this late date, however, the race may have only just begun. Both Obama and Clinton land their ad campaigns in Connecticut this weekend, and with one-in-five likely voters still undecided, the campaigns have the potential to make a difference."

Unfortunately the media aren't the only ones who choose to ignore warnings like these, and thereby interpret numbers and outcomes in a way that is methodologically irresponsible and damaging to the entire polling industry.





Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.