Mark Blumenthal | June 10, 2009
Topics: Polling Errors , Registration Based Sampling , Virginia
What a difference perception makes. Last year, the New Hampshire Democratic primary produced an unprecedented polling "fiasco," also described as "one of the most significant miscues in modern polling history" (to quote two of the most respected voices in political polling). This morning, I see no such angst, and for good reason. Creigh Deeds won Virginia's Democratic primary for governor by a crushing margin after the final polls had shown him leading by a double digits and trending sharply upward.
But look closer. If we simply compare the final polls to the actual results, the "polling errors" were actually bigger in Virginia last night than in New Hampshire. In New Hampshire, as the table below shows, the final polls as summarized by our trend estimates of the vote for Barack Obama and John Edwards came remarkably close to their actual percentages but understated Hillary Clinton's support by nearly 9 percentage points. Last night, our trend estimates came within tenths of a percentage point from the actual votes won by second place finishers Terry McAuliffe and Brian Moran, but they understated Deeds' final tally by a whopping 13 percentage points.
The crucial difference, of course, is that the Virginia polls gave us clear direction of both the winner and the final trend, while the New Hampshire polls pointed us in the wrong direction on both. So while we may have been a bit surprised by the margin last night, we had ample warning that uncertain voters were "breaking" to Deeds over the final days of the campaign.
Still, it is worth noting that simply extending our trend lines on either Deeds' support or his margin over McAuliffe from Sunday through Tuesday does not explain or predict the ultimate margin. Here, courtesy of Charles Franklin, is a chart that extends our trend estimate for Deeds' support using either our standard estimate (the solid blue line), the more sensitive estimate (the dashed blue line), or a straight line ("linear fit") based only on the polls conducted since completed after May 15. As Charles writes via email, "All three are essentially the same as of election day, at 39% or so."
You see the same pattern if we plot Deeds' margin over McAuliffe (the Deeds percentage on the poll minus the McAuliffe percentage). It shows the same sharp upward trend with Deeds clearly ahead but, again, not by as much as his actual 23 point margin.
My point here is not to bash the polls in Virginia. To the contrary, the much derided automated surveys conducted by Public Policy Polling (PPP) and SurveyUSA, as well as the live interviewer polls from Research 2000 and Suffolk University, provided a consistent, clear and apparently accurate picture of the trend in voter preferences (though they were more divergent about the level of candidate support seen earlier on). As with Clinton in New Hampshire last year, we can probably never know for certain whether their final estimates understated Deeds support or whether he benefited from a virtually monolithic "break" of undecideds in his direction in the closing hours.
The point, which I tried to make over the weekend, is that late shifts and polling errors are a lot more common in primary elections because turnouts are smaller and likely voters are harder to select, because partisans are not locked into choices based on party affiliation (as they are in general elections) and because the dynamics of contests featuring three or more viable candidates produce more volatility on voter preferences.
On a slightly different topic: I wrote earlier this week about the potential benefits of sampling low turnout primaries with lists of registered voters, a subject of sometimes fierce debate among pollsters. I am not sure we can draw firm conclusions on that subject from the final wave of polls in Virginia, since polls using both list samples (PPP and Suffolk University) and those using random-digit-dial samples (SurveyUSA and Research 2000) obtained generally similar results and tracked similar trends over the last week. Earlier on, however, the list sample polls tended to show lower support for McAuliffe, although again we are limited by our small "sample size" of pollsters and inability to control for other issues (such as question format and degree of screening).
That said, the survey world needs to take more seriously the argument that PPP's Tom Jensen made last night. This Virginia primary, he wrote, "was the perfect race to be polling using the voter list and automated calls." In particular, I wish someone in would devise a a randomized controlled experiment (and place the results into the public domain) to test Jensen's implicit assertion about non-response bias and automated surveys:
When you're dealing with an automated poll folks who don't intend to vote don't feel the sort of social pressure they might feel from a live interviewer to participate. So folks who didn't plan to vote didn't bother to answer the poll. No harm done. You don't want a high response rate from people who aren't going to vote.
In other words, if we hold all other factors constant -- something never possible with after-the-fact comparisons of results from different pollsters -- does calling with an automated method do a better job selecting truly likely voters than calling with live interviewers? That could be done with an experiment that samples with a registered voter list and updating after the election to validate turnout.
Let's give Jensen some due credit. Using his survey measurements he predicted a turnout "somewhere in the range of 300,000 voters," while others were less certain or predicted lower numbers (though apparently not the McAuliffe campaign). With 99.8% of precincts counted, over 320,369 votes were cast last night. Even Nate Silver has trouble predicting turnout.