I had intended to post a "quick" summary of what Tuesday night's results say about how the polls did, but like a thread pulled on a sweater, my outline kept getting longer. So apologies for the delay in getting this summary posted. What follows is a review of how the polls performed this year, with a closer look at the question posed yesterday by our own Brian Schaffner, was it "a victory for IVR polling?"
New Jersey. Our final trend estimate based on all pre-election polls was dead even, with each major party candidate receiving 42.0% of the vote and independent Chris Christie 10.1%. Christie had a one-point lead on the RealClearPolitics average of the last five non-partisan polls (+1.0%), roughly the same margin as using our more "sensitive" trend line (+1.1%).
The unofficial count, as of this writing, has Christie leading by 4.3% (though as noted yesterday, all of these unofficial results are likely to change slightly as provisional and absentee ballots are counted). So the average polling error in New Jersey was between 3.3% and 4.3% depending on the average. Nate Silver did a compilation of comparable New Jersey polling errors (compared to final averages) on 9 previous elections that ranged from a low of 0.5 to a high of 4.8. So the error yesterday, while higher than average, fell well within recent experience.
At the same time, nearly everyone has noticed that the average of the final polls from three organizations using an automated methodology (sometimes refered to as "interactive voice response" or IVR) had Christie ahead by four percentage points (46% to 42%) -- roughly the same as his unofficial margin -- while the last three live-interviewer telephone polls had Corzine leading by an average of one point (41% to 40%)
As I wrote on Monday night, what makes that gap between automated and live-interviewer polling interesting is that it was not some random fluke on the last few polls, but persisted throughout the campaign to a degree that we did not see in Virginia this year or in most states during the 2008 presidential election. My conclusion was that the consistency in the estimate of Corzine's vote on so many recent polls suggested a looming "incumbent effect," that voters had largely made up their mind on Corzine but that a small but critically important number were still weighing whether to support Christie or Daggett. So, the theory goes, the IVR polls did better by removing the live interviewer and simulating a secret ballot, thus pushing voters harder to make a choice and more accurately recording their true intentions over the phone.
And what happened to Daggett? Our final trend estimate had him at 10%, but he received only 5.8% of the vote. Although it had been rising until mid-October, Daggett's support ultimately followed the traditional pattern. Many voters that had been intrigued by his candidacy ultimately concluded that their votes would be wasted and opted to support either Christie or Corzine. The Fairleigh Dickinson Unversity poll provided a hint of where Daggett's support was heading in an experiment conducted on their last survey: They found that Daggett received just 6% -- the same number he won on election day -- when they only named Corzine and Christie as candidates but accepted Daggett as a volunteered choice. When they offered a three-way-choice that included Daggett, his support jumped to 14%.
Virginia. Republican Bob McDonnell's victory in Virginia was never in doubt during the final weeks of the campaign, so political junkies were less obsessed with the polling numbers, but the polling errors in Virginia were, on average, about the same as in New Jersey. Our final trend estimate had McDonnell ahead by 13.4% (54.7% to 41.0%). The unofficial tally has McDonnell leading by 17.4% (58.7% to 41.3%) so the error, as of this writing, averages 3.7 points on the margin.
In Virginia, the gap between the results of automated and live interviewer polls was not nearly as big or as consistent as in New Jersey. The average of the final automated polls in Virginia conducted by PPP, SurveyUSA and Rasmussen had McDonnell at 56% compared to 54% on the final polls in the last week conducted by five organizations using live interviewers, while both sets of poll gave Democrat Creigh Deeds an average of 41% of the vote. However, the final automated polls by SurveyUSA and PPP along with the live interviewer survey by Virginia Commonwealth University are closest to the final margin (as of this writing).
New York City. Our final trend estimate had Mayor Michael Bloomberg leading Democratic challenger William Thompson by a 14-point margin, (53.1% to 39.0%), but Bloomberg won by less than five (50.6% to 46.0%) so the polling error is large (9 points on the margin) -- roughly the same as the infamous New Hampshire polling debacle).
What happened? Marist pollster Lee Miringoff describes it as a "text book case of pre-election poll analysis:"
It is not unusual in contests between a well-known incumbent (Bloomberg) and a relatively unknown challenger (Thompson) that the incumbent ends up getting pretty much the same number he was attracting in pre-election polls. Undecided voters tend to find the challenger or not vote at all, having already rejected the incumbent.
He refers, of course, to the "incumbent rule," a subject I speculated about at length in 2004, only to see it generally not apply that year, in close races in 2006 or 2008. That said, it does appear to have returned in New Jersey and New York City on Tuesday.
But that apparent reemergence raises an important question: If the rule is no longer a "rule," but rather a phenomenon that occurs only occasionally, how do we know to expect it? Miringoff wrote yesterday that Marist's polls "showed the trend that Democratic voters were 'coming home' to Thompson." That result would have been a helpful warning sign. Problem is, I can't find any reference to it in Marist's final poll release. Instead, I find this prediction: "If today were Election Day," they wrote on Wednesday without qualification, "Mayor Michael Bloomberg would handily win a third term."
If anyone deserves to say "I told you so" in New York, it is Thompson pollster Geoff Garin, who released a survey last week showing Thompson gaining (he said), trailing by only 8 points (38% to 46%) and by only 3 points (41% to 44%) among those who said they were certain to vote. The release prompted Bloomberg spokesman Howard Wolfson to retort that it "gives new meaning to the term margin of error." Not exactly. (And yes, we managed to miss this poll and omit it from our chart -- apologies to Garin and our readers for that oversight).
I asked Garin for his thoughts and he agrees that "undecideds split against incumbent" in the New York race and that such a split was knowable in advance, but argues:
[I]t is stupid to think they would split 100 to nothing. There was a high undecided in NYC because voters were cross pressured -- they did not want to reward Bloomberg for his bad behavior on term limits, but they didn't know enough about Thompson to know whether he would be up to the job.
Garin also thinks their sample made a difference:
I think the main reason we did better and the public polls were off is that we worked off the voter file, and were persnickity about who we took into what was very likely to be a low turnout election. Even among whites, the smaller the turnout scenario the better for Thompson. I am sure the public polls let in too many people.
Maine Question 1. Polling on the gay-marriage referendum was far more limited -- just seven public polls released over the course of the campaign -- and the complicated ballot language and the error prone nature of prior referenda poll warned us to expect the unexpected. Yet while the differences between the final polls were relatively small, it is worth noting that the automated survey from PPP was the only one that showed more support for the anti-gay marriage position than opposition. Our final trend estimate showed the No side (pro gay marriage) with a two point lead (49.4% to 47.1%) but Question 1 won by nearly six (52.8% to 47.2%).
While this one experience is far from a conclusive test, there are at least theoretical reasons to think that automated surveys have an advantage in measuring true preferences on issues like gay marriage, where the presence of a live interviewer might introduce some "social discomfort" that would make the respondent reluctant to reveal their true preference.
* * *
So were automated IVR polls the big winners on Tuesday, as Mickey Kaus, Taegan Goddard and PPP's Tom Jensen argue? If what you care about most is predicting the winners, it is clear that the automated surveys provided a more accurate gauge of the outcome, especially in New Jersey where the closer simulation of the secret ballot probably gave us a heads up of an imminent "incumbent rule" effect favoring Christie. SurveyUSA also deserves credit for coming closer than most pollsters to the final margin in New Jersey, Virginia and New York City.
But that said, consider that we count on polls to do much more than predict the outcome. In addition to the points raised by Brian Schaffner here yesterday, consider two things:
First, as a live-interviewer media pollster pointed out to me yesterday, there were some inconsistencies with subgroups, particularly by race. As the table below shows, despite relatively small sample sizes, the three automated surveys showed Republicans Christie and McDonnell winning a greater percentage of the African American vote than the final live-interviewer surveys and the exit polls (though there were a few inconsistencies; namely Rasmussen in New Jersey and Marist in New York City).
If you believe the exit poll result, then the automated surveys provided a generally misleading sense of whether the Republican candidates were about to make bigger inroads than they did among African-American voters (consider also commenter RussTC3's observation about big differences between job approval ratings as measured by PPP and the exit polls -- as Mike Mokrzycki reminds us we do polls for reasons other than predicting the outcome).
Second, there is one last contest we need to review....
New York 23. Although three last minute polls on the special election in New York's 23rd Congressional District conducted after Republican Dede Scozzafava withdrew from the race last Saturday showed Conservative Doug Hoffman leading Democrat Bill Owens by margins of between 5 and 17 points, Owens prevailed by 4 points (49.0% to 45.9%). Whatever shortcomings we might identify in the polling, the far bigger error was the interpretation applied by pundits, most notably me, who foolishly assumed that the trend in Hoffman's direction was unstoppable and that normal assumptions about last minute developments would apply. In retrospect, it is obvious that there was nothing normal about the last 72 hours of this particular campaign.
Moreover, we should have paid closer attention to the evidence of growing voter uncertainty in the final Siena Research Institute poll. Their final survey, conducted on Sunday night, showed Hoffman with modest but not quite statistically significant lead (41% to 36%) but also a doubling of the undecided (from 9% to 18%) in just a few days. So their poll showed that voter uncertainty was surging at a time when it is usually nonexistent. To his great credit, Siena pollster Steven Greenberg also argued that Owens might still gain from the Scozzafava endorsement on Sunday since "most voters are not political junkies" and had not yet heard the news" (an argument I boldly dismissed since few undecided voters had a favorable impression of Scozzafava -- apologies to Greenberg for that).
But while we might plausibly reconcile the results of the Siena poll with the outcome, the PPP survey is another story. While their estimate of Owens' support (34%) was within a few points of the other polls, PPP had Hoffman receiving five percentage points more support (51%) than he ultimately received (45.9%). A late shift among the undecided voters cannot explain the difference.
I am planning to look more closely at this example, but the important point for now is that while the automated polls turned in a strong performance in New Jersey, Virginia and Maine, the PPP poll in NY-23 was highly misleading.
The larger lesson is this: Automated polls have been maligned, unfairly in my view, as inherently "unreliable." Yet when it comes to predicting election outcomes they continue to prove, NY-23 aside, at least as reliable as surveys done by conventional means. In New Jersey this week, they were more accurate in predicting the winner. At the same time, however, it would be wrong to jump to the opposite conclusion and place inherently greater trust in all automated surveys, especially when used for purposes other than predicting election outcomes.
All polls have their limitations. Rather than trying to divide them into two categories, "reliable" and "crap," we might do better to try to understand their limitations and interpret the results we see accordingly.
"So who was the most accurate pollster yesterday?"
If I had $100 for every time I've been asked that question by a reporter on the Wednesday morning after an election, I could retire early. And after five years of blogging on this beat, it's a question I'm determined to refuse to answer today.
First, all the votes are not yet counted (including 7% of the precincts in NY-23), and the counts that are available do not yet include the absentee and provisional ballots that will be added later and are not reflected in those percentage-of-precincts-reporting statistics you see on all the media vote counts morning. Take a look at this snap judgement from November 5, 2008. It declared a "big winner" among prognosticators on the assumption that Barack Obama won by 6.1 percentage points (52.4 to 46.3), but when all the ballots were counted the margin was 7.2 (52.9% to 45.7%). So that particular snap judgement picked the wrong "big winner."
Second, the whole notion of crowing a "big winner" based on a handful of polls in a handful of states is foolish. The final polls yesterday had random sampling error of at least +/- 3 percentage points. If a poll produces a forecast outside its margin of error, that's important. But if several polls capture the actual result within their standard error, chance alone is as likely as anything else to determine which one "nails it" and which miss by a point or two.
Third, there are sometimes other problems with making too much of "hitting a bullet with a bullet" on the final poll, when the polls leading up to it provide different results.
Yes, there are several good stories about what went right and what went wrong with yesterday's polling, including some important lessons about the value of automated polling. Some pollsters certainly did better yesterday than others. And I'm hoping to have something written and posted on that subject later today, provided that I don't get bogged down by the calls and emails from reporters wanting me to tell them, "who was the most accurate pollster yesterday?
Wall Street Journal, 11/3/09:
Republicans Are Poised for Gains in Key Elections
Outcomes in New York, New Jersey and Virginia Are Unlikely to Forecast Much About National Races in 2010, History Shows
Republicans appear positioned for strong results in three hard-fought elections Tuesday. But isolated, off-year contests aren't always reliable indicators of what will happen in the wider federal and state races held in even-numbered years.
Wall Street Journal, 11/4/09:
Republicans Win in Key States
A Republican sweep in Virginia and New Jersey on Tuesday shifted the political terrain against President Barack Obama only a year after his historic election.
PS For the record, the WSJ was right the first time. Despite what the press will tell you, a handful of off-year elections don't tell us much about the "political terrain" facing Obama and the Democrats. As Matthew Yglesias points out, we have these things called "polls" that we can use to measure people's political beliefs and opinions. Perhaps we should consider using those instead.
Update 11/4 11:41 AM: Dave notes in comments on my blog that the first story includes a similar passage about the election potentially revealing "much tougher political terrain," which I missed:
A Republican sweep in Tuesday's key contests would at minimum show that Democrats face much tougher political terrain than they did a year ago.
I'm not sure what the passage means (the metaphor of "political terrain" is not well-defined) but it seems to contradict the lede of the story, which states that off-year elections are not reliable indicators. The point remains that the ledes are in tension (if not in direct contradiction).
It's also worth noting note the contradiction between the election "show[ing]... political terrain" (11/3) and the results actually "shift[ing] the political terrain" (11/4). Maybe it's time to retire the metaphor, which lets reporters vaguely suggest that things have changed without specifying how.
Update 11/4 8:49 PM -- Eric Boehlert at Media Matters has a virtually identical item on the AP's election coverage:
The AP on Tuesday:
To be sure, it's easy to overanalyze the results of such a small number of elections in a few places. The results will only offer hints about the national political landscape and clues to the public's attitudes. And the races certainly won't predict what will happen in the 2010 midterm elections.
The AP on Wednesday:
To be sure, each race was as much about local issues as about firing warning shots at the politically powerful. But taken together, the results of the 2009 off-year elections could imperil Obama's ambitious legislative agenda and point to a challenging environment in midterm elections next year.
(Cross-posted to brendan-nyhan.com)
Since 2002 (and probably earlier), you could do pretty well in predicting the outcomes of races for President, Senate, Governor and even the U.S. House by collecting the final polls in each race and averaging them. In fact, in 2008, the final Pollster.com trend estimates and RealClearPolitics averages did as well or better at calling election outcomes as those more "sophisticated" models you heard so much more about last year.
The reason is that while highly variable, the final polls were largely unbiased in the aggregate. Any one poll might be way off from the final result, but the average of all of them usually comes reasonably close to the final result. There have certainly been exceptions in individual states, but in 2002, 2004, 2006 and 2008, the polls looked reasonably accurate once averaged across all states.
We may perceive things differently tonight. First, instead of watching polling across 20 or 30 contests, most of us are focused on just three or four races and for two of these -- New York's 23rd District and the Maine Question 1 -- we have only one or two recent polls to consider.
Second, as Nate Silver pointed out yesterday, the challenges in some of today's elections -- again, especially Maine and NY-23 -- may be more like what pollsters faced during last year's presidential primaries, where poll averages often missed the mark by wide margins.
Silver also posted a handy comparison of final poll averages in New Jersey elections since 2000 (below), which helps make two important points. First, as he writes, despite conventional wisdom to the contrary there has not been "any particular tendency by Democrats to outperform their numbers once the final polls are in." Second, though usually very close to the result, final poll averages in individual states typically missed the final margin by a few percentage points. So even though our final New Jersey trend estimate is a remarkable 42.0% to 42.0% tie, for example, the final margin will be close but probably not that close.
Which brings me to our final polling-wrap up for 2009. Here's what the final polls and our trend estimates are showing:
- New Jersey, again, ends up as 42.0% to 42.0% tie on our trend estimate, a contest simply too close to call between Democratic incumbent Jon Corzine and Republican challenger Chris Christie, with independent Chris Daggett running far behind at 10.1% and likely falling. My hunch, explained last night, is that Christie prevails.
- New York's 23rd District special election for Congress was the focus of much speculation over the weekend. The last two polls, both conducted immediately after original nominee Dede Scozzafava withdrew from the race, each had Conservative Doug Hoffman leading Democrat Bill Owens, but by widely different margins (5 and 17 percentage points). Our trend estimate, which has Hoffman leading by 7 points (43% to 36%) also factors in previous polling that showed a closer contest. The ultimate margin is anyone's guess, but my sense is that Hoffman will win comfortably.
- The outcome of Virginia's race for Governor has never been in much doubt. Republican Bob McDonnell began with a roughly 7-point lead over Democrat Creigh Deeds that never significantly wavered, widening to nine points by Labor Day and ending at our final trend estimate of roughly 14 points (55% to 41%).
- Ditto for the New York City Mayor's Race. Incumbent Michael Bloomberg led Democratic challenger William Thompson consistently on polling throughout the race, although his lead on our final trend estimate (53% to 39%) is sightly narrower than earlier in the year.
- And the very few polls on Maine's Question 1, the gay marriage referendum, show a close contest, although as I wrote earlier this afternoon, referenda polling is notoriously error prone. The age composition of the PPP survey seems closer to plausible than the final DailyKos/Research 2000 survey, but beyond that, your guess is probably as good as mine.
Two notes on what's coming up later tonight. First, the consortium of network news organizations (known formally as the National Election Pool or NEP) is conducting exit polls in New Jersey and Virginia tonight. While official results will not begin to appear until the polls close, some early leaked estimates will probably start to bounce around the internet sometime after 6:00 p.m. As I explained at about this time last year (on on most election days since 2004), these are not likely to be much more accurate than the pre-election polls summarized above. Very large grains of salt are in order.
And finally, we will be live blogging here once again tonight. If all goes well, we should be using a more advanced tool that will allow our all-star line-up of contributors (Charles Franklin, Kristen Soltis, Margie Omero, Steve Lombardo and hopefully more) to join in. We hope you'll join us starting at about 6:30 eastern time.
I have to admit that I had been hoping to take a closer look at polling on Maine's Question 1 on Gay Marriage over the weekend, but got distracted by the fuss over the New York 23rd District special election. The polling is difficult to evaluate partly because there has been so little of it. While I have a lot of confidence in our trend estimates in states with large numbers of polls, the small number of polls in Maine (7 total since Labor Dayl) allow for just crude linear trend lines.
Another reason why the Maine polls are difficult to evaluate is that issue referenda polling is so treacherous and prone to error. A 2004 paper by Joe Shipman, then director of election polling for SurveyUSA, showed that polling on ballot measures had triple the rate of error (9.5 average error on the margin) as polls in presidential elections (3.4) and nearly double that of contests for statewide offices (4.6). I summarized the assumed reasons for that greater error rate in a long post four years ago today, but the most relevant to Maine are a greater difficulty modeling the likely electorate and the problem of accurately conveying ballot language.
A particularly painful example followed a few days after that post, when a set of ballot initiatives in Ohio produced some of the biggest polling errors in recent memory. The combination of failing to poll late and not accurately reproducing the actual ballot language were likely culprits.
Let's start with the ballot language in Maine that voters are confronting right now:
Do you want to reject the new law that lets same-sex couples marry and allows individuals and religious groups to refuse to perform these marriages?
So, a "Yes" vote is a vote against gay marriage, and a "No" vote is for gay marriage. Confused? Imagine the uncertainty some Maine voters may be experiencing without that extra bit of explanation. As you can see in the table at the bottom of our chart, only the Pan Atlantic SMS surveys reproduce the actual ballot language -- and nothing else -- while the other pollsters provide a line of explanation to clarify the meaning of "Yes" and "No."
The final round of polling has shown a relatively close race, although results have varied. A survey conducted two weeks ago by Pan Atlantic SMS, shows the No side prevailing by an 11-point margin (53% to 42%), while a Daily Kos/Research 2000 poll conducted late last week shows a dead-heat (No 48%, Yes, 47%) Finally, the Democratic automated polling firm PPP conducted a survey over the weekend that had the Yes side ahead by a not-quite-statistically significant four points (51% to 47%) despite a very large (n=1,133) sample.
Complicating the issue further is that the final poll from PPP differed in both the age of the "likely voters" they selected and the way they interviewed (via an automated, recorded voice methodology). Less than a third of PPP's likely voters (32%) were under age 45, compared to more than half (51%) of the Research 2000/DailyKos survey. Both showed much more support for the No side from younger voters.
On the question of age, the Research 2000 sample was even younger than the Maine exit poll in November 2008 (43% age 18-45) and far younger than in November 2006 (36% age 18-45). Of course, the PPP sample was older than both, but keep in mind that exit poll estimates are sometimes too young.
The question of the automated mode is more complicated. The automated polls conducted by SurveyUSA in California 2008 may have picked up more support for ultimately successful anti-gay marriage Proposition 8 than on live interviewer surveys conducted at the same time. However, the convoluted nature of the timing of the various polls and the final result from SurveyUSA (showing Prop 8 narrowly failing) make it impossible to draw firm conclusions.
So what conclusions can we reach about tonight's outcome in Maine? I have more faith in the age composition of the PPP poll than the one from Research 2000, but given the much larger potential for error in ballot referenda and the close margins on the two final polls, your guess is probably as good as mine.