Mark Blumenthal | July 29, 2008
Topics: Likely Voters
Ever had a day where everything seemed to fall on your desk at once? Today, for me, has been one of those days, although just one is obvious, and that involves those seemingly contradictory numbers on the presidential race from the Gallup organization. As blogged yesterday, the most important differences are the result of Gallup's well known (and often controversial) "likely voter" model.
If you are a long time reader and remember my old blog, Mystery Pollster, you will remember that over the final weeks of the 2004 campaign, I did a seven-part review of how pollsters select likely voters, including an explanation of the Gallup model and a review of criticism of it. Most of the issues reviewed then are relevant now, but in the context of this latest controversy, let's consider (a) why pollsters try to identify "likely voters," (b) the approach Gallup and USA Today took this week and (c) some thoughts about what it all means about the state of the race.
Why Screen for Likely Voters?
Four years ago, according the website maintained by Professor (and frequent Pollster commenter) Michael McDonald, 122 million Americans cast a ballot for president, which amounts to a turnout of 60% of the eligible adults in the United States (a significant increase from 54% in 2000).
A pre-election survey of all adults that made no effort to identify likely voters would have included the 40% who did not vote, and -- as should be obvious -- those extra interviews create the potential for a error the results since non-voters might have different preferences than actual voters. So all pollsters care about trying to identify the likely electorate.
But there is a big problem: Simply asking respondents whether they plan to vote does not work. Many more Americans will report they are likely to vote, or will claim they have voted in the past, than actually do. Consider the following results from the final national survey conducted by the Pew Research Center
However, when the Pew Research Center conducted their final survey before Election Day 2004 (interviewing 2,804 adults, October 27-30), they found the following:
- 83% of adults said they were registered to vote (excluding the tiny percentage who live in North Dakota, the one state without
- 71% of adults said they had already voted (5%) or rated their likelihood of voting as 10 -- "definitely will vote" -- on a 1-10 scale (66%).
- 68% of adults said they vote "always" (51%) or "nearly always" (17%)
So screening for just self-identified registered voters is a good idea, but would still include roughly 20% non-voters. And simple questions about past voting or vote intent would largely overstate the size of the electorate. As such, the Pew Center and most other media pollsters used various indirect techniques (with some success) to screen for or otherwise "model" the likely electorate.
How Does Gallup Do it Now?
As explained in an article posted yesterday by Gallup's Frank Newport, the USA Today/Gallup has been using a three question scale to likely voters on the nine surveys they have conducted so far this year that asked a presidential vote question. Actually, that should be four questions, as they first ask adults if they are registered to vote, than ask:
1. How much thought have you given to the upcoming election for president -- quite a lot, or only a little? (quite a lot or volunteer "some" = 1 point)
2. How often would you say you vote -- always, nearly always, part of the time, or seldom? (always or nearly always = 1 point)
3. Do you, yourself, plan to vote in the presidential election this November, or not? ("yes" = 1 point)
They award points as noted above to those who give responses that have shown to correlate strongly, in past elections, with actual turnout. How do they know? They have conducted studies in past elections where they checked the voter registration rolls to see which respondents actually voted and which did not (how long ago? - I'm not sure if Gallup has disclosed that).
When they scored the three questions, they found that 56% of adults scored a perfect 3 on the 1-3 scale, answering all three questions as a highly likely voter would. The next category -- those scoring at least 2 out of 3 -- amounted to another 17% of adults, which would add up to 73%. But Gallup wanted their likely voter tabulations to "model" a turnout of 60% of adults, so they weighted down the "2s" to (those getting 2 out of 3 points) to a little less than one third of their original value.
I am leaving out a few details (involving Gallup's standard demographic weighting) but that is the gist of it: Likely voters are the 3s on their likely voter scale plus the 2s weighted down to roughly a third of their original value.
Gallup will use more or less the same procedure in the surveys they conduct in September and October, except that they will add four more questions to the scale (involving past voting behavior, knowledge of their polling place and a ten point scale to rate vote intent.
What are the problems here? First, as should be obvious, this is not the most precise method of identifying a true likely voter. If you are not yet registered, you are not included. If you registered this year for the first time and respond honestly that you have never voted before, your preferences are weighted down by a factor of 2 as compared to other voters or thrown out altogether if you say you are not paying much attention to the campaign.
Second, as Robert Erikson and his colleagues reviewed in the pages of Public Opinion Quarterly four years ago, the classic 7-question Gallup model "exaggerates" reported volatility in ways that are "not due to actual voter shifts in preference but rather to changes in the composition of Gallup's likely voter pool" (also summarized here).
Third, as Mike McDonald points out in a comment earlier today, a higher than ever turnout will challenge these models and their assumptions. Other surveys continue to show a huge Democratic advantage on measures of supporter "enthusiasm" for the two candidates. Those measures have not previously been included in the Gallup-style model, but they may be important this year.
Fourth, and this is the really important one, no one knows how accurate this technique is in terms of predicting turnout in November based on an application to survey data gathered in July. We have a lot of evidence that the Gallup-style "cutoff model," clunky as it may seem, does make surveys more accurate when applied to data collected the week before the election. But I have yet to see any comparable evidence regarding data collected in July.
So I tend to agree with Gallup's Frank Newport when he told Jill Lawrence yesterday that "'registered voters are much more important at the moment,' because Election Day is still 100 days away." For now, the poll of self-identified registered may be too broad a representation of the likely electorate, but at least they allow for a consistent measurement. Looking at vote preference among typically higher turnout subgroups is useful, analytically, but may or may not improve our conception of where the race stands.
So What Do These Results Say About Where the Race Stands?
First, to put the question as several readers did in emails over the last 24 hours? So which poll or approach is the most accurate right now? Listen closely now: We. Don't. Know.
If the election were being held today, past evidence would argue for placing more trust in the Gallup "likely voter" model than in the preferences of registered voters. But the election is not today, and I am not convinced that any pollster has a monopoly on wisdom when it comes to predicting turnout 100 days out.
As Brian Schaffner (our new contributor) reminds us, if we look at all the recent polls, and not just one, we can stills say with considerable confidence that Barack Obama is ahead. The precise margin probably depends on what assumptions one makes about turnout, which is more art than science at this point. However, as progressive blogger Chris Bowers has been pointing out lately, it is far better to be ahead than behind.
Having said that, we should not discount that two recent polls -- USAToday Gallup and ABC/Washington Post -- show McCain doing better when the classic Gallup "likely voter" model is applied. What is truly interesting about that finding is that the opposite was true on six of seven surveys that Gallup conducted from January to May: Obama did better slightly better among "likely voters" (defined as they were above) than among registered voters.
I have a theory (that someone at Gallup can probably test empirically): What changed is that the Democratic primaries ended. From February to June, Republicans who usually vote had a perfectly good reason to say they were paying "only a little attention" to the presidential campaign. All of the news what about the Obama Clinton race. Now that the media has started to focus on the McCain-Obama contest, Republicans have greater reason to be engaged. At least, that is something worth checking.
Also, finally, consider something from the perspective of a no-longer-practicing campaign pollster: Campaigns matter. So I am less concerned at this stage about "projections" that predict the outcome than in understanding what each campaign needs to accomplish to win. If the "likely voter" pattern evident in the recent USA Today/Gallup and Washington Post/ABC polls is accurate, it tells what the Obama campaign needs to do to win this election: They need to mobilize Americans that are ready to support Obama but that do not typically vote. That comes through loud and clear.
**PS - My conclusions above raise an obvious question: If registered voters are a better subgroup to watch, why does Pollster.com use the likely voter numbers on our tables and charts? I will blog on that highly pertinent question next, I promise.
[Typo corrected. I know it's "likely voter" season because I'm misspelling Erikson again].