Likely Voters 2008: The Sequel

Ever had a day where everything seemed to fall on your desk at once? Today, for me, has been one of those days, although just one is obvious, and that involves those seemingly contradictory numbers on the presidential race from the Gallup organization. As blogged yesterday, the most important differences are the result of Gallup's well known (and often controversial) "likely voter" model.

If you are a long time reader and remember my old blog, Mystery Pollster, you will remember that over the final weeks of the 2004 campaign, I did a seven-part review of how pollsters select likely voters, including an explanation of the Gallup model and a review of criticism of it. Most of the issues reviewed then are relevant now, but in the context of this latest controversy, let's consider (a) why pollsters try to identify "likely voters," (b) the approach Gallup and USA Today took this week and (c) some thoughts about what it all means about the state of the race.

Why Screen for Likely Voters?

Four years ago, according the website maintained by Professor (and frequent Pollster commenter) Michael McDonald, 122 million Americans cast a ballot for president, which amounts to a turnout of 60% of the eligible adults in the United States (a significant increase from 54% in 2000).

A pre-election survey of all adults that made no effort to identify likely voters would have included the 40% who did not vote, and -- as should be obvious -- those extra interviews create the potential for a error the results since non-voters might have different preferences than actual voters. So all pollsters care about trying to identify the likely electorate.

But there is a big problem: Simply asking respondents whether they plan to vote does not work. Many more Americans will report they are likely to vote, or will claim they have voted in the past, than actually do. Consider the following results from the final national survey conducted by the Pew Research Center

However, when the Pew Research Center conducted their final survey before Election Day 2004 (interviewing 2,804 adults, October 27-30), they found the following:

  • 83% of adults said they were registered to vote (excluding the tiny percentage who live in North Dakota, the one state without party voter registration)
  • 71% of adults said they had already voted (5%) or rated their likelihood of voting as 10 -- "definitely will vote" -- on a 1-10 scale (66%).
  • 68% of adults said they vote "always" (51%) or "nearly always" (17%)

So screening for just self-identified registered voters is a good idea, but would still include roughly 20% non-voters. And simple questions about past voting or vote intent would largely overstate the size of the electorate. As such, the Pew Center and most other media pollsters used various indirect techniques (with some success) to screen for or otherwise "model" the likely electorate.

How Does Gallup Do it Now?

As explained in an article posted yesterday by Gallup's Frank Newport, the USA Today/Gallup has been using a three question scale to likely voters on the nine surveys they have conducted so far this year that asked a presidential vote question. Actually, that should be four questions, as they first ask adults if they are registered to vote, than ask:

1. How much thought have you given to the upcoming election for president -- quite a lot, or only a little? (quite a lot or volunteer "some" = 1 point)

2. How often would you say you vote -- always, nearly always, part of the time, or seldom? (always or nearly always = 1 point)

3. Do you, yourself, plan to vote in the presidential election this November, or not? ("yes" = 1 point)

They award points as noted above to those who give responses that have shown to correlate strongly, in past elections, with actual turnout. How do they know? They have conducted studies in past elections where they checked the voter registration rolls to see which respondents actually voted and which did not (how long ago? - I'm not sure if Gallup has disclosed that).

When they scored the three questions, they found that 56% of adults scored a perfect 3 on the 1-3 scale, answering all three questions as a highly likely voter would. The next category -- those scoring at least 2 out of 3 -- amounted to another 17% of adults, which would add up to 73%. But Gallup wanted their likely voter tabulations to "model" a turnout of 60% of adults, so they weighted down the "2s" to (those getting 2 out of 3 points) to a little less than one third of their original value.

I am leaving out a few details (involving Gallup's standard demographic weighting) but that is the gist of it: Likely voters are the 3s on their likely voter scale plus the 2s weighted down to roughly a third of their original value.

Gallup will use more or less the same procedure in the surveys they conduct in September and October, except that they will add four more questions to the scale (involving past voting behavior, knowledge of their polling place and a ten point scale to rate vote intent.

What are the problems here? First, as should be obvious, this is not the most precise method of identifying a true likely voter. If you are not yet registered, you are not included. If you registered this year for the first time and respond honestly that you have never voted before, your preferences are weighted down by a factor of 2 as compared to other voters or thrown out altogether if you say you are not paying much attention to the campaign.

Second, as Robert Erikson and his colleagues reviewed in the pages of Public Opinion Quarterly four years ago, the classic 7-question Gallup model "exaggerates" reported volatility in ways that are "not due to actual voter shifts in preference but rather to changes in the composition of Gallup's likely voter pool" (also summarized here).

Third, as Mike McDonald points out in a comment earlier today, a higher than ever turnout will challenge these models and their assumptions. Other surveys continue to show a huge Democratic advantage on measures of supporter "enthusiasm" for the two candidates. Those measures have not previously been included in the Gallup-style model, but they may be important this year.

Fourth, and this is the really important one, no one knows how accurate this technique is in terms of predicting turnout in November based on an application to survey data gathered in July. We have a lot of evidence that the Gallup-style "cutoff model," clunky as it may seem, does make surveys more accurate when applied to data collected the week before the election. But I have yet to see any comparable evidence regarding data collected in July.

So I tend to agree with Gallup's Frank Newport when he told Jill Lawrence yesterday that "'registered voters are much more important at the moment,' because Election Day is still 100 days away." For now, the poll of self-identified registered may be too broad a representation of the likely electorate, but at least they allow for a consistent measurement. Looking at vote preference among typically higher turnout subgroups is useful, analytically, but may or may not improve our conception of where the race stands.

So What Do These Results Say About Where the Race Stands?

First, to put the question as several readers did in emails over the last 24 hours? So which poll or approach is the most accurate right now? Listen closely now: We. Don't. Know.

If the election were being held today, past evidence would argue for placing more trust in the Gallup "likely voter" model than in the preferences of registered voters. But the election is not today, and I am not convinced that any pollster has a monopoly on wisdom when it comes to predicting turnout 100 days out.

As Brian Schaffner (our new contributor) reminds us, if we look at all the recent polls, and not just one, we can stills say with considerable confidence that Barack Obama is ahead. The precise margin probably depends on what assumptions one makes about turnout, which is more art than science at this point. However, as progressive blogger Chris Bowers has been pointing out lately, it is far better to be ahead than behind.

Having said that, we should not discount that two recent polls -- USAToday Gallup and ABC/Washington Post -- show McCain doing better when the classic Gallup "likely voter" model is applied. What is truly interesting about that finding is that the opposite was true on six of seven surveys that Gallup conducted from January to May: Obama did better slightly better among "likely voters" (defined as they were above) than among registered voters.

I have a theory (that someone at Gallup can probably test empirically): What changed is that the Democratic primaries ended. From February to June, Republicans who usually vote had a perfectly good reason to say they were paying "only a little attention" to the presidential campaign. All of the news what about the Obama Clinton race. Now that the media has started to focus on the McCain-Obama contest, Republicans have greater reason to be engaged. At least, that is something worth checking.

Also, finally, consider something from the perspective of a no-longer-practicing campaign pollster: Campaigns matter. So I am less concerned at this stage about "projections" that predict the outcome than in understanding what each campaign needs to accomplish to win. If the "likely voter" pattern evident in the recent USA Today/Gallup and Washington Post/ABC polls is accurate, it tells what the Obama campaign needs to do to win this election: They need to mobilize Americans that are ready to support Obama but that do not typically vote. That comes through loud and clear.

**PS - My conclusions above raise an obvious question: If registered voters are a better subgroup to watch, why does Pollster.com use the likely voter numbers on our tables and charts? I will blog on that highly pertinent question next, I promise.

[Typo corrected.  I know it's "likely voter" season because I'm misspelling Erikson again].



Personally, I think USAT's insistance on hyping the "likely" voter results, in order to show McCain with the lead, is simply another demonstration of the old adage: Torture the numbers long enough, and they'll confess to anything.




As always, thanks for the very informative column, especially your last paragraph. I've become more convinced than ever that the Obama CAMPAIGN is the key to the November election. All evidence points to the effect his campaign has on voter preferences. When he campaigns and advertises, his support rises measurably. When he doesn't, the support slips. Anyone familiar with the role of a "campaign" in the introduction of a new product understands what is going on.



gosh mark, mccain is running too. your bias is a bit too transparent, perhaps? in all fairness, maybe you should take a look at what mccain could do to lift his numbers as well. instead of,

'...what the Obama campaign needs to do to win this election: They need to mobilize Americans that are ready to support Obama but that do not typically vote. That comes through loud and clear.'

i would like to hear what you suggest about the nature of his campaign too.



This LV model of discounting the 2's and ignoring the 3's seems to severly impact two important groups of Obama/Democratic voters. One group would be the young voter. Anyone under the age of 22 could score no more than 2 points using this methodology if they answered accurately (they probably never voted before). Then there is the lower socio-economic group in the inner cities which pays less attention on average to campaigns and may generally vote only when they are fired up to do so (like having a black candidate when the voter is black, or having a close election), so many of these people would likely score a 2 or less as well.

On the flip side, since Gallup is actually discounting the 2's, they are giving an unfair advantage to the 3's which are more likely older voters, and older voters are skewing heavily towards McCain.

This certainly explains how their model produces a swing of 7 points in the most recent USAToday/Gallup poll.

Another note, I saw today that someone from both Gallup and Rasmussen was on Kudlow and Co. today and Kudlow was ranting about Obama bouncing and then losing the whole bounce (despite the fact that this was based on an outlier poll with questionable LV results). The Gallup guy, while not perfectly confirming Kudlow's claims, was not refuting them either. I saw this game in 2004 when the Republicans put out talking points to kill Kerry's post-convention bounce and it was in fact effective, at least in terms of popular media opinion. It's a shame that pollsters are confusing the science of polling with actual political diatribe.



Good grief! Public opinion polling has been studied scientifically, and applied, for sixty years!

And these polling firms are still "tinkering" with their methodologies. This constant tinkering conveys the impression that the Ph.D. statisticians have little or no confidence in their own methods.

How many more decades will it be before we can expect polls that inspire confidence in the results?

(By the way, a note to Mark Blumenthal who stated: "North Dakota, the one state without party registration". Actually, Texas also does not have "party registration", as in Democratic or Republican. But, maybe you meant "voter registration"; your wording could have been a little clearer. :)




"So which poll or approach is the most accurate right now? Listen closely know: We. Don't. Know."

Except for the typo, that's a brilliant answer.



Excellent post, Mark. I'd add the single most important paragraph from the USAT write-up of the results:

To determine whether they were likely voters, poll participants were asked how much thought they had given the election, how often they voted in the past and whether they plan to vote this fall. McCain's gains came because there was an even number of likely voters from each party. Last month, the Democrats had an 11-point edge.


This is perfectly consistent with the Blumenthal "end of the primary season effect", of course.

But it still raises a puzzle for me. While it is true that one would expect Republicans to be re-energized once the Obama-McCain race was engaged (and thus pass the likely voter screen), one would also have expected Democrats to have been energized by Obama's overseas trip...

So I guess I still find the -11 point swing in likely Democratic voters implausible...maybe it's because his followers are young and previously intermittent voters...or maybe due to the fact that it was a weekend only poll....??

In any case, it's all the more curious (and sloppy) that Newport would then post this:


without telling us (clearly) whether the results are based on LV or RV...


