Articles and Analysis


Polling Registered vs. Likely Voters: 2004

As Pollster.com readers have no doubt noticed, there has been much discussion in the posts and the comments here about the merits of polling registered voters (RV) versus likely voters (LV). Mark and Charles have been debating this point in their most recent exchanges about whether it is better to include LV or RV results in the Pollster.com poll averages. Charles's last post on this topic raised the following questions:

"There is a valid empirical question still open. Do LV samples more accurately predict election outcomes than do RV samples?"

Ideally, I'd have time to go back over 30 or more years of polling to weigh in on this question. Instead, I thought I'd go back to 2004 and get a sense of how well RV versus LV samples predicted the final outcome. To do this, I used the results from the final national surveys conducted by eight major survey organizations. For each of these eight polls (nearly all of which were conducted during the last three days of October), I tracked down the Bush margin among both RVs and among LVs. The figure below demonstrates the difference in the Bush margin for the LV subset relative to the RV sample from the same survey.


For most polls, LV screens increased Bush's margin, including three surveys (Gallup, Pew, and Newsweek) where Bush did 4 points better among LVs than he did among RVs. But using a LV screen did not always help Bush. In three polls, (CBS/New York Times, Los Angeles Times, and Fox News) his margin remained the same and in the Time poll (which was conducted about a week earlier than the other surveys) Bush actually did 2% worse among LVs.

Of course, this doesn't really tell us which method was more accurate in predicting the general election outcome, just which candidate benefited more from the LV screens. To answer which was more accurate, we can plot each poll's Bush margin among both RVs and LVs to see which came closest to the 2.4% margin that Bush won in the popular vote. This information is presented in the figure below, which includes a dot for each survey along with red lines indicating the actual Bush margin.


Presumably, the best place to be in this plot is where the red lines meet. That would mean that both your RV and LV margins came closest to predicting the eventual outcomes. But, if you are going to be closer to one line over the other, you'd rather be close to the vertical line than the horizontal line. This means that the polling organization's LV screen helped them improve their final prediction over just looking at RVs. If the opposite is true (an organization is closer to the horizontal line than they are to the vertical line), their LV screen actually reduced their predictive accuracy.

The CBS/New York Times poll predicted a 3 point Bush margin for both its RV and LV samples, meaning it was just 6/10ths of a point off regardless of whether they employed their LV screen. Four organizations (Pew, Gallup, and ABC/Washington Post, and Time) increased the accuracy of their predictions by employing the LV screens, coming closer to the vertical line than they do to the horizontal line. Gallup's LV screen appeared to be most successful, since it brought them closest to the actual result (predicting a 2 point victory for Bush despite the fact that their RV sample showed a 2 point advantage for Kerry).

On average, the RV samples for these eight polls predicted a .875 Bush advantage while the LV samples predicted a 2.25 advantage for Bush, remarkably close to the actual result. Of course, this is just one election, but it does appear as though likely voters did a better job of predicting the result in 2004 than registered voters. On the other hand, this analysis reinforces some other concerns about LV screens, the most important of which is the fact that some LV screens created as much as a 4 point difference in an organization's predictions while in three cases LV screens produced no difference at all. It is also important to note that these are LV screens employed at the end of a campaign, not in the middle of the summer, when it is presumably more difficult to distinguish LVs. Ultimately, the debate over LV screens is an important one and the 2008 campaign may very well provide the biggest challenge yet to pollsters trying to model likely voters.



A better way perhaps to explain this chart is to say that if the point sits on the line with slope=1, then LV screen made no difference in the pollster's accuracy. Drawing that line, you can easily see that Pew and Gallup did significantly better with their LV results, Newsweek did significantly worse, and for all other pollsters, it made no difference.



nate at 538 has the definitive answer to this question. you should check it out. it turns out that RV is more accurate up until about convention time...after that it is LV. he explains the details.



Orthogonal topic, but just hoping for a quick pointer:

How partisan poll weighting work? I'm looking for the kind of detail you'd find in a refereed scientific journal. Pointer to a reference is fine. Can't find any details on Rasmussens web site. Why would anyone take seriously the results of a polling organization that does not publish its methods?

In particular there's something fishy about gallup's and rasmussen's daily presidential tracking polls. They're way too stable to be real, and I suspect something is being done numerically that overdamps them.



... that should be "How DOES partisan poll weighting work?"



Brian, you devote exactly one sentence to the crucial issue: "It is also important to note that these are LV screens employed at the end of a campaign, not in the middle of the summer, when it is presumably more difficult to distinguish LVs."

Most of the critics of current LV polls rely exactly on that point. I wish you had discussed it at greater lrngth...




Mark B. did a piece on Rasmussen's weighting within the last month. As far as I can tell, on the national level they use a rolling average of the last three months and keep that weight for a month. (In fact, however, it's not quite clear. They may use the same weights for three months and then re-weight.)

The more problematic method is how Rasmussen weights state by state results. In that case, they appear to use the overall national change since the last election and apply that delta to the partisan split of either the 2006 or 2004 election. Thus, if a state was 30% Democratic in 2004, they simply add, say, nine percent (the national Democratic change) to that number and weight their results in a particular state to 39% of the sample.



You could probably throw away polling data randomly and the reduced statistics would give you a very similar pattern of shifts.

Gallup and Pew had lucky shifts. But if one could repeat this study for a half dozen elections and show a consistent effect of an LV screen pushing a pollster towards a correct answer, then one could make a statement about that particular LV screen.

Given Mark Blumenthal's posts on what's in the guts of most LV screens (Gallup's in particular) it's not clear that LV screens are any better than randomly reducing your sample size. It may well be worse (if the data being thrown away is not random).



Hi jsh1120, and thanks for that answer. Actually my question is more fundamental: What is the process by which weighting is achieved? For example, if you expect to hear from 53 dems and 47 republicans for every 100 people called on a given night (because that's the ratio you determined over some past period), do you:

a) keep calling people until you've hit the target numbers, thus throwing out every republican called after that time -- say 10 pm -- when 47 were reached. This method can be biased to get desired results (sometimes unconsciously).

b) call way more than you need and then cull through them afterward, throwing out various responses to get the 53:47 ratio you seek. This method is especially vulnerable to bias.

c) take the numbers you get and apply some kind of purely numerical adjustment to them -- this would be true weighting. If so, then how is it done? Literally multiplying the resulting proportions by .53 and .47 would be overly simplistic and a violation of Bayes' Rule.

Bayes' Rule states that (in trying to predict the probability of voting for Obama):

p(O) = (p(0|d) * p(d)) / p(d|O)


p(O|d) = p(voting for Obama given you're a democrat)
p(d|O) = p(being a democrat given you're voting for Obama)
p(d) = p(being a democrat)

Rasmussen's partisan surveys give you only the probability of being a democrat, not the other two terms. So, how do they find those out? Do they ask as part of each night's survey? Or is that part of the 3-month partisan history? Again, open to possibly bias and manipulation.


Chris G:

BlueMerlin- I'm not a pollster, but if I understand weighting I don't think there's as big of an issue here (state weighting excepted as jsh1120 mentions).

p(O) = p(O|D)+p(O|R)+p(O|I) as well, since p(D)+p(R)+p(I) = 1. Let's call p(O') Obama support w/out weighting, and p(D') proportion of Dems in the unweighted sample, likewise for R' and I'. The key assumption in weighting (as I understand it) is that *p(O|D') = p(O|D)*. IOW Obama's "rate" of support among sampled Ds is the same as the rate in the 3-month (or whatever) universe used to derive p(D), likewise for R and I. Disregarding random sampling error,

p(O) = p(O'|D')p(D) + p(O'|R')p(R) + p(O'|I')p(I)

It's a simple linear transformation of p(O) from sampled partisan space into "real" partisan space. No information needs to be added (MOE is a different issue)


Chris G:

Oh sorry, p(O) = p(O|D)p(D)+p(O|R)p(R)+p(O|I)p(I)



It's my "understanding," and it's purely what I've concluded, not what I "know," that Rasmussen uses the simplest form of probability weighting to get their results. I've seen no indication that they are using a Bayesian technique (where actual evidence would overwhelm a presumed partisan split.) Thus, I believe the Rasmussen approach is simply to apply a weight to each respondent's answers that reflects Rasmussen's presumed partisan split.

I have to say, however, that I'm less certain about this than I used to be as a result of some other disconcerting information from Gallup about their "weighting" for likely voters. I had presumed that Gallup used a similar probabilitic weighting (as in a stratified random sample) to reflect a probability of voting. It turns out they use a much clunkier scale that simply throws out the responses of those they deem "unlikely voters." Perhaps Rasmussen does something similar in trying to determine a partisan split.

As far as the question of whether Rasmussen asks respondents for their party ID, I believe they do in each of the tracking polls.




Rasmussen does in fact ask for party ID, but even how they ask will give somewhat different results. Rasmussen asks "are you a ______", in some other polls they ask what you most identify with, and in exit polls they ask the question even more strictly than Rasmussen by asking about registration. As an example of this, I am an independent, but I mostly vote for Democrats and I can't remember ever voting for a Republican (though I have considered it). I answered Rasmussen as being Independent, however if I was asked with whom I identified with, it would have certainly been the Democratic Party. I would imagine that many independents are anti-establishment and anti-two-party system like I am, and I don't think that Rasmussen asks the question the best way if they are then going to weight polls on this. In years of change, I can see Democrats and Republicans switching parties in greater numbers than Independents since they are used to being in a party, where as to stay an independent, there is nothing gained nor lost unless you wanted to vote in a primary in certain states.

As you can probably guess, registration is the least dynamic of the three and what people most identify with is the most dynamic. In an election such as this one as the one in 2006, you are likely to see the biggest advantages for Democrats with the most dynamic measures.

Rasmussen uses the previous month's results to generate their target party ID numbers used for weighting their state and national polls. The national one is likely to be less volatile, while in the state ones it can add substantially to their MoE since they are in effect combining the error from one poll with another by weighting it on factors outside of demographics (which are not nearly as dynamic). It's interesting that during June Rasmussen was about dead on in their national tracking poll, but as soon as they re-weighted for party ID (and they showed a drop in Democrats's advantage), they quickly started measuring Obama 2 points under the advantage of other polls, and my educated guess was that it was specifically due to their weighting by party ID, and dropping the Democrat's weight. The states which lack the number of respondents that the national tracking polls do could well suffer from a MoE greater than 6 points in some cases, and that's no better really than sampling 350 people, plus they may well have an average skew similar to the national tracking poll that Rasmussen does.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.