Articles and Analysis


How We Choose Polls to Plot: Part II

Topics: ABC/Washington Post , Charts , Likely Voters , Pollster.com

Mark started this conversation with "Why we choose polls to plot: Part I" asking how we decide to handle likely voter vs registered voter vs adult samples in our horse race estimates.  This was especially driven home by the Washington Post/ABC poll reporting quite different results for A, RV and LV subsamples but it is a good problem in general. So let's review the bidding.

The first rule for Pollster is that we don't cherry pick. We make every effort to include every poll, even if it sometimes hurts. So even when we see a poll way out of line with other polls and what we "know" has to be true, we keep that poll in our data and in our trend estimates.   There are two reasons. First, once you start cherry picking you never know when to stop. Second, we designed our trend estimator to be pretty resistant to the effect of any one poll (though when there are few polls this can't always be true.)  That rule has served us pretty well. Whatever else may be wrong with Pollster, we are never guilty of including just the polls (or pollsters) we like.

But what do we do when one poll gives more than one answer? The ABC/WP poll is a great example, with results for all three subgroups: adults, registered  voters and likely voters. Which to use? And what to do that remains consistent with our prime directive: never cherry pick?

Part of the answer is to have a rule for inclusion and stick to it stubbornly. (I hear Mark sighing that you can do too much of this stubborn thing.)  But again the ABC/WP example is a good one. Their RV result was more in line with other recent polls while their LV result showed the race a good deal closer.  If we didn't have a firm, fixed, rule we'd be sorely tempted to take the result that was "right" because it agreed with other data. This would build in a bias in our data that would underestimate the actual variation in polling because we'd systematically pick results closer to other polls. Even worse would be picking the number that was "right" because it agreed with our personal political preferences.  But that problem doesn't arise so long as we have a fixed rule for what populations to include in cases of multiple results. Which is what we have.

That rule for election horse races is "take the sample that is most likely to vote" as determined by the pollster that conducted the survey. If the pollster was content to just survey adults, then so be it. That was their call. If they were content with registered voters, again use that. But if they offer more than one result, use the one that is intended to best represent the electorate. That is likely voters, when available.

We know there are a variety of problems with likely voter screens, evidence that who is a likely voter can change over the campaign and the problem of new voters. But the pollster "solves" these problems to the best of their professional judgement when they design the sample and when they calculate results.  If a pollster doesn't "believe" their LV results, then it is a strange professional judgement to report them anyway.  If they think that RV results "better" represent the electorate than their LV results, they need to reconsider why they are defining LV as they do.  Our decision rule says "trust the pollster" to make the best call their professional skills can make. It might not be the one we would make, but that's why the pollster is getting the big bucks. And our rule puts responsibility squarely on the pollsters shoulders as well, which is where it should be. (By the way, calling the pollster and asking which result they think is best is both impractical for every poll, AND suffers the same problems we would introduce if we chose which results to use.)

But still, doesn't this ignore data? Yes it does. Back in the old days, I included multiple results from any poll that reported more than one vote estimate. If a pollster gave adult, RV and LV results, then that poll appeared three times in the data, once for each population.  But as I worked with these data, I decided that was a mistake. First, it was confusing because there would be multiple results for a poll-- three dots instead of one in the graph. That also would give more influence to pollsters who reported for more than one population compared to those pollsters who only reported LV or RV. Finally, not that many polls report more than one number. Yes sometimes some pollsters do, but the vast majority decide what population to represent and then report that result. End of story.  So by trying to include multiple populations from a single poll, we were letting a small minority of cases create considerable confusion with little gain.

The one gain that IS possible, is to be able to compare within a single survey what the effect of likelihood of vote is. The ABC/WP poll is a very positive example of this. By giving us all three results, they let us see what the effect of their turnout model is on the vote estimate. Those who only report LV results hide from us what the consequences might be of making the LV screen a bit looser or a bit tighter. So despite our decision rule, I applaud the Post/ABC folks for providing more data. That can never be bad.  But so few pollsters do it that we can't exploit such comparisons in our trend data. There just aren't enough cases.

What would be ideal is to compare adult, RV and LV subsamples by every pollster, then gauge the effect of each group on the vote.  But since few do this, we end up having to compare LV samples by one pollster with RV samples by another and adult samples by others.  That gets us some idea of the effect of sample selection, but it also confuses the differences between survey organizations with differences in the likely voter screens. Still, it is the best we can do with the data we have.

So let's take a look at what difference the sample makes.  The chart below shows the trend estimate using all the polls, LV, RV and adult samples separately. We currently have 109 LV samples, 136 RV and 37 adult.    There are some visible differences. The RV (blue) trend is generally more favorable to Obama than is the LV (red) trend, though they mostly agreed in June-July. But the differences are not large. All three sub-population trend estimates fall within the 68% confidence interval around the overall trend estimate (gray line.)  There is good reason to think that likely voters are usually a bit more Republican than are registered or adult samples. The data are consistent with that, amounting to differences that are large enough to notice, if not to statistically distinguish with confidence.  Perhaps more useful is to notice the scatter of points and how blue and red points intermingle. While there are some differences on average, the spread of both RV and LV samples (and adult) is pretty large. The differences in samples make detectable differences, but the points do not belong to different regions of the plot. They largely overlap and we shouldn't exaggerate their differences.


There is a valid empirical question still open. Do LV samples more accurately predict election outcomes than do RV samples? And when in the election cycle does that benefit kick in, if ever? That is a good question that research might answer. The answer might lead me to change my decision rule for which results to include. But if RV should outperform LV samples, then the polling community has a lot of explaining to do about why they use LV samples at all.  Until LV samples are proven worse than RV (or adult) then I'll stick to the fixed, firm, stubbornly clung to, rule we have. And if we should ever change, I'll want to stick stubbornly to that one. The worst thing we could do is to have to make up our minds every day about which results to include and which not based on which results we "like."

[Update: In Part III of this thread, Mark Blumenthal answers to some of the comments below and poses a new question].


Mark Lindeman:

I smiled at "it is a good problem in general." Yeah, like an especially sweet Sudoku.

Thinking about the "accuracy" of various poll estimates at this point in the campaign gets pretty metaphysical, pretty fast. My gut tells me that some LV screen ought to be better than none at all -- people who say they are unlikely to vote, probably are. Of course that doesn't mean that every LV screen is better than none at all. But I agree that there's a lot to be said for a simple, inflexible rule at least as a baseline.

I'd say the difference between LV and RV trendlines here is pretty large, although I take your point. Can this become a semi-regular feature?

We all look forward to the era where pollster.com users can eliminate or "drag" individual points and watch the trendline update dynamically... no, sorry, that's just sick.



Your comment that LV skews in one direction scares me somewhat. Is it real, or is it an effect of the LV screening?

In general, do you worry at all that there can be correlations between the LV screens and the poll results? How are these LV filters constructed? Some algorithm applied to personal information and previous voting record questions, I would guess. Since things like age, gender, education, ... all correlate with political preference, if these are also then part of the LV screen, how do you disentangle the effect?

(Of course, if the LV screen consists of a single "are you likely to vote?" question, the point is moot.)

In physics we worry about "trigger bias" or "filter bias", and we estimate this by having a family of filters with varying levels of selection, so one can judge how warped the answer is getting. In an ideal world the pollster would give you a family of LV screens (say, LV_soft, LV_medium, LV_hard) and you could judge.




Charles, you ask, "Do LV samples more accurately predict election outcomes than do RV samples?"

Isn't that the wrong question, though, except for polls taken just before Election Day? After all, pollsters continually tell us that their results are *not* a prediction, but a "snapshot" of *current* preferences.

The problem is that to get that "snapshot" they ask a question about a fictitious event--an election held today. But of course if the election really were held today, some people who tell the pollsters that they haven't been taking much interest in the campaign--one common screen question for likely voters-- would be very interested indeed. (For that matter, some people who aren't even registered now might long since have registered.) So to defend likely voter models by saying "They aren't intended to predict whether the respondent will vote in November but whether he would vote today" is in effect to pile one fiction on top of another.



Looking at the plot, it appears that Likely Voters show the highest variability as a function of time, while Registered Voters show the least. Is there some reason why LVs should be more volatile than RVs? If not, shouldn't one suspect that the higher variability of the LV votes is an artifact of the LV screening process?


Michael Pisapia:

It would be great to see trend estimates for the different types of polls, since RV and LV polls are measuring different potential electorates. Actually, isn't it the case that all RV polls are measuring the same potential electorate, while the various LV polls are measuring multiple and different potential electorates? What's nice about the RV polls is that at least the instrument tapping the potential electorate ("are you registered to vote?") is stable and consistent across polls, and we can be reasonably confident that different pollsters are actually mapping the same potential. On the other hand, what's nice about the LV polls is that the pollster's historical knowledge about who actually goes to vote (why the pollster's get paid the big bucks) may increase the predictive capacity of the poll. As a reader of the LV polls, it would be nice to know the criteria used to define the LV population. Perhaps pollster.com should only include LV polls that publish these criteria. Maybe all of them do.



@ Amit:
There's good reason to think that the LV screens correlate highly to at least one demographic factor: age. For example, CNN's current screen asks about prior voting habits. People who have recently turned 18 obviously haven't voted before. Of course, there is also a strong correlation between age and candidate preference. I've also seen it argued that the "cutoff" method CNN uses (only counting the most likely voters rather than weighting choices by probability of voting) exacerbates this bias. Nate on fivethirtyeight.com has a post about this potential "long tail" effect.

@ Independent:
I think there has been some research showing that LV screens introduce higher volatility. This may be because part of some screens (including CNN's current screen) is current interest in the race, which is of course highly volatile.



What the trend estimate likely vs registered voting intention is showing is what happened for the most part in the primaries Obama polled less in the actual poll than in the opinion polls and certainly what happened in the last General Kerry polled less in the actual. The question is how much pollsters have refined. Rasmusson looks to have skewed slightly further towards the Repubs as well.

The bottom line is how you are presenting does allow us to extrapolate in a one stop fashion snapshots of our own without necessarily "cherry picking"!



Prof Franklin,

Reasonable defense for your decisions, especially in view of the fact that most pollsters don't report all three sets of results.

However, I was struck by your comment:

"...If a pollster doesn't "believe" their LV results, then it is a strange professional judgement to report them anyway. If they think that RV results "better" represent the electorate than their LV results, they need to reconsider why they are defining LV as they do..."

I couldn't agree more. And with that in mind, I was struck by Frank Newport's "defense" of the recent USA Today/Gallup poll that had McCain ahead among LV's and trailing among RV's. In effect, despite trumpeting the LV results in USA Today, Newport claimed that the RV results were more reliable.

I realize that pollsters may not control (or even influence) the editorial decisions of their clients,(e.g. USA Today), but it is rather disconcerting to see poll results that are effectively disowned by the firm (e.g. Gallup) who conducted the poll.



One further comment. I'd be much less skeptical of LV screens (especially this far ahead of the election) if (a) pollsters routinely published the details of their LV screens (which they don't); the LV screens were rougly comparable (which they apparently are not); and (c) if LV weighting weren't so absurdly "kludgy" (as Gallup's, at least, appears to be.



thoughtful you say that "in the primaries Obama polled less in the actual poll than in the opinion polls"

According to a pro-Obama commentator at http://www.fivethirtyeight.com/2008/08/persistent-myth-of-bradley-effect.html this is false:
"On average, Barack Obama overperformed the Pollster.com trendline by 3.3 points on election day." Unfortunately, that commentator included caucuses as well as polls in his chart, but I think the basic point remains valid: people have it so much etched in their memory how Obama did worse than the polls indicated he would in New Hampshire that they forget that there were many other states where he actually outperformed the polls.

As for 2004, Bush outperformed the final polls (I mean the final polls in the week before the election, not that ridiculous exit poll) but only very slightly. They showed on average a Bush victory by 1.5%; he won by 2.45%. http://www.mysterypollster.com/main/2005/01/final_results.html



@David_T You are quite right. Caucuses are different to the polling booths.I'lltry to pull out other examples other than NH!


Mark Blumenthal:

Just a note that earlier this afternoon I posted Part III of this thread, which addresses the comments from Amit and Independent


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.