Articles and Analysis


How We Choose Polls to Plot: Part III

Topics: Charts , Likely Voters , Pollster.com , Robert Erikson

In the first two installments of this online dialogue, I asked a question we have heard from readers about why we choose the results for "likely voters" (LVs) over "registered voters" (RVs) when pollsters release both. Charles answered and explained our rationale for our "fixed rule" for these situations (this is the gist):

That rule for election horse races is "take the sample that is most likely to vote" as determined by the pollster that conducted the survey. If the pollster was content to just survey adults, then so be it. That was their call. If they were content with registered voters, again use that. But if they offer more than one result, use the one that is intended to best represent the electorate. That is likely voters, when available.

Despite my own doubts, I'm convinced by the rule for this reason: I can't come up with a better one. Yes, we would arbitrarily choose RVs over LVs until some specified date, but that would leave us still plotting numbers from pollsters that only release LV samples. And on which date do we suddenly start using the LV numbers? After the conventions? After October 1? What makes sense to me about our rule, is that in almost all cases (see the prior posts for examples) it defers to the judgement of the pollster.

Several readers posed good questions in the comments on the last post. Let me tackle a few. Amit ("Systematic Error") asked about how likely voters are constructed and whether we might be able to plot results by "a family of LV screens (say, LV_soft, LV_medium, LV_hard)" and allow readers to judge the effect.

I wrote quite a bit back in 2004 about how likely voter screens are created, and a shorter version focusing on the Gallup model two weeks ago. One big obstacle to Amit's suggestion is that few pollsters provide enough information about how they model likely voters (and how that modeling changes over the course of the election cycle) to allow for such a categorization.

"Independent" raised a related issue:

Looking at the plot, it appears that Likely Voters show the highest variability as a function of time, while Registered Voters show the least. Is there some reason why LVs should be more volatile than RVs? If not, shouldn't one suspect that the higher variability of the LV votes is an artifact of the LV screening process?

The best explanation comes from a 2004 analysis (subs. req.) in Public Opinion Quarterly by Robert Erikson, Costas Panagopoulos and Christopher Wlezien. They found that the classic 7-question Gallup model "exaggerates" reported volatility in ways that are "not due to actual voter shifts in preference but rather to changes in the composition of Gallup's likely voter pool." I also summarized their findings in a blog post four years ago.

Finally, let me toss one new question back to Charles that many readers have raised in recent weeks. The two daily tracking surveys -- the Gallup Daily and the Rasmussen Reports automated survey -- contribute disproportionately to our national chart. For example, we have logged 51 national surveys since July 1, and more than half of those points on the chart (27) are either Gallup Daily or Rasmussen tracking surveys. Are we giving too much weight to the trackers? And what would the trend look like if we removed those surveys?



Regarding the disproportionate number of Rasmussen and Gallup tracking poll surveys...you already answered that one yourself. Of course it is over weighted in your poll averages. In fact, in the state-by-state polls, Rasmussen and SurveyUSA are also overweighted, especially Rasmussen.

It would be very interesting to see the national trend without Rasmussen and Gallup tracking polls included. My guess is that their over-weighting and tendency to slightly under poll the difference would net a 1 to 2 point change for Obama. I'm guessing that it might also show higher bounces and bigger corrections (like the trend with 3rd parties that you chart separately).

I'm not sure why there is such a focus on LV vs. RV results when the inclusion of partisin polls and Zogby Interactive polls likely introduces much more skew. The last Zogby barrage definitely messed up the trend lines on several states.

What I would really want to see, and believe would be more useful, would be to have a chart that shows the difference between the candidates, excludes partisan pollsters, only includes an equal weighting of one pollster to another, and avoids polls that have clear ambiguities related to the pollster's method (you might just discredit outliers with an equation). I would think that the trends from such a chart would be more accurate and telling.


Chris G:

On the over-weighting issue, why not bootstrap confidence intervals by sampling pollsters.

I think the basic question is whether incorrectly passing the likely voter screen is at all linked (statistically) w/ candidate preference, or more stable characteristics like party, ideology, etc. In other words, are Dems or Repubs (or Obama and McCain supporters) more likely to yield a false impression of being likely to vote? If not (if independence) then no matter how crude the models may be they should add *some* degree of accuracy, not matter how small.

On the question of more volatility, I think it comes down to a similar question about the relationship b/w change in likelihood to vote and change in candidate preference.
If they are independent or positively correlated, the larger LV fluctuations are expected a priori since variability in both factors will sum.

It may be interesting to actually plot the time series and do the local regression on % of RV who are considered LV, and plot the same w/in undecideds, McCain and Obama supporters.

My guess is the difference in RV vs LV trajectories in the past month is due to Repubs getting more energized.



In state polls, the overweighting of Rasmussen's polls gives too much influence to the results of one pollster especially for four reasons: (1) use of a LV model at this time in the election, (2) party ID weighting (which most pollsters don't use), (3) the small sample size (N=500), and (4) the one-day or overnight field period.

I'm not sure much can be done about any of this, but the overreliance on one pollster greater increases the risk of house effects and excessive variability (especially because of 1, 3, and 4 above).


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.