Articles and Analysis


Does One Bad Pollster Spoil the Trend Estimate?

Topics: Divergent Polls



Yesterday, in response to this post, readers raised a number of excellent questions about the effects of individual polls in our trend estimates of candidate support (and just about everything else here as well, including presidential and congressional approval, support for the war and more.) Much of the discussion was about how to detect and exclude "bad" polls, which is a topic that covers a huge range of issues including "house effects" (the tendency of polling organizations to poll consistently high or low on some questions), outliers (single polls that are far from the rest) and more. The discussion will provide fodder for a number of posts to come later this month as I review our methods and try to clarify these and other issues. So there is a lot to do. Consider this a down-payment on the rest.

To paraphrase one question: "Why not exclude a polling organization if it consistently produces results out of line with everyone else?"

We could approach this in several ways. For example, suppose a pollster was consistently 4 percentage points high but their polls moved in synch with the trend in all the other polls. Movement in those polls would tell you a lot about dynamics of opinion even if the pollster were "biased" by 4 points. If the bias were consistent, then we could just subtract 4 points and have an excellent estimate of the trend. A simple shift of the average poll result above or below the overall trend is not in and of itself a clinching argument for excluding a pollster. I'll come back to this issue in much more detail later in this series of posts.

A simpler and more direct way to approach the question is to ask what difference does it make if we do include all polls, rather than exclude supposedly "bad" ones? Of course we'd have a major problem if the trend estimator were quite sensitive to individual polls or all the polls by a particular pollster. Happily, this is an empirical question, so we can answer it. And we don't have to know which pollster is "bad" to begin with.

The plots above show the trend estimate, using our standard estimator, as the black line. This uses all the polls we have available for the national nomination contests in both parties. The light blue lines are trend estimates that result when I drop each of the 19 different polling organizations, one at a time. Though the lines are indistinguishable, there are 19 different blue ones for each candidate in the figures. If the impact of individual organizations on the trend estimate were large, some of these blue lines would diverge sharply from the black overall trend line and we'd be seriously concerned about those polls that were responsible for the divergent results.

But that isn't what actually happens. The blue lines all fall within +/- 1 percentage point of the overall trend estimate and the vast majority are within less than +/- 0.5 points. There is no evidence that excluding any single organization has more than a trivial effect on the estimated trend. This alone is strong evidence that whatever problems specific pollsters or individual polls may have, they do not seriously disturb the trend estimates we use here at Political Arithmetik and Pollster.com.

It is interesting that the variation around the top candidates in both parties, Clinton and Giuliani, is larger than it is among the third place candidates, Edwards and Romney, while variation for the middle candidates falls in between. This is a possible clue to one aspect of "house effects". One well known source of house effects is due to how hard the interviewer pushes for an answer. Some organizations now routinely find 20% or more unable or unwilling to pick a candidate. Other organizations have less than 5% failing to choose a candidate. Now imagine yourself asked to pick, but lacking an actual preference. When pushed, who do you most likely "settle" for in order to placate the interviewer? I'd bet on the best known names. If that were the case, we'd see the greater variation around Clinton and Giuliani substantially explained by differences in how hard pollsters push for answers on the vote preference question. This is one more topic for another day.

The fact that we find little effect on the trend estimate due to excluding each pollster could mean one or both of two things: either no pollster is biased or discrepant enough to actually raise a problem in the first place, or the trend estimator we are using is statistically robust enough that it resists the influence of unusual pollsters or polls. The second possibility is true by design. I've chosen an estimation method and designed the approach we take so that the trend estimator should be resistant to bias due to a single organization or a single poll. While it can be fooled under the right circumstances, those should be both rare and short lived, rather than common and long term.

We are not in a position today to reach a conclusion about the first possibility, that none of our pollsters are consistently out of line with the others. That could be, but it could also be that one or more pollsters are in fact out of step but that the estimator successfully resists their influence. To address this more interesting question, will require more work and a separate post (or series of posts). It does seem to me that there are clearly systematic differences across polling organizations. I've done a good many posts in the past on "house effects" and on individual outliers, and will do more of that in the coming weeks. But it also disturbs me that many complaints are hurled at specific polling organizations with little or no effort to support the claims empirically and systematically. That is a job we'll begin to undertake here as a route to clarifying what the actual evidence is for which polls are less "reliable" than others, and what exactly that means. Stay tuned.

Cross-posted at Political Arithmetik.



Well, you've proven a point in terms of national polling, where you have such a large number of samples from so many different pollsters. My guess is that state polling would be quite different, right?



Of necessity the fewer polls the more influence any subset has. The state primary polls are certainly an issue for this where we still have less than 2 dozen polls in most states, including crucial places like IA and NH.

There will be a post looking at this case shortly (not today, but soon) to see what happens when the worst case occurs: few polls and one pollster accounting for a large proportion of those.



Chuck T.:

This is my biggest problem with averaging polls. I think, nationally, it works out pretty well. But on the state level, it's potentially a MAJOR problem. There are some state pollsters who are dreadful (possibly even fraudulent, but I don't have the resources to prove it). But the likelihood that some of these polling organizations are actually surveying 600 or 800 voters via live telephone with the frequency they claim is EXTREMELY hard to believe. Anyway, my point is a couple of these polling organizations make up a lion's share of your averages, as the person who pointed out the Iowa issue earlier. Is that right, particularly if two of these organizations can regularly get some funny results? You guys do great work but I think the next step in your project is to start weighting pollsters. If I were still at Hotline, I was going to move to a "star" system. For instance, in Iowa, Ann Selzer does a great job, ditto for Mason-Dixon and Research 2000, they'd get 3 or 4 stars. But ARG or Strategic Vision? 1 or 2 stars at best. In fact, in order for a polling firm to earn a higher rating, I would have asked them to verify their methodology; find out the calling centers they use etc. and they figure out if they deserve a higher rating. Bad polling is having too much influence on the C.W. in this race. The sad thing is that some good polling is being overlooked because there are so many bad pollsters getting away with mediocrity thanks to the World Wide Web.



You are exactly right about the potential influence of individual pollsters in state races. And this is one reason we DON'T do simple averages but rather use a local regression estimator tuned to be more robust. However, that said, any mix of good and bad data can clearly be polluted by the bad. Where we are going is an open assessment of pollsters based on their data, rather than partisan unhappiness with their results or prejudice towards particular organizations. That will move the discussion of quality away from the subjective, we hope. And it may give pollsters a bit more incentive to reconsider their methods if they are demonstrably out of line with everyone else.

That said, small numbers of polls in individual states will remain the acid test of our approach.




When you average several pollsters together,
I presume you weight them equally, or perhaps by sample size. But if you are trying to identify "bad" pollsters, why not compare each pollster's previous predictions to actual outcome (say for the past few presidential and midterm elections).

Then you can easily turn the discrepancies into a weight to use in your average. Of course, one big problem I can see is that polling organizations will react to big errors by tweaking their methodology, so the
weight you calculate may not be a good estimator of the systematic error.


Thank you very much for addressing my question in such detail.

My only curiosity would be similar to that of the first comment; I wonder how the graph would look for state primary polls, which were originally the impetus for my concern about a consistent outlier's potential ability to skew polling averages.


Amit-- Yes, that is the gist of the approach. The key is to 1) measure both "bias" (in the statistical, not the political sense) and 2) unreliability, then compute weights that optimally account for these. We'll be there fairly soon. Your second point is also an issue, but we'll use current polling as well as past performance with election outcomes so that should be adaptive if someone improves their methods.

Shadow-- Thanks-- I'm working on the Iowa (and all the other states) measurements now. Look for something on the performance in the states early next week. I can tell you for sure the results will be worse than the ones here simply due to the smaller number of polls and the greater influence of individual pollsters. But how much worse? Ahhh. That is the question!

Thanks to all for the stimulating comments and pushing this line of posts along sooner than I might have otherwise done (seeing as how I'm in Hawaii and supposedly on vacation!)



Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.