Articles and Analysis


Approval Trends and Pollsters, Part 1

Topics: Approval Ratings , Barack Obama , Charts , Rasmussen


We get a lot of questions, comments and complaints about the effect particular pollsters have on our trend estimates. This is an important question, and today I'll start a series of posts on this issue. I want to encourage your comments and feedback. Over the series of posts I'll try to answer what I can, and we'll improve our approach when you raise points we aren't doing well enough and can improve on.  The focus will be presidential approval, but many of the issues are generic.

Yesterday Mark posted on the Rasmussen daily tracker and whether IVR interview methodology was enough to explain the generally low approval readings from that poll. Here I want to extend this to address the frequent comment the Rasmussen is systematically distorting our trend estimates because his results are consistently below the trend line.

Let's start with a bit of data. Rasmussen represents 90 polls in the Obama series above, while Gallup's daily provides 87 polls and all other pollsters contribute 55 polls. 

Even a casual glance at the figure makes it clear that the Rasmussen dailies run 2-3 points below the blue trend line, while Gallup's daily runs about the same above the trend. The other pollsters scatter widely around the trend.

The most common comment we get is that Rasmussen is clearly too low and is distorting our trend estimate downward. If we removed only Rasmussen, it is certainly true the trend estimate would shift up. But the problem is how do you "know" that it is Rasmussen who is wrong?  As one commenter put it "It's so annoying to see Obama at 57% when everybody knows he's over 60%!!!"  Well, that IS annoying if you "know" the truth, but how do we know the truth?  When I talk to Republicans they are equally certain we "know" Rasmussen is right and that it is Gallup that is obviously wrong.  How can we address this difference of views in a non-partisan, data oriented, way?

The best estimates we get for our trends are when we have lots of different polling organizations represented and none of them contribute a disproportionate share of the polls. When we get in trouble is the opposite, when one poll dominates and we have few other polls to calibrate against. An extreme case would be if we only had Rasmussen right now, or only had Gallup. Happily, that isn't the case.

At the moment we have 55 polls by firms other than Rasmussen or the Gallup daily (I include 3 USAToday/Gallup and 1 Gallup only  polls which are not dailies). These 55 polls come from 23 different firms with the most from any one firm being 4 polls.  This is just what we want for a standard of comparison-- lots of pollsters, none contributing too many.

This doesn't mean there are no house effects. Every polling organization has a house effect, some larger than others. But across all the pollsters we get heterogeneity in those effects with low balancing high and the result being the best estimate of the trend we can manage with polling data alone.

The chart above estimates the trend using only these 55 polls from the 23 non-daily pollsters. That trend is plotted by the black line. The blue line is our standard trend estimate, using all the polls, including the dailies. And for comparison I've show the trends for Rasmussen only and for Gallup daily only.

Clearly both Rasmussen and Gallup are quite different from the overall trend or from the non-daily trend. Pick your poison, neither of these is in agreement with the non-dailies. You can prefer high or you can prefer low, but the dailies are about equally far off the black trend.

But the key point for us is that the black line for non-dailies is very close to the standard blue trend using all the polls. The average absolute difference is barely 1 point (1.009, in fact) and 95% of the days find less than a 2 point difference between the blue and black lines. Sometimes blue is higher and sometimes black is higher. The average difference (not absolute difference) is that blue is 0.3 points below the black line. (The black line is a bit more variable because it uses only the 55 non-daily polls rather than all 232 polls.)

There are cases where we can't do this sort of analysis because of a lack of diversity in pollsters. Approval is a happy exception. It is clear there are pollster differences, but at this point they are not drastically affecting our results. If you SELECTIVELY exclude only low polls, then of course you can drive up the trend, just as you can selectively exclude only high polls and drive the trend down.

But when we take the most diverse collection of polls, we get pretty much the same trend estimates as we do with all the polls.  (You can go to the interactive charts and pick what to include or exclude and see how big a range you can get. Selection of high or low polls is the key to making the trend move a lot.)

Now, this is only part 1 of this series. I'm not claiming our trends are infallible. Far from it! I know all too well that they can break when given too little data or various kinds of bad data.

In the next installment of the series I'll respond to your comments here, and show an example of a more problematic case.



Of course, this approach could be corrupted if a regular pollster deliberately structured its sample or survey instrument so as to produce a low/high estimate of presidential approval or some other trend. It's likely that a commercial firm like Gallup or Rasmussen would have good business incentives not to do this, fearing being labeled as inaccurate. Then again, in the cable news realm, an analogous strategy has worked out quite nicely for Fox News' bottom line. With the cost of IVR and internet polling becoming lower and lower, it would not be that surprising to find a new firm backed by a wealthy partisan patron entering the game simply to produce fresh, daily "data" to serve as fodder for countering the opposing party's claims about the state of public opinion...


Chris G:

I have several general concerns:
-Is your local regression based on a Gaussian distribution (in other words, do you assume that the scatter across polls at a single time point is a bell curve)? If so, is it a fair assumption? The question in my mind isn't about whether this or that pollster is an "outlier" but whether they indicate a skewed distribution that distorts what's effectively an average in your time series.

-The rate of poll release, per pollster, varies a lot of course. So this can also distort a time series if, for example, an "outlier" pollster only releases once a month. That in turn might provide a slight but artificial dip in approval once a month. So I *think* you also need to assume that poll release is randomly distributed as a function of time. Is that also a fair assumption that you've examined?

-Pollster bias can partly reflect bias in demographic representation. But the difference in approval (or whatever) among demographics *itself* can change as a function of time. For example, GOP support may have declined steeply since January, more steeply than in other demographics, which in turn will cause an entirely different kind of trend to be introduced as a function of pollster. Just look at the last month--according to Gallup, approval as steadily increased while according to Rasmussen, it's steadily decreased. And this is over 1000s and 1000s of samples.

It's unclear how to handle any of that stuff other than bootstrapping the trend on the level of pollsters, or something like that. Or just write out a model that explicitly models error as a function of sampling and pollster. But to really understand what's going on I think we'd have to separate out from each pollster demographic breakdowns, or whatever we think might explain the bias. Say we just look at approval among Inds. Does variability across pollsters go down a lot? Do the trends then look more consistent?




One key question is whether Gallup or Rasmussen ask other substantive questions prior to the core approval question. If, for example, Rasmussen asked respondents first about the bailout plans or other economic matters, that would tend to prime them to think of the bailout when judging Obama's job performance. And if they do that and Gallup does not (and the others are split between asking other questions first and not) that could explain the differences completely. And it would also mean that Gallup's numbers are closer to being an accurate reflection of what Americans think absent the hypothesized priming effect. (And yes, as I told you last year, your Big 10 polls were quite guilty of this.)

I can't find information on either Rasmussen's or Gallup's web pages to give me even the exact question wording, let alone the full context of the survey. Have you guys been able to get access to that?

-- Joel



I noticed that you're using gross approval score. What about net score? I think Rasmussen's huge deviation on the disapproval metric is what attracts more scrutiny to their results than Gallup.

There's also the fact that Rasmussen's graph just plain doesn't have enough variance - they get streaks of identical results that are supposed to be independent samples (55-44 or 56-43). By my back-of-the-envelope calculations, a 50% (non-)confidence interval on a sample size of 1500 is still greater than 1% (1.6% total); this spans a couple of percentage points when allowing for rounding. This means that no single percentage point result should have a greater than 50% chance of getting hit - this alone already suggests that their results shouldn't be within-a-point consistent over several independent releases. Add in whatever other sources of random variance are there (like, the news of the day or the day of the week), and there should be a lot more variance than what's seen. What's going on here?



Chris G.:

- I believe pollster.com's trend lines are non-parametric smoothers, so they are not using a Gaussian model. It's just a local polynomial smoother.

- Concerns about rate of poll release are only worrying to me at the end points of the trend line. But that's just a part of the (well known) phenomena that we will tend to see more variance at the end points of the trend line than in the middle.

- I have actually played with some simple attempts at bootstrapped CI's for these data. For the most part, I've found that it doesn't add that much more info than just looking at the scatterplot of the data. Namely, the amount of variability is usually fairly clear just from the data themselves. The exception is near the end points (i.e. the current estimate) where I think bootstrapping, or other methods, may help to convey the point that the variability of our trend line estimate is usually larger here. Of course, that's also the part of the trend estimate that is of most interest, usually.

- And I haven't the foggiest notion how (or if) one should deal with differences in the internals between polls.



I would also add the following points (nothing new here, others have pointed these things out as well)

- Re:Rasmussen, I'm more interested in their (seemingly) lower variance and higher disapproval numbers than any effect their approval numbers might be having on the overall trend estimate.

- Are there enough other pollster's that use a 4 option approval question wording (as Ras does) to compare trend lines for these two groups?

- The _real_ way to answer these questions, of course, would be to get your hands on Rasmussen's raw polling data and then step through their post-processing methodology. Then you could actually test whether omitting the party ID weighting, for instance, was disproportionally responsible for moving their numbers away from the crowd.

(I'm not picking on Rasmussen; the same thing would be interesting to do with Gallup, as well.)



I remember sifting through Gallup and Rasmussen during the general election. Specifically, I recall asking myself the classic methodology question: "Is party identification a valid weighting value?"

I'm not a critic of Rasmussen in general, but I do have to say I have significant issue with weighting by party ID. I simply don't trust it from a pure math perspective.

My concern boils down to this:

Rasmussen is calculating party ID based on polling data. To their credit, the data set is incredibly large; exceeding 10,000 surveys in most instances. However, even with a 10k sample one can not ignore the implied margin of error. This is especially true given they round their findings off to the nearest 10th of a percent.

Given that one of their demographic weights is not a "real" number, i.e., not age, gender, race, etc., we have to wonder if Rasmussen isn't inducing an unacceptable level of error.

Additionally, the average variance of their reported results seems exceedingly low from what we know of random sampling error. I would hypothesis that some of this can be accounted for by the party ID weighting methodology, as they only update their Party ID weights monthly. However, even given that possibility the lack of variance appears highly suspect.



Let's take a look at the polling (I don't like Internet polling, so I'm un-checking those) with a focus on Gallup, Rasmussen and PPP.

All polls minus Rasmussen, Gallup, PPP and Internet (numbers in parenthesis include USA Today/Gallup):
Approve 61.6% (61.7%)
Disapprove 30.1% (30.0%)
+31.5% (+31.7%)

Just Gallup (numbers in parenthesis include USA Today/Gallup):
Approve 61.7% (62.0%)
Disapprove 29.7% (29.6%)
+32.0% (+32.4%)

Just Rasmussen:
Approve 54.4%
Disapprove 44.9%

Just PPP:
Approve 53.0%
Disapprove 41.0%

Gallup is 0.5 points off the mark of ALL the surveys minus Gallup, Rasmussen, PPP and the Internet-only ones.

Rasmussen is 22 points off the mark.

PPP is 19.5 points off the mark.

It seems to me that Gallup is actually pretty much right on track with the average of all the other polls, while Rasmussen and PPP are way, way, way off the mark.

Judging by that, I don't know how any one can seriously take the numbers Rasmussen and PPP are presenting seriously.



A couple of suggestions: weighting and decaying.

As it (I'm guessing here), you appear to give every data point equal weight in some sort of regression (LOESS?). Given that the final app/disapprove number is essentially a fancy "rolling average", any pollster who puts out daily scores will disproportiantely drive your model. You can easily see this by selectively removing pollsters. Take Rasmussen out and the dissapproval numbers moves 7.1(!) points (as of today, when I first noticed this, it was over a nine point effect) - a much larger effect than any other pollster (the nearest being to your point below Gallup). A quick count on your site shows that there are 36 pollsters: 30 doing phone, 3 doing robo-phone, and 3 doing internet - why not give each one an equal vote in the presidential approval/dissaproval?

And to help ensure that your numbers reflect the current mood, add in a decay mechanism - basically a diminishing weight over time (I'd probably front weight it and then have a rapid tail off - so polls that are today add 100% of their weight to the model, yesterday 98, 2 days ago 95, week old 80, two weeks old 20, etc.). This would provide an incentive to pollsters who regularly poll, as Rasmussen would always be at 100% contribution, but not allow them to disproportiately drive your model.

Obviously there are other issues as well, and taking each pollster on face value is probably a dangerous thing, given that certain methologies will bring methodological bias into your model. Just to rattle off a handful: 1) internet panels are opt-in, and don't reach non-traditional respondents (a large part of Obama's constituents), 2) internet & robo-call tend to get the stay at home types - generally older and more conservatives, 3) party weighting brings in subjectivity, etc., etc. The beauty of a site like Pollster.com, is that by stirring all the different pollsters in together, you generally hope that these individual differences/flaws will wash out - however, I think I would generally downweight the less robust methodologies (i.e., give internet and IVR a lesser impact than phone in your in your total model, and while your at it, I'd downweight phone methodologies that don't do a cell-phone supplement).

Of course the decay weights and the methodology weights are subjective as well - how do we decide when data become stale? how much more reliable is phone? - so I can see not wanting to wade into that mess. But as it is, the final app/disapp numbers just don't jive as one pollster is contributing a weight equal to roughly 30 others - hurting the credibility of your otherwise respected website.



Just chiming in belatedly to say thanks to Charles for the analysis, but also to second what AySz88 said.

Rasmussen may not vary more than Gallup from the overall trendline when it comes to Obama's approval numbers, but it's off on its own on his disapproval numbers. And the chasm is even larger - and seemingly increasing - on Obama's unfavourable rating.

I commented about that point in a little more detail on the blog I'm on: Rasmussen gone rogue?.

(Sorry, don't mean it as a shameless plug, just don't want to repeat myself! Anyway, I'd bow to Charles' expertise any day, so I'd love to hear him give his take on this point.)


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.