Pollster.com

Articles and Analysis

 

Bialik on Poll Mash-Ups

Topics: Pollster.com

Carl Bialik, author of "The Numbers Guy" column for the Wall Street Journal, takes a balanced look today at the pitfalls of something we do here at Pollster, "mashing up surveys from various sources this election year to produce composite numbers meant to smooth out aberrant results." His piece is worth reading in full, as it considers both the benefits and risks of creating composite trends or averages:

Stirring disparate pollsters in one pot has its critics. "That's dangerous." says Michael Traugott, professor at the University of Michigan, and author of a recent guide to election polls. "I don't believe in this technique."

Among the pitfalls: Polls have different sample sizes, yet in the composite, those with more respondents are weighted the same. They are fielded at different times, some before respondents have absorbed the results from other states' primaries. They cover different populations, especially during primaries when turnout is traditionally lower. It's expensive to reach the target number of likely voters, so some pollsters apply looser screens. Also, pollsters apply different weights to adjust for voters they've missed. And wording of questions can differ, which makes it especially tricky to count undecided voters. Even identifying these differences isn't easy, as some of the included polls aren't adequately footnoted.

Bialik quotes both me and Charles Franklin in the column, but here are a few additional thoughts. We do not consider the trend estimates to stand as worthy replacements to the data from individual surveys. The trend lines -- and the estimates derived from their end-points -- are best considered as tools to help make sense of the barrage of often conflicting results from individual surveys. We learned in 2006 that "mashing up" surveys and "smoothing out" the variation between them helps counter the instinct to overreact to variation between individual polls -- some of it clearly aberrant -- that is common in hotly competitive political races. Moreover, while we only plot a few summary measures here such as vote preference and job approval, many of the surveys we report and link to include a wide variety of questions that help illuminate many aspects of public opinion.

Bialik is correct to argue that benefits of averaging lessen when we start to see large and consistent "house effects" separating the results from different pollsters. If a few polls are providing good estimates, while many other polls have misleading results, the mashed up averages may reflect more of the bad than the good. I wrote as much just before the Iowa Caucuses. Bialik correctly notes that the averages were misleading in California, where most polls showed the Clinton-Obama race closer than it turned out to be. His suggestion that we could "bolster" the case for trend estimates or averaging by comparing those numbers "directly against those from individual polling firms in terms of election accuracy" is a good one and something we are working on.

Bialik adds some additional detail in a companion blog item that focuses, among other things, on my calls for greater disclosure of methodological details, which includes a response of sorts from Zogby International:

When I asked Zogby spokesman Fritz Wenzel for further details, such as what those flawed estimates were, and passed along a blog post from Mr. Blumenthal calling for more disclosure from the firm, Mr. Wenzel dismissed sites like Pollster.com as “rivals.” “We are satisfied that we have identified the problem in California,” Mr. Wenzel wrote in an email, “and giving our rivals more ammo in the form of methodological detail, some of which is proprietary, with which to criticize us further doesn’t make the world a better place.”

Bialik is asking his readers comment on the value of composite poll numbers and whether better disclosure would "make the world a better place. Your comments are welcome here or there (or both!).

 

Comments
Hudson:

This is a thoughtful response to a thoughtful article. Rather than getting defensive, I think you've presented the Times' point of view and your own in a balanced and objective way. Thanks.

____________________

Hudson:

(Sorry, that should obviously be the WSJ not the Times.)

____________________

s.b.:

Who would hire Zogby really? Or pay to read their polls? If Zogby found kids like puppies, I'd question their results.

What happened in California was that they weren't neutral and were trying to do everything to make it look like Obama was going to win.

____________________

Jeff:

Wow, that Wenzel reply undermined Zogby's credibility. Interesting that he regards the site as a competitor.

____________________

Daniel T:

"His suggestion that we could "bolster" the case for trend estimates or averaging by comparing those numbers 'directly against those from individual polling firms in terms of election accuracy' "

I would think that this suggestion creates more problems than it solves. On the surface it seems that this type of "weighting" would be helpful. In reality, while it might make day to day data look better, it would also magnify any mistake. "Past performance is no promise of future performance" and by creating a trust factor you are setting yourself up for bigger falls. Far from "bolstering" your results, you have the potential to undermine them greatly.

Right now, you have situation where you have lots of auto accidents on the road and you are going to trade that for a situation where you will have planes crashing from the sky. As a matter of fact, airplanes are the safest way to fly but they are not percieved that way because of the concentrating effect of large crashes. Meanwhile, cars are less safe but this gets lost in the day to day experience of living and we don't give it much thought.

I think the perception that you have a problem largely stems from people who don't understand stats or research very well. All the so-called pitfalls noted above really aren't, unless you don't know what you are doing with statistics. After all, the goal is not to easure the same thing all the individual polls measure. It is to produce an average, and an average, by defination, is a meta-concept. Michael can chose "not to believe" if he wants but the reality is that math is not a belief system; if it was, I'd be walking through walls.

____________________

Chris G:

Mark- sounds like you guys are on the right track. i don't know your background (or Dr. Franklin's), but from my armchair I'd like to recommend some kind of Bayesian algorithm that treats each pollster like a random variable, and for a given poll release, updates current estimate based on the previous estimate, time lag, what the poll says, and who the pollster is.

as a numbers geek i have to say your site's the best I've seen out there, you guys do a very thorough job teasing apart bias and different factors.

____________________

Daniel T:

Chris G:

I read your comment both places and in general I agree. However, if you are going to treat the poll as a random variable, you need to treat the *entire* poll that way. You can't weight by sample size because as soon as you do that the poll is no longer random, you have introduced bias in the form of weighting.

Either the polls are treated randomly or they are not, you can't treat some polls as random and others as not. Well, you can do that but as I pointed out above you introduce as many problems as you solve and so what's the point.

The biggest problem with poll averaging is not the methodology, it is N. As we saw in NV, it is impossible to produce a meaningful poll average with an N of three.

____________________

Chris G:

not at all Daniel T, weighting is just a scaling factor. you have some probability distribution p(X|theta) where theta is a vector of different parameters (pollster, etc) and X is your random variable (measured public opinion). just multiply that by a weight M and you've got M*p(X|theta), still random, you're just changing the relative contribution of the poll to your composite. think about it this way--you're just converting the % back to a frequency. the next step is to figure out how you combine all of these different frequencies into a single model. it's tricky but certainly feasible, just requires principled assumptions and perhaps some alternative models to test against.

it's true that simple averaging, or smoothing in time, is a bit tricky w/ different methods and a small sample of methods, but it's still likely to be better than a single poll provided that there aren't outlying methods that are especially good or bad and distort your measure (as i think Bialik talked about). but that's a surmountable problem if you also weight the relative contribution of a poll based on the pollster and past performance. a completely legitimate strategy provided that you've got the right algorithms in place.

____________________

Thom:

I'm with Daniel T. on this one, at least up to a point. I'd be nervous about a weighting scheme that gave *more* weight to the pollsters with the best track record. On the other hand, I think the averages would be much improved if all polls in the average were treated equally--but some discrimination was employed to exclude a few polls or pollsters that fail to make the grade, based on recent performance.

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR