Articles and Analysis


A Surrender of Judgment?

Topics: 2006 , The 2006 Race

I had an unhappy experience yesterday morning while still down for the count with a persistent fever (it has broken finally and thanks to all for the kind get well wishes). As I lay shivering, achy and generally miserable, my wife kindly ventured outside to find me some distraction in the form of our dead-tree copy of the morning's Washington Post. It took me only a minute or two to discover that Jon Cohen, the new polling director as the Post, had penned a column that mounted a veiled but clear attack on this site and others like it:

One vogue approach to the glut of polls this year was to surrender judgment, assume all polls were equal and average their findings. Political junkies bookmarked Web sites that aggregated polls and posted five- and 10-poll averages.

But, perhaps unsurprisingly, averages work only "on average." For example, the posted averages on the Maryland governor's and Senate races showed them as closely competitive; they were not. Polls from The Post and Gallup showed those races as solidly Democratic in June, September and October, just as they were on Election Day.

These polls were not magically predictive; rather, they captured the main themes of the election that were set months before Nov. 7. Describing those Maryland contests as tight races in a deep-blue state, in what national pre-election polls rightly showed to be a Democratic year, misled election-watchers and voters, although cable news networks welcomed the fodder.

More fundamentally, averaging polls encourages the already excessive attention paid to horse-race numbers. Preelection polls are not meant to be crystal balls. Putting a number on the status of the race is a necessary part of preelection polls, but much is lost if it's the only one.

We need standards, not averages. There's certainly a place for averages. My investment portfolio, for example, would be in better shape today if I had invested in broad indexes of securities instead of fancying myself a stock-picker. At the same time, I'd be in a much tighter financial position if I took investment advice from spam e-mails as seriously as that from accredited financial experts.

This last point exaggerates the disparities among pollsters. But there are differences among pollsters, and they matter.

Pollsters sometimes disagree about how to conduct surveys, but the high-quality polling we should pay attention to is based on an established method undergirded by statistical theory.

The gold standard in news polling remains interviewers making telephone calls to people randomly selected from a sample of a definable, reachable population. To be sure, the luster on the method is not as shiny as it once was, but I'd always choose tarnished precious metals over fool's gold.

I want to say upfront that I find the charge that our approach was "to surrender judgment," "assume all polls were equal" and blindly peddle "fool's gold" to be both inaccurate and deeply offensive. While it is tempting to go all "blogger" and fire off an angry response in kind, I am going to try to assume that Mr. Cohen -- whom I do not know personally -- wrote his column with the best of intentions. At the same time, it is important to spell out why I fundamentally disagree with his broader conclusions about the value of examining and averaging different kinds of polls.

[Unfortunately, having lost a few days to the flu, I need to pay a few bills and attend to a few other details here at Pollster. I should be back to complete this post later this afternoon. Meanwhile, please feel free to post your own thoughts in the comments section].

Update (11/16): Since I dawdled, the second half of this post appears as a second entry



The Post only worries about averaging when other people do it. In this case they didn't strictly average the predictions, but it was a de facto sort of averaging.


Incidentally, they haven't released Chris Matthews' predictions. Hmm, I wonder why not.



I never came to pollster.com to get a prediction of what was going to happen on election day. I came to look at the trending data within a context of all the other things going on.

For example, when the numbers on Ford in Tennessee started to trend downward, it made me wonder how effective the "Call Me" ad really had.

The trends, not any specific poll showed that the Macacca incident in the Allen campaign was a real turning point--not a one or two day event.

Trends were much harder to spot in House races, and the polling was far less accurate or reliable, but it gave some sense of who is in play.

I think the strongest part of pollster.com was the blog that gave intelligent insight into polling methods, limitations, challenges and utility.

Tips for those with the flu: get more sleep than you think you need; drink more water than you think you need; avoid the Washington Post.


Hong Kong pollster:

I think the target of WAPOST was more the Real Clear Politics site and others than yours. Pollster.com always had charts breaking out IVR and phone polls and showing combined trendlines in comparison. Any decent technician would look at those, look at the averages of all polls from other sites where they did not distinguish IVR and phone polls, and know yours was not just simply averaging the poll numbers. the WAPOST article does make a valid point about other sites that simply posted averages and also didn't bother indicating range of error and so on. Unfortunately, from long experience with the press, trying to correct technical details and point out critical but tough to explain to the layman details that make a difference is all but impossible. And I work with Chinese journalists and readers who are supposedly much more maths oriented than Americans.


Chris G:

Man, not only is that attack unwarranted, but it is completely myopic. First of all, I live in Maryland and can tell you that the few polls showing a dead heat in the Senate and Gov races were widely reported in the mass media, *including the Washington Post*.

Of course, if you were gonna publish these data in a journal you couldn't just average. There's a ton you could do to treat the polls more like a time series, factor in known biases by pollster and method, etc. But come on, what you did was a hell of a lot more accurate than what practically every major outlet did. Jon Cohen cherry-picked Post polls and apparently doesn't understand that races can change overnight. And to say that traditional polling methods are "undergirded" by theory is naive. So many problems with polling methods, such as undetermined response bias, likely voter models, failure to account for priors when predicting outcomes, etc, MOE is almost always understated.

So you have the full support of a fellow stats buff who works in science. Let's face it, this isn't physics, its human behavior.



I had a second thought about what I wrote above. I fell trap to the "unitary whole fallacy," e.g. the idea that a complex entity like The Washington Post speaks with a single voice. Politics is the big game for them, and they look at it from all kinds of angles, some of which will necessarily contradict each other.

Along those lines, could it be that Mark Blumenthal is too defensive? Did Cohen really "attack" this and other sites, or was he presenting a particular line of analysis for consideration?

It isn't an all-or-nothing world. "The Post" prints a variety of views. Not only within the newspaper in general, but within individual articles. Just a thought.


Mark Lindeman:

"Tips for those with the flu: get more sleep than you think you need; drink more water than you think you need; avoid the Washington Post."

Now, that may just be the wisest and most compassionate comment I've read on the Net all year.



The oddity here is that a 'polling director' would simply ignore the powerful (and, to me, persuasive) statistical reasons that suggest an average of polls will give you a better prediction than any individual poll. Yes, the Maryland results are peculiar-- but that's an argument in favor of looking at averages. How would you know, otherwise, that the Maryland results stand out from the others?



Mr. Cohen's column strikes me as blatantly foolish on a couple of fronts.

First, when referring to live caller methodology as the "gold standard," he ignores the rapidly mounting evidence that the robo-pollsters are at least as reliable as those who use live interviewers exclusively. Mr. Cohen does his readers a disservice by pretending there isn't a legitimate scientific debate over the merit and performance of IVR.

Second, regular consumers of polls, the "political junkies" Cohen refers to who bookmark Pollster.com and RCP, are well aware that all polls and pollsters are not equal, thank you.

(There are others better qualified to address the idea that five poll averages mitigate distortions caused by an outlier.)

Further, we political junkies can also spot an apparent outlier when we see one and recalculate accordingly. Mr. Cohen's suggestion that Pollster and RCP readers swallow these averages blindly is an insult.

I regularly looked at which polls were figured into an average, and took it upon myself to toss the ones in which I had little faith. For example, I routinely disregarded the CD/RT Strategies data. And more often than not, I tossed the Zogby internet polls. Twenty seconds with my trusty Radio Shack calculator, and voila... a new average that I felt more was more trustworthy.

Pollster and RCP provide a valuable service by serving as a clearinghouse for publicly released data. Both provide ample context to help laymen and pros alike to interpret the information.

Mr. Cohen should be half as thorough.


Rick Brady:

MP, I tend to agree that averaging polls shouldn't be the tool for evaluating pollster performance. It's useful to help understand things leading up to an election, but all polls (and pollsters) are not created equal.

OTOH, I don't think it's apporpriate to evaluate an individual pollster's performance with a snapshot comparison of one poll taken shortly before the election (that darn sampling error thing...).

So, how do you: A) evaluate the performance of a single pollster?; and B) evaluate the performance of the industry as a whole?

A) Using appropriate statistical techniques (I'd argue the recent Martin-Traugott-Kennedy measure is the best thing on the market for this type of analysis right now), look at the history of a pollster's performance. Snapshots are no good.

B) Generate a top 5 or perhaps top 10 list of pollsters based on historical performance (again using the appropriate statistical toold). An argument could be made that these pollsters's polls are basically "alike" and there are other statistical tools that can be applied to evalaute the combined performance of all 5 or 10 of these polls for a single election. Beats the averaging.


Very good site!! [URL=http://xoomer.alice.it/youfreeslots/]Free Slots[/URL].


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.