Articles and Analysis


On Being a Wikipedia, Not a Google

Topics: Charts , House Effects , Pollster.com , Strategic Vision

Reader DG emails with a reaction to my column and accompanying post on Monday about Strategic Vision LLC:

I am a huge fan of your work and deeply appreciative of all the effort you and your staff have put into making pollster.com one of the best political sites on the Internet.

I do have to confess, though, to being deeply disturbed by the debacle with Strategic Vision. The fact is that there have been problems with the shop for years, yet little attention was paid, even while respectable bloggers (such as electoral-vote.com) made the call in 2004 to stop reporting SV's numbers as they were consistently, and suspiciously, pro-GOP. SV appears to me to be a very bald-faced effort to gratuitously influence national and local debates through nefarious means, and could have seriously damaged the reputation pollsters have worked so hard to build over the preceding decades. Even worse, Strategic Vision was enabled by people who damn well should have known better, like yourself.

Your site is a one-stop shop for journalists, pundits, Administration officials, etc. and anything that gets reported by you is magnified because of that. Moreover, these people do not have the time or training to effectively evaluate polls. As such, you have a responsibility to ensure methodological rigor is adhered by the pollsters whose results you report, and you must begin to call out anything from consistently being an over-the-top outlier to having an uncommonly large (such as Kaiser) or uncommonly small (Fox) party ID spread. I am not even saying to stop reporting polls like Kaiser or Fox, simply make it clear that there are methodological hang-ups with the data that your readership should be aware of. Your "general philosophy" of reporting results as long as the pollster "purports" to adhere to methodological basics is at best lazy, at worst, dangerous. Like it or not, websites such as yours have become such powerful aggregators of information that you must impose some kind of control to limit the ability of the mendacious and malicious from having an undue influence. You must be a Wikipedia, not a Google.

I agree with DG's general argument: Sites like ours need to do more to help readers evaluate individual pollsters and their methods. That was the spirt of the three part series I wrote in August titled, "Can I Trust This Poll," and the reason why I want to use our site to actively promote better methodological disclosure by pollsters.

That said, I'll cop to "lazy" in just one respect: On Monday, I gave short shrift to our "general philosophy." I combines two goals, (1) making all poll results available and (2) providing an informed and critical context -- through interactive charts and commentary -- for understanding those results. The best examples are the interactive tools we built into our interactive charts (the "filter" tool and the ability to click on any point and "connect the dots" for that pollster) to make it easy to compare the results for any individual pollster to the overall trend. We have also devoted considerable time to commentary on pollster house effects both generally and for specific pollsters (like Rasmussen).

I'll also take issue with the idea that we "damn well should have known better" with respect to Strategic Vision. The evidence that they were a "consistently over-the-top outlier" relative to other pollsters is weak. This was Charles Franklin's take three years ago:

I tracked 1486 statewide polls of the 2004 presidential race, of which Strategic Vision did 196. The Strategic Vision polls average error overstated the Bush margin by 1.2%. The 1290 non-Strategic Vision polls overstated KERRY's margin by 1.3%. Further, the variability of the errors was a bit smaller for Strategic Vision than for all the other polls combined.

Try the connect-the-dots-tool on the 2008 Obama-McCain charts for Pennsylvania, Florida, Georgia and Wisconsin (the states where Strategic Vision released five or more "polls"), and make your own judgements for 2008.

But again, I tend to agree with DG's central thrust. We can do better. I am particularly intrigued by DG's comment about being "a Wikipedia, not a Google." What Wikipedia is about, for better or worse, is "crowdsourcing." A few weeks ago, the Wall Street Journal described crowdsourcing as the idea that "there is wisdom in aggregating independent contributions from multitudes of Web users." How might a site like ours help individuals collaborate on efforts to evaluate pollsters? If you have thoughts or suggestions on any of this, we would love to hear them.



Thanks to DG for the serious yet critical email. This kind of email is very helpful to us and we take them seriously (as opposed to the ones that say "you are idiots", which may be true in my case but isn't really helpful in improving our efforts.)

Mark quotes my results for SV in 2004. The larger point is that we do a lot of analysis that may not make it to a post but which informs our published results. House effects is an example. We reestimate the house effects model regularly, but since little changes from week to week it doesn't seem worth a continuously updated chart. Now that is arguably wrong, and someone like DG could convince me otherwise. But the point is we don't ignore outliers and we don't ignore house effects. Rather we analyze those constantly to be reassured they aren't driving trend results. Maybe we should have a "back room" feature that exposes those analyses and updates for all to see, rather than restrict them to occasional updates though.

The SV debacle is also a result of a breach of professional ethics, at the very least in terms of (non-)disclosure, and at most outright fraud. I think Mark's repeated efforts to bring about more complete disclosure are hugely boosted by this case and make it evident that honest pollsters (which I remain confident is almost all of them) should embrace greater disclosure as a means of protecting their reputations.

But the size of SV's house effects are not really large enough to have raised red flags. The case against them raised in Nate Silver's analysis is based on factors other than extreme outliers or large house effects. In fact, if you wanted to fake polls, the best thing you could do is take our trend, add random noise +/- 4%, and post your "polls". The result would be a small house effect, few outliers, and a reasonable variance. Done by a pseudo-random number generator, this would also pass Nate's tests for the distribution of digits.

Over the winter holidays I'll be updating our house effects estimates and some assessments of poll performance, both present and past. You'll see a number of results from that early in the new year that will attempt to provide a better look at each pollster as well as assessments of what effect they have on trend estimates.




Actually, Google also uses crowdsourcing: it analyzes the network of hyperlinks throughout the Internet. So it doesn't make quite make much sense to me to say that crowdsourcing is a difference between them.

I think what the writer might mean is how Wikipedia subjects sources to scrutiny and transparency - there is a lot of effort in making sure inputs are from reliable sources. Google tends to be more automatic and opaque about how it treats its inputs.



A model for intelligent data averaging you may find useful is the Particle Data Group (PDG) based at Lawrence Berkeley National Labs. They keep a database of all the relevant quantities of interest to particle physicists. They do much more than simply average various results. They examine methodology, they talk to the experimenters, and they only use results published in peer-reviewed journals.

They also take care to add statistical and systematic errors properly, taking into account correlated uncertainties. They are trusted to such a large degree that when the PDG says a certain particle exists, it exists, otherwise it does not.

You could be the PDG of polls. In fact, you are already. But it's possible that you need to increase your threshold for acceptance.



This is just a random idea from a completely unqualifed reader, but maybe a rating system? I'm picturing it kind of like the 1-5 rating scale used for restaurants and movies.

Put together a small working group of a few dozen pollsters, statisticians, political scientists, and other assorted math geeks who all agree to occasionally and anonymously audit some of the backroom data and rate each pollster in broad strokes. Then post the firm's cumulative rating any time a pollster is cited to make it easier for casual readers to assess the reliability of a poll. Link the rating to a posting, updated monthly, explaining each pollster's score and maybe some comments from reviewers that don't fit neatly into a numerical score. The vast majority of pollsters would score well, and those who don't could always request not to be included on Pollster.com.

I think readers like me who lack the skills to assess polling firms would find this a very useful feature.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.