Articles and Analysis


Miller: What Pollsters Can Learn From Climate Modelers

Topics: Climate Modelers , Disclosure , Likely Voters , Nate Silver

Guest Pollster Clark A. Miller is an Associate Professor at Arizona State University. His post expands on a comment left on Pollster.com on Friday.

As Mark Blumenthal and Nate Silver have both noted in detail of late, the design of likely voter models can significantly impact how pollsters interpret and transform the raw data of voter samples into the topline results we see at pollster.com, fivethirtyeight.com, and other sites covering election polling. In turn, Mark and Nate observe, likely voter model design depends significantly on judgments that pollsters make about how to model the likelihood that any voter sampled will actually turn out and vote in the election. As we have all seen in the last few days, differences in how such judgments get made by different pollsters, combined with differences in the samples of voters collected by each poll, can mean the difference between a 1-point and a 14-point spread between the respective candidates for President.

A key challenge for consumers of polls - whether citizens, journalists, or politicians - is sorting out to what extent the likely voter model or the underlying raw data sample is responsible for variations in poll outcome. In fact, this sorting out of how judgments made by modelers impact model design and outputs is a general challenge in the use of science to inform policy choices, which I have studied for much of the past two decades. Judgments like this are inevitable in any scientific work, which is why policy officials turn to experts to make judgments on the basis of the best available knowledge, evidence, and theories.

One case that I have looked at in detail is the use of computer models of the Earth's climate to make predictions about whether the planet is experiencing global warming. As I'm sure most of you know, models of climate change have been viewed skeptically by many people. I believe the trials and tribulations of climate modelers - and also their approaches to addressing skepticism about their judgments - offer three useful insights for pollsters working with likely voter models.

  1. Transparency - climate models are far more complex than most polls, but climate modelers have made significant efforts to make their models transparent, in a way that many pollsters haven't. (In much the same way, computer scientists have called for the code used in voting machines to be open source.) By making their models transparent, i.e., by telling everyone the judgments they use to design their model, pollsters would enhance the capacity of other pollsters and knowledgeable consumers of polls to analyze how the models used shape the final reported polling outcome. They would also do well to publish the internal cross-tabs for their data.
  2. Sensitivity - climate modelers have also put a lot of effort into publishing the results of sensitivity analyses that test their models to see how they are impacted by embedded judgments (or assumptions). This is precisely what Gallup has done in the past week or so, in a limited fashion, with its "traditional" and "extended" LV models and its RV reporting. By conducting and publishing sensitivity analyses, Gallup has helped enhance all of our capacity to properly understand how their model responds to different assumptions regarding who can be expected to vote.
  3. Comparison - climate modelers have also taken a third step of deliberate comparisons of their models using identical input data. The purpose of such comparison is to identify where scientific judgments were responsible for variations among models, and where those variations resulted from divergent input data. Since the purpose of polling is to figure out what the data are saying, it is essential to know how different models are interpreting that data, which can only be done if we know how different models respond to the same raw samples.

The reason climate modelers have carried out this activity is to help make sure that the use of climate model outputs in policy choices was as informed as possible. This can't prevent politicians, the media, or anyone else from inappropriately interpreting the outputs of their models, but it can enable a more informed debate about what models are actually saying and, therefore, how to make sense of the underlying data. As the importance of polling grows, to elections and therefore to how we implement democracy, pollsters should want their polls to be as informative as possible to journalists, politicians, and the public. Adopting model transparency, sensitivity analyses, and systematic model comparisons could go a long way toward creating such informed conversations.



A great idea. Unfortunately, the scientific philosophies of transparency, sensitivity, and comparison, along with reproducibility and falsifiability, are too often considered "too egghead" by those outside the enterprise, as well as too much effort, for the risk of yielding unbiased results contrary to that desired. The politics and the proprietary sense of polling will likely prevent transparency et. al. from becoming an essential part of polling, I fear. I would love to be proven wrong, though.



Excellent comments. At least on climate control it is easy to debate the merits based on the model and the assumptions being made.

While I think the R2K/DailyKos polls use modelling assumptions that overstate D strength ... an opinion ... not saying they will be wrong ... AT LEAST it's pretty transparent what they are doing unlike CBS, CNN, Time, etc. The media polls ought to be the most transparent of all.



The problem in part is that pollsters get assessed in th public mind by how close they were to the funal result, which with sampling error is part skill but part lottery.

In addition, they are each so convince of their own correctness that they don't want other to replicate their methods or they fear they will lose their comptitive advantage. In science you want others to replicate your method, it shows you were right. In commercial polling, it costs you a competitive advantage that you at least imagine you have.

The best thing is to keep up the pressure on pollsters to be transparent, and publicly do so as this post and others at pollster.com and 538.com and others do. On its own the market won't deliver transparency unless it is seen to have a competitive advantage.


The point of the R2K/Daily Kos poll is to be as transparent as possible. This is a concept first urged by Mark Blumenthal here on this site, and one we subscribe to. As PlayingItStraight notes, however you feel about it, the data is there to peruse and pick apart.

Mark has analyzed youth vote, and we (Daily Kos) don't look too shabby, though the proff will be on 11/4.



It would be great to be able compare, as the good professor suggests, the data from one poll to another but even if this does not happen, getting the basic or even complete model for the dataset would allow us to see how the samples actually differ.

There are probably good economic reasons why pollsters don't share data but harder to see why they would hide their models. R2K is to be complimented on putting out full crosstabs on everything!

Then, if as many do, you think they have a liberal/democratic slant, you can look at how the sample was adjusted and draw your own conclusions.

I think there are probably good political reasons why Democratic and Republican pollsters would hide their slanted results but we could also begin to ignore polls that do not open up.

Here's to increasing the state of sunshine in the process of democracy.


Dwight McCabe:

When policy makers and politicians approach data with preconceived notions, they can misinterpret the results, both unwittingly and on purpose.

I agree with Professor Miller that it's especially important that modelers and pollsters work to present the data and its' meaning as clearly as possible to prevent those who might misunderstand or distort the results.



Unfortunately, there are some pollsters who engage in what I can only regard as deliberate distortion. Here my post from today's Tipp board.

Tipp has a unique distinction, since having entered the fied on 10/13.

If you take the mean and standard deviation of each day's national polls (all of them -- no cherry picking), then Tipp's reported support for Obama is outside the standard deviation each and every day.

This is truly remarkable, particularly since on most days he is the only pollster to do so, and he managed to fall outside the SD even on days he recorded the largest spread.

But his figures for McCain's support have been consistently within the standard deviation.

What does this tell us? That Tipp's model is both methodologically flawed and deliberately skewed for his client.

The evidence is incontravertible.

Posted on October 26, 2008 5:40 PM



Question: If we can't expect people in high up government positions to use scientific study and data to implement scientifically backed public policy, presumably because of partisan opinions, how can we expect pollsters to use and publish scientific data to back up polls ,assuming they also have partisan leanings?



Thanks for the comments, everyone. A few quick responses:

1. I certainly don't anticipate that achieving these goals will be easy. Believe me, it wasn't in the case of climate change. The first models began to appear in the 1960s, and it wasn't until the 1990s that a systematic approach to model comparison was built. Why then? Because there was pressure from the folks who needed to make decisions based on these models. Having smart politicians who want to make sure they're getting the best information might therefore present some sort of a potential lever on firms to open up to this kind of an analysis.

2. I could imagine the same being said of the media. In some respects, they are currently a large part of the problem, only wanting top-line information so they can say "X is winning" or "it's a very tight race". But they could be the best, wanting more sophisticated analysis to provide their readers. Having a poll that was known to follow well vetted modeling practices could be a competitive advantage for a media outlet, and if one outlet does it, maybe others would be obliged to follow.

3. Or the push might begin with academic pollsters. Here, the idea of building a better mousetrap and publishing about it could generate interest in these ideas.

4. @CTPonix especially -- I will be up front and say that I think people will always be tempted to interpret or spin data one way or another, whatever the subject. The reason is that evidence is never complete and certain (or at least this is exceedingly rare, especially on the time scales or spatial scales that you need the information). The strategies I suggest will therefore make no difference to how people try to spin or interpret data.

What the strategies will do, on the other hand, is make it possible/easier for knowledgeable observers to critically assess those interpretations and spins and to write about and discuss them. I think this kind of deliberation is critical precisely because there will always be uncertainty about polls, so we need to be able to foster debates about that uncertainty, not let those who "own" the data spin it with impunity.

My favorite example is the claim made by the Bush Administration that Iraq possessed WMD. The fact that the IAEA publicly disagreed with their claims forced the administration to overplay its hand and insist on the certainty of what was fundamentally uncertain evidence (or maybe they just outright lied). Either way, they were compelled to make a huge effort to find the WMD, and when they didn't they lost a whole lot of credibility. That's the way public deliberation about evidence can work, over the long-term. Same could be said of Watergate. The law could never prevent Nixon from breaking into the Democratic party offices. But it could hold him accountable later, if it had the right evidence.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.