Re: Disclosure Project: Results from Iowa

Topics: 2008 , Disclosure , Divergent Polls , Iowa , The 2008 Race

Last week's Disclosure Project report produced two good questions worthy of follow-up.

Q: Given then almost complete lack of overlap in the way pollsters are defining likely caucus goers, how useful are poll averages?

Good question. Averaging polls with differing methodologies is always a bit risky if those differences affect the results in a big way. Simple averages of the most recent polls can get distorted when one "outlier" value enters the average. That's one reason why we have greater confidence in our regression trend lines. Because they draw on all of the available data rather than just a handful of recent polls, they are less likely to be thrown off by a single odd value. But the Iowa example is a tough example because, as the reader understands, the selected "likely voter" universes are so different, and because those difference affect the results.

It may be helpful to think of this process like a game of darts. Suppose twenty people all threw darts at a bullseye. Some throws would be more accurate, some less so. If we imagine that we could see only where the darts landed (but not the target) and then picked the center-point in the pattern misses, that point would probably be pretty close to the bullseye.

In a sense -- and like all metaphors, this one is imperfect -- that's why poll averaging works. When all of the pollsters are aiming for roughly the same target (or the same universe of "likely voters"), the average of their efforts typically gets us closer to reality than any one poll, largely because of the inherent random variability that affects individual surveys.

But what if those "throwing the darts" can't see the target and are guessing at its location? What if some aim carefully while others throw carelessly? What if some players guess at the target by looking at the throws made by other players? In that case, the mid-point of the various throws may be off completely.

And that is the fear with polling in Iowa. If the average of the pollsters guesses about the size and characteristics of the likely caucus-goers is about right -- even with all the obvious variation -- then averages or trend lines based on the combined results will get us closer to reality than the individual polls. However, if the consensus "best guess" about the pool of likely caucus goers is way off, than we may be in for a big surprise on January 3.

Q: So, as a close reader of the polls, where do you think the Democratic race in Iowa stands today?

This is a tougher question, but obviously the one that everyone is asking.

The safest thing we can say is that polling in Iowa represents too blunt an instrument to tell is with any precision who would be ahead if the caucuses were held today. This has less to do with the statistical "margin of sampling error," than with both the wide divergence in likely caucus goers and the practical difficulties of modeling the caucus process. But let's look more closely at the results.

12-16 Iowa.png

Our chart for the Democrats, which draws a regression line through the cloud of results, currently shows Obama with an estimated 28.2%, Clinton with 26.7%, Edwards with 22.7% and other candidates running far behind. This result represents the rough consensus of all the polls, drawing on both recent results and the apparent trend over the course of the year. But a look at the range of results for each candidate on the chart, or in the table below, shows considerable variation in the margin between Obama, Clinton and Edwards.

12-17 Iowa Dem chart.png

Three recent polls released since December 1-- by Research2000, Strategic Vision and Newsweek -- show margins in Barack Obama's favor of 9, 8 and 6 percentage points each over Hillary Clinton (though only the Research2000 result is large enough to be statistically significant in its own right, assuming a 95% confidence level). Four other surveys conducted during the same period -- by Diageo/Hotline, RasmussenReports, Mason Dixon/MSNBC/McClatchy and Zogby -- show either an exact tie or Clinton ahead by 2-3 statistically insignificant points.

Does methodology explain the apparent divergence in these results? Perhaps. Newsweek's sample represents a greater than average number of Iowa's adults (24%) than most of the other polls that disclosed their Iowa methodologies. However, both Research2000 and Strategic Vision have failed to disclose comparable details about their methodologies, so we cannot be certain. It is entirely possible that these three sampled a broader slice of the Iowa population than the other pollsters.

If so, these results are generally consistent with what cross-tabulations show within individual surveys: Obama should do better the more the samples include younger voters, first-time caucus goers and independents.

So what do we make of this? The three frontrunner campaigns can all make a reasonable case why polls have been systematically under-representing their true caucus night strength. Obama supporters argue that polls aiming to replicate past turnout are missing their younger, first-time supporters. Clinton and Edwards supporters arguing just the opposite, that polls are including too many younger, independent voters than have voted in past caucuses. The Edwards campaigns also argues that given the 15% threshold requirement to win delegates, its organizational advantage and supposed strength in rural Iowa will add 2-3 points to the actual results as compared to his poll standing.

If I had to guess, I would say there is some truth to all three arguments, and that they may effectively cancel each other out. So even if the pollsters are far apart in their individual "models" of the likely electorate, their collective average may be close to reality, and the overall average suggesting a very close race is probably right.

But it may not be. It is always possible that they are all (or mostly) aiming at the wrong target, a possibility that makes this entire exercise so terrifying to pollsters and so interesting to everyone else.



Shorter Blumenthal: I do not know what the heck is going on or what will happen on caucus night, but, hey, nor does anyone else!



A few months back, when about the only polls coming out of Iowa were the monthly SV and ARG polls and they were so wildly different from each other, I tried an experiment. I took a straight average of all the polls from December '06 and January '07 and used that as my baseline. Then I tabulated just the average monthly gain or loss for each candidate from pollsters who had polled the race more than once and added those to the baseline numbers month-by-month to get a trend.

For example in June, Clinton was up 1% in the ARG poll and 4% in SV's, so I added 2.5% to her total. Edwards was up 4% in ARG's and down 3 in SV's so his averaged gain was 0.5%, and so on. When I put it all together and charted the results, it actually tended to track Pollster.com's regression within small tolerances. Edwards did maybe a point or two better in my analysis but that was about it.

The exercise convinced me that Pollster's approach is as good a way to look at poll data as any other I can think of, even under fairly adverse circumstances when the data set is relatively sparse and the methodologies used in data acquistition by various sources may not be entirely harmonious. I left off keeping my chart up when we started getting enough polls from different sources that I was more confident just averaging the raw numbers should give you as good a result as any.

(I did start another spreadsheet last week just to get a faster-moving average than pollster's for IA, NH and SC -- I have this idea that the time scale for a political campain is logarithmic, not linear. But that's another story.)


