Articles and Analysis


The Insider Advantage Crosstabs

Topics: 2008 , Divergent Polls , Likely Voters , Pollsters , The 2008 Race

For today's puzzle, we have two new polls in Iowa, one from the ABC News/Washington Post partnership and another from the public relations firm InsiderAdvantage. The ABC/Post poll shows both Obama (at 33%) and Clinton (at 29%) significantly ahead of John Edwards (at 20%). The InsiderAdvantage survey -- or at least the result they chose to lead with -- shows that John Edwards (with 30%) has "leapfrogged ahead" of Clinton (26%) and Obama (24%). As our friends at NBC's First Read note, conflicting results like these make it "hard to know what's right or wrong."

Before digging deeper, it is worth highlighting this point from the ABC story:

Applying tighter turnout scenarios can produce anything from a 10-point Obama lead to a 6-point Clinton edge -- evidence of the still-unsettled nature of this contest, two weeks before Iowans gather and caucus. And not only do 33 percent say there's a chance they yet may change their minds, nearly one in five say there's a "good chance" they'll do so.

However, I want to pass along some problematic details on the recent InsiderAdvantage polls. One issue is that InsiderAdvantage sometimes conducts surveys using live interviewers, sometimes using an automated interactive voice response (IVR) method (in which respondents answer by pressing buttons on their touch tone phones) and almost never specifying which method they use in their public releases. In this case, I checked with InsiderAdvantage and they confirm that the latest Iowa surveys were done with the automated IVR method.

The second problem is potentially bigger. InsiderAdvantage typically emails us a few pages of cross-tabulations that we have sometimes posted to the site, but which they rarely post to their own site. We did not receive those crosstabulations for today's survey, perhaps because of the story I am about to share. The site RealClearPolitics has posted a more limited version for the Republican and Democratic results.

Take a look at the Democratic tab, and if you look closely, you'll see the problem: According to the crosstabs, Barack Obama gets 19.6% of the vote from men, 17.8% from women but 24.3% from all voters. Needless to say, that result is impossible, especially since they report 392 interviews conducted among men, 585 interviews among women and 977 overall (and since 392+585=977).**

We had posted the crosstabs for the InsiderAdvantage poll of Republicans in South Carolina earlier this month, but pulled them back when a reader noticed similar inconsistencies (for this posting, we have put the Democratic and Republican crosstabs back up on our server). The story of what happens next should give pause to anyone wondering how much faith to put in their surveys.

I emailed InsiderAdvantage to say that "something seems amiss" in their tabs. Mistakes happen, and I assumed I was simply reporting an error in the cross-tabulations that they would want to correct. Instead, I got some curious replies. I heard first from Matt Towery, the public face of InsiderAdvantage. He referred me to the statistician who weights their data and then offered this explanation:

We have produced many a poll that showed the male female column not seeming to "fit" with the totals. But as [the person who weights the data] will explain, the other weights applied cause the numbers to appear to "disagree" with the male female column. I can only tell you that we've used the same weighting system for going on ten years and it has rarely failed us.

Next, I heard from Gary Reese, an analyst at InsiderAdvantage, who shared his "guess" that "because of gender and age and race weightings, that may make individual cross-tabs read slightly off." The person that weights the data was not available, Reese wrote, but he would check with him and get back to me. The next day, Reese replied with a confirmation:

Was as I wrote yesterday. Multiple weightings of various demographics skew individual weightings that they don't necessarily add up to match the top line.

Now here I have to interject: I too have weighted data for many years, and this explanation is simply wrong. Either the data are weighted consistently (in a process that changes the "weight" given each respondent when the data are tabulated) or they are not. If cross-tabulations are based on weighted data, then the results in subgroups (men, women, etc) should be internally consistent with the total.

They gave me a number for the statistician that weights the data. I called, but heard nothing back, then got caught up in our office move and other more pressing stories. I finally heard back yesterday from Jeff Shusterman, the president of Majority Opinion Research (the company that conducts the InsiderAdvantage surveys) and he confirmed what should have been obvious to Towery and Reese: Only the total column in their crosstabs is weighted. Thus, for reasons that still perplex me, they choose to leave the columns for subgroups unweighted.

Before posting this item, I went back to Towery and Shusterman and asked for an explanation of the purpose of releasing weighted values for all respondents, but unweighted results for subgroups. Here is Shusterman's answer:

The purpose of the InsiderAdvantage/Majority Opinion polls are to provide a snapshot for major media outlets of the race at the time of polling and, as the election day approaches, to accurately predict the outcome of the election for which we have a substantial record of success. This snapshot and eventual prediction are contained in the total column of the cross-tabulations, which is accurately weighted. By contrast, our polls are not conducted to advise campaigns or to provide interesting subtext for academics or bloggers, so we do not weight or place emphasis on the other banner points.

If that's the case, I am not sure I understand why they choose to run "inaccurate" cross-tabulations at all, much less send them to us and to RealClearPolitics. Readers ought to take all of the this "interesting subtext" into account when trying to decide which polls to rely on (and we will save for another day the issue of what weighting up subgroups by factors of three or more does to the reported "margin of error").

Back to the issue of the conflicting results from Iowa. As we have reported, pollsters in Iowa have taken many different approaches to defining likely voters. The ABC News/Washington Post surveys have at least disclosed the demographics of their likely caucus-goers and the methods used to select them. InsiderAdvantage has not. Without more of these details, it is hard to do much more than speculate and pass on the good advice from First Read:

Look at the trends of the pollsters who have surveyed the state for multiple cycles, and be careful of pollsters who haven't polled Iowa before.

**Update: Several commenters are fixated on the footnoted paragraph above but appear to have paid little attention to the rest of this post. So to be clear: The contradictory results are "impossible" only if all of the crosstabs columns were weighted consistently, which they obviously were not. The results are also "impossible" in terms of the reality the data are supposed to represent, and that is the point. If you are ready to weight all Democratic voters to 48% black, then it makes no sense to release results for the same survey by gender where men are 10.9% black and women are 18.4% black.



thank goodness you saw what many of us saw. I was hoping you'd post this asap.

can you also address the "second-choice" issue?

Question 2 in that poll (Dem side) asked:
2. If you are planning to vote for a candidate other than Clinton, Edwards, or Obama, and your candidate fails to receive the required 15% of the overall caucus vote in the first round of voting, and if the remaining choices were Hillary Clinton, John Edwards, Barack Obama, which candidate would you vote for as your second choice? (Likely voters)

(bold emphasis mine)

But the results are not clearly presented. They indicate that the total number of respondents remains 977.

I suppose it's possible that only the 12.1% (119 people) that said they supported another candidate actually answered this question, and those results were added to the previous totals for the 3 front-runners. But it doesn't say that.

So either,
1. All 977 respondents answered question 2, making the results bogus--since in most precincts the top 3 will exceed the 15% minimum threshold, or

2. the subgroup of 119 second-choice respondents must have an astronomical margin of error, making the results meaningless.

Is there a third way that those results actually make sense?



I heard that this company has ties to the Clinton campaign. Is it possible that they manipulated the results to feed into the "Edwards is a threat" narrative that they've been pushing all the sudden? (If Edwards takes first in Iowa, it's much better for them than if Obama takes first.)



It is floating around the blogosphere that Matt Towery is a maxed out Clinton donor.

This certainly adds to the concern about reliability given that HRC would much rather see JE win over Barack Obama in Iowa.

Thank you for your evaluation of the poll.



I do this for a living for a partisan firm (meaning we actually care if our results are right).

As for Q.2, there are only two scenarios that make sense, and you have them both right: Either the question results are a re-calculation of what the new result would be of all voters (997), or everyone (997) was asked the question.

The first option would be a reasonable attempt to try to simulate the caucus process in aggregate, although flawed. The follow-up question would reclassify respondents as supporting the second-choice candidate, and then they would be combined with Clinton, Obama or Edwards supporters from the initial question to produce an overall number with all 997 respondents accounted for--but it needs to be labeled that way if it is.

The second would be a ridiculously lazy bit survey design--apparently you would accept the new number as "correct" and forget about the initial question. This is totally stupid, but at this point it wouldn't surprise me.

I can't think of anything else that would remotely make sense.

From what I've read here on their weighting practices (total shocking), I have zero confidence in anything that comes out of this company.



You seem to have a real thing for this one company. Did you read Bob Novak's column of January 19, 2004? InsiderADvantage used this same process and nailed the Iowa caucus. You need to do a little research yourself before you do damage to a company and get yourself sued. I dare you to post this. By the way, Novak's column ran right before the caucus of '04. " InsiderAdvantage, which previously has polled mainly in the South, says the contest may not actually be that close in Iowa. Calculating second-choice preferences that may be decisive in the complicated caucus system, the poll gives Kerry 33 percent of the actual caucus vote to Dean's 26 percent." Robert Novak Chicago Sun Times January 19,2004/Creators Syndicate. Hey mystery pollster, have you got the guts to apologize or even post this. Bet not.



Do you not understand weighted polls?

You said: >>>Needless to say, that result is impossible, especially since they report 392 interviews conducted among men, 585 interviews among women and 977 overall (and since 392+585=977).

This is Elementary school stuff. How do you not understand how weighted polls work? They take the stats they got and compare them to the actual # of people in each age group and gender group tend to vote.

Obama got 35.3 and 35.2% of the younger vote 18-44 years old, but they were a very small part of the sample probably because younger voters only have cell phones. So what they did was to calculate how many younger voters there are in NH compared to the older voters and weighed that in because he does so well with younger voters and they were such a tiny part of the sample only (34)(88) people out of 977 polled.



There's nothing wrong with trying to include the 2nd choice system in the polls to try to simulate what will happen on caucus night. In fact, it's a good idea.

The problem is that no one at the company seems to understand anything about weighting surveys, which throws open the door to all sorts of questions about how they go about determining likely voters or having a representative sample. It's about confidence.

If they were close in 2004, I would attribute more to dumb luck than anything else.



You said>>>Take a look at the Democratic tab, and if you look closely, you'll see the problem: According to the crosstabs, Barack Obama gets 19.6% of the vote from men, 17.8% from women but 24.3% from all voters. Needless to say, that result is impossible, especially since they report 392 interviews conducted among men, 585 interviews among women and 977 overall (and since 392+585=977).

I am stunned that you missed this Mark?????

Obama got 35.3 and 35.2% in the 18 to 29 and 30 to 44 age groups but those age groups were only 12% of the sample. What it looks like to me is they couldn't get hold of the younger voters (who love Obama) because they all have cell phones so they had to give incredible extra weight to those age groups.

Those 18 to 44 are certainly much bigger than 12% of the voting block and since he got 35% of the voters from 18 to 44 his numbers came out to 24% after the younger vote was weighed in properly.

How could you miss something so obvious?



Here's the problem:

Their "sample" consists of respondents who SELF-IDENTIFY as likely caucus goers. So, if your sample shows that only 10% of likely caucus goers are "one-armed paper hangers" and you scale that up because you think 30% of the caucus goers "should be" one-armed paper hangers, you have just negated your entire mechanism for identifying likely caucus goers.

These weighting issues go a long way towards explaning why we are seeing polls over over the map.

I'm seeing several polls recently that are "predicting", after their weighting, that only 52% of Democratic primary voters will be women. Historically, that number has been more like 60%.



There's no such thing as a likely caucus-goer, as I've been telling people for the last so many months. Even more so for "certain caucus-goer". They are collective figments of imaginations of social scientists who can predict likelihood to VOTE with some (limited) accuracy. The same does not hold true to caucusing. The best reflection is to consider people who did caucus in 2004 AND self-idetify as being likely to do it again. Wishful thinking and good intentions are meaningless.



I think some people posting here are misunderstanding the issue:

It has nothing to do with whether they weighted age up or down, or if they made a determination about what caucus goers should look like. (That's a separate debate.)

The problem is that they are not weighting in a standard way--if you are going to weight a survey, you have to weight ALL the data.

For example, it looks like they made a determine that they are under-representing young people. And many more young people support Obama. Their response to this was to "bump up" the strength of young people in the survey, thus creating more overall support for Obama. That's fine.

However, if that is the case, then all demographics must shift in proportion and all young people must get bumped up at the same time. This means that young men should also be a greater proportion of men in general, thus creating more Obama support among men in general (but as you can see from the MEN cross tab, that is not the case). And young women should be a greater proportion of women in general, thus creating more Obama support among women in general (which is also not the case).

Crosstab data has to be internally consistent or else it's useless.

The point is that if the topline result showing Obama with 24.3 percent of the vote is "correct" by their standards (with youth weighted up), then Obama's 19.6 percent among men is incorrect, because how could it be with young men being such a large proportion of voters as to give Obama 24.3 percent overall? And that goes for every demographic in the crosstabs.

Releasing numbers that are not self-consistent is a completely foreign way to operate for reputable pollsters.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.