Articles and Analysis


The DailyKos Research 2000 Controversy: 'How Bad for Pollsters?'

Topics: Daily Kos , Disclosure , Jonathan Weissman , Mark Grebner , Markos Moulitsas , MIchael Weissman , Research2000 , Walter Mebane

The allegations of fraud leveled by Daily Kos founder Markos (Kos) Moulitsas and the analysis of Mark Grebner, Michael Weissman and Jonathan Weissman are compelling and troubling. As Doug Rivers wrote here earlier today, they demonstrate that "something is seriously amiss" in the Research 2000 data. All of us that care about polling data need to consider the larger issues raised by their analysis and their allegations.

The most urgent question a lot of non-statisticians have been asking, how damning is the evidence? The short answer is that some of the patterns uncovered by Grebner, Weissman and Weissman have no obvious explanation consistent with what passes for standard survey practice (even given the generous mix of art and science at work in pre-election polling). They demand a more complete explanation.

Of the patterns uncovered by Grebner, et. al., the easiest to describe to non-statisticians -- and for my money the most inexplicable -- involves the strange matching pairs of odd or even numbers. They examined the many cross-tabulations of results among men and among women posted to Daily Kos. If the result for any given answer category among men (such as the percentage favorable) was an even number, the result among women was also an even number. If the result among men was an odd number, the result among women was also an odd number. They found that strange consistency of odd or even numbers in 776 of 778 pairs of results that they examined.

Put simply, there is virtually no possibility that this pattern occurred by chance. Your odds of winning $27 million in the Powerball lottery tonight are vastly greater. Some automated process created the pattern. What that process was, we do not know.

While there are many true statisticians that design samples and analyze survey data, very few do the kind of forensic data analysis that Grebner, Weissman and Weissman have presented. One true expert in this field who is universally respected, is University of Michigan Professor Walter Mebane (Disclosure: Mebane was my independent study advisor at Michigan 25 years ago). I emailed him last night for his reaction.

Mebane says he finds the evidence presented "convincing," though whether the polls are "fradulent" as Kos claims "is unclear...Could be some kind of smoothing algorithm is being used, either smoothing over time or toward some prior distribution."

When I asked about the specific patterns reported by Grebner, et. al., he replied:

None of these imply that no new data informed the numbers reported for each poll, but if there were new data for each poll the data seems to have been combined with some other information---which is not necessarily bad practice depending on the goal of the polling---and then jittered.

In other words, again, the strange patterns in the Research 2000 data suggest they were produced by some sort of weighting or statistical process, though it is unclear exactly what that process was.

As such, I want to echo the statement issued this morning by the National Council on Public Polls calling for "full disclosure of all relevant information" about the Research 2000 polls in question:

"Releasing this information will allow everyone to make a judgment based on the facts," [NCPP President Evans] Witt added. "Failure to release information leaves allegations unanswered and unanswerable."

In the absence of that disclosure, and unless and until the parties have their day in court, it is also important that we give the Grebner, Weissman and Weissman analysis the respect it deserves and subject it to a thorough "peer review" online. It is all too easy to use a blog to lob sensational accusations at suspicious characters, especially when those accusations are grounded in subjects that are "all but impossible for a lay-person to be able to investigate" unless "you have a degree in statistics" (to quote our colleagues at The Hotline earlier today).

The courts have discovery and cross-examination, academic journals have a slow process of anonymous review. Online, we provide such review through reader comments and deeper analysis posted by "peers" that critique work in something much closer to real time. Examples I've seen already include the comments earlier today by Doug Rivers and the blog post by David Shor. Grebner, et. al. have made a compelling case, but it is vital that we kick the tires on their work before leaping to conclusions. Remember, the truly "full disclosure" that a law suit's discovery process will certainly provide may take months or even years to occur.

We will all have more to say on this subject in the days ahead, but for the moment, I want to echo a point Josh Marshall made yesterday. Research 2000 was not the creation of Daily Kos, nor was it the product of a business model built on ignoring the mainstream media and disseminating data over the internet. "They've been around for some time," Marshall wrote yesterday, "and had developed a pretty solid reputation." Their clients included local television stations plus the following daily newspapers (according to the Research 2000 web site): The Bergen Record, The Raleigh News & Observer, The Concord Monitor, The Manchester Journal Inquirer, The New London Day, The Reno-Gazette, The Fort Lauderdale Sun-Sentinel, The Spokesman-Review, and The St. Louis Post-Dispatch

A colleague asked me yesterday about the "upshot of this situation, how bad is it going to be for the [polling] industry?" The answer depends on where the evidence leads us, of course, but the early implications are ominous. The polling industry cannot simply continue on a business-as-usual course. We must push for complete disclosure as a matter of routine and we need to develop better objective standards for what qualifies as a trustworthy poll.

PS: The Atlantic Wire's Max Fisher has a thorough summary of the first wave of online commentary on the DailyKos/Research 2000 controversy. I'd also recommend the short-but-sweet commentary from Washington Post pollster Jon Cohen:

However this dispute turns out, there's a new, blazing light on the rampant confusion about the right ways to judge poll quality. Saving the longer discussion, one thing is clear: to assess quality, one needs to know the facts. At this point, too little is currently known about the Daily Kos/Research 2000 poll to make definitive statements. (Research 2000 has a record of releasing more information than about their polling than some other prolific providers.)



Has this ever happened to anyone? I got a call from the NRA warning that there is a international anti-gun control takeover. If you get a call from them taking a survey, report them for harrasment. I actually listened to the message and it was creepy. The NRA thinks the UN wants a world government that will ban handguns like they do in Great Britain.

It was a live person on the phone, and they played a recording equating this UN conference in NY, and they oddly enough think the US should attack Iran and other rogue nations and take away their guns too. I am not joking.

If I get a call from a republican running for office; no big deal; I usually say who I am voting for and they respect that, but this is over the top.


Michael Weissman:

Rivers and I ended up in boring agreement (see his thread).

The David Shor link you provided was a statistical disaster. Check the end of that thread. He can't tell the difference between p of 0.01 or so and of 0.0000000000000001 or so. Really sad and sloppy. But he's very young, so don't count the guy out after coming up through the minors.


Michael Weissman:

BTW, for the record, the number 778 for how many (three-answer) M-F pairs were reported was a transcription error on my part. The real number was 795, posted correctly on Jonathan's blog. The the number of Fav even-odd matches is then 793. The odds get worse by about another factor of 100,000 or so, which is just another drop in the bucket in this case. Anyway, my bad.


Matt Sheldon:

Posting this here as well...

It is a valid explanation for the M/F gender split pattern.


A regression on the individual gender approvals and overall approval rating suggests that R2K weights it's sample to be 50% female / 50% male.

Regression Statistics
Multiple R 0.999977223
R Square 0.999954447
Adjusted R Square 0.999954334
Standard Error 0.095356729
Observations 813

Intercept -0.004174906
MEN 0.499635141
WOMEN 0.500651122

t Stat
Intercept -0.38877702
MEN 1160.7836
WOMEN 2143.945047


Given that the weighting is 50/50, this will introduce some new properties on the numbers.

If I say that the average of M/F approval MUST equal the overall average approval EXACTLY, then they MUST be both even or both odd.


Overall Approval = 51% Both MUST be the same.

52/50 or 53/49 or 54/48, etc.

Overall Approval = 52% Again, both MUST be the same.

53/51 or 54/50 or 55/49, etc.

The average of M/F should EXACTLY equal the overall approval.

They do. This is a property of the 50/50 weighting.

IF, this is the answer to the riddle, then there is lots of egg on everyone's face.



@ Matt Sheldon. As I mentioned after your other post, the ratio of females to males is closer to 55/45, so there would be no reason to weight them 50/50.


Matt Sheldon:

@Jlichtman: -

As I mentioned in the other post, you are WAY wrong.

I think you should check the Census before lecturing anyone.

Males outnumber females at birth, but females live longer on average.

The result is a 15-64 age population that is EXACTLY 50/50.

The total population is roughly 51 F-49 M.

I am simply reporting the derived weights he used via a regression analysis on the tabs.

You are reporting incorrect demographics.



There's no need to be nasty. In any event, I should not have said population, but rather "registered voters." As I understand it, registered voters split roughly 55/45. I am certainly open to you pointing me towards data indicating the contrary.

(Also, why would anyone exclude 64+ from an election poll?)



As I mentioned on the other thread, I stand corrected on the specific data, the RV percentage seems to be roughly 52/48 (per census.gov). But this would still not make weighting 50/50 appropriate.


Full Disclosure?

What about full disclosure of the impossible weighting adjustements made to the Exit Polls ro match the vote count?

Quite arbitrary on it's face, no?

Are you listening Grebner,Weissman and Weissman?

Would any or all care to respond to the 25 questions I sent to Nate Silver?


Full Disclosure?

What about full disclosure of the impossible weighting adjustements made to the Exit Polls ro match the vote count?

Quite arbitrary on its face, no?

Are you listening Grebner,Weissman and Weissman?

Would any or all care to respond to the 25 questions I sent to Nate Silver?


Gender weighting is not based on population mix (50/50?).

It's based on voter turnout: 54F/46M (check the 2004 National Exit Poll gender weights).


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.