Articles and Analysis


Comment of the Day

Topics: Barack Obama , Gallup , John McCain , NCPP , Sampling Error

From "joejoejoe," regarding yesterday's release of a new national survey from CNN/ORC:

Here's the headline to the CNN story that accompanies the poll.

'CNN poll: Obama, McCain in a statistical dead heat'

I'm not sure why a 5-point lead in a poll with a margin of error of +/- 3.5% is a "statistical dead heat" but whatever. Based on the '04 turnout a 1.5% victory projects to about 1.8 million more votes for Obama then for McCain. Doesn't "dead heat" mean tie?

Yes, it does, as as such "statistical dead heat" is a phrase we wish journalists would avoid.

To be fair, Nate Silver made the same point more emphatically (citing a National Council of Public Polls release) a few hours before joejoejoe. But we appreciate our alert readers nonetheless.

Update: A highly valued reader emails:

Not exactly. Suppose there are no undecideds and Obama leads 53-47. The +/- 3.5% MOE means that the estimate of 53% for the Obama has a 95% confidence interval ranging from 49.5% to 56.5%. 49.5% for Obama means 50.5% for McCain, so a McCain lead is within the margin of error. It's a little more complicated when there are undecideds, but the result would be similar

I probably posted this item too quickly. To me, and to most readers, "dead heat" means "tie." The point I agree with -- and the one made more directly by Nate Silver -- was not to imply that the 5 point margin was outside the margin of error, but rather to object to the use of the phrase "statistical tie" to describe a difference that is not quite large enough to attain statistical significance. It presumes we know the race is "a tie" when we lack the evidence, from this one poll, to be certain that a candidate is ahead.

Caution is always in order when it comes to interpreting small differences on just one poll result, but we have more than one poll to consider. Since May 1, we have logged 37 national poll releases (omitting daily tracking releases based on over-lapping samples). Only one (from Gallup) showed a "tie" result (44% to 44%). The other 36 had Obama ahead by margins of 1 to 15 percent. That's evidence that "tie" is not the best way to describe the current preferences in the race for president.



I though the statistical tie headline referred to Obama's 3-point lead when Nader and Barr are included, which is within the 3.5 percent MOE.



I don't know why is it wise to narrow down the choices to McCain vs. Obama, when we all know that Nader and Barr are officially running for president. Aren't they?



I'm confused by the NCPP release; in fact it seems wrong.

Their release instructs reporters to use margin of error as if it's a Bayesian Credibility Interval. But a survey's margin of error is a frequentist 95% confidence interval. That means the survey's margin of error is right 95% of the time. So for the CNN result, it does not mean McCain has a low probability of being ahead, just that the survey's confidence interval is right 95% of the time.



I believe many political observers frequently choose to ignore surveys (or resort to different methods) that include Barr and Nader because polls this early in the race are notoriously bad predictors for third party candidates. A nice rule of thumb that I hear bandied about is that a third party candidate will take in a general election no more than half of what they poll in June or July. Of course, if they stay strong up until the election then perhaps we'll be singing a different tune.



The headline didn't surprise me at all. The media are in the Republican corner and will continue to be.


Brad Hershbein:

As a graduate student in economics and teacher of econometrics (and statistics), it's somewhat frustrating to see people misinterpret confidence intervals. In the example from the highly valued reader, a 95% confidence interval for Obama is given as 49.5% to 56.5%, with the point estimate in the middle of 53%. This does NOT mean that the true level of support for Obama lies within the interval with 95% confidence, which is how most people tend to interpret it. Rather, it means that *if* the true value is 53%, repeating the poll many times (with the same population universe of respondents) would yield a support level within the confidence interval 95% of the time.

If you want to know what the probability is that McCain is ahead of Obama, given the poll results and accepting that they're unbiased, the math gets complicated. If you assume that McCain actually has 51% support, then the probability of his receiving 47% or less in a poll is only 1.1%. (I used Monte Carlo to get this number; let me know if you want details).



simply put...a 5 point lead IS within a 3.5% margin of error.
A 3.5% margin of error means that McCain is as high as 48.5% and Obama is as low as 46.5%.

Put another way, a 3.5% margin of error allows a spread of up to 7.0% to be within the margin of error.



Brad is confusing hypothesis tests (where the type I error probability is computed conditional upon the parameter value) with a confidence interval. In classical statistics, of course, it is meaningless to discuss "the probability that McCain is ahead, given the poll results." Unless one is a Bayesian, McCain is either ahead or he isn't, and you can't assign a probability to this.

You also don't need to do a Monte Carlo simulation, since if the population proportion supporting McCain is .51, the probability of a SRS of size n having McCain support of .47 or less is (approximately) Phi(-.04/.0175) = .011 where Phi is the standard normal distribution function and the s.e. of .0175 is half the MOE of 3.5%. The relevance of this calculation (about the probability of events that didn't happen) is unclear.

One can, however, adopt a Bayesian perspective and assign, say, a uniform prior to the proportion supporting McCain. In this case, the posterior is approximately normal with mean .47 (the sample mean) and standard deviation of .0175. The probability that McCain is leading in this setup would then be .043.

Of course, the "statistical dead heat" terminology is, as Mark points out, silly. Lack of statistical significance doesn't prove the null hypothesis.


Brad Hershbein:

anonymous is, of course, correct about Monte Carlo being unnecessary, as the limiting distribution is normal. (S)he is also correct that in classical statistics, McCain either is ahead or isn't; I regret a poor choice of words. However, my comment on the interpretation of confidence intervals still stands. Confidence intervals are a measure of an estimator's precision, not its accuracy.

The hypothetical example was meant to indicate that, if one accepts the poll as unbiased, it is "unlikely" that McCain is truly ahead. It was meant to be a quick and dirty classical approach that assumed a counterfactual null.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.