Articles and Analysis



Topics: 2008 , Barack Obama , Gallup , Hillary Clinton , Newsweek , NY Times , Pew Research Center , Sampling Error , Zogby

By now your favorite political blog has probably informed you of David Runciman's essay in the London Review of Books in which he reviews the Obama-Clinton race from a British perspective and includes this broadside against American polling:

Yet if the voting patterns have been so predictable, why have the polls been so volatile? One of the amazing things about the business of American politics is that its polling industry is so primitive. Each primary has been preceded by a few wildly varying polls, some picking up big movement for Clinton, some for Obama, each able to feed the narrative of a contest that could swing decisively at any moment. All of these polls come with warnings about their margins of error (usually +/–4 per cent), but often they have been so far outside their own margins as to make the phrase ridiculous. A day before the California primary in February, the Zogby organisation had Obama ahead by 6 per cent – he ended up losing by 9 per cent. In Ohio, the same firm put Obama ahead by 2 per cent just before the actual vote – this time he lost by 10 per cent. The sampling of national opinion is even worse. Before the Indiana primary, two national polls released at the same time claimed to track the fallout from the appearance of Obama’s former pastor Jeremiah Wright on the political stage. One, for the New York Times, had Obama up by 14 per cent, and enabled the Times to run a story saying that the candidate had been undamaged. The other, for USA Today, had Clinton up by 7 per cent, leading the paper to conclude that Obama was paying a heavy price.

The reason for the differences is not hard to find. American polling organisations tend to rely on relatively small samples (certainly judged by British standards) for their results, often somewhere between 500 and 700 likely voters, compared to the more usual 1000-2000-plus for British national polls. The recent New York Times poll that gave Obama a 12 per cent lead was based on interviews with just 283 people. For a country the size of the United States, this is the equivalent to stopping a few people at random in the street, or throwing darts at a board. Given that American political life is generally so cut-throat, you might think there was room for a polling organisation that sought a competitive advantage by using the sort of sample sizes that produce relatively accurate results. Why on earth does anyone pay for this rubbish?

The polling misfires of the 2008 primary season are certainly a fair target for criticism and debate, but Runciman's diagnosis of the problem is both misleading and flawed.

First, Runciman does not compare "apples to apples," as the British polling blogger Anthony Wells puts it:

American polls normally quote as their sample size the number of likely voters, it is typical to see a poll reported as being amongst 600 “likely voters”, with the number of “unlikely voters” screened out to reach that eventual figures not made clear. In contrast, British polling companies normally quote as their sample size the number of interviews they conducted, regardless of whether those people were filtered out of voting intention questions. So, voting intentions in a UK poll with a quoted sample size of 1000, may actually be based upon 700 or so “likely voters”.

To give a couple of examples, here’s ICM’s latest poll for the Guardian. In the bumpf at the top the sample size is given as 1,008. Scroll down to page 7 though and you’ll find the voting intention figures were based on only 755 people. Here’s Ipsos-MORI’s April poll - the quoted sample size is 1,059, but the number of people involved in calculating their topline voting intention once all the unlikelies have been filtered out was only 582.

Let's also consider a few recent U.S. national polls. This week's Pew Research survey sampled 1,505 adults, 1,242 registered voters and 618 Democratic and Democratic-leaning registered voters. The Gallup Daily tracking survey typically reports on more than 4,000 registered voters and more than 1,200 Democratic and Democratic leaning "voters." Last week's Newsweek survey screened 1,205 registered voters from 1,399 adults, and in the process interviewed 608 "registered Democrats and Democratic leaners." Some pollsters use smaller samples, some bigger, but when it comes to national surveys of general election voters, American surveys are at least as large if not larger than their British counterparts.

Runciman confuses things further by comparing national British surveys to the U.S. polling in low turnout, statewide primary elections. In 2004, 61% of eligible adults voted in the U.S. presidential election, but during the 2008 primary season the typical turnout -- while higher than usual -- typically ranged from 25% to 35% (including both Republican and Democratic primaires). "The challenge for U.S. pollsters," as Wells puts it,

is filtering out all those people who won’t actually take part. Getting lots of people per se can be a bad thing if those people won’t actually vote, the aim is getting the right people. Considering the rather shaky record of most British pollsters in some low turnout elections like by-elections, Scottish elections, the London mayoralty and so on, we really aren’t the experts on that front.

Setting aside Runciman's fallacious "our polls are bigger than yours" theme, the biggest problem with his overall argument is the assumption that larger samples would solve all problems. If only that were true. Runciman notices that actual election returns in the U.S. primaries have often" been so far outside their own margins as to make the phrase ridiculous." That's right. If the poll has a statistical bias (in sampling or in the way it selects likely voters), doubling or tripling the sample size will not solve the problem. Remember: the "margin of error" only covers the random variation that results from drawing a sample rather than trying to call all voters. It tells us nothing about other potential survey errors.

Here is one obvious example. Take a look at the final round of polls before the Democratic primary in Pennsylvania. Which pollster had the largest sample size? The winner on that score, by far, was Public Policy Polling (PPP) with 2,338 interviews of "likely voters" conducted Sunday and Monday before the election. And which pollster had the biggest error? The same pollster, the one with the biggest sample (and this example may be unfair to PPP -- they had better luck elsewhere this year).

Rubbish indeed.

[Typo corrected].



While some aspects of your argument are valid, others are quite ridiculous.

In response to the charge that American polling sample sizes are too small, you bring up a number of polls. Let's analyze some of the math here.

I'm using 60,000,000 as a rough estimate for the population of the United Kingdom and 300,000,000 for the United States.

From the quote by Anthony Wells, you cite two polls from the UK, one with 755 respondents, and another with 582 respondents. 755 respondents equals .00125% of that population, 582 equals .00097%.

In response, you cite several American polls with 1505 and 1205 respondents (and another nebulously defined as "more than 4000"). 1505 respondents equals .0005% of the population, and 1205 equals .0004% of the population. This is less than half of the representation from the smaller of the UK polls, percentage wise. Even if you use the 4000 figure, you get .0013, which is just slightly higher than the high British number.

Shouldn't we be making the case here that something more akin to 4000 counted respondents might be more appropriate for a national poll than 1200, rather than just attacking Mr. Runciman's point of view mercilessly? To me, the point he is trying to make is that since the US is about 6 times larger than the UK, population wise, our sample sizes should be appropriately increased. Is this not a reasonable idea that is worth testing and evaluation?

It might be worth analyzing the correlation of sample sizes as they represent percentages of the population in question to the accuracy of the poll. This would be more informative, surely.


richard pollara:

Mark: I think you are splitting hairs. His reasoning may be wrong but his answer is correct. Democratic Primary polls have been nonpredicitive. I doubt they have any constructive use in an environment where there are unquantifiable variables such as voter turnout and they don't have hooks such as party ID to hang 90% of the vote on. Zogby's miss in California was the most egregious example of "polling gone wild". My recollection was that Zogby called it for Obama by 13 and not the 6 that the Runciman reports. A 23 point error! Yikes! Shouldn't such a dismal performance been a wake up call to everyone that there was something fundamentally wrong with the science? Instead the polls rolled on primary after primary with stunningly bad results. If all this were just a parlor game who would care? But it is not. Candidates are using polls to tout their electability and the media cite them as though they were the holy grail. I will never forget Tom Brokaw coming on MSNBC the afternoon of Super Tuesday and reporting Zogby as though he were reading Hillary Clinton's tombstone. Is polling science or witchcraft? I tend to think the latter....



The size of the population has essentially no effect on the sampling error provided that the population is much larger than the sample size. This is the case for any national poll of the U.S. or Britain. A 600 person poll has a 4% margin of error regardless of whether the population is 300,000 or 300,000,000. Runciman displays some fundamental misunderstandings of the statistics involved here.



I'd also like to point out another factor that limits the accuracy that can reasonably be expected of election polls. Specifically, unlike polls, actual vote totals all end up with "undecided" at 0% (not to be confused with "uncommitted," which is in fact a decision). The proportion of undecided voters in a poll depends on several factors, like how hard respondents are pushed to express a preference, but prior to the election, there is always some portion of the electorate that hasn't yet decided who they will vote for. The fact that many people make up their minds after being polled obviously tends to increase the amount by which polls differ from the final result.


richard pollara:

Alan: I think the undecided question begs the larger issue. Are polls of any value? Polls are supposed to predict an outcome. If they can't, why bother? If 20% of the electorate is undecided and they have no methodology for figuring out how those people will vote what good is the poll? Primary polls seem to have three areas which they can't quantify: turnout, likely voters and late deciders. Make a minor change in any of your assumptions about those three things and the results swing wildly.

My guess is that the future of polling will include the types of statistical modeling that Pablano is doing combined with intra-group surveys. The science that is out there right now doesn't seem to work. Perhaps it is time to try a different approach.



Richard: "Polls are supposed to predict an outcome."

That is incorrect. Polls are supposed to predict a range of outcomes that would obtain if the election were held on the day the question was asked. Hence, the hackneyed but nonetheless true description of a poll as a "snapshot" of the electorate.



Richard: "Polls are supposed to predict an outcome."

along: "Polls are supposed to predict a range of outcomes that would obtain if the election were held on the day the question was asked."

Actually polls are some part of opinion research, nothing more and nothing less. They can try to measure public opinion in the field period . That we use them - abuse them - for prediction purposes is our fault. If we're responsible we use other sources as well, like regression models based on actual voting, like socio-economic models, like prediction markets, Delphi research...

Runciman is wrong, of course, about the sample size. Indeed: a larger sample provides tighter confidence intervals, but most people forget that every sample is idiosyncratic. That's part of the probability of the improbable: it is highly improbable that a sample correctly represents the "real population"; in most cases there is some deviation.

Basically a sample must be small compared to the total population. The meaning of "small" in this context is: it doesn't make a difference whether the sample is drawn with or without replacement.

A sample is large enough when each cell in the multidimensional matrix of groups and questions a poll distinguishes contains so many respondents that random variation only slightly influences the result.

When the sample size is too large, random deviations from the true value can seem significant, as the maximum margin of error (that's what we refer to as the "MoE") drops. Essentially it is 1 divided by the square root of the number of people in the sample. Thus we might be led to believe an idiosyncratic property of the sample is an actual property of the population.

It is less probable that we're misled by sampling error when the sample is huge, but if it happens it seems more significantly right. The same thing happens to biases. Biases of the sample, or the LV selection, or the weighting procedures, are blown up to seemingly significant properties.


richard pollara:

Along: Kodak is in the snapshot business not Gallup, Zogby or ARG. The pollsters are the Soothsayers of our day. They don't provide the breathless lead for the evening news: "The latest Zogby Poll shows...." because they are promoting their polls as a caveat ladden snapshot. The are not only predicting the future, they are wrapping it up in a bundle of scientific mumbo jumbo that would make an alchemist blush. If this election cycle proved anything (and it did so week after week) it is that there is a fundamental flaw in the underlying science of polling. The pollsters have posited that they can turn iron into gold but the proof just isn't there.

I do know one thing: If a whole bunch of polls headlined in the New York Times, "Beware the Ides of March". I'd pick that day to attempt my first parachute jump.



One of the reasons the NY Times polls often show Obama way ahead of McCain stems from the fact, in my opinion, that the NY Times hates John McCain. Let's not remember their article insinuating McCain had sex with a female lobbyist. And their editorial board's and opinion page crusade against the Arizona Senator.

I think there's a good chance the NY Times is making stuff up in order to hurt McCain.


Do the British have regional primaries? Gotta compare apples to apples, folks. A Pennsylvania primary is not a national UK election.


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.