Articles and Analysis


Survey Practice on the "Convergence Mystery"

Topics: Allocating Undecided , Convergence Mystery , Divergent Polls , Survey Practice , Undecided

Survey Practice, AAPOR's online publication, has a new issue out that includes a follow-up to David Moore's provocative piece on what he dubbed the "convergence mystery." Moore observed that the national polls often produced "contradictory estimates and trends" -- and more specifically, a greater variance of results -- that significantly converged over the last few days of the campaign. I posted a follow-up here that showed a similar pattern in state level polling and posed a theory that might explain some of the convergence, and Moore responded with more data and a different theory.   

In their new issue, Survey Practice publishes responses from five survey researchers who deserve the title "expert:" Paul J. Lavrakas, Michael Traugott, Micheline Blum, Cliff Zukin and Don Dresser. David Moore follows up with a new article of his own that begins with this helpful summary:

The two major types of explanations offered by our experts for the convergence of polls right before the election are based on 1) changes in the pollsters’ methodology and 2) changes in the certainty of the vote choice.

The first explanation suggests that in the final weeks of the campaign, many pollsters adjust their likely voter models (mentioned by Lavrakas and Blum) or they increase their sample sizes (Dresser).

Lavrakas argues that the adjustment of the voter models, even when done explicitly to make their outcomes more consistent with other polls, should be seen as a positive action rather than as a “suspicious” activity. However viewed, such last minute changes could account for some of the convergence.

Dresser mentions the tendency of many pollsters to substantially increase their sample sizes for their final pre-election polls, to insure as small a margin of sampling error as they can reasonably afford. He specifically mentions Pew, Harris and ABC/Washington Post.

Blum suggests that the outlier polls during the campaign were either a) less likely to poll in the final three days, or b) more likely to allocate their undecided voters, thus bringing them closer to the mean.

Zukin speculates that the convergence has less to do with pollsters’ methods and more to do with the “phenomenon we are measuring.” Specifically, as voters become more certain about their choices, polls will tend to converge toward each other. That sentiment is also found in the explanations by Lavrakas and Blum.

Moore goes on to offer his own thoughts and asks why polls, "conducted at the same time, using virtually the same wording - [are] supposedly more accurate (reliable) when opinion is more crystallized?" For those intrigued by the "mystery," It is all worth reading in full.

Among all of these efforts to explain, I was most intrigued by Blum's analysis, which was the only one of the five to scrutinize the poll data directly. Among the national polls, at least, much of the variance in mid-October can be explained by a small handful of "outlier" results two-weeks out from the election:

Of the six organizations with outlier polls, three reported margins larger than 11 points, and all three consistently showed larger margins in their polls. In the final three days, however, one of these organizations allocated the undecided, giving more to McCain, one did not release a poll, and one had a margin of exactly11 points on its final poll. Of the three organizations releasing four polls with margins smaller than 5 points, one organization (with 2 “outlier” polls) allocated the undecided in the final three days, one did not release a poll in the final three days, and one “converged.” So, of the 6 organizations, 2 organizations (accounting for 3 “outlier” polls) allocated undecided in the final three days in the direction of the previously underestimated candidate, 2 did not release polls in last 3 days, and only 2 “converged.” If we remove the 7 “outlier” polls from the 29 released in the week of 10/21-27, the variance is reduced to only 2.9 points.

Her conclusion:

Basically then, both of the explanations examined, the allocation of the undecided by seven organizations in the final three days and the absence or favorable allocation of a few “outlier” organizations, appear to be major contributors to the “convergence” seen. Apportioning the undecided in the favorable direction and the absence of previous outliers virtually guarantees less variance and the appearance of “convergence. So, perhaps, rather than convergence, what we saw was that much of the earlier variance was due to a few outliers and that the final three days benefited from their absence or favorable apportionment of their undecided vote.

Blum's observation prompted me to take a closer look at the national polls that allocated the undecided. I find seven projections by six organizations (the two pollsters for the GWU/Battleground poll, Republican Ed Goeas and Democrat Celinda Lake, once again produced competing projections with different allocations). The process of allocating undecided should not, by itself, raise any suspicion. Of the seven allocations, three changed the initial margin not at all or within rounding difference of the averages, two moved the margin farther away from the poll averages as of Monday 11/4, and one moved it closer. However, that one move -- by the IBD/TIPP poll -- was important in helping produce the reduction in variance that Blum describes.



Raghavan Mayur:

A typo in the last para -- should be IBD/TIPP Poll.


Raghavan Mayur:

As per the exit polls, the share of electorate deciding in the last weekend in 2004 was 9%, and in 2008 was 7%. We typically get similar shares (close to 10%) in the IBD/TIPP poll that I run. I don’t question the 10% we see in our polls, because, it’s supported by the exit poll.

We use a Hierarchical Heuristic algorithm developed based on the data. In both 2004 and 2008, the TIPP data shows that using such a method improves the accuracy – in 2004, I allocated the undecided 2 to 1 in Kerry’s favor, which reduced our final Bush margin from 3.3-points to 2.1-points – much closer to the actual 2.4.

In 2008, the algorithm broke my data widening Obama-McCain margin by 2.1 points, brought our result to the exact actual margin at 7.2.

Based upon my data at least at the national level, for 2004 and 2008, here’s the moral –

1. Close to ten percent of the electorate makes decision in the final weekend (Saturday, Sunday, and Monday).
2. They don’t break even. Democrats have a clear advantage among them, at least in the past 3 races.

Anyone, interested in pursuing further understanding of this topic, please feel free to email to me directly if you wish to -- mayur@technometrica.com.


Raghavan Mayur:

Also I want to add that a method that examines the individual respondent characteristics of the undecided voters and allocates them is likely to be much superior than an equal or proportional allocation technique.


Election models consist of recorded data, assumptions (parameters) and calculations. Given that the mathematical logic in the 2008 Election Calculator (EC) is correct, the assumptions should be realistic in order to determine the True Vote. The EC base case assumptions are the best estimates derived from the following data sources: 2008 National Exit Poll and 2004 unadjusted aggregate state exit polls, 2004 and 2008 official recorded vote, voter mortality tables, historical returning voter turnout percentages, Census total votes cast. Due to the margin of error in the data and assumptions, a thorough examination of the effects of changes in the assumptions on the vote share is necessary in order to have confidence in the model.

The EC contains a comprehensive set of sensitivity analysis tables. Each table is a 5x5 matrix of a two-variable range of assumptions. The combinations display the effects on vote share and margin. It is very likely that the True Vote is located in one of the 25 matrix cells; the best estimate is that it is in the central cell. The range of plausible True vote shares is reduced from 25 to 9 by focusing on the combinations that lie within the margin of error.

The Final 2008 National Exit Poll vote shares of returning 2004 and new voters are used as the base case. Since these shares were used to match the recorded vote, it makes perfect sense to use them for the base case. The model calculates the returning voter mix based on plausible, documented assumptions for 2004 uncounted votes, voter mortality and 2004 voter turnout in 2008. Along with the NEP vote shares, the assumptions comprise the base case in the sensitivity analysis.

View the sensitivity analysis of the effects of these variables on Obama's vote for two basic scenarios: the 2004 election was fraud-free (the recorded vote was the True Vote) and 2004 was fraudulent (the unadjusted exit poll aggregate was the True Vote).



Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.