The 2006 Race


Presidential Job Approval, The Age Gap and the 2010 Midterm Elections

As many others have noted, the President's job approval scores are stronger among the young (18-29) than they are among older voters.

For a quick visual synopsis of this, Gallup has a nice breakdown here.

Looking just at the Nov. 30-Dec. 6 time series, you can see that the President has a 13 point gap between approval from younger voters (59% among 18-29 year olds) and older voters (46% among 65+). This gap appears to have been even more pronounced in prior weeks.

Although the gap itself is interesting, the real issue is how this gap impacts the 2010 midterm elections.


Because midterm electorates tend to be older than Presidential year electorates. How much older? If exit polls from 2006 and 2008 are a guide, then the 2010 electorate will shift by about 10% toward the 45+ age group.

Here it is worth taking a look at the exit polls from 2006 and 2008.

As you can see, in 2008 (Presidential year) 53% of the electorate was 45 or older. But, in 2006 (Midterm) 63% was 45 and older. Call it the Midterm Maturity Shift.

The problem for Democratic congressional candidates is obvious. Given lower Presidential approval scores among older voters, this midterm maturity shift could shave a few points off their base voter support.

With this in mind, I would expect Democratic campaign managers to do several things.

First, they will pump up the GOTV efforts among voters under 30. This is an obvious strategy, but it has been notoriously difficult to turn out younger voters in non-Presidential elections. The old joke has been "What do you call a campaign that needs to turn out the youth voter? Answer: A loser."

The other strategy for Democrats is to work on the senior vote. Traditionally this has been done via mountains of early direct mail. Many an incumbent has been saved by senior mail. But the task may be more difficult this year given President Obama's sagging approval numbers among older voters. I suspect that the most effective strategy for Democrats among older voters will be to disqualify their Republican opponent on seniors issues by digging up past statements about social security, etc. The oppo research guys will be going full tilt.

On the Republican side I expect campaign managers to do everything they can to exacerbate this job approval age gap. Viewed through the lens of 2010, this goes a long way in explaining the Senate Republicans' legislative strategy in the past few days.

The Vanishing Young Republicans

Yesterday's departure of Sen. Arlen Specter from the Republican Party re-opened the debate over the ideological direction of the Republican Party. Did the GOP move away from Specter, or was it Specter that left the GOP? Where do the American people fall?

My focus on this site over the last few weeks has been on young voters. And most of the news I have had for the Republican Party has been bad news, presenting a picture of a young cohort less convinced of the virtues of limited government, more supportive of gay marriage, and more inclusive of minority groups less prone to voting Republican.

In all of this, the overall ideological makeup of young voters has not yet been examined. Are young voters more liberal than older voters? Are they more likely to identify as Democrats? Recently on The View, Meghan McCain declared that 81% of young voters identified as Democrats. Though I appreciate Ms. McCain's efforts to draw attention to the GOP's troubles with young voters, the number is greatly exaggerated (and I would argue that exaggerating the problem does the cause no favors).

Pollster Piece Figures.003.png

But the actual numbers are not much more pleasant for the GOP. According to the EMR exit polls at the presidential level, in 2008, 45% of voters 18-29 identified as Democrats while only 27% identified as Republicans. The gap between Democratic and Republican identification has not been so wide since 1976 when only 19% of voters 18-29 identified as Republican. Yet in 1976, young voters did not flee the GOP for the Democratic party. The above figure shows that voters left the Republican Party and became independents that year; Democrats actually saw a 7 point dip among 18-29 year olds in 1976 as well.

The 2008 shift is most concerning for the Republican Party in two ways. First, it shows the highest proportion of young voters identifying as Democrats since 1972. Second, it shows the largest gap between 18-29 year old party ID and overall party ID in that same time frame. Consider 1976, when the post-Watergate voters abandoned the GOP. In that year, Democrats enjoyed a 16 point advantage over Republicans overall. The gap among 18-29 year olds was 21 points - large to be sure, but not so different from voters overall.

Yet in 2008, there was a more marked difference between young voters and the overall electorate. While Democrats held a 7 point advantage over Republicans in terms of party identification overall, that advantage jumps to 18 points among voters 18-29.

However, in terms of ideology, while young voters are quite different from voters overall, the major change did not occur this year or even this decade. In 2008, "Liberal" made a one point gain among young voters, "conservative" a one point loss. The change in young voters didn't look terribly different from the change (or lack thereof) overall, a surprising finding given the major shift in partisan identification.

Pollster Piece Figures.004.png

What is interesting is to take a look at 1992, when liberal overtook conservative among young voters. Conservatism took a five point hit that year, but took an 8 point decrease among young voters. Meanwhile, "liberal" picked up three points overall, but picked up seven points among young voters. Ever since 1992 re-calibrated the ideological makeup of the young electorate, the "liberal" label has outpaced "conservative".

Even odder, take a look back at the first chart of party identification. In 1992, the year the young electorate began identifying "liberal" more often than "conservative", the partisan makeup of young voters was actually more Republican than voters overall. So is ideology simply not as linked to partisan behavior? Or did the ideological shift in the early 1990's simply wait to manifest itself in 2008 as a party identification shift due to a different ideological alignment of the parties themselves? The Republican Party in the 1990's and early 2000s was able to attract young voters despite the fact that young voters were more likely to be liberal than conservative. Even as recently as 2004, Democrats only had a 2 point advantage among young voters.

Between 2004 and 2008, young voters' more liberal ideology started to match up with their partisan identification. A center-left young electorate (emphasis on center) was no longer evenly divided between the parties. As for reasons why, there are countless theories that have been offered to explain the shift. Some say young voters felt out of touch with a GOP that had nominated an older candidate (indeed, look at 1996 when the Republican Party ran the older Bob Dole against Bill Clinton). Some say the Republican Party moved to the right and became an unacceptable option for young center or center-left voters. Some may point to Obama himself as a large driver of young voters affiliating with the Democratic Party.

In order to evaluate the claim that young voters left the Republican Party because of the allure of the Obama candidacy, it is helpful to look at the 2006 election and a handful of midterms preceding it. If the Obama candidacy itself was driving young voters to become Democrats, we would expect to see young voter party identification that was similar to overall party identification, or at least we would expect to see behavior that makes sense in the context of the previous election or two. Yet while in 1998 and 2002 there were roughly equivalent numbers of young Republicans and young Democrats showing up at the polls, in 2006 there was a massive shift toward the Democrats ending in a twelve point Democrat advantage in party identification [in the electorate overall, that advantage wound up being two points, a far smaller gap].

Pollster Piece Figures.005.png

As it turns out, young voters began abandoning the Republican Party long before Barack Obama was even a serious contender for the presidency. Those pinning the Republican Party's poor fortunes among young voters on the Obama candidacy miss the source of the problem and certainly underestimate its severity.

I've been troubled in recent months when discussing the issue of young voters with some fellow Republicans. There seems to be a sort of conventional wisdom that we should expect young voters to trend liberal and Democratic, that the behavior of young voters in 2008 is not serious cause for concern. This stems from a belief in partisanship as a life-cycle factor, that voters start liberal and Democratic and wind up older, conservative, and Republican. But the data paint a very different picture. Take the graph of partisan identification for instance; over the last few decades, young voters have not identified with the Democratic party in substantially higher numbers than voters overall. Even conservatism had its moment among young voters in the 1980's. Yet with the end of the Reagan presidency, young voters shifted toward liberalism. This ideological shift did not play out into actual partisan identification in a meaningful way until 2006 and 2008.

Another bit of conventional wisdom I hear from my fellow Republicans about the youth vote is that they need to vote Democratic twice before they are "locked in for life", supporting the notion that there is still time to turn the tide among this generation. Unfortunately, given that the shift began in 2006 and not 2008, for many voters the GOP may simply be too late. For the rest, if the Republican Party does not take immediate action to repair its brand, this generation may exhibit similarly low levels of Republican identification for years to come.

The Demographics of the North Carolina Polls

Time for another round-up of available poll demographics, this time from North Carolina. The most important variable in this state is the African American percentage of likely Democratic primary voters. The most recent polls -- at least among those that have disclosed their demographics -- have converged around a black percentage of 32-33%. Needless to say, given the near monolithic support that African Americans have given Barack Obama, that percentage will ultimately be critical to his share of the vote on Tuesday.

The following table shows demographic composition statistics for those pollsters that have released them. Click on the table to display a larger version that also includes the vote preference results for reach poll.


The table excludes the pollsters that have, as of yet, not publicly released demographic information for their North Carolina surveys: Mason-Dixon, Rasmussen Reports, and LA Times/Bloomberg (special thanks to readers Paul and jac13 for sharing the demographic profile data that Zogby shares with paid subscribers).

As in previous states, we see considerable variation in the kinds of voters selected as "likely primary voters." Easily the most variant likely voter sample on the list is the one from the Civitas Institute from early April, with a composition of just 28% African American and 17% under the age of 45. However, even if we set that survey aside, we still see considerable variation: from 51% to 58% female, from 39% to 55% age 18-to-44 and from 25% to 37% African American (and those last extremes come from a single pollster -- more below).

A quick review from my post on the demographics of the Pennsylvania surveys:

It is important to remember that pollsters come to these composition statistics through different paths. Some interview samples of adults, weight those demographically to match census estimates of Pennsylvania's adults, then select "likely voters" and let their demographics fall where they may. Others will weight their "likely voter" samples directly to pre-determined demographic targets. Some pollsters will not set weights or quotas for demographics, but will set such weights or quotas for geographic regions (based on past turnout and their assumptions about what might be different this time).

With that in mind, note two very striking changes from two pollsters that set pre-determined demographic targets, Public Policy Polling (PPP) and InsiderAdvantage:

  • The first three surveys released in April by PPP had an African American composition of 36% or 37%. Their most recent survey, fielded last Sunday and Monday evenings, had a black composition of just 33%.
  • The gyrations in the weighting by InsiderAdvantage are even more dizzying. Their first North Carolina survey in late March was 37% African American. Their next two surveys in April were only 25% African American, and their most recent poll last week bumped the black percentage back up to 33%. Notice that none of their percentages for women, 18-29-year-olds, 18-44-year-olds or those 65+ changed by a single digit, despite a 12-point variance in the black percentage.

Both pollsters put out written summaries of their results, but neither made any reference (that I could find) explaining or justifying their changing assumptions about the racial composition of the North Carolina electorate. [Update: On their final poll, PPP upped the black share to 35%, but explained their rationale]. By the way, we know that these two pollsters set predetermined demographic targets, because both have confirmed as much to me in previous communications (here for InsiderAdvantage and here for PPP).

The change in the PPP poll is important -- they should have noted it -- but relatively modest compared to the astonishingly large, significant and unexplained shifts in the African American composition in the InsiderAdvantage polls. InsiderAdvantage's Matt Towery likes to brag of his "significant experience" as a pollster, but after a number of curious episodes over the last few months, it is getting very hard to take those claims seriously.

It's also worth pointing out the relative stability in the racial composition of the SurveyUSA results, given that they do not force their samples to a pre-determined demographic profile (details on their procedures here). The percentage of African Americans in their four surveys since March have remained relatively stable, falling within the range of 30% to 33%.

Finally, one caution about the percentage reported as "unaffiliated" (having no party affiliation). Only PPP includes the full text of their party question, and it is possible other pollsters are asking about party identification (whether respondents "consider themselves" as partisans) rather than party registration.

Update: Almost forgot. Fivethirtyeight's Poblano posted a handy spreadsheet that can help you see just how much small changes in the racial composition of the North Carolina electorate can affect the potential margin between the candidates. It's well worth the click.

Update II: In posting this last night, I neglected to point out that North Carolina has been releasing reports on the demographics of early voters. As North Carolina is one of nine southern states still required by the Voting Rights Act of 1965 to track voter registration by race, racial tallies among early voters are also available. The demographic composition of early voters have been analyzed by Brian Schaffner, DailyKos diarist dean4ever and noted in comments by many of our readers over the weekend.

Overnight, GMU Professor Michael McDonald, whose academic focus is voter turnout, posted the following comment:

North Carolina is an exceptional state in that it provides near real-time updates of its voter registration file. Indeed, you can download the entire file of absentee and early in-person voters directly from the state's ftp site.

North Carolina is also an interesting state because race and gender are recorded on the voter file (birthdate appears to be supressed in the absentee file). When I crunch the numbers, out of the 397,850 persons who are listed as returning a Democratic Party ballot as of 5/03:

39.9% are African American
60.8% are women

Note, a small percentage (less than 1%) of records have missing data.

Will these percentages hold for Tuesday? That is hard to say, mostly because people who study early voting (myself included) don't know much about the characteristics of early primary voters. Added is the confounding factor that one-stop registration and voting is permitted for in-person early voters only and not for Election Day voters. Providing little further clues, African-Americans are only slightly more likely to vote early in-person, 40.6%, and women slightly less, 60.7%.

The fact that nearly 400,000 early votes have been cast so far is remarkable given past primary turnout in North Carolina. The state held a caucus in 2004 (due to a redistricting battle that delayed the primary), but 544,922 Democrats voted in the largely uncontested primary in May 2000, and 691,875 voted in May 1992 (statistics I gathered for a column noting that pollster PPP has been sampling from a total universe of 874,222). The record was 961,000 in 1984, according to the Charlotte Observer, which cites "long time N.C. political observers" guessing that "as many as 1.5 million" may vote this year. So this early vote will be a significant portion of the total votes cast, but as McDonald points out, no one knows exactly how big.

It is also worth pointing out that the Obama campaign has made early voting drives a focus of their field organizing, so it is certainly possible that the ranks of early voters are disproportionately swollen with Obama voters. Last week's poll from SurveyUSA showed Obama leading by a 18 points (57% to 39%) among early voters, but that subgroup was just 2% of their total sample. Thus, one key result to watch in the final poll releases today -- among those far sighted enough to track and report it -- will be the size and preference of the early voters.

Vacation Effect?

I took my laptop with me on vacation, and in catching up on the news, I came across one of those impossible-not-to-blog items in NBC's First Read:

NBC/WSJ pollster Peter Hart (D) tells First Read that the revision of the primary calendar -- moving Iowa forward to the first few days in January -- is really the most important political event that has happened in the past few months...Perhaps most significant of all is that no one will know who's up and who's down right before Iowa. No self-respecting polling company, he says, does polling between the 20th and 25th of December. So we very well might have no idea how Iowa will break until after the results are in.

For what it's worth, I took a close look at whether polls suffer from a "pre-Christmas effect" back in December 2005, and found less evidence for it than I had expected.

One thing seems clear though: "Self-respecting" or not, some pollsters will certainly try to measure the ups and downs the Iowa Caucus campaign, and we will be there to track and explain. So much for another vacation in late December!

And speaking of vacation...In the rush to get out of town last week, I neglected to post a note that I would be taking some much needed R&R. Sorry about that. I'll be back next week. See you Monday.

Michael McDonald on the CPS 2006 Turnout Data

George Mason University Professor Michael McDonald, whose voter turnout web site is one of the most useful election data resources on the web, sends along this note:

The 2006 Current Population Survey (CPS) Voting and Registration Supplement, a primary source of data for many voting studies because of its large state sample sizes, is now available for download. To access these data, use the Census Bureau's Data Ferret program.

Preliminary analysis

The CPS reports that 47.8% (+/- 0.4%, remember, the CPS sample size of over 100,000 is very large and that margin of error varies with sub-sample sizes) of the citizen voting-age population reported voting, which compares to my most recent turnout rate estimate of 41.3%. The higher CPS turnout rate is consistent with a well-known phenomenon known as "over-report" bias, where more people report voting than aggregate statistics indicate. For comparison purposes, 46.1% of the 2002 CPS citizen voting-age population reported voting while my turnout rate estimate is 40.5%.

The overall percentage of the electorate reporting voting before Election Day is 18.5%, down slightly from 20.0% in 2004 and up from 14.2% in 2002. California and Washington saw an increases in early voting to 33.2% in 2006 from 29.9% in 2004 (CA) and to 71.8% in 2006 from 60.6% in 2004 (WA), however, increases were reported in only 15 states. This may reflect a tendency of early voting to drop off in midterm elections, so I would caution that 2006 is probably not indicative of a new downward trend in early voting which has increased strongly in every election since 1998 from 11.2%. (Of course, these are self-reported rates, not actual election statistics such as those collected by The Early Voting Center.) If these trends persist, it may very well be true that more Californians will have voted early before the 2008 New Hampshire primary than all New Hampshire voters.

Turnout by demographic categories show that higher turnout in 2006 versus 2002 likely came from younger, white, moderately educated citizens (slightly more women, too). Perhaps most interesting is the lower turnout among non-Hispanic African-Americans, which indicates that Democrats likely won in 2006 by expanding their base rather than relying on their core constituencies, though we can't know for certain from these data because the CPS does not ask who people voted for.

One other interesting tidbit is found in Tennessee where Harold Ford ran in a closely contested U.S. Senate race. If the CPS is correct, non-Hispanic African-American turnout rates went down in Tennessee a non-statistically significant amount between 2002 and 2006, from 41.1% to 38.9% (+/- 7.6%).


Party ID in States Shifted in 2006


Democrats gained an average of 3.4% and Republicans lost 3.0% in partisan identification between 2005 and 2006, according to a new Gallup estimate based on over 30,000 interviews conducted throughout 2006. Gallup aggregated polls throughout the year to create estimates of party identification at the state level, as they did in 2005 and previous years. Gallup's report of the results is here.

The plot above shows how uniform the shift in Democratic and Republican partisanship was across the states. The colors of the points reflect the Democratic minus Republican balance in 2005-- the darker the blue the more net-Democratic identifiers and the darker the red the more net-Republican identifiers. Light or pale points are closely balanced states, as of 2005. The size of the points is proportional to the size of the state.

There is no apparent pattern to the shifts in partisanship: regardless of partisanship or partisan balance in 2005, states shifted by about the same amount in 2006. Likewise, Republican losses shifted uniformly. The +3.4 percentage point shift for the Democrats, and the -3.0 point shift against the Republicans produced a net -6.4 point loss for the Republicans in the balance of partisanship. The lower left figure in the plot shows that independents shifted more or less randomly between 2005 and 2006.

These shifts could, in principle, represent a non-trivial gain for the Democratic party. Recall that after the 2004 election there was considerable talk in Republican circles of establishing an "enduring Republican majority", a goal that seemed within the party's grasp though certainly not assured. That hope is clearly out of reach at the moment.

Before Democrats go wild with joy, there remains a question about the electoral impact of these partisan shifts. Party identification is the strongest single predictor of vote choice at the individual level. But the shifts in partisanship in the Gallup data do not predict the shift in voting for the U.S. House in 2006.

The bottom right figure above shows that the Republican U.S. House vote shifted more or less uniformly across states as well. However, when we look at the relationship between party id shift and vote shift across states, there is no relationship at all, as seen in the figure below.


Controlling for both Democratic gains and Republican losses doesn't add to the relationship. So the conclusion here is that both partisanship and vote shifted against the Republican party in 2006, but the variation in shifts appears to have been essentially independent between partisanship and vote.

Democratic gains and Republican losses in partisanship may affect the 2008 prospects in the House (and other) elections. But in 2006, it appears both vote and partisanship responded to conditions in the country without a clear impact of changes in partisanship on changes in vote.

The Speaker and the Votes that Elected Her

Today I have two possibly overlooked items to report, both related to the new Democratic House and both discovered via MyDD:

  • First, the latest CNN/ORC poll released last week included a job rating of new Speaker of the House Nancy Pelosi. As MyDD's Jonathan Singer notes, Pelosi's initial rating (51% approve, 22%) not only tops the current job rating of President George Bush, but it also exceeds the high mark for Republican Speaker Newt Gingrich (39% approve, 35% disapprove). He hit that mark on an identically worded question asked on a Gallup poll conducted in January 1995, just after the Republicans took control of the House.

On a differently worded question, Pelosi received a 44% favorable rating from the USA Today Gallup Poll earlier this month, which is 13 points higher than all-time-high for her predecessor, former Republican Speaker Dennis Hastert.

  • And speaking of the House of Representatives, MyDD diarist Adam T has gathered and tabulated all of the official and final vote count totals for all of the races for the U.S. House in 2006. In the 430 districts where votes were cast, Democratic candidates received 52.8% of the vote, Republicans received 44.9% and other candidates received 2.3%. As Adam notes, the totals do not include five districts in Florida where Democratic candidates faced no opposition - Florida does not put unopposed candidates on the ballot.

The 7.9 point margin favoring the Democrats is about a point higher than the margin I estimated in a two-part series of posts I did in November that looked at the performance of national surveys in estimating the national "generic" House vote.

The 2006 Exit Polls: How Did They Perform?

Today's Guest Pollster's Corner contribution comes from Mark Lindeman, assistant professor of Political Studies at Bard College.

In the wake of allegations that the 2004 U.S. presidential exit polls pointed to a stolen election, many observers wondered how the 2006 exit polls would turn out. One widespread rumor asserted that no exit poll results whatsoever would be made public until after the polls had been forced to match the official vote counts. But in fact, CNN.com once again posted a preliminary national House tabulation a bit after 7 PM Eastern, and posted tabulations in state races soon after the polls closed in each state. (Other outlets may have done so as well: at the time I had my hands full with just one.) These tabulations appear to show discrepancies fairly similar to the 2004 discrepancies, as I report below.

Using tabulations to estimate exit poll "red shifts"

The tabulations are not intended to project the final vote counts. Rather, they offer crude but useful insights into why voters voted as they did. Nonetheless, each tabulation is based on a particular vote estimate made at a particular time. The exit pollsters use different estimates for different purposes. Before vote counts begin to arrive, the pollsters can refer to at least three estimates. These are (as described in the post-election evaluation report on the 2004 exit polls):

  • The "best survey" or "Best Geo" estimate -- based on interview data (from exit polls and, in some states, telephone surveys of early and absentee voters), and also incorporating data on past results from the exit poll precincts
  • The "prior" estimate -- based primarily on public pre-election surveys (something like the averages posted on Pollster.com)
  • The "composite" estimate -- a hybrid which combines the interview data (Best Geo estimate) and pre-election polls (prior estimate).

The initial tabulations posted by CNN.com are based on the composite estimate -- not just on interview data. Therefore, they probably tend to understate the disparity between the exit poll results and the vote counts. For instance, the initial 2004 "screen shot" of Pennsylvania indicates that Kerry had about 54% of the vote, and the evaluation report confirms that the composite estimate was 54.2% [p. 22]. But the report also reveals that the interview-only Best Geo estimate gave Kerry almost 57% of the vote, or a 13.8-point margin [p. 22]. The official result -- Kerry won by 2.3 points -- constitutes an 11.5-point "red shift," or reduction in Kerry's net margin, compared to the Best Geo estimate. Because pre-election polls showed a very tight race in Pennsylvania, the composite estimate gave Kerry "only" an 8.5-point margin, or 6.2-point red shift. Overall in 2004, the average discrepancy was a 5.0-point red shift in the Best Geo estimate, but "only" a 3.6-point red shift in the composite estimate.

(Once vote counts start to arrive, the pollsters continually generate a variety of estimates that incorporate vote count data at both precinct and county levels. These dynamic estimates are used to inform the decisions to "call" -- or not to call -- each race. Intermittently the pollsters also generate new tabulations based on a current vote estimate. Updating the tabulations has been described by critics as replacing "pristine" exit poll results with "soiled" ones. [*] Actually, if "pristine" means "based on interviews only," none of the tabulations is pristine.)

To "estimate the estimates" from the early tabulations, I use each table in a tabulation to figure approximate party or candidate shares, then take the median of the differentials across all the tables. For instance, take this snippet of the preliminary national House poll:


We can use these percentages to estimate that 49% * 53% (or about 26%) of voters were men who voted for Democrats, and 51% * 57% (or about 29%) were women who voted for Democrats. So, based on this table, apparently Democrats got about 26% + 29% = 55% of the vote. Applying the same logic, apparently Republicans got about (49% * 45%) + (51% * 42%) = 43.5% of the vote, for about an 11.5% Democratic margin. However, other tables imply somewhat larger or smaller margins, due to the influence of rounding error. Using a median of estimates from all the tables reduces this rounding error, and a computer program interpreting the HTML tables can do the calculations almost instantly. (Because of mistakes I made on election night, I have cruder approximations for two uncompetitive Senate races -- Minnesota and Utah -- and no data for the gubernatorial races in Illinois and Tennessee.)

What I found

Overall, I estimate that the initial national House tabulation gave Democrats an 11.3% margin in total vote. The final tabulation currently available, weighted to the pollsters' vote estimate at that time, gives Democrats a 7.6% margin, so these figures imply a 3.7-point "red shift" -- close to the 3.6-point average in the 2004 presidential composite estimates. If the final official margin is closer to 7 points, as Mark Blumenthal has estimated, the red shift may be above 4 points. [*] However, the vote proportions are influenced at the margins by uncontested races, which appear on the ballot in some states and not in others. Without knowing exactly how NEP handles these uncontested races (nor whether voters accurately report their votes and non-votes in these races), it is unclear what vote totals we should compare to the exit poll estimates.

(Note also that the House tabulation is not quite like the state-level tabulations I discuss next. The state-level tabs, posted as the polls closed in each state, should incorporate all the interview data. The House tabulation, posted long before the polls closed in many states, relies on partial data from much of the country. I have no reason to think that the complete results would be much different.)

State-level races yield broadly similar red shift estimates. In the Senate races, I estimate that the average red shift was 2.3 points, and the median red shift was 2.8 points. In races for governor, I estimate that both the average and the median red shift was 4.0 points.


As in 2004, most of the largest exit poll discrepancies were in uncompetitive races. Some observers have cited (here , here, and here) the red shifts in the Virginia and Montana Senate races as pointing to vote miscount favoring the Republican incumbents -- who nonetheless lost both races. But since those two races have near-average red shift, there is little reason to single them out. Perhaps the most striking discrepancy is in the Minnesota governor's race. Democratic challenger Mike Hatch appeared to have an 8-9% lead in the initial exit poll tabulation, but lost to incumbent Tim Pawlenty by about 1%. The pollster.com 5-poll average gave Hatch a narrow 2.6-point margin, so the election result was closer to expectations than the exit poll result was. Minnesota also experienced one of the largest "red shifts" in 2004.

The House red shift has also been cited as evidence of vote miscount, most elaborately in a paper issued by the Election Defense Alliance (EDA). The paper argues that respondents' reports of their presidential votes in 2004 can be used as an "objective yardstick" to evaluate the 2006 poll. In the final House tabulation, (self-reported) Bush voters outnumber Kerry voters by 6 percentage points, more than double Bush's popular vote margin. EDA's analysts reason that the exit pollsters in effect had to invent millions of Bush voters (and/or delete Kerry voters) in order to match the House vote counts -- which, therefore, must be wrong. The basic flaw in this argument is that reported past vote is not an objective yardstick. On the contrary, as I have noted elsewhere, exit polls and other polls often -- even usually -- overstate past winners' vote shares. Worse, because the authors believe that Kerry won the popular vote and that Democrats had higher turnout in 2006, they end up conjecturing in a footnote that Democrats actually won the House vote by 23(!!) percentage points, a double-digit deviation from the initial tabulation. So much for defending the reliability of exit polls!


After the events of election night 2004, the NEP pollsters (Edison Media Research and Mitofsky International) announced efforts to reduce exit poll bias. Among other things, Edison/Mitofsky planned to improve interviewer training in order to minimize any selection bias on the part of interviewers. (See, for instance, Joe Lenski's interview with Mark B.) Despite the strong evidence of red shift in the 2006 data, we cannot conclude that these efforts were ineffectual. Participation bias easily could have been larger than ever in 2006. As Mark Blumenthal has noted, a Fox News/Opinion Dynamics pre-election poll found that 44% of Democrats, versus only 35% of Republicans, said that they would be "very likely" to participate in an exit poll. Differences in levels of concern about electronic voting and election fraud may (or may not) contribute to that disparity. In any case, no methodological refinement can force Democratic and Republican voters to participate at equal rates.

Interestingly, the pilot "Election Verification Exit Polls" (EVEP) conducted by Steve Freeman and Ken Warren reported similar or larger red shifts. Freeman's initial report indicates red shifts ranging from 5 to 8 percentage points in four distinct races (two House races, Senate, and governor) in the 28 Pennsylvania precincts surveyed. Freeman argues that the survey "eliminated most of the potential sources of error" (7), presumably through careful training of the interviewers. However, Freeman also reports that several interviewers who obtained relatively low completion rates "subjectively felt that Republicans were disproportionately avoiding participation" (8).

A first glance at the EVEP data files shows at least one case of large red shift where the exit poll result seems implausible. In this precinct, the exit poll registered a 63% majority for House Democratic challenger Lois Murphy (PA-06), while the official returns gave her just 44% of the vote. Registration statistics for the precinct (Chester County precinct 021, East Bradford North 2) show that registered Republicans outnumber Democrats by more than 2 to 1 (57% to 26%). If we concede Freeman's premise that the EVEP methodology was close to ideal, this result hardly inspires confidence in exit polls' inherent accuracy.

Amy Simon: Random Digits or Lists

Today's Guest Pollster's Corner contribution comes from Amy Simon, a partner at Goodwin Simon Victoria Research.

News media and academics hold up Random Digit Dialing (RDD) sampling methodology as the gold standard for survey samples for elections. Meanwhile, many top notch political pollsters have been serving their clients well for years by instead using samples selected from the official list of registered voters (the statewide voter file), often called Registration-Based Sampling (RBS).

RDD samples are created when a computer randomly generates the last four digits of a phone number. The advantage of RDD is that everyone with a working landline phone is included in the sample - it doesn't matter if your phone service was just turned on that morning or if your number is unlisted, since the sample isn't generated from a list of actual phone numbers. An obvious disadvantage is that an RDD sample also includes business numbers, fax numbers, disconnected numbers, and even numbers that have never been connected - so the costs of administering an RDD sample are higher since the built-in inefficiencies bring down your contact rate.

An RBS sample draws a sample from a list of registered voters. The obvious advantage of using voter files for survey samples - one that has been noted for years - is that voter file studies are cheaper to administer than RDD studies. RDD surveys have to churn through not only bad numbers but also have to bear the cost of screening out the large portion of adults who are not registered voters, in order to find their real interview targets: respondents who self report as registered voters and who, after applying their own likely voter models, the pollsters define post-interview as likely voters.

With RBS surveys, when you do reach an actual person on the phone, you already know -- since you ask for them by name - that you have a real live actual registered voter on the line and therefore have a better production rate. (The cost difference between the two methods is even more significant in a primary or other low-turnout election scenario, but the debate about using RDD versus RBS samples in low, medium, and high turnout elections is another topic requiring its own separate discussion.) In states that have high quality voter history showing which registrants have actually voted in different types of elections, pollsters can use a likely voter screen to draw the sample in the first place, further ensuring that they are interviewing people most likely to vote in the kind of election they are attempting to measure.

Yet the news media and academics engaged in polling question whether RBS studies can be as accurate as RDD studies, since no voter registration list is 100% up to date, nor does any voter file include 100% of the phone numbers of voters. In fact, the phone match rate for a voter registration list is not only less than 100% but it can vary significantly across a state based on geography, with suburban areas showing a higher match rate than either urban or rural areas. So drawing an RBS sample requires special expertise in terms of controlling for this and other issues about who is potentially over-or under-represented in your sample. So why is it that so many experienced political pollsters continue to use RBS samples despite these concerns about its accuracy? We do so because we find that in many instances (though certainly not all) it is just as accurate, or even more so, than RDD studies.

In fact, some academics and media outlets have been experimenting with voter file survey samples and have found this to be the case. Several have publicly shared at least some of their findings about the ways in which the results do or do not differ when using RDD versus voter file samples. Several studies worth reviewing are by Mitofsky, Lenski and Bloom, by Gerber and Green in Public Opinion Quarterly and the online archive of Gerber and Green's work maintained by the list vendor Voter Contact Services (VCS). These studies have largely shown that RBS studies can be just as accurate and in some cases, more accurate, than RDD studies. One hypothesis offered is that samples drawn from voter registration lists by definition consist of actual voters, while RDD studies rely entirely on respondents' self-reporting about whether they are in fact registered to vote. Given the larger and larger portion of the adult American population that is not registered to vote, the potential for survey error when relying on self-reported behavior may be introducing larger error than carefully designed RBS studies contain.

In one recent example, we saw virtually no differences between the results of an RDD and an RBS study. We provide here just one example from our own work as the polling firm for Ned Lamont for U.S. Senate in Connecticut. In the course of the general election, at one point in September we simultaneously conducted both an RDD study and an RBS study. The results were dramatically in sync, with a margin of error of +/- 4.0 percent on the n=600 RBS study and a margin of error of +/- 3.5 percent on the n=800 RDD study. Considering the far higher cost of using RDD samples as compared to RBS samples, these results certainly give weight to the common practice among political pollsters of using voter file samples instead of RDD samples in general election campaigns.


Four Pollsters on the Incumbent Rule

I spent yesterday morning at a post election conference sponsored by Charlie Cook's Political Report, James Carville, Congressman Tom Davis and the Northern Virginia Community College. The conference kicked off with a panel of four very experienced campaign pollsters, two Republicans and two Democrats. They covered many subjects, and I can't possibly do them all justice here, but I do want to pass along some of what the pollsters had to say on what I typically refer to as the Incumbent Rule.

Carville called it "as good a pollster panel as has ever been put together," and he wasn't kidding. The Republican pollsters were Republicans Neil Newhouse of Public Opinion Strategies and Dave Sackett of The Tarrance Group. The Democrats were Harrison Hickman of Global Strategy Group and Stan Greenberg of Greenberg, Quinlan, Rosner. Each is a principal in their own firm, and each has been involved in some of the most competitive statewide races since the 1980s, and collectively their four firms polled in over 180 races for Senate, Governor and the U.S. House in 2006. It is hard to imagine any four campaign pollsters with more comparable experience.

Carville moderated the pollster panel, and his first question concerned his observation that a "doctrine" prevalent in the 1980s among campaign mangers and consultants "that challengers close better than incumbents" in the final days of a campaign. As Michael Barone put it earlier this year, the idea is that "an incumbent is not going to get a higher percentage in an election than he got in the polls." Carville's question: Is that doctrine no longer valid?

A more complete look at the incumbent rule and remains on my to-do list for the next month or so, but in writing up this post, I took a quick look at how incumbents fared in the (still largely) unofficial election results as compared to our final last-five-poll average in the most competitive races for Senate and Governor.


As the table below shows, on average in these particular races looking only at the last five polls in each race, the rule did not apply particularly well. On average, both incumbents and their prime challengers picked up 2.4 percentage points -- an almost exact 50-50 split in the most competitive statewide races.** In some cases, such as the Pennsylvania Senate race, virtually all of the undecided went to challenger Bob Casey, but the pattern was otherwise typically muddled.

The tougher question is the one inherent in Carville's question to the pollsters: Why the recent change in what had been pollster doctrine? Here is a summary of what the four pollsters had to say:

  • Republican Neil Newhouse noted the example of his client, incumbent Republican Jim Gerlach (Pennsylvania-6), who was in a 44% to 44% tie on their final internal poll conducted a week before the election. In the "old days," Newhouse said, we would have assumed an easy Murphy victory. However, Gerlach ultimately prevailed (51% to 49%) after a closing with a final television ad featuring a personal appeal by Gerlach that Newhouse credited for the victory. As for the incumbent rule, Newhouse said, "we are seeing a bit of a change, but not much consistency." While he still tends to give challengers the "benefit of the doubt" when incumbents are under 50%, Newhouse believes it is no longer "carte blanche automatic" that the undecided vote on the final poll will all go to the challenger.
  • Republican Dave Sackett agreed and credited the much shorter "fifteen minute" news cycle for the ability of incumbents to turn the tables on challengers late in the campaign. He noted that his client Deborah Pryce (Ohio-15) as trailing "all the way through" on internal tracking polls, which would presumably include one in the final week (he noted via email that the margin had closed to within sampling error on the final poll). However, according to Sackett, the Pryce campaign outspent Democratic challenger Mary Jo Kilroy by a two-to-one margin over the final weekend, and credits her narrow victory to that final burst of communication.
  • Democrat Harrison Hickman pointed out that in the 1980s, the conventional wisdom was to avoid mention of your opponent, a habit that helped explain why challengers won much of the late undecided vote. Now, he said, the general pattern is for incumbents to vigorously attack challengers throughout the campaign. "Incumbents put so much more pressure on challengers then they used to." (See this pre-election column by Dick Meyer of CBS News that includes data Hickman gathered showing the impact of negative advertising on candidate favorable ratings since 1986).
  • Finally, Democrat Stan Greenberg agreed with his colleagues that the traditional pattern, seen as recently as 1996 when Bill Clinton got 49% in their final poll and 49% of the vote on Election Day, has changed. He speculated about another possible explanation, that elections for the House and Senate have become increasingly "nationalized" since 1994. Pointing to the increasing "partisan consistency" in pre-election polls (each party's candidates winning 90% or more of voters of that party's voters), Greenberg argued that elections now "get crystallized in a specific way nationally" and that local elections get "swept up" in a national tide that may negate the traditional last minute shift of undecided voters to challengers.

Of course, the solution to this very interesting puzzle is inherently speculative. It is also worthy of more analysis than I gave it above. Hopefully, we'll have more to come over the next month or so.

House Districts vs. Poll Results: Part II

On Monday, I looked at how well our averages of polls in U.S. House Districts did in comparison to the unofficial vote counts, and when we averaged the averages, they compared quite well. A related and important question is how well those averages did within individual districts. How often did our House poll averages - sometimes conducted over a span of more than a month - provide a misleading impression of the eventual result on Election Day? In most cases the pre-election averages in House races coincided with the eventual results, but there were a handful of districts where those averages gave a misleading impression of the outcome of the race. The tougher question is whether that misimpression was the fault of the polls or of the combination of their timing and subsequent "campaign dynamics" that changed voter preferences.

That last point is important. Pre-elections polls attempt to be snapshots of voter preferences "if the election were held today." No one should expect a head-to-head vote preference question asked in the first week of October to forecast the outcome of an election held a month later. And as noted here previously, our final averages often included polls stretching back a month or more before Election Day. So consider today's discussion as much about the merits of averaging polls in House races as about the merits of the polls themselves.

Let's start with the averages that we posted on our House map and summary tables. We averaged of the last five polls in each district (including those conducted by partisans and sponsored by the campaigns or political parties). We then classified each race as either a "toss-up" or "lean" or "strong" for a particular candidate based on our assessment of the statistical strength of that candidate's lead.

We were able to find at least one poll in 87 districts, but only 34 with five or more polls. As such, the House race averages often spaned far more time than our statewide poll averages. The final averages were based on just over 304 polls, but 58 of those polls (in 38 districts) were conducted before October. More than a third of the polls used in the averages (124) were conducted before October 15. So it would not be surprising to see averages of these results produce misleading results in any district with a late trend.

In comparing the averages to the results, I see ten districts with "reversals" - districts that we had designated as "leaning" or better to one candidate while a different candidate prevailed. Specifically:

It is worth noting that all but two of these "reversals" were seats we classified as either "lean" Democrat or Republican (a lead beyond one standard error, but not two). That is to say, the lead of the ultimately unsuccessful candidate was relatively small, though obviously not small enough to rate "toss-up" status. The exceptions were New Hampshire-1 and Florida-13, which we had classified as strong Republican and strong Democrat respectively (based on average margins of 11.8% and 7.2% respectively).

Some of these reversals are explicable. For example, all of the public polls released in Ohio-15 and Kansas-2 were conducted prior to October 11, so it is entirely possible that those early surveys were right and that late trends moved the ultimate winner ahead by Election Day. Also, the results for Pennsylvania-4, Arizona-5, New Hampshire-1 all showed trends toward the ultimate winner. The polls in Florida-13 also showed a late trend to the current nominal leader, Republican Buchanan. In Nebraska-3 and Kansas-2, partisan polls with results highly favorable to their sponsors also helped skew the averages in what may have been a misleading direction.

Finally, as many readers know, the results from Florida-13 remain in dispute due to an unusually high rate of "under-votes" in one county that appear to result from a poorly designed layout of the touch-screen electronic voting equipment in that county. A compelling draft analysis by four political scientists (Frisina, Herron, Honaker and Lewis) argues that Democrat Christine Jennings would have prevailed but for the roughly 15,000 votes lost because of the touch-screen equipment.

I had anticipated some of these issues and, in a post just before the election, presented a variety of different "scorecards" based on applying various filters (only late polls, only independent polls, etc). At the time, the various alternative averages made very little bottom-line difference in terms of the number of seats we classified as leaning Democrat or Republican. For the sake of brevity, I will not go through every permutation, but the following table summarizes the number of reversals that would have resulted given various screens we could apply to the averages (that I described in my post on Monday).


Not surprisingly, applying the various filters does reduce the number of "reversal" districts, those where one candidate led in the poll averages but another won. As we throw out early polls or those conducted by partisans, however, a different kind of "miss" increases, those where we miss a switch in party because no polls are available. Our rule on Pollster.com was to assume no change in party for districts with no polls available. However, had we included only independent polls conducted after October 15, we would have made the wrong assumption about four districts previously held by Republicans were Democrats prevailed: Florida-16, Kansas-2, New York-24 and Pennsylvania-7. So remarkably, the rate of "missed" outcomes is roughly the same regardless of the filter applied.

Of course, there are a few districts mentioned above where the reasons for a late "reversal" are not immediately apparent. I'll try to take up some of these, as well as the question of how some of the more prolific pollsters fared in a subsequent post.

House District Polls vs. Results - Part I

Continuing with our post-election review of how the polls performed, I want to turn to the polls conducted in individual House races. As I discussed last week, the final results among likely voters for the national generic vote varied with each other beyond sampling error, and the overall average of those results overestimated the support for Democrats. When we look at the polls we tracked on Pollster.com in individual districts, the story is much different. As we should expect, the overall average of polls in individual districts compares remarkably well to the overall average of the actual results.**

To review: On Pollster.com we tried to track and report every poll we could find within individual districts. In the end, we logged over 400 polls in 87 districts. I have obtained unofficial, but mostly complete results in all 87 districts. As should be obvious, these more competitive districts do not have the problem on unreported results in uncontested districts. And while final certified results may change the district level results by a percentage point here or there, any such changes are unlikely to make much difference in the average results across many districts.

The table below compares the overall average poll result to the overall actual result when averaged across all districts for which polls were available. The first line of the table (a) shows the average of the last-five (or fewer) polls still posted on our House summary page in the 87 districts in which at least one poll was available. The next three lines show comparisons for three more averages, (b) only polls released after October 1, (c) only polls released after October 15 and (d) only the final poll for each pollster released after October 15. The next four rows (e through f) show each of these averages but include only independent, non-partisan polls.


Of course, as we start putting restrictions on the types of poll used to calculate averages, the number of districts with available polls declines (something I discussed in reviewing the poll data in October). As such, it was necessary to calculate a different vote count average for each method of averaging. Not surprisingly, the Democratic margin tended to increase (in both the polls and the results) as the number of districts declines. Democrats did better in the most competitive districts. The less competitive districts were more likely to be held by relatively safer Republican incumbents.

Of course, comparing the raw poll results to the actual vote count presents the perennial problem of what to do about undecided voters. So I created the following table, which shows the Democratic percentage of the two-party vote assuming an equal split of the undecided/other vote. A proportional division (D/D+R) increased the error very slightly across all categories but the differences were so slight that I omitted them from the table. .


Contrary to my own expectation, all of the poll averages of the Democratic share of the two-party vote come remarkably close to the overall average result regardless of the averaging method used. The differences between the methods are negligible, although somewhat surprisingly, the most predictive average was based on the last five (or fewer) polls regardless of date, including many polls conducted in September or earlier.

Although the differences are small, throwing out the partisan polls made the overall averages slightly more accurate in terms of predicting the result, although it also meant losing available data for a handful of districts in each case.

Why did the district level polls perform, on average, so much better than the "generic" vote on the national surveys? It is all about what pollsters call "measurement error," something that occurs when the question does not measure the thing we hope to measure.  Polls conducted at the district level ask respondents to choose between the names of the actual candidates. The "generic" national vote asks about generic party labels and assumes the respondents know the names of candidates.  Present the choice as it appears on the ballot, and the poll gets more accurate.   

Of course, the tables above just show how well the House race poll data worked on average. How well did these polls perform in individual districts? And how did some of the more prolific House race pollsters compare to each other in terms of poll accuracy? I will take those questions up in subsequent posts.

**PS: And I have to note that Pollster reader Mark Lindeman beat me to this observation in a comment over the weekend.

Generic House vs. National Vote: Part II

So how did national estimates of the "generic" House vote compare to the national vote for Congress? We learned in my last post on this topic that the national House vote is being counted and is not yet set in stone. My estimate of the Democratic victory margin (roughly 7 points, 52% to 45% 47%) is still subject to change. The survey side of the comparison is even murkier, with an unusually wide spread of results among likely voters on the final round of national surveys.

To try to make sense of all the numbers, we need to revisit the "generic" House vote and its shortcomings. By necessity as much as design, national surveys have made no attempt to match respondents to their individual districts and ask vote preference questions that involve the names of actual candidates. Instead, they have asked some version of the following

If the elections for support Congress were being held today, which party's candidate would you vote for in your Congressional district -- The Democratic Party's candidate or The Republican party's candidate?

The problem is that the question assumes that respondents know the names of the candidates and can identify which candidate is a Democrat and which is a Republican. Such knowledge is rare, even in competitive districts, so most campaign pollsters consider it a better measure of the way respondents feel about the political parties than a tool to measure actual candidate preference.

In 1995, two political scientists -- Robert Erikson and Lee Sigelman -- published an article in Public Opinion Quarterly that compared every generic House vote result as measured by the Gallup organization from 1950 to 1994 to the Democratic share of the two-party vote (D / D + R). Among registered voters, when they recalculated the results to ignore undecided respondents, they found that the generic ballot typically overstated the Democratic share of the two party vote by 6.0 percentage points, and by 4.9 percent for polls conducted during the last month of the campaign. When they allotted undecided voters evenly between Democrats and Republicans, they fund a 4.8 point overstatement of the Democratic margin, and a 3.4 point overstatement in polls taken during October (See also Charles Franklin's analysis of the generic vote, and also the pre-election Guest Pollster contributions by Erikson and Wlezien and Alan Abramowitz that made use of the generic ballot and other variables to model the House outcome).

Two years later, Gallup's David Moore and Lydia Saad published a response in Public Opinion Quarterly. They made the same comparison of the total House vote to the generic ballot "but included only the final Gallup poll results before the election -- poll numbers that are closest to the election and also based on likely voters" (p.605). Doing so, they reduced the Democratic overstatement from 3.4 points in October to an average of just 1.28 percentage points. In 2002 the Pew Research Center used their own final, off-year pre-election polls from 1994 and 1998 to extend that analysis. Their conclusion:

The average prediction error in off-year elections since 1954 has been 1.1%. The lines plotting the actual vote against the final poll-based forecast vote by Gallup and the Pew Research Center track almost perfectly over time.


Last year, Chris Bowers of MyDD put together a compilation of the final generic House ballot polls from 2002 and 2004 "conducted entirely during the final week" of each respective campaign. When I apply the calculations used by the various POQ authors to the 2002 and 2004 final polls (evenly distributing the undecided vote), the average Democratic overstatement was smaller still -- roughly half of a percentage point in 2000 and 2004.


Which brings us to the relatively puzzling result from this year. The following table shows the results for both registered and likely voters for the seven pollsters that released surveys conducted entirely during the final week of the campaign. The most striking aspect of the findings is the huge divergence of results among likely voters. The Democratic margin among likely voters ranges from a low of 4 percentage points (Pew Research Center) to a high of 20 (CNN).


Not surprisingly, the results show a much smaller spread when we look at the larger and more comparable sub-samples of self-identified registered voters. And some of this remaining spread comes from the typical "house effect" in the percentage classified as other or unsure. As we have seen on other measures, the undecided percentage is greater for the Pew Research Center and Newsweek (and Fox News among likely voters), less for the ABC News/Washington Post survey.

If we factor out the undecided vote by allotting it evenly, and compared to my current estimate of the actual two party vote (with the big caveat that counting continues and this estimated "actual" vote is still subject to change), an interesting pattern emerges:


The results of three surveys -- Gallup/USA Today, Pew Research Center, and ABC News Washington Post -- fall well within the margin of error of the current count. The average result for these three surveys understates the Democratic share of the current count by about a half a percentage point. The likely voter models used by these surveys also show the usual pattern -- a narrower Democratic margin among likely voters than among all registered voters.

But three surveys -- CNN, Time and Newsweek -- show big overstatements of the Democratic vote, roughly 5 percentage points on average. And none of these three show the usual narrower Democratic margin among likely voters than among all registered voters. On the CNN survey, the likely voter model actually increases the Democratic margin.

It is not immediately apparent why the likely voter models of those three surveys yielded such different results, although as always, the precise details of the mechanics used on the final surveys were not publicly disclosed. Other than the general information some of these pollsters have provided previously, all we know for certain is the unweighted number of interviews classified as "likely voters" by each organization, and that information is not helpful in sorting out the differences As indicated in the table below, each pollster identified roughly two-thirds of their initial sample of adults as likely voters.


What can we make about this unusual inconsistency? The overstatement of the Democratic margins among registered voters is generally consistent with past results, but the wide spread of results of likely voters is far more puzzling. The difference in the behavior of the likely voter models looks like a big clue, but without knowing more about the mechanics of the models employed by each pollster, conclusions are difficult.

2006 Data Available for Download

We are pleased to announce that we have posted spreadsheet files the include every poll result we gathered for 2006 Senate, House, and Gubernatorial races. These are now available for download from each respective national summary page. For each poll we've includes a link, a name of the pollster, sample size, population, margin of error, and polling dates where available.

Generic House vs. National Count - Part I

For more than a week -- with my bout of flu virus causing an unfortunate interruption -- I have been trying to come up with a reasonable tabulation of the total House vote to use as a comparison to the final rendering of the "generic" House vote by various national surveys. As it turns out, coming up with a precise total is not easy, as some votes are still being counted and other votes have not been reported.  The comparison comes with a number of caveats before we even reach the unusual spread in the generic results. 

The weekend before last, I copied the raw vote numbers reported by the Associated Press as posted on WashingtonPost.com (largely because the latter reported raw votes for all districts in a format easily conducive to spreadsheet copy-and-paste). Then last week, I went to various Secretary of State web sites to try check any district where a large portion of the precincts (3% or more) were still uncounted on the Post/AP tallies. I was able to obtain complete counts in most districts, but not all. In some areas, counting either continues or remains incomplete pending the release of final, "certified" results.

For example, nearly a third of the vote apparently remains uncounted in California's Riverside County (mostly absentee and provisional ballots). Yet check the Associated Press tallies for Riverside's 44th and 45th Districts (as reported by CNN.com or WashingtonPost.com) and you will see that "100%" of precincts have been counted. The reports on the California Secretary of State web site are not much more help.

Or check the results for any of the House districts in Washington, where a significant share of the votes had still not been counted when news sites stopped updating their results. For example, consider the results below for Washington's 5th District. The last report from WashingtonPost.com indicated that only 64% of precincts had been counted. The last update from CNN.com indicated 75% of precincts counted. And the current unofficial tabulation available from the Washington Secretary of State's office shows a total of 232,379 votes cast. So how many votes have been counted? According to a press spokesperson for the Washington Secretary of State, the "uncounted vote" tally is maintained separately by each County in Washington. Their web site currently shows a total of 48,190 votes still uncounted (roughly 2% of the total)


The last line of the table shows what the total would be if we extrapolate from the percentage of precincts counted. These data suggest either that 5% to 10% of the ballots are uncounted, or that extrapolations based on previous reports of precincts counted were too high, or -- most likely -- a mix of both. One lesson to take away is that "extrapolations" based on the percentage of precincts counted are sometimes less than precise shaky.

With the caution in mind that counting continues, and that the following totals are unofficial and incomplete, here are the current totals as I have:


Now consider that these totals still leave out the votes cast in 22 districts (19 held by Democrats and 3 by Republicans) where no votes have been reported. In six of those districts -- all in Florida and all held by Democrats -- vote counts will never be available (assuming that no write-in candidates qualified in any of the districts) because no votes were cast. Florida law leaves uncontested races off the ballot.

The totals above include vote counts for another 12 no-contest races (involving 11 Democrats and 1 Republican). The incumbent received an average of 108,608 votes in those districts, roughly 60% of the total votes cast elsewhere. If we assume that each of the incumbents in the 16 missing districts outside Florida received roughly that number of votes (a big if -- totals varied widely), we would add 2.3 million votes to the Democratic total, and a little over a half million votes to the Republican total. The Democratic margin would thus increase to a roughly seven-point margin (52% to 45%) for the Democrats. Though the exact size will depend on the assumptions we are willing to make about the various sources of uncounted votes.

And how does this estimate compare to the "generic" House vote on national surveys? I will take that up in a subsequent post, but the survey side of this comparison makes the vote count look relatively complete, precise and pristine.

Continues with Part II

A Surrender of Judgment? (Conclusion)

[This post concludes my comments started yesterday in response to a column by Washington Post polling director Jon Cohen.]

We chose to average poll results here on Pollster -- even for dissimilar surveys that might show "house effects" due to differences in methodology -- because we believed it would help lessen the confusion that results from polling's inherent variability. We had seen the way the simple averaging used by the site RealClearPolitics had worked in 2004, in particularly the way their final averages in battleground states proved to be a better indicator of the leader in each state than the leaked exit poll estimates that got everyone so excited on Election Day.

As Carl Bialik's "Numbers Guy" column on Wall Street Journal Online shows, that approach proved itself again this year:

Taking an average of the five most recent polls for a given state, regardless of the author -- a measure compiled by Pollster.com -- yielded a higher accuracy rate than most individual pollsters.

And in fairness, while I have not crunched the numbers that Bialik has, I am assuming that the RealClearPolitics averages performed similarly this year.

Readers have often suggested more elaborate or esoteric alternatives and we considered many. But given the constraints of time and budget and the need to automate the process of generating the charts, maps and tables, we ultimately opted to stick with a relatively simple approach.

Regardless, our approach reflected our judgment about how to best aggregate many different polls while also minimizing the potential shortcomings of averaging. The important statistical issues are fairly straightforward. If a set of polls uses an identical methodology, averaging those polls will effectively pool the sample size and reduces random error, assuming no trend occurs to change attitudes of the time period in which those polls were fielded.

In reality, of course, all polls are different and those differences sometimes produce house effects in the results. In theory, if we knew for certain that Pollsters A, B, C and D always produce "good" and accurate results, and Pollster E always produces skewed or biased results, then an average of all five would be less accurate than looking at any of the first four alone. The problem is that things are rarely that simple or obvious in the real world. In practice, house effects are usually only evident in retrospect. And in most cases, it is not obvious either before or after the election whether a particular effect -- such as a consistently higher or lower percentage of undecided voters -- automatically qualifies as inherently "bad."

So one reason we opted to average five polls (rather than a smaller number) is that any one odd poll would have a relatively small contribution to the average. Also, looking at the pace of polling in 2002, five polls seemed to be the right number to assure a narrow range of field dates toward the end of the campaign.

We also decided from the beginning that the averages used to classify races (as toss-up, lean Democrat, etc.) would not include Internet surveys drawn from non-random panels. This judgment was based on our analysis of the Internet panel polls in 2004, which had shown a consistent statistical bias in favor of the Democrats. One consequence was that our averages excluded the surveys conducted by Polimetrix, Pollster.com's primary sponsor, a decision that did not exactly delight the folks who pay our bills and keep our site running smoothly. The fact that we made that call under those circumstances is one big reason why the "surrender judgment" comment irks me as much as it does.

Again, as many comments have already noted, we put a lot of effort into identifying and charting pollster house effects as they appeared in the data. On the Sunday before the election, we posted pollster comparison charts for Senate race with at least 10 polls (22 in all). On that day, my blog post gave special attention to the fairly clear "house effect," involving SurveyUSA:

A good example is the Maryland Senate race (copied below). Note that the three automated polls by SurveyUSA have all shown the race virtually tied, while other polls (including the automated surveys from Rasmussen Reports) show a narrowing race, with Democrat Ben Cardin typically leading by roughly five percentage points.



Which brings me to Maryland. Jon Cohen is certainly right to point out that the Washington Post's survey ultimately provided a more accurate depiction of voters' likely preferences than the average of surveys released at about the same time. Democrat Ben Cardin won by ten percentage points (54% to 44%). The Post survey, conducted October 22-26, had Cardin ahead by 11 (54% to 43% with just 1% undecided and 1% choosing Green party candidate Kevin Zeese). Our final "last five poll average" had Cardin ahead by just three points (48.4% to 45.2%), a margin narrow enough to merit a "toss-up" rating.

So why were the averages of all the polls less accurate than one poll by the Washington Post? Unfortunately, in this case, one contributing factor was the mechanism we used to calculate the averages. As it happened, two of the "last 5" polls came from SurveyUSA, whose polls showed a consistently closer race than any of the other surveys. Had we simply omitted the two SurveyUSA polls and averaged the other three, we would have shown Cardin leading by four-point, enough to classify the race as "lean Democrat." Had we added in the two previous survey releases from the Baltimore Sun/Potomac Research and the Washington Post, the average would have shown Cardin leading by six.

John Cohen seems to imply that no one would have considered the Maryland races competitive had they adhered to polling's "gold standard.... interviewers making telephone calls to people randomly selected from a sample of a definable, reachable population." That standard would have omitted the Internet surveys, the automated surveys, and possibly the Baltimore Sun/Potomac Research poll (because it sampled from a list of registered voters rather than using a "random digit dial" sample). But it would have left the Mason-Dixon surveys standing, and they showed Cardin's lead narrowing to just three points (47% to 44%) just days before the election.

We are hoping to take a closer look at how the pollsters did in Maryland and across the country over the next month or so, and especially at cases where the results differed from the final poll averages. I suspect that the story will have less to do with the methods of sampling or interviewing and more to do with more classic questions of how hard to push uncertain voters and what it means to be "undecided" on the final survey.

Numbers Guy: Rating the Pollsters

We interrupt the previous post still in progress to bring you a feature Pollster readers will definitely want to read in full.  Carl Bialik, the "Numbers Guy" from Wall Street Journal Interactive did some comparisons of the performance of five pollsters that were particularly active in statewide elections: Rasmussen Reports, SurveyUSA, Mason Dixon and Zogby International (twice, once for its telephone surveys and once for Internet panel surveys).

The most important lesson in Bialik's piece is his appropriate reluctance to "crown a winner."  As he puts it, "the science of evaluating polls remains very much a work in progress."  That's one reason why we have not rushed to do our own evaluation of how the polls did in 2006.  Bialik provides a concise but remarkably accessible review of the history of efforts to measure polling error (including a quote from Professor Franklin) and a clear explanation of his own calculations.

Again, the column -- which is free to all -- is worth reading in full, but I have to share what is for us, the "money graph:"

There were some interesting trends: Phone polls tended to be better than online surveys, and companies that used recorded voices rather than live humans in their surveys were standouts. Nearly everyone had some big misses, though, such as predicting that races would be too close to call when in fact they were won by healthy margins. Also, I found that being loyal to a particular polling outfit may not be wise. Taking an average of the five most recent polls for a given state, regardless of the author -- a measure compiled by Pollster.com -- yielded a higher accuracy rate than most individual pollsters.

Thanks Carl.  We needed that today.   Now do keep in mind the one obvious limitation of Bialik's approach.  He only looked at polls by four organizations, including just one online pollster (Zogby) and just two that used live interviewers (Mason Dixon and Zogby).  There were obviously many more "conventional pollsters," although few conducted anywhere near as many surveys as the four he looked at. 

Another worthy excerpt involves Bialik's conclusions about the Zogby Interactive online surveys, especially since nearly all of those surveys were conducted by Zogby on behalf of the Wall Street Journal Interactive -- Bialik's employer.  

But the performance of Zogby Interactive, the unit that conducts surveys online, demonstrates the dubious value of judging polls only by whether they pick winners correctly. As Zogby noted in a press release, its online polls identified 18 of 19 Senate winners correctly. But its predictions missed by an average of 8.6 percentage points in those polls -- at least twice the average miss of four other polling operations I examined. Zogby predicted a nine-point win for Democrat Herb Kohl in Wisconsin; he won by 37 points. Democrat Maria Cantwell was expected to win by four points in Washington; she won by 17.

 Again...go read it all

A Surrender of Judgment?

I had an unhappy experience yesterday morning while still down for the count with a persistent fever (it has broken finally and thanks to all for the kind get well wishes). As I lay shivering, achy and generally miserable, my wife kindly ventured outside to find me some distraction in the form of our dead-tree copy of the morning's Washington Post. It took me only a minute or two to discover that Jon Cohen, the new polling director as the Post, had penned a column that mounted a veiled but clear attack on this site and others like it:

One vogue approach to the glut of polls this year was to surrender judgment, assume all polls were equal and average their findings. Political junkies bookmarked Web sites that aggregated polls and posted five- and 10-poll averages.

But, perhaps unsurprisingly, averages work only "on average." For example, the posted averages on the Maryland governor's and Senate races showed them as closely competitive; they were not. Polls from The Post and Gallup showed those races as solidly Democratic in June, September and October, just as they were on Election Day.

These polls were not magically predictive; rather, they captured the main themes of the election that were set months before Nov. 7. Describing those Maryland contests as tight races in a deep-blue state, in what national pre-election polls rightly showed to be a Democratic year, misled election-watchers and voters, although cable news networks welcomed the fodder.

More fundamentally, averaging polls encourages the already excessive attention paid to horse-race numbers. Preelection polls are not meant to be crystal balls. Putting a number on the status of the race is a necessary part of preelection polls, but much is lost if it's the only one.

We need standards, not averages. There's certainly a place for averages. My investment portfolio, for example, would be in better shape today if I had invested in broad indexes of securities instead of fancying myself a stock-picker. At the same time, I'd be in a much tighter financial position if I took investment advice from spam e-mails as seriously as that from accredited financial experts.

This last point exaggerates the disparities among pollsters. But there are differences among pollsters, and they matter.

Pollsters sometimes disagree about how to conduct surveys, but the high-quality polling we should pay attention to is based on an established method undergirded by statistical theory.

The gold standard in news polling remains interviewers making telephone calls to people randomly selected from a sample of a definable, reachable population. To be sure, the luster on the method is not as shiny as it once was, but I'd always choose tarnished precious metals over fool's gold.

I want to say upfront that I find the charge that our approach was "to surrender judgment," "assume all polls were equal" and blindly peddle "fool's gold" to be both inaccurate and deeply offensive. While it is tempting to go all "blogger" and fire off an angry response in kind, I am going to try to assume that Mr. Cohen -- whom I do not know personally -- wrote his column with the best of intentions. At the same time, it is important to spell out why I fundamentally disagree with his broader conclusions about the value of examining and averaging different kinds of polls.

[Unfortunately, having lost a few days to the flu, I need to pay a few bills and attend to a few other details here at Pollster. I should be back to complete this post later this afternoon. Meanwhile, please feel free to post your own thoughts in the comments section].

Update (11/16): Since I dawdled, the second half of this post appears as a second entry

Partisan Composition of Samples in 2006 Generic Congressional Ballot Surveys: Greater Discolsure, Less Controversy

Today's Guest Pollster Corner Contribution comes from Alan Reifman of Texas Tech University, who takes a closer look at this fall's pre-election polls.

In the months leading up to the 2000 and 2004 general elections, presidential election polls showed considerable variation -- both across different pollsters and within the same pollster at different times -- in the percentages of self-identified Democrats, Republicans, and Independents comprising the samples. Sample composition itself probably would not concern many people, but when these sampling variations seemed to affect the polls' candidate vs. candidate "horse race" numbers, people got agitated.

Discrepancies in polls' partisan compositions almost inevitably raise the issue of whether survey samples should be weighted (i.e., post-stratified) to match party ID figures from sources such as previous elections' exit polls, much like polls are weighted to match gender and other demographic parameters from the U.S. Census. Underlying the question of whether pollsters should weight by party ID lies another question: How fixed and enduring are voters' identifications with a party? Again, experts differ. Zogby was the first major pollster to weight on party ID, with Rasmussen following suit later. Most, if not all, of the remaining pollsters do not weight by party.

I track polls' partisan compositions at my sample weighting website. I am neither a pollster nor a political scientist, but I am a social scientist who teaches research methods and statistics, and I've spent much time studying and collecting data on party identification. I also took a graduate statistics class many years ago from Pollster.com contributor Charles Franklin.

If I had to summarize developments on the sample composition/weighting front for 2006 (where the main point of interest was the Generic Congressional Ballot), I would identify two trends:

1. Thanks to the efforts of the Mystery Pollster himself and others who raised the issue over the past few years, full "topline" documents (also known as polls' "internals"), which included party ID numbers, were freely accessible via the web for most of the national pollsters during this past election season.

2. The margin between the percentages of self-identified Democrats and Republicans (D minus R) comprising most national polls over the final two months of the campaign season was pretty stable. As a consequence, questioning of polls' partisan breakdowns was relatively rare this year.

On my website, I used Rasmussen's party ID readings as my benchmark for comparison, due to the large numbers of interviews involved (500 daily interviews, aggregated over the 90 days preceding the start of each new month). Most of the time, Rasmussen had the D-R margin at roughly 4.5 percent. As shown in the major chart on my website, when multiple independent polls that were in the field during roughly the same time frame (and which released the necessary party ID numbers) were available, I averaged their partisan percentages. Four polls (not including any from Rasmussen) taken from October 18-22 inclusive showed averages of D 34.4 and R 29.8, well in line with Rasmussen's margin. (The average of five polls from an earlier period, October 5-8 inclusive, had a wider Democratic margin: D 36.9, R 29.6.)

In the chart, I also provided brief verbal commentary on how each poll's partisan breakdown matched up with Rasmussen's. As can be seen, polls' D-R margins were sometimes described as "about right," with instances of "D edge understated" and "D edge overstated" almost perfectly balancing out over the final two weeks.

In the end, the New York Times exit poll (N = 13,251) showed the national electorate for U.S. House races to consist of 39% Democrats and 36% Republicans. This 3-point difference is slightly smaller than would have been anticipated from some of the late polls, but only slightly. It should also be noted that, even with its huge sample size, the Times exit poll still is a sample survey and thus carries a small margin of error (about +/- 1).

One final note: As animated as I get by party ID percentages, I must acknowledge that they are not the whole story. For example, among the final batch of polls, FOX, Pew, and Time all had Democratic respondents outnumbering their Republican counterparts by either 3 or 4 percent. Yet these polls differed widely in their Generic Ballot readings, with FOX and Time having Democrats up 13-15 percent (with FOX's sample explicitly described as consisting of "likely" voters), whereas Pew had them up only 8 (among registered voters) or 4 (among likely voters). Other traditional issues of survey methodology -- such as question wording and order effects -- thus have to be examined for their possible role in these polls' varying D-R margins on the Generic Ballot.

So Now He Tells Us

A quick follow-up on Karl Rove's contention in his now well-known interview with NPR's Robert Siegel:

I'm looking at 68 polls a week . . . and adding them up. I add up to a Republican Senate and Republican House. You may end up with a different math but you are entitled to your math and I'm entitled to THE math.

Obviously, it didn't work out that way.  I discussed the topic here and in a subsequent interview on NPR's On the Media.  But now, thanks to Newsweek (via Kaus) we have the details on just Rove meant by "THE math:"

The polls and pundits pointed to a Democratic sweep, but Rove dismissed them all . . .He wasn't just trying to psych out the media and the opposition. He believed his "metrics" were far superior to plain old polls. Two weeks before the elections, Rove showed NEWSWEEK his magic numbers: a series of graphs and bar charts that tallied early voting and voter outreach. Both were running far higher than in 2004. In fact, Rove thought the polls were obsolete because they relied on home telephones in an age of do-not-call lists and cell phones. Based on his models, he forecast a loss of 12 to 14 seats in the House—enough to hang on to the majority. Rove placed so much faith in his figures that, after the elections, he planned to convene a panel of Republican political scientists—to study just how wrong the polls were.

So there you have it.  Two plus two always adds to four, but sometimes our models and assumptions don't add up as well as we think they will.

Update: Adam Berinsky, an associate professor of political science at MIT, asks a good question in the comments:

Who were these Republican political scientists that were going to attend Rove's conference? I assume they were lined up before the election. If any of them are MP readers, it would be interesting to get their perspective?

I do not hear that as a rhetorical question. If any political scientists want to chime in on this issue, our "Guest Pollster Corner" is open and your comments would very much be welcome. Who knows, could Karl Rove himself be an MP reader?

Out of Step or Out of Office? (Or Just a Bad Election for Republicans?)

Today's Guest Pollster Corner Contribution comes from Simon Jackman of Stanford University, who takes a closer look at Tuesday's Senate election results. 

Two interesting questions to ask after Tuesday's election are (1) Were the six defeated Republican senators particularly "out of step" with their respective states?; (2) What will be the effect of the Democratic pickups on the look of the new, 100th Senate?

To answer this question I first assigned a liberal-to-conservative voting score to each senator based on an analysis of the 530 non-unanimous roll call votes cast in the 109th Senate. The resulting scores are scaled to have mean zero and standard deviation 1, with lower (negative) scores reflecting a more liberal voting history, and positive scores reflecting a more conservative voting history (the details of this scoring procedure appear in my 2004 article with Josh Clinton and Doug Rivers in the American Political Science Review. A familiar story results, with Democrats on the left, Republicans on the right, with virtually zero no overlap between the parties. Lincoln Chafee is estimated to be the most liberal Republican with a voting score of about zero, while Ben Nelson (NE) is the most conservative Democrat (again, with a voting score close to zero). The usual suspects anchor the extremes of both parties: Barbara Boxer (CA) and Ted Kennedy (MA) for the Democrats (scores of -1.9), and Inhofe (OK) and Demint (SC) for the Republicans (scores of 1.3).

To gauge each state's political complexion, I use a simple and convenient proxy: Bush's share of the 2004 presidential vote in each state, which ranges from a high of 71.5% in Utah, to a low of 36.8% in Massachusetts, with a median of 52.7% (bracketed by Florida's 52.1% and Missouri's 53.3%).

The graph below shows a scatter-plot of voting score against 2004 Bush vote. Each point corresponds to a senator (red for Republicans, blue for Democrats, with senators running for re-election given a heavier shading), with 2004 Bush vote on the horizontal axis, and the roll call voting score on the vertical axis (higher is more conservative, lower is more liberal). The gray line is a regression fit to the data, not to be taken too seriously, but rather more as a rough guide as to how "out of step" the senator may or may not be. Republicans running for re-election are all numbered: Republican senators losing their seats are numbered 1 through 6, the other Republican senators who were re-elected are numbered 7 through 14. Chafee (RI), Santorum (PA) and Allen (VA) seem to be the only Republican losers who are obvious candidates for a "out of step" with their state kind of story (along the lines proposed by Canes-Wrone, Brady and Cogan in a 2002 article in the American Political Science Review, lying relatively distant from the regression line. DeWine (OH) seems to have been caught in what was an extremely difficult election for Republicans in Ohio, and neither Talent (MO) nor Burns (MT) appear to have been particularly "out of step". And keep in mind that there are several Republican senators just as apparently "out of step" as Santorum or Chafee who did not lose their seats: e.g., Jon Kyl (AZ), who faced no Democratic opposition in 2000, but won 53-44 in 2006; or John Ensign (NV), who by almost identical margins in both 2000 and 2006.


It is interesting to speculate on shape of the new, 110th Senate. Chafee goes, replaced by a Democrat, leaving the Maine senators (Olympia Snowe, re-elected with a 74-21 margin in 2006, and Susan Collins) as the most moderate Republicans. It remains to be seen just how liberal or moderate the new Democrats will be. Given their states and the narrow margins with which they are projected to win, it is tough to imagine Webb (D-VA) or Tester (D-MT) being particularly liberal, perhaps voting more like relatively conservative Democrats from the plain states (e.g., Nelson, NE; Conrad and Dorgan from ND) or the other Montana senator (Baucus).

First Impressions: A Good Day for Averaging

Despite exhaustion and sleep deprivation, we want to take a few minutes today to a very quick and very preliminary look at how the preelection polls did as compared to yesterday's results.  Since some precincts are still out and some absentee and provisional ballots are still being coutned, this quick looks is inherently preliminary and subject to change, but at the statewide level, the average of the last five polls in each races did reasonably well.  In every case that we have examined so far, the leader in the average of the preelection polls was the leader on election day. 

The following table includes only the most competitive Senate races that we tracked for the Slate Election Scorecard.  It shows the curernt unofficial result in each state as compared to our final last-5-poll average.  Since the preliminary results we gathered had been rounded to the nearest whole digit, we did the same with the final average.  Again, every leader in the polls ran ahead yesterday.


[Note:  For brevity's sake, the table above displays the results for Joe Lieberman in the Republican column, although Lieberman ran under the "Connecticut for Lieberman" party and has pledged to caucus with the Democrats in the Senate].

The list of the most competitive Gubernatorial races shows the same pattern.  While the averagse did not predict the winners perfectly, the leader in the prelection polls was the leader on election day in every case.



[Update:  The original version of the above table omitted the Minnesota Governor's race, which as several commenters noted, is the one state where the nominal leader in the averages was not the winner on election day.  My apologies for the omission -- more details in a comment below.

Averaging results is obviously an imperfect solution to pre-election poll variation.  The outcome in many races was off the "last-5-poll" average by as much or more than the Minnesota Governor's race:  The Pennsylvania Senate race, the Maryland races for Senate and Governor, and the races for Governor in Alaska and Michigan all featured results differing from the final average that were as large as Minnesota].

We hope to have a far more comprehensive analysis in a few days looking at more races and using vote return data that is closer to complete.  And these comparisons obviously make no effort to allocate undecided voters or use any of the more sophisticated measures of survey error.   But for now, the bottom line is that the last-five-poll averages gave a pretty good impression of the likely outcomes of each of these competitive races.

Live Blogging Election Night

1:40 am  One more "alert reader" emails:

Actually, "alert reader MW" has it exactly wrong. If you look on the details page for Isle of Wight County, there are 22,861 registered voters in the county. With the count at 6,984 for Allen, 5,050 for Webb, and 163 others, the vote total of 12,197 gives a turnout of 53.35%. If the Webb number were actually 9,050, the turnout would be an absurdly high 70.84%. In addition, all the other races in the county are showing vote totals in the neighborhood of 12,000. So it seems unlikely that the Virginia site is incorrect. Still, with the more complete Virginia numbers (but without the benefit of the AP's extra 4,000 votes), Webb seems to be leading by about 1,500 votes with only a handful of heavily Democratic precincts yet to be counted. This one appears headed for the D column.

I for one, am not nearly alert enough to sort this one out.  Probably a clue to say goodnight and get some sleep.  Thanks to all. 

1:15 am  I have been reviewing the House races paying particular attention to the races I rated as toss-ups based on the surveys conducted in October.  Of the pure toss-ups, those that have declared winners so far are splitting about evenly between Democrats and Republicans.  As of now, I see eight toss-up races going to Democrats and seven to Republicans.   Of course, many more are still being counted decided.

One interesting result involves three districts that showed Democratic challengers leading by a significant margins only when we included the automated Majority Watch surveys in the averages:  New York-25, New York-29 and Ohio-15.  The Republicans incumbents in these three districts were all reelected.**

**Well, not quite.  Apparently, with 1% of the vote uncounted and a two percentage point lead, Pryce has not been declared the winner in Ohio-15.  Thought I saw a check mark next to that one.  My error -- apologies.  

1:03 am  Alert reader  MW writes:

In the last mid-term election Isle of Wight county had 44.5 percent voter turnout.  The 9050 tally for Webb would put it closer to that number, around 48% turnout, based on a 2004 population of 32,774.  The 5050 number would put turnout closer to 39%.  Not that this means anything per se, but the talk has been of higher voter turnout hasn’t it?

11/8 12:28 am  Back home and checking House races.  Some very interesting comments by Pollster readers are ahead of something I just heard on CNN.  The AP results are not matching the official numbers on the Commonwealth of Virginia State Board of Election web site.  This sort of error is not unusual. 

Specifically, this is from reader Jeremy Pressman:

What is with Isle of Wight county?
The state says Webb has 5050 votes while some news organizations say 9050 - obviously a huge difference in that race.

Compare: Virginia State Board of Election and CBS News

A typo??

It has happened before.  But whose typo?

10:52 pm  I am going to need to relocate and will be offline for about 45 minutes. 

10:45 pm  In the comments, Gary Kilbride suggests a great site to look at where the outstanding vote is in Virginia. As of when I last updated the page, 143 precincts were uncounted and about 40 of those precincts come from four jurisdictions (Arlington, Fairfax, Norfolk and Richmond City) that Webb is carrying by margins of 59% or better.  Another 30 are in Loudoun, which Webb is carrying 51% to 48%. 

10:17 pm While my computer was slooowly rebooting ("virtual memory low" I hate that!) reader VZ emailed to remind me that the Montana tabulations are now online at CNN.  An extrapolation on these numbers (which reflects the estimates applied as the polls closed) shows Tester leading by six (52 percent to 46 percent).  Obviously, as the polls closed 15 minutes ago, this margin is not sufficient to call the race.

10:10 pm Promoting a comment from Mark Lindeman:

I've tried to estimate exit poll margins from a few of the tabulations for the 22 Senate polls I have so far (all of which I think I saved before they were updated). Those tabulations presumably are based on composite estimates incorporating pre-election returns.

When I compare the margins to the Pollster.com pre-election average margins, the exit polls appear to be running about 3.8 points more Dem than the pre-election polls -- which suggests that the actual gap could be wider. Several caveats on that: (1) I can already tell that my eyeballed margin in Missouri is about a point too large, so the gap could narrow. (2) Exit poll discrepancies have generally run high in the Northeast, which is overrepresented. The  biggest discrepancies so far appear to be in non-competitive races, with the possible exception of CT. (3) Some part of this may be attributable to Democratic surge, and I don't have enough info yet to estimate that possible effect.

9:45 pm  That last question really gets at something important.  In some ways, it is a bad idea to think of the estimates we can extrapolate from the exit polls cross-tabulation as "exit polls."  That may sound crazy, but the tabulations in states like Virginia, Missouri and Tennessee are now being weighted (or statistically adjusted) to reflect NEP's best estimate of the outcome at any given moment.  Those estimates are gradually being updated to reflect more of the actual vote from the sampled precincts.  That makes these estimates worth looking at -- the network decision desks certainly are. 

On the other hand, the big risk in extrapolating from the exit poll crosstabulations is the considerable lag since they were run.  Right now, the time stamps are 8:38 for Virginia, 8:49 for Tennessee and 8:10 for Missouri.  So take these extrapolated estimates with a big grain of salt:  McCaskill up by 3 in Missouri, but Webb and Ford down by 2 in Virginia and Tennessee respectively. 

If we could look over the shoulders of the decision desk analysts right now, we would probably be seeing different numbers.  Oh to be a fly on the wall in that room. 

9:26 pm  Very alert reader BM emails with a question: "It looks like the exit poll you quoted in the VA senate race has changed and would indicate that Allen has a majority.  Am I missing something?."

Nope, you're not missing a thing.  The tabulation has been updated and Allen now leads by slightly more among both men (55% to 44%), and Webb leads by slightly less among women (53% to 46%).  Notice that the time stamp is 8:38 pm.  What is happening is something I described in this morning's post:  The NEP analysts are gradually replacing the exit poll talleys in each precinct with actual votes from that precinct.  They are also beginning to fill out a second and larger sample of precincts from which they gather hard votes  At any given time, they adjust the exit poll tabulations (displayed on CNN) to match the current estimate considred most reliable.  And that process appears to have shifted the Virginia estimate -- for the moment -- in Allen's favor (roughly 50% to 49%)

8:56 pm  Gary Kilbride has a very good catch in the comments.  The Missouri exit poll is up on CNN.  Those who decided in the last three days (who were 10% of all voters) went for McCaskill 57% to 38%.  Earlier decideds split nearly evenly with 50% for McCaskill, 49% for Talent.  The overall margin in the tabulation is far, far too close to tell us who will win, but given how close the pre-election poll looked, a late break if real would be decisive for McCaskill.

8:45 pm  Not sure what to make of this:  CBS News seems to be calling races much more readily than the other networks.  The just called New Jersey, and an extrapolation on the currrent exit poll tabulation on CNN (with an 8:21 timestamp) shows  Menendez with a roughly 10 point lead (54% to 44%). CNN just called it also. 

8:16 pm LS asks a good question in the comments:  "Why are your blog entries showing a TEN+ MINUTE lag time?"  LS, we have "cached" our servers to handle the very heavy traffic today, and as I understand it, the cache only updates every ten minutes or so.  So, unfortunately, these posts are updating on a ten minute delay.  Also (as with the last update), I'm guessing wrong about how long it takes to write these updates.

8:10 pm Ok, here's another one.  Extrapolate from the vote by gender tabulation now available on CNN and you get a 16 point lead for Democrat Bob Casey (58% to 42%).  CBS has apparently called both Pennsylvania and Ohio for the Democrats, although the other networks I've been monitoring have not.  This should tell us something important:  The analysts are being very cautious about calling the result on exit polls alone.  And these are states with candidates with double digit leads in the estimates applied to the CNN crosstabulations.  For the states with closer margins, those exit polls aren't telling us much. 

7:55 pm  Polls close in five minutes in a bunch of states with closely watched Senate contests, including Connecticut, Maryland, Missouri, Pennsylvania and Tennessee.   

7:43 pm  Interesting.  Want to see the difference between a margin big enough to "call" and election and one that isn't?  Look at Ohio.  Doing the math (all in my spreadsheet this time) the CNN tabulations show Democrat Strickland leading Blackwell in the Governor's race by roughly 26 points (62-36), but Democrat Sherrod Brown leading by 16 (58-42).  The polls have been closed for ten minutes in Ohio and they haven't called it for Sherrod Brown yet.   That should tell you what to think about a margin of less than 3 or 4 points. 

7:20 pm  Ok..here's the way you do the math, and these are not "leaked" results.  From CNN's tabulation.  Virginia:  49% men, 51% women; Allen-Webb 53-46% among men, 43-56% among women.  Allen's approximate number from this tabulation is (.49*.53)+(.51*.43)= .479.485.  Webb's number (.49*.46)+(.51*.56)=51.1 50.5

That's a 3.2 two point margin for Webb which is [still] (a) way too close to call on the exit poll estimate alone and (b) for what it's worth, narrower than the leaked number I saw about an hour ago.   And not surprisingly, we continue to watch. [Sorry about the bad math] 

7:05 pm Right now, if you got to the CNN exit poll page they are reporting the current "cross-tabulations" for each state where the polls are closed. They do not show the current vote estimate, but they do show the vote by gender, as well as the percentage male and female in each state, and it is not exactly rocket science to do the math.  

6:57 pm Something else to remember:  One of the things the network analysts are doing right now is comparing the exit poll results with averages of preelection polls -- averages not unlike those we have posted here on Pollster.com.   If the exit poll result in a state looks out of line with the preelection result, they will not call the election even if the exit poll lead looks statistically significant.  So if you see a "big" lead in a leaked exit poll, but the networks don't call that state when the polls close, you can assume that they are waiting to see hard data to confirm the exit poll result. 

6:40 pm Something to remember about those leaked numbers you may or may not be seeing.  First, if I say it once, I'll say it a thousand times:  A "lead" of 2 or 3 points isn't much of a lead in an exit poll.  We are seeing leaked numbers but we are not seeing the current "status" assigned to that state by the exit pollsters -- whether the lead is statistically significant enough to call the race.   My guess, looking at the leaked numbers, is tha the networks will need hard vote data to call the Senate races we have long considered "toss-ups."

Second, we all need to remember that in 2004, the exit polls had an average error favoring the Democrats of about 5 or 6 points on the margin.  In other words, if 2006 turns out like 2004, a 6 point lead may not be a lead.

6:30 pm - So what is this live blogging thing about?  Two years ago, I vowed not to post leaked exit polls, and kept to that pledge, but in so doing opted out of the opportunity to comment on all the leaked data flying around the blogosphere.  Also, we had just brought our three-day-old son home from the hospital, and so any excuse to avoid the computer was worthy.  Tonight I want to do something different.  The first wave of leaked estimates is now out, and I want to say a few things about it.  I won't post the numbers here (and I'm not sure this will work) but I will try to offer some advice about how to read what you may be seeing. 

Sound crazy?  Maybe, but bear with me.  I'll keep posting to the top of this entry.

6:22 pm - For those who may have missed it at the bottom of the last post, here are the best links I have for the NEP network sites both reporting vote results and (eventually) displaying exit poll cross-tabulations.  When those will appear are anyone's guess. 

Slate 13: Final Update

Our next to last Slate Election Scorecard reaps where things stand in the closest Senate races: Maryland, Missouri, Montana, Tennessee and Virginia.  Meanwhile I wanted to give one last update on the overall "mashed-up" margin across all the Senate races on the Slate Scorecard. Consistent with the trend on Professor Franklin's Senate "national forces" charts (which have their basis in the same underlying data), the average Democratic margin across the turned downward over the last week -- for the first time in seven weeks.


One interesting twist to these findings is that the Republican Bob Corker's gains in Tennessee explain virtually all of the Democratic decline.  The last 5-poll average in Tennessee went from a dead-even tie to a 7.4 point Corker lead in just a week.  If we remove that race from the overall average, there is still a leveling off of the six week Democratic trend but virtually no decline.  Republicans saw gains on the averages in some states over the last week, but so did Democrats and, except for Tennessee, the changes cancel out.


Democracy Corps 50 District Tracking: Final Survey

Democracy Corps, the project of Democratic pollster Stan Greeberg and Democratic consultants James Carville and Bob Shrum, just released their final tracking survey (memo, results) conducted among voters in 50 competitive districts currently held by Republicans.  To be clear, they do not conduct 50 surveys in 50 districts, but one sample of 1,201 600 or so likely voters spread out across the 50 districts.  While this approach does not allow for district-by-district projections, it is the only public survey available that has tracked attitudes on a weekly basis in the most competitive congressional districts (our massive collection of public polls, by comparison, provides what amounts to as a "time lapse" snapshot of these districts taken over the course of October). 

There is evidence of a slight shift of the playing field to the Republicans at the end of last week, fully reflected in the first half sample of the survey on Thursday night. That shift includes perhaps a 2-point gain in party identification advantage amongst these likely voters (with and without leans) in the Republican districts; a 3-point rise in "right track" (though only 34 percent), a 3-point gain in Congress job approval (though only 34 percent) and a 4-point rise in "warm" reactions to Republican Congress (though only 38 percent).

Republican congressional voters' high interest in the election is up 4 points, but still lags 11 points behind that of Democratic voters. Together, that has likely cut the Democrats' margin by 2 points -- and that is not trivial in districts where Republicans are near 50 percent. But more striking is how stable is this race and how endangered the incumbents are. While the voting electorate has become marginally more Republican, it has not moved key indicators,

Of course, the polling company that conducted the survey -- Greenberg, Quinlan Rosner Research -- is a Democratic firm.  So take these results with whatever grain of salt you deem appropriate. 

The memo also makes some interesting observations about their year-long experiment with a "named" congressional ballot question. It is well worth reading in full

Generic Ballot Update

Three of the last six national polls have found sharp downturns in the Democratic lead on the congressional generic ballot. After rising steadily since the week before the Foley scandal, the Democratic advantage has now begun to turn down. USAToday/Gallup, ABC/Washington Post and Pew Research Center all find substantial drops. Newsweek, Time and last week's CBS/New York Times polls do not find that decline, but rather show stability at around a 15-point Democratic lead.


While these shifts this MAY signal a sharp change of opinion going into the weekend, the magnitude of the drop is quite uncertain with only three polls. We routinely see lots of variation across polls, especially when looking at the generic ballot margin. Nonetheless, the shifts have been enough to convince my "local regression" estimator (the blue line in the figure) to turn down for the first time in a while. Since the blue trend line considers ALL the polls, it is not overly sensitive to single polls, though the combined weight of Gallup, ABC/WP and Pew is enough to move it down about 4 points, from +15 to +11 for the Dems. It is likely that the individual polls are overstating the extent of the downturn. The trend estimator captures the "signal" among all the "noise" from poll to poll. It would take more polls to "know" how much this downturn really represents. But the "poll" taken on Tuesday will answer the question for sure.

However, the current estimate of the Democratic lead based on the trend of all recent surveys remains at roughly +11. While down from the peak of early October, check my post and comparison graphic from earlier this week.  The final Democratic advantage has not been over 10 points (or even close) in the last 12 years. 

Generic Ballot Update

Three of the last six national polls have found sharp downturns in the Democratic lead on the congressional generic ballot. After rising steadily since the week before the Foley scandal, the Democratic advantage has now begun to turn down. USAToday/Gallup, ABC/Washington Post and Pew Research Center all find substantial drops. Newsweek, Time and last week's CBS/New York Times polls do not find that decline, but rather show stability at around a 15-point Democratic lead.


While these shifts this MAY signal a sharp change of opinion going into the weekend, the magnitude of the drop is quite uncertain with only three polls. We routinely see lots of variation across polls, especially when looking at the generic ballot margin. Nonetheless, the shifts have been enough to convince my "local regression" estimator (the blue line in the figure) to turn down for the first time in a while. Since the blue trend line considers ALL the polls, it is not overly sensitive to single polls, though the combined weight of Gallup, ABC/WP and Pew is enough to move it down about 4 points, from +15 to +11 for the Dems. It is likely that the individual polls are overstating the extent of the downturn. The trend estimator captures the "signal" among all the "noise" from poll to poll. It would take more polls to "know" how much this downturn really represents. But the "poll" taken on Tuesday will answer the question for sure.

However, the current estimate of the Democratic lead based on the trend of all recent surveys remains at roughly +11. While down from the peak of early October, check my post and comparison graphic from earlier this week.  The final Democratic advantage has not been over 10 points (or even close) in the last 12 years. 

Dem wave crested; advantage shrinks


Across the board, in Senate, House and Governor's races, the wave boosting the Democrats crested about 10 days ago. Since then the advantage Democrats have built throughout the year has been reduced by from 1.5 to 3.5 percentage points. While forces are still a net positive to the Democrats, these forces are weaker than they were during the week before Halloween. This implies that the most competitive races will now be harder for Democrats to win and easier for Republicans to hold. This implies that the anticipation of a major surge to Democrats now needs to be reconsidered. While race-by-race estimates still show an 18 seat Democratic gain, and 27 seats as tossups (see our scorecard at Pollster.com here), this reduction in national forces makes it less likely the Democrats sweep the large majority of the tossup seats and could result in total gains in the 20s rather than the 30s or even 40s that looked plausible 10 days ago.

This cresting of national forces has taken place across Senate, House AND Governors races and occurred essentially simultaneously around October 25th. The estimators here, plotted as the blue line in the figures, is a measure of national effects that are common across all races. The estimate uses all polls in all races, but estimates the Senate, House and Governors races independently, yet produce similar results for each in terms of timing, though with some variation in magnitude. (For more on the estimation method, see this earlier post.)


As of last Thursday's data, the downturn was clear for the Senate but no indication of change had appeared for the House. Adding the polling data from Friday, Saturday and Sunday, the downturn is now apparent there as well. I did not do the Governors last week.


The congressional generic ballot does NOT show any such change (yet!). In my earlier post I cautioned that the generic ballot might not be reflecting a realistic assessment of the Democratic advantage. It may also not be reflecting the last minute dynamics of the campaign this year.


So what does this mean? The House still looks likely to go Democratic, but probably by a smaller margin that it might have a week ago. For a while, the Senate looked to come down to who won two of VA, TN and MO. Now MT must be added to that, and TN moved to lean Rep, perhaps requiring a Dem sweep of VA, MO and MT. (Momentum in VA remains pro-Dem, while MO is completely flat and MT is strongly trending Rep.) Possible but more of a trick that 2 of the former 3 states. The shrinking margin in MD may well end with a Dem win, but clearly some races that were viewed as likely Dem pickups or holds are now somewhat more in doubt that before, possibly including RI.

With two days to go there is a time limit on this dynamic. Reps may not have time to profit greatly from this trend, and we've seen sharp changes before so Dems may be able to recover (Republicans had a bad end of the week last week, after John Kerry and the Dems had a bad first of the week.) So no firm prediction here, but the evidence is that the Dems are falling back from their best chance of large gains.

Update: Two New Toss-Ups in MD

As should be evident by our "most recent polls" box on the right, the Mason-Dixon organization released their (presumably) final round of statewide surveys today.  We just updated our charts and the new polls helped push both the Maryland Senate and Maryland Governor's races into the "toss-up" category. 


Joe Lenski Interview: Part 2

Joe Lenski is the co-founder and executive vice-president of Edison Media Research. Under his supervision, and in partnership with Mitofsky International, the company of his later partner Warren Mitofsky, Edison Media Research currently conducts all exit polls and election projections for the six major news organizations -- ABC, CBS, CNN, Fox, NBC and the Associated Press. In Part I of his interview with Mark Blumenthal, he spoke generally about how the networks conduct exit polls and how they use them in their system to project winners on Election Night. The interview concludes with a discussion of the problems the exit poll experienced in 2004 and what will be done differently this year.

I want to ask more generally about how things will be different this year. First, let's talk about the issue of when and how you will release data to members of the National Election Pool (NEP) consortium and other subscribers. In the past, and please correct me if I'm wrong, hundreds of producers, editors and reporters had access to either the mid-day estimates or early versions of the crosstabulations that you would do, and the top-line estimate numbers would inevitably leak. How is that process going to be different this year?

The news organizations are really taking this challenge seriously on how to control the information for a couple of reasons. First, each of these news organizations have made a commitment to Congress over the years that they would not release data that would characterize the winner or leader in a race before the polls have closed. So in essence, by this data leaking, it was undermining that promise that they had made to Congress.

The other thing is that we know these are partial survey results. No polling organization leaks their partial survey results. If it's a four-day survey they don't leak results after two days. Similarly if it's a twelve-hour exit poll survey in the field you're not going to release results after just three hours of interviews. So the data will not be distributed to the news organizations until 5:00 p.m. in 2006, and that's a change from all the previous elections. The goal is that this will be more complete data and also we will have more time to review the data and deal with any issues in the data that look questionable that we need to investigate. It will still give news organizations time for their people to look at the data before the polls start closing.

In 2004 at least one network started posting the demographic cross-tabulations online for specific states. I believe these started appearing almost as soon as the polls closed, maybe shortly thereafter. Do you have any idea if they are planning to repeat that of if they will hold off on posting tabulations until most of the votes have been counted?

Again, that's an editorial decision by the news organizations, but they are well within their rights, as soon as the polls close within a state, to publish those results.

Let's switch gears a bit. There are a bunch of stories out now about the increase in demand for early voting or absentee ballots. In Maryland, where there are concerns about the voting equipment stemming from problems that had during the primary there, there are apparently so many requests that they are running out of ballots. What are you doing to cope with the fact that fewer and fewer voters are voting at polling places?

In the states with significant amounts of absentee or early voting, we are doing telephone surveys the week before the election of voters who have either already voted absentee or have their absentee ballot and plan to send it in or vote in person before Election Day.

Have you had time to adjust for the demand in Maryland? Is Maryland going to be one of those states?

No, Maryland will not. Most of these decisions were based on the share of the vote that was absentee in 2004 and Maryland in 2004 had a relatively small number. Yes, there will be big increases in several states. Maryland is one. Ohio may be another. We saw in 2004 really large increases in states like Florida and Iowa in terms of the people who voted early and absentee. So as absentee and early voting increases the need for us to do more of these telephone survey supplements to the exit poll is going to continue and we will need to budget for more of them.

As long time readers of my blog will certainly know you and Warren Mitofsky co-authored an analysis in January 2005 of the problems experience with the 2004 exit polls and the overall system. In particular there appeared to be a problem with interviewers either deviating from the random selection procedure or when they faced greater challenges in completing interviews. What will you be doing differently this year to try to fix those problems?

Well a lot of it has already been done. We sat down with all of the NEP members after that report came out and did a thorough review of all the recruiting and training procedures for the exit poll interviewers and we got a lot of input from all of the professionals that work at the news organizations that do their own surveys. They looked at the materials we were using. We had discussions and we came up with an improved training manual. We also prepared and filmed a training video that all interviewers are required to watch. We developed a new more rigorous training script and a quiz or evaluation at the end of that script to make sure that the interviewers understand the important facets of their job.

In addition to all that we have the input rehearsals that we have done every year, where we have two days in which we act like its Election Day. People call in with test results using the same phone numbers, the same questionnaires they will use on Election Day just to make sure they understand how the process works. So all of that has already gone into effect and I think our interviewers are much better trained this year than they were in 2004.

Another factor that came into play is that we found the error rates tended to be higher on average in precincts where there were younger interviewers, especially interviewers under the age of 25. This isn't to malign the abilities of interviewers under the age of 25, there just seems to be an interaction between older voters and younger interviewers that make older voters less likely to fill out surveys that are presented to them by younger interviewers, so we've also made a concerted effort to increase the average age of our interviewers.

I wrote just the other day about a question buried at the end of a recent Fox News poll that showed Democrats were significantly more likely to agree to participate in exit polls than Republicans. I'm wondering if you think that differential willingness to participate is worse now than in 2004? And how can you do an unbiased random sample at the precinct level under those conditions?

Well, typically in past non-presidential off-year elections there hasn't tended to be large exit poll biases. I have a feeling though, 2006 has, like 2000 and 2004, more passion than the typical off-year presidential election. So that does worry me to some extent. Again, the more we can train our interviewers to follow the proper sampling procedures the more we can eliminate a good bit of the bias that comes in from people in essence volunteering to take an exit poll when they weren't randomly selected to participate in an exit poll.

It still doesn't correct for the differential non-response that might exist even within the sampled voters. So you could properly select every fifth voter say as they are leaving a polling place but if 55 percent of the Republicans fill out a questionnaire but only 50 percent of Democrats fill it out you are still going to have a bias from non-response even if your sampling is absolutely perfect. What we are going to do, and what all the decision teams looking at this data are going to do, is know that the possibility for that type of bias exists and we'll be careful especially in states where we have seen that type of survey bias before projecting any winners.

The NEP exit polls are designed for a variety of different purposes, to provide an analytical tool to help people understand why people voted the way they did, to help assist with the projections, but they are not designed, at least as I understand it, to help detect fraud. So my question is, if you were in the business of designing an exit poll to do that in the United States, how would you design it differently than you do for NEP?

In answering this question I want to be careful not to malign the design of the exit polls as they now exit because they really serve the purpose of the news organizations and what they need. What they need is a lot of information about who voted, how they voted, why they voted, and to be able to present that information as quickly as possible on Election Night.

So there are some things that we do in designing these exit polls that we wouldn't do in designing an exit poll whose sole purpose would be to validate the results precinct by precinct, county by county, state by state. One is we interview on average about 100 voters per polling location. That's basically because there are several costs that come in. There are printing and shipping costs to get the questionnaires to polling places. Two, there is the interviewer cost -- you would probably need to hire more than one interviewer per location if you were doing more interviews. And three, we need to get all of this data into our system by the time the polls close in that state in time so that it can be reported on Election Night. So there would be the time cost -- it would be cost prohibitive to get all that data into the system on Election Day.

So if you were going to design a system solely for the purpose of validating election results, one, you would try to interview everyone, or at least approach everyone at the polling locations where you are trying to do that validation. Two, you would have a shorter questionnaire. You would not have the twelve to twenty five questions that are being asked in our questionnaires for analytical purposes. You wouldn't be asking a lot of the questions about the important issues in the race or be asking questions about religion, whether people are married, income, education, etc. The studies I've seen -- and there have not been enough of them and some of them are fairly old -- tend to conclude that you get the highest response rate and the lowest error from exit poll questionnaires that are about six to seven questions long. That would give you very little data to analyze the election. And that's why the NEP questionnaires are longer than that. But if you were trying to increase response rates and decrease bias and within-precinct-error, you would have shorter questionnaires as well.

The other thing you would do is to interview for all the hours in which the polls are open. Because of the time constraints, we tend to stop interviewing at most polling places about 30 to 60 minutes before the polls have closed, so we can get all the data for that day into our system. If you were going to do a survey that would validate the entire day's voting you would do interviewing from poll opening all the way to poll closing, including that last hour. So those are some of the things you would change in the design if you were designing exit polls solely to validate the voting results at the precinct level.

About That US House Scorecard

I want to take a close look at our House of Representatives summary scorecard, partly to address some of its shortcomings, but mostly to try to get an overall sense of what the available public polling is telling us about the likely outcome. This post may get a bit long and a little esoteric, but there are ultimately two big takeaway messages: The first is that we still see a remarkably large number of races -- 25 to 50 depending on which polls you trust -- where public polling data is inconclusive. The second is that if we assume that the pure "toss-up" races split about evenly between the parties, the Democrats stand to gain 30 to 35 seats on Tuesday.

Many of you have posted comments or sent email asking why we took the approach that we did for classifying House ratings. We did consider many different approaches. None were perfect. We ultimately chose the simplest - replicating the approach used for the statewide races - because the process of creating something more complicated required far more time and computer programming than we had available. In this post I want to try to look at what some alternative approaches would tell us about how who is ahead and who is behind.

First, let's review how we our scorecard classifications work. We take average of the last five random sample polls in each district and then classify each based on the statistical significance of the leader's margin. We classify races where a candidate leads by at least one standard error or better as "leaning," we classify leads of at least two standard errors as "strong." The rest we classify as toss-ups, meaning that the surveys provide no conclusive evidence about which candidate is ahead. If no polling is available, we assume no change in party and assign it a "strong" status for the incumbent party.

That last step is important for House races, because we can find no public poll data for 351 of the 435 districts. However, very few of those missing districts are considered even potentially competitive by the various respected handicappers. We currently itemize seven theoretically competitive seats as "no-poll" in the scoreboard (because the Cook Political Report listed these among the seats with the "potential" to become competitive), but Cook considers five of seven incumbents in these districts "likely" to be reelected (i.e. "not considered competitive at this point).

So far, so good. But one big problem, as many of you have pointed out, is that polls in House districts are far less numerous than those in statewide contests. As such, a lot of those "last 5 poll" averages include some pretty stale results. While we have logged in more than 250 new House polls since October 1, there are still only 32 districts with five polls or more to average. Applying the "last 5 polls" filter still leaves 37 polls from September - and 25 polls from the summer months - contributing to the averages that we use to classify districts.

In some cases, those stale results can give a very distorted impression of where the race stands today. Consider, Pennsylvania-07, the district currently represented by Republican Curt Weldon. We currently rates that district a toss-up, based on the average of five polls that includes two from September and one from March. Weldon trailed by an average seven points in the two polls conducted in October - enough to shift the district to "strong" Democrat status.

So I put all of our House data into a big spreadsheet and did some "what-if" analysis. The first question I asked was, what would happen if we had applied a filter so that only polls released since October 1 could be included in our averages. Here is the result:


As the table shows, the net impact on the scoreboard is not dramatic but improves the lot of the Democrats: The count of seats at least leaning Democratic grows from 221 to 222, while the count at least leaning Republican drops from 187 to 184. The number of toss-up seats grows from 27 to 29, and all but two of those toss-up seats (Georgia-12 and Indiana-07) are currently held by Republicans.

The net changes on the scoreboard obscure a bit more reshuffling at the district level. For those keeping track: Four districts (Florida-16, New Hampshire-02, Ohio-2 and Pennsylvian-07) move from toss-up to Democrat, but three (Indiana-07, Iowa-1 7 and New York-20) shift from leaning or better Democrat to toss-up. Three more seats (Arizona-05, Kentucky-02 and California-50) move from Republican to toss-up based on unfavorable trends since September.

The table also reminds us of the relatively small number of surveys available in many of these districts. The good news is that the average number of polls per district drops only slightly (from 3.4 to 2.9) when we count only the October polls. The bad news is that more than half of the competitive districts have been polled two or fewer times (40) or not at all (12) in October.

While we're at it, thare are few more good "what if" questions we can ask....

[11/5 (11:45): Picking up where we left off last yesterday...].

What about partisan polls? As I have noted previously, the House data includes quite a few internal campaign polls, roughly one of every four in our database, and polls from Democratic campaigns outnumber those from Republicans by more than four-to-one (85 to 20). Since October 1, one-in-five House polls have come from partisans, and again those polls have been released mostly by Democrats (42 to 10).

Do all these Democratic polls tilt our scoreboard in favor of the Democrats? Yes, but only slightly. If we focus on the averages filtered to include only polls released since October 1, removing the partisan polls leaves the number of Democratic seats unchanged at 222 and shifts a net two seats to the Republicans (from 184 to 186). The absence of favorable internal polls make three potential Democratic pickups seem less likely (Florida-13, Nebraska-03, and Ohio-01), but also leaves three other Republican incumbents looking more vulnerable (New York-19, North Carolina-08 and New York-20).


What about the Majority Watch automated polls? Their two waves of October surveys account for roughly a third of the House district polls released in the last month, and as the table shows, removing them from the averages does reduce the Democratic advantage on the scorecard. Keeping our October-only filter on, the Democratic seats drops from 222 to 215 the Majority Watch surveys also removed, while the number of Republican seats increases from 186 to 192.

What is driving the change? Removing the Majority Watch surveys changes our classification of 15 seats. In seven of these districts, Majority Watch conducted the only public polls released in October, and all seven were seats held by Republicans and classified as toss-ups or likely Democratic pick-up using their data. So without any polling data available, our model assumes "no change" and shifts all seven seats to the Republican column. In another eight seats, the absence of the Majority Watch surveys tips the balance in the averages just enough to shift our classification - 6 seats move toward the Republicans and 2 seats move to the Democrats.

Those changes beg an important question: How do the Majority Watch results differ from other pollsters in districts where we have other sources of data available? I count 40 districts in which public polls were released in October by both Majority Watch and other pollsters. So I went back to my big spreadsheet and averaged the averages for those 40 districts two ways: Once including only the Majority Watch surveys, once including only the results from other pollsters.


The results are a bit different. The Majority Watch surveys indicate a 3.3 point lead for the Democratic candidates in those districts (49.1% to 45.8%) compared a 1.0 point lead by other pollsters (44.6% to 43.6%). But notice that the percentage going to undecided or third party candidates is more than twice as large on the traditional telephone surveys (11.8%) as on the Majority Watch automated surveys (5.1%). So we have two potential explanations for the difference: One is that the automated surveys reach different kinds of voters (who tend to be more opinionated and less Democratic in their preferences). Another is that both types of surveys reach the same mix of voters, but that the absence of a live interviewer better simulates the "secret ballot" and entices more uncertain voters to express their true preference for the Democratic candidates. Which theory expalins the difference here? Take your pick.

Another question: What if we remove both the partisan and automated surveys? Unfortunately, at that point, this particular "model" essentially blows up because we have no polls to look at in 39 of the competitive districts. Since more than two thirds of the October "no-poll" districts (28 of 39) are currently held by Republicans, removing these polls shifts the scorecard in the Republican direction. Adding back the pre-October data nets us only five additional districts, but makes virtually no change in the scorecard numbers.


Still, even if we look only at the smaller number of districts with traditional live interviewer surveys conducted by independent pollsters, we still see Democrats leading by statistically meaningful margins in nine Republican districts. Moreover, these same surveys show Democrats with significant leads in 11 districts currently held by Democrats and indicate "toss-up" races in another 20 seats now held by Republicans.

[11/5 - 4:30 p.m. - Back again. And finally...]

One more thought about the last paragraph. Those 20 "toss-up" races exclude 9 districts with no traditional polls released during October that currently rated either "toss-up" or "lean Democrat" by the Cook Political Report.

But let me try to sum this up, following the same formula I used in discussing these results for the Slate Election Scorecard earlier in the week. The math is easier given one important finding: Not a single Democratic candidate in a district now held by a Democrat is currently trailing, regardless of the combination of polls examined. So the text and the table that follow focus on potential Democratic pickups.


  • Eight seats currently held by Republicans show a Democrat leading by a statistically meaningful margin regardless of what combination of polls we look at: Arizona-8, Colorado-7, Indiana-2, Indiana-8, North Carolina-11, New Mexico-1, Ohio-18, and Pennsylvania-10.
  • One seat deserves its own category: The one and only poll in the Texas-22 district formerly represented by Rep. Tom Delay shows Democrat Nick Lampson leading. However, a complicated ballot (Republican Shelley Sekula-Gibbs is a write-in candidate) makes this result tenuous.
  • Nine more Republican seats look to be in statistically meaningful jeopardy, but only when we count the automated Majority Watch surveys (either because those are the only surveys available or because they tip the balance making the Democrat's lead statistically meaningful): Florida-16, Iowa-1, New Hampshire-2, New York-24, New York-25, New York-26, New York-29, Ohio-15, Pennsylvania-6 and Pennsylvania-7.
  • Three more Democrats would show significant leads if we include the internal surveys released by partisan pollsters: Florida-13, Nebraska-3, and Ohio-1.

To sum up: If you trust the automated Majority Watch surveys and assume a pickup in Texas-22, then Democrats are leading in exactly the 18 seats the need to win a majority. If you trust all polls (including those released by partisans on both sides), then they currently lead in enough districts to pick up 21 seats. And they are not currently trailing in any.

But even more important: Polls have been conducted in October in another 29 seats where the averages indicate a statistical tossup. Only two of these seats are currently represented by Democrats. How well the Democrats ultimately do depends on how many of these still-too-close-to-call races they ultimately win. If they split evenly, then Democrats are looking at a gain of between 29 and 34 seats depending on which polls you trust.

But wait -- we need to remember one very important caveat. Even if we exclude the pre-October surveys, we are still looking at something of a time-lapse "snapshot" of voter preference. If voter Republicans have made late gains nationally over the last week (and at least two new national surveys out today suggest that they have), then these results may overstate the likely Democratic gains. As usual, we will need to wait to see the actual results to know for certain.

UPDATE: On that last note, be sure to see the post by Charles Franklin on late trends in the generical Congressional ballot.

Joe Lenski Interview: Part 1

Joe Lenski is the co-founder and executive vice-president of Edison Media Research. Under his supervision, and in partnership with Mitofsky International, the company of his later partner Warren Mitofsky, Edison Media Research currently conducts all exit polls and election projections for the six major news organizations -- ABC, CBS, CNN, Fox, NBC and the Associated Press. He spoke with Mark Blumenthal last week about plans for the exit polls and network projections this year.

It's really almost surreal for me -- and I think for all of us -- to think about an Election Day and the topic of exit polls without the presence of your mentor and former business partner, the late Warren Mitofsky. A few days after his passing in September, I wrote a post that recalled a phone call I made about 16 years ago when I was young and foolish and how astonished I was in retrospect that he took the call, and that he was patient and kind in answering what was really a very naive question. And you sent me an email and few days later and I wondered if you could share with our readers the thoughts you shared with me.

It's true, Warren did have this real enthusiasm for being around young people and teaching young people and listening to their questions and answering their questions. In sorting out his affairs after he passed, I looked at his calendar and he was involved with just about every University that's doing some sort of polling in the area. He was on the board of the Marist Poll, he was teaching a course on exit polling at Columbia University, he was helping Seton Hall establish their sports poll, he was scheduled to do a lecture at American University in DC, and so the 27 year old Mark Blumenthal that called him 16 years ago wasn't an oddity. Twenty-somethings all over the place -- he had been learning from them in the classroom or in New York AAPOR workshops, or making the same types of calls you made in getting answers from them over the phone. I heard a lot about that at his memorial service and I saw a lot of tributes similar to the one you wrote mentioning very similar stories.

Well, let's get to the business at hand. I'd like, in the limited time we have, for you to briefly give our readers some sort of sense of how this whole operation works. I think most political junkies understand that television networks conduct exit polls on Election Day and project winners at the end of the night. I don't think they have a sense for how complex this whole operation is. Could you give us a brief explanation of how it works?

Sure. First, there is a group called the National Election Pool [NEP], and just so everyone understands who that group is, that is the pool of the five television networks, ABC, NBC, CBS, CNN, FOX, and the Associated Press -- so it's the networks and the Associated Press who have formed this pool. We at Edison Research and Mitofsky International have a contract with those six members and we provide them with exit polling, sample precinct vote counts, and election projection information on Election Day and election night. The news organizations have the editorial control: they choose the races to cover, they choose the size of the samples, they choose the candidates to cover, they write the questions that are asked. We at Edison Research and Mitofsky International implement that -- we have a system in place where this year we'll have over a thousand exit poll interviews around the country at more than a thousand polling locations. We will have more than two thousand sample precinct vote count reporters at more than two thousand locations around the country. We'll be gathering that information during the day, distributing it to the six members and several dozen other news organizations that subscribe to our service and we will also be providing our analysis and projections of the winners of those races at poll closing and after poll closing as actual votes come in. The news networks and the Associated Press reserve the right to make their own projections based on our data and any other data they may collect, and they have their own decision teams in place to review any projections we send them. But basically the source of the data they will be using on elections are the exit polls and the sample precinct vote counts our interviews and reporters collect, and the county voter returns that are collected by the Associated Press and fed through our system into our computations and out to the members and subscribers.

What sort of system or algorithm will you be using to project which party wins control of the House of Representatives?

We at Edison-Mitofsky are not going to project House seats. The individual news organizations are going to make projections seat by seat. What we are going to do provide is an estimate of the national vote by party in the House races, but there are a bunch of complications in taking that and applying it at a seat-by-seat level. It's a lot like the Electoral College. We know popular vote doesn't necessarily translate into Electoral College votes. Similarly because of Gerrymandering, we know that popular vote for the House does not translate into House seats directly as well.

But in addition there are other complications. One is there are 55 house districts where one party or the other party has not nominated a candidate. And this year because of the added Democratic activism there are only 10 districts where Republicans are running unopposed but there are 45 districts where Democrats are running unopposed. So there are 45 districts where the Democrats are going to get 100 percent of the vote for House. And so those districts are going to account for 4, 5, 6 points of Democratic advantage, solely from undisputed races.

So I think all those factors could contribute to Democrats having a sizeable lead in the popular vote for the House and in the exit poll estimate of the popular vote for the House, but that might not necessarily translate into a Democratic majority in seats in the House or a Democratic majority in seats that is as large as the popular vote that they are going to receive.

So again, early exit poll estimates or even later exit poll estimates may show a significant Democratic lead in terms of the Democratic vote for the House that may not translate into House seats, but that doesn't mean the exit poll is wrong. It just means the exit poll is measuring something different. It's measuring the number of votes by party; it's not necessarily measuring the number of seats per party.

[Editor's Note: For a detailed discussion of the relationship between the national vote for Congress and seat gain or loss, see this post by Pollster.com's Charles Franklin].

So the consortium members will have that data available to them on Election Night and may use that as part of their decision matrix to essentially call the race for the House. Is that right?

Again, this is an editorial decision the news organizations themselves will make . To predict the number of seats for the House, you really have to look at those 40, 50, 60 competitive seats, district by district, and make estimates on each one.

One of the things -- one of the misperceptions I think of the exit poll projection system you have -- is that the mid-day estimates based on the exit polls would often leak, people would see them, and I think the misperception was that you'd see a candidate leading by two or three or four percentage points and people would assume that numbers meant that that candidate would win. What can you tell us about the margin of error if you will, for those exit poll estimates, if you look at them at the end of the day just before the polls close, how much of a margin would a candidate need to have before you consider it statistically meaningful enough to call the election?

Well, that varies based on the size of the precinct sample and the number of interviews that are taking place in each state and also the correlations with past vote, with the higher the correlations the lower the standard error calculated. One of the interesting things in your question is everywhere the data leaked in 2004, it was only the estimates that leaked, never did it leak with our computational status, which tells whether the race is "too close to call," or with what we call "leading status," or what we call "call status." All of those races then -- and there were four presidential states where Kerry had a point or two or three point lead in the exit poll that ended up going for Bush -- none of those ever were outside the "too close to call" status when we were distributing that to our members. So all the news organizations that had paid for the data and were looking at the data, knew those races were too close to call, even if it was 51-48 in the exit poll. Those were well within the standard errors that we have calculated before we have even a "leading" or "call" status in the race. Everything that was leaked on the web, none of that had the standard errors or none of them had the computation statuses that we assigned to each of those races based on the margin determined by the calculated standard errors.

And just briefly, what level of statistical confidence do you require before you give a state "call status," which is the recommendation to your NEP consortium members that you are ready to call a winner?

Again that varies depending on the circumstances. The rough rule of thumb is three standard errors, which would be 99.5% confidence.

Blumenthal's interview with Joe Lenski continues tomorrow with a discussion of the problems the exit poll experienced in 2004 and what will be done differently this year.

Weekend Media

Just a note that I did two interviews on Thursday that will air over the weekend.  One was for a story that should run on the Saturday broadcast of the CBS Evening News (unless the LSU vs. Tennessee game runs late).  Either way, it will be posted sometime tomorrow at cbsnews.com.  

I was also interviewed by Bob Garfield for the NPR program "On the Media."  The segment is a follow-up to an NPR interview of Karl Rove that I posted on last weekend.   I am told that the interview will air on local NPR stations over the weekend, although streaming and MP3 audio of the interview is now available for download on the On the Media web site.  Local air times for On The Media can be found here

House 06: National forces estimate


I estimated the net national forces in the Senate rate last night. Here is the same estimation procedure applied to the House. This is based on 86 House races with a total of 380 polls. This is much less dense than the Senate data, but the results are surprisingly stable. Unlike the Senate, however, the data do not extend back in time very far, so the starting point here is June 1, 2006. The size of the effects here are also NOT comparable to the Senate forces, since both are estimated independently and the zero point is arbitrary in both cases. Relative movement is meaningful, so it is fine to say that since June 1 the net national forces in the House (for these 86 races) have risen about 6 percentage points.

As with the Senate, this is a good explanation for why so many House seats held by Republicans are now competitive. UNLIKE the Senate, these effects appear to have continued to grow recently. Even a much rougher fit still produces the upward rise at the end. This also is consistent with the growth in the Democratic advantage on the generic ballot, though the details of the dynamics are somewhat different.

The evidence then is favorable to larger than anticipated Democratic gains in the House, but smaller gains in the Senate, at least as of November 3. Four days to go.

Note: This entry is cross-posted at Political Arithmetik.

House 06: Generic Ballot


The generic ballot measure of the House has surged up and not stopped rising since September 22. The surge began the week in which the National Intelligence Estimate (NIE) appeared, followed by Bob Woodward's book, State of Denial. A week later the Foley scandal broke, adding to the move that began a week earlier.

The week or so before the NIE was published there was a small trend in the Republican direction which was remarked on in political news, but this very modest movement was abruptly revered. I would not have thought the NIE or Woodward revelations would have had much effect on mass public opinion, but the timing here is pretty convincing that these did in fact play a role. I speculate that was due to undermining the growth in approval of the administration on terrorism which built in late August and early September, though convincing data of this link is missing. Certainly no such inference is needed with regard to the impact of the subsequent Foley scandal.

The generic ballot is, of course, only a rough indicator of election outcomes (see here, but also see the forecasting efforts of Bafumi, Erikson and Wlezien here and Alan Abramowitz here). I also think the current upturn is a political equivalent of "irrational exuberance" in the sense that the run up in the polls seems likely to seriously overstate the actual vote margin. The current 17 point Democratic margin would be enormous, and even applying the "Charlie Cook Correction" of subtracting 5 points would still imply a 56-44 Democratic triumph. It may happen, but the generic ballot has virtually always overstated the Democratic lead, and this overstatement seems to get worst as the polling margin increases.

For a bit of perspective, the figure below plots the generic ballot since 1994. The conclusion is clear-- the poll measure has not been anywhere near current levels in the past 12 years. The practical result of this remains to be seen, but if Democrats fail to capitalize on this opinion advantage there will be some interesting research to understand why the seat gains fail to respond to this advantage in vote intent (well, in generic vote intent, which isn't the same thing.)


Note: This entry is cross-posted at Political Arithmetik.

Gov 06: State of play


Here is a recap of what are (or at least once were!) the competitive Governor's races (AK and ID lack enough data for the analysis and are omitted.) The graph is ordered from the strongest Republican in the lower left corner to the strongest Democratic in the upper right.

Recent action that may affect election day is visible in NV where Republican U.S. Rep. Jim Gibbons is facing allegations of sexual assault. The race had looked strong for Gibbons but has now narrowed, with Gibbon's leading by under 5 points. In Maryland a small but consistent lead for Democratic challenger Martin O'Malley has all but vanished, leaving Gov. Robert Ehrlich a chance to hold on to the office. The reverse has happened in Minnesota where Republican Gov. Tim Pawlenty has lost the small lead he held over Democratic challenger Mike Hatch, with the race now a dead heat. In Iowa, Democratic fortunes have improved to a small lead, as have those of endangered Oregon Democratic Gov. Ted Kulongoski. In Wisconins, incumbent Dem. Gov. Jim Doyle has persistently held on to a small but relatively steady lead, making challenger U.S. Rep Mark Green's chances look longer than many (including me) expected.

No other races give any indication of shifts likely to threaten current leaders. The bottom line should be a considerable gain for Democrats. Our Pollster.com scoreboard shows 28 Dem, 20 Rep with 2 races too close for an assignment. This would give the Democrats a majority of Governorships for the first time since 1994, with potential advantages going into 2008 presidential contests.

Note: This entry is cross-posted at Political Arithmetik.

Updates - TN Moves to Lean Republican

Our most recent update changed some of our designations.  Good news for Republicans In Tennessee:  Rasmussen's latest shows Republican Corker leading Democrat Ford by eight points (53% to 45%48%) and moves Tennessee to the "lean" Republican status.  As the chart shows (though you may need to click it and choose the "since October 15" view to see the recent trend) the new result confirms similar findings earlier in the week from Reuters/Zogby and CNN/ORC

On the other hand, hopeful news for Democrats in Arizona, where a new Arizona Daily Star poll helps narrow our last five poll margin enough to move the state from strong to "lean" Republican. 

Finallly, a new survey in Idaho's 1st Congressional District  -- only the third released there to date -- shifts our designation of that District from lean Republican to toss-up. 

I will have more on our House classifications -- lots more -- late today.

Sen 06: Four Critical Races


There have been some important changes in the Senate polling over the past week. Tennessee now appears to have turned against Democratic Rep. Harold Ford, while Virginia has moved away from Republican Sen. George Allen to a clear tossup. From now on, when people use the term "Tossup" they should show the plot of the Missouri race which lacks trends, bumps, wiggles or hints of what is to come. But the big news of today is the move that has been made in Montana where Democrats were ready to claim (amd many Republicans to concede) Sen. Conrad Burns' seat. President Bush visited and apparently money is now being devoted to this new "firewall" seat. A Burns win would require a Dem sweep of VA, TN and MO to manage a Senate majority. That is obviously a much higher burden than the "2 of 3" wins in these states required with MT in the Dem bag. It is worth noting the Burns is still behind in the trend estimate for MT, but clearly the level of competition has risen, and the odds of a Democratic Senate have shrunk. To make matters worse, in Maryland Democrat Ben Cardin still leads Republican Lt. Gov. Michael Steele but that lead has been shrinking steadily and while the normally Democratic state would be expected to go Democratic, the trend here and in the Governors race (see here) suggest that the Maryland race cannot be assumed to be over. The good news for Democrats (other than in VA) is that New Jersey Sen. Robert Menendez appears to have recovered his lead from Republican Thomas Kean Jr.

So as it now stands, the Dems need 3 of the 4 seats in MT, VA, MO and TN, while holding MD. That may be a tall order, and it makes it likely we won't know control of the Senate until the MT vote is in in the wee hours of Mountain Standard Time. Stay up! It will be fun.

Note: This entry is cross-posted at Political Arithmetik.


Here is an item published by Roll Call on Wednesday that we almost missed about two Zogby polls in New York's 25th District that two media outlets refused to run

The Post-Standard newspaper in Syracuse and WSYR-TV had asked Zogby to conduct a second poll of the race after the pollster acknowledged that his firm had improperly weighted the results of a survey last week. In that case, Zogby polled the 25th district but then weighted the data using voter registration information from the more-Republican 24th district.

Zogby promised the two media outlets that he would do a new poll from scratch, but when the results of that survey came in both declined to run them. Jim Tortora, the news director of WSYR-TV, wrote on the station's Web site that after consulting with outside polling experts, he was concerned that Zogby had conducted the second poll using the same larger sample of 5,000 likely voters as he had on the first survey.

"With respect to Mr. Zogby, we felt the questions raised ... left us with only one choice: We had to pull the poll," Tortora wrote.

Used the same sample?  Here is the explanation from WSYR's Tortora about their analysis of the second poll:

This time, the Post Standard arranged an independent expert from the University of Connecticut's Department of Public Policy to review the findings of the second Zogby poll. Late Tuesday, we discovered that some of the same people who were called for the first poll, were called again. Zogby confirmed they did indeed use the same larger sample of 5000 likely voters, to come up with this "new" poll sample of 502 likely voters. Our independent expert felt this raised a red flag...an unknown variable. The concern? How would you react if you were called twice in about a week, to answer the same questions? Would you answer differently? The same? Would you even take the call? 

And as the Syracuse Post-Standard reported, "27 people responded to both polls."  If that were not enough, even after all of this came to light, Tortora repots:

Mr. Zogby firmly stands by his findings. He insists his methodology is sound, and was prepared to join us live at 5:30pm to explain his findings and back-up his results.  He points out pollsters often disagree about each other's methods. 

In its story on the controversy, the Syracuse Post-Standard spoke to a number of other "national polling consultants," and none supported Zogby's sample recycling.

"I think it's sort of a rookie mistake if you're including people a second time from a database," said Cliff Zukin, past president of the American Association for Public Opinion Research, an industry group based in Lenexa, Kan.

A bad practice

Zukin, a professor of public policy and political science at Rutgers University, spoke before he was told who conducted the poll.

He said it's considered a bad practice to call the same people twice for "random" polls.

"The problem is the first interview activates them," Zukin said. "They follow the news differently. So the people become different from a random citizen.

"If you didn't purge those people from the database," he said, "then that is a significant methodological problem. It gives you a problem to make any inference from these data."

Yes, we often disagree, but there are limits to what can be waived off as a mere difference of opinion.  If Mr. Zogby or any other pollsters want to explain and defend the practice of reusing sample, our Guest Pollster's Corner is wide open.

Sen 06: National Forces Estimate


This is certainly a good year for Democrats, but how good? And what are the national forces at work? I can estimate a summary of national forces to answer these questions.

I estimate a model that pools ALL Senate race polls, then iteratively fits a local regression (my usual trend estimator here) while simultaneously extracting a race-specific effect. This procedure has the effect of removing the difference between PA (with a strong Dem lead) and AZ (with a substantial Republican advantage) and likewise for all the states, effectively centering them at zero. The trend estimate that results then will move up if across most states the trend has been up, while if pro-Dem and pro-Rep movements equal one another, the national trend will be zero. There is no fixed metric for this national force, so it is convenient to pick a zero point for identification, in this case January 1, 2006.

The estimator finds that the Democratic margin has grown by 5 points across all races due to this national force. Where Republicans have enjoyed increased support, they have had to do it in the face of this opposing wind, while Democrats who would have been trailing by 5 points if January conditions still prevailed, will now have a "wind assisted" tossup race.

The dynamics of this national force have been generally increasing all year but with significant partial reversals at times. From a June high of about 4 points, this force shank to 2 points in August, then surged to 5 points by September 1. A brief improvement for Republicans took place in early September. At the time Republicans claimed to see new movement in their favor, and these data lend some support for that claim. However, that trend was sharply reversed after September 24 with the first publication and subsequent release of the National Intelligence Estimate followed by Bob Woodward's book, State of Denial. This was followed a week later by the Foley scandal, and once more Democratic advantage increased to about 6 percentage points. In the last two weeks of October there was a brief move in a Republican direction, then back to favor the Democrats. As of November 2, however, the national forces have again moved in a Republican direction, this time somewhat more strongly. While it is tempting to explain this as a result of Sen. Kerry's verbal difficulties, the downturn started before the joke-gone-wrong, so perhaps the Senator does not deserve all the credit for the 1.5 point decline since mid-October. As it stands, the estimate is only a little under 5 points. However, as a national force, common to all races, this decline of even 1.5 points is enough to be crucial for either party in Virginia and Missouri. If it moves more, it could also affect the Tennessee or Montana races as well (and conceivably Maryland.)

For my money, these are sensible estimates of the magnitude of national forces at work in this election. A gain of 5 points in the margin turns a 50-45 race into a 47.5-47.5 tie. Estimates much bigger than this would seem too large to be plausible as they would suggest too many races become competitive or Democratic leads.

The method I use here does not lend itself to the usual confidence interval estimates. But some sense of the variability of the estimator can be seen below. The estimation errors, indicated by the gray dots, are estimates of where the trend would be IF the series had stopped on the day represented by the dot. This method is sensitive to last observations and while the fit is quite stable when there is abundant data on both sides of a point of interest, it is often a poor predictor of what will come next. The deviations of the gray dots around the line show when the trend would have gone up more, or down more, than the blue trend estimator finally settled on. The errors are worse near points of change in direction, which makes sense. While the variability is not trivial, and indicates considerable uncertainty near changes of trend, the area covered by the gray dots is still relatively small compared to the size of the effect being estimated. The practical implication is that we have to be cautious in suggesting that the current trend will continue, because a change in direction is not well predicted by the model. That said, we can be reasonably confident that the trend estimator would not be radically different if we add more observations. (Which we won't do, after November 7.)


Note: This entry is cross-posted at Political Arithmetik.

MT Senate: Moves to Toss-up

Two pieces of housekeeping: First, our latest update of the charts and scoreboards moves the Montana Senate race to the toss-up category. The new Reuters/Zogby poll showing Democrat Tester up by just a single percentage point confirms a recent Rasmussen poll showing Tester ahead by just three points. These two new polls help pull Tester's lead over Republican Senator Conrad Burns to just 3.2% on the last five public polls, just enough to move Montana to the toss-up category.

Second, those who watch this site closely have noticed the lag between updates of our "most recent polls" box and the charts. Until today, our charts and tables have updated infrequently, sometimes only once a day. The reason is largely technical (and not worth attempting to explain), but from now until the Election Day we are committed to far more frequent updates - hopefully at least three updates a day. Also, by popular demand, we will try to post updates like this one on the blog when our categorization of a race changes.

Yes, We Have House Charts!

A few quick updates on the poll data we track on races for the House of Representatives:

First, as of last night, we now have charts available for all 84 House districts for which we currently have polling data. Clicking on a link for any House district on our House map and national summary table now takes you directly to the chart for that race, just like the links on our Senate and Governor maps. This latest update means that any our 84 House district charts,** like the one from can be embedded on your blog or website using the new "embed Chart feature" (see yesterday's post for details).

Second, an apology for the slightly slower pace of blog posts over the last 48 hours or so as we worked to get these new upgrades and features up and running. I have also been spent a lot of time the last few days crunching my "big-spreadsheet-o'House" and will have a more in-depth review of the available polling data later today. For those who cannot wait, you can find the abridged version in our Slate House Election Scorecard updates on Tuesday and Wednesday.

Finally, a quick update on a bit of anecdotal evidence I discussed last Saturday. There is one source of polling largely out of public view - the internal polls conducted by the campaigns and party committees. Some of these get released, but typically only when they show good news for that particular campaign. So one indirect measure of where things stand is which side is releasing more of its internal polling, and by that measure the Democrats are a lot more confident: Since Labor Day, Democrats have released 54 internal polls for House candidates logged into our Pollster.com database, Republicans have released only 13. And that confidence has not abated in the last two weeks. Since October 15, Democrats have released 21 internal polls, Republicans only 2.

**Unfortunately, many of the House races have only a handful of polls. As of this morning, roughly half of the districts in our database have three or fewer polls, and that will make for a very sparse looking chart. Keep in mind that the trend line represents the average of the last 5 (or fewer) polls at any given point in time. So for the first few polls in the series, the lines may draw in ways that seem a little confusing.

Jacob Eisenstein: Using Kalman Filtering to Project the Senate

Today's Guest Pollster Corner contribution comes from Jacob Eisenstein. While not technically a pollster -- Eisenstein is a PhD candidate in computer science at MIT -- he recently posted an intriguing U.S Senate projection (and some familiar looking charts) based on a statistical technique applied called "Kalman filtering" that he applied to the Senate polls. He explains the technique and its benefits in the post below.

Polls are inexact measurements, and they become irrelevant quickly as events overtake them. But the good news about polls is that we're always getting new ones. Because polls are inexact, we can't just throw out all our old polling data and accept the latest poll results. Instead, we check to see how well our poll coheres with what we already believe; if a poll result is too surprising, we take it with a grain of salt, and reserve judgment until more data is available.

This can be difficult for the casual political observer. Fortunately, there are statistical techniques that allow this type of "intuitive" analysis to be quantified. One specific technique, the Kalman Filter, gives the best possible estimate of the true state of an election, based on all prior polling data. It does this by weighing recent polls more heavily than old ones, and by subtracting out polling biases. In addition, the Kalman Filter gives a more realistic margin-of-error that reflects not only the sample sizes of the polls, but also how recent those polls are, and how many different polling results are available.

The Kalman Filter assumes that there are two sources of randomness in polling: the true level of support for a candidate, which changes on a day-to-day basis by some unknown amount; and the error in polling, which is also unknown. If the true level of support for a candidate never changed, we could just average together all available polls. If the polls never had errors, we could simply take the most recent poll and throw out the rest. But in real life, both sources of randomness must be accounted for. The Kalman Filter provides a way to do this.

Pollsters are happy to tell you about margin-of-error, which is a measure of the variance of a poll; this reflects the fact that you can't poll everybody, so your sample might be too small. What pollsters don't like to talk about is the other source of error: bias. Bias occurs when a polling sample is not representative of the population as a whole. For example, maybe Republicans just aren't home when the pollsters like to call -- then that poll contains bias error that will favor the Democratic candidates.

We can detect bias when a poll is different from other polls in a consistent way. After repeated runs of the hypothetical biased poll that I just described, careful observers will notice that it rates Democratic candidates more highly than other polls do, and they'll take this into account when considering new results from this poll. My model considers bias as a third source of randomness; it models the bias of each pollster, and subtracts it out when considering their poll results.

The Kalman Filter can be mathematically proven to be the optimal way to combine noisy data, but only under a set of assumptions that are rarely true (these assumptions are listed at my own site). However, the Kalman Filter is used in many engineering applications in the physical world -- for example, the inertial guidance of rockets -- and is generally robust to violations of these assumptions. In the specific case of politics, I think the biggest weakness of this method is the elections are fundamentally different from polls, and my model does not account for the difference between who gets polled and who actually shows up to vote. I think this can be accounted for, but only by looking at the results of past elections.

Using the Generic Ballot to Forecast the 2006 House and Senate Elections

[Today's Guest Pollster's entry comes from Alan I. Abramowitz, the Alben W. Barkley Professor of Political Science at Emory University in Atlanta, Georgia. He is also a frequent contributer to the blog Donkey Rising.]

In order to predict the outcome of the 2006 House elections, I create a model incorporating both national political conditions and candidate behavior. Pre-election Gallup Poll data on the generic ballot and presidential approval are used to measure national political conditions and data on open seats and challenger quality are used to measure the behavior of congressional candidates. The model is tested with data on U.S. House elections between 1946 and 2004. A simpler model based only on national political conditions is tested with data on U.S. Senate elections from the same period. Based on the estimates for the models, I forecast the 2006 House and Senate election results.

The dependent variable in the House forecasting model is the change in the percentage of Republican seats in the House of Representatives. The model includes six independent variables. The percentage of Republican seats in the previous Congress is included to measure the level of exposure of Republicans compared with Democrats in each election-the larger the percentage of Republican seats in the previous Congress, the greater the potential for Republican losses. A variable for Republican vs. Democratic midterm elections is included to capture the effect of anti-presidential-party voting in midterm elections. Net presidential approval (approval - disapproval) in early September is included to measure public satisfaction with the performance of the incumbent president, and the difference between the Republican and Democratic percentage of the generic ballot in early September is included to measure the overall national political climate. The actions of congressional candidates are measured by two variables: the difference between the percentages of Republican and Democratic open seats and the difference between the percentages of Republican and Democratic quality challengers, defined in terms of elected office-holding experience.

The model does a very good job of explaining the outcomes of past House elections-all of the independent variables except the percentage of Republican seats in the previous Congress have statistically significant effects and the model explains 87% of the variation in House seat swings since World War II. Even after controlling for presidential approval and the actions of strategic politicians, the generic ballot variable has a substantial impact on the outcomes of House elections: a 10-point advantage in the generic ballot produces a swing of approximately nine seats in the House with all other independent variables held constant.


House Forecast
We can use the results in Table 1 to predict the outcome of the 2006 House elections. Based on a net approval rating for President Bush of -17, a Democratic advantage of 12 points in the generic ballot, and a Democratic advantage of 2% in open seats, and a Democratic advantage of 3% in challenger quality, the model predicts a Democratic gain of 29 seats in the House of Representatives.

Senate Seat Change Model
The dependent variable in the Senate model is the change in the number of Republican Senate seats. The independent variables are the number of Republican seats at stake in the election (a measure of exposure), a variable for Republican vs. Democratic midterm elections, net presidential approval in early September, and the difference between the Republican and Democratic percentage of the generic ballot in early September. Variables measuring candidate behavior are not included in the Senate model because data on challenger quality is not available for Senate elections and relative numbers of Republican and Democratic open seats had no impact on the outcomes of Senate elections when it was added to the model.


The results in Table 3 show that the Senate forecasting model is not as accurate as the House forecasting model, explaining only 65% of the variance in the outcomes of Senate elections since World War II. This is not surprising since the model does not include any variables measuring candidate behavior. Moreover, Senate seat swings are probably influenced more by chance because there are far fewer contests in each election and a larger percentage of these contests are competitive.

Despite the limitations of the Senate model, however, the results indicate that three of the five independent variables have significant effects. In the Senate model, in contrast to the House model, seat exposure is the single strongest predictor of outcomes. This is consistent with the results of previous models of Senate election outcomes such as Abramowitz and Segal (1986). According to the results in Table 2, for every additional seat that the Republican Party has to defend in a Senate election, it loses an additional 0.8 seats.

While the effects of the presidential approval variable are not quite significant at the .05 level, the generic ballot variable does have a statistically significant, and substantively important, impact on the outcomes of Senate elections despite the fact that the question asks about voting in House elections. The results in Table 2 indicate that an advantage of 10 points in the generic ballot produces a swing of about two seats in the Senate with all other independent variables held constant.

Senate Forecast
We can use the results in Table 2 to predict outcome of the 2006 Senate elections. Democrats need a gain of six seats to take control of the Senate. Based on a net approval rating for President Bush of -17 and a Democratic advantage of 12 points in the generic ballot, the model predicts a Democratic gain of 2.5 seats in the 2006 Senate elections. The main reason why the predicted Democratic gain is relatively small is that only 15 Republican seats are being contested this year.

Both national conditions and the behavior of candidates influence the outcomes of U.S. House elections. President Bush's low approval ratings and especially the large advantage that Democrats currently enjoy in the generic ballot suggest that Democrats are very likely to regain control of the House of Representatives in November. Democratic gains are also likely in the Senate but it will be difficult for Democrats to pick up the six seats that they need to take control of the upper chamber because only 15 of the 33 seats up for election in 2006 are currently held by Republicans.

Majority Watch Mashup

Picking up on the post from earlier tonight, the new Majority Watch surveys released today provide another strong indicator of recent trends, in this case regarding the race for the U.S. House.  The partnership of RT Strategies and Constituent Dynamics released 41 new automated surveys conducted in the most competitive House districts. 

Since they conducted identical surveys roughly two weeks ago in 27 30 of the 41 districts, we have an opportunity for an apples-to-apples comparison involving roughly 27,000 30,000 interviews in each wave.  The table below shows the results from both waves from each of those 27 30 districts.  The bottom line average indicates that overall, the Democratic margin in these districts increased slightly, from +1.9 to +2.7 percentage during October. 


Whatever one may think of their automated methodology, the Majority Watch surveys used the same methodology and sampling procedures for both waves.  And as with the similar "mashup" of polls in the most competitive Senate races in the previous post, these also show no signs of an abating wave.

Interests disclosed: Constituent Dynamics provided Pollster.com with technical assistance in the creation of our national maps and summary tables

Slate 13 Update

Charlie Cook writes tonight:  "With the election just eight days away, there are no signs that this wave is abating."   Some supporting evidence:  The overall average Democratic margin in the Slate 13 -- the 13 most competitive Senate races we have been tracking on the Slate Election Scorecard -- has increased for the sixth straight week (from +3.7 to +4.1 percentage points over the last week). 


Again, the value in looking at this overall "mash-up" is that it combines a very large number of surveys, including at least 35 new statwide surveys in the 13 states released in the last week.  In any one state, the averge might be a little lower or a little higher due to the "house effects" or other variation in recent surveys.  By rolling up the results of many surveys, we should minmize the noise.  And that approach shows now end to slow Democratic trend in Senate races since mid-September.

PS:  The Slate Election Scorecard update for tonight focuses on the Senate race in New Jersey, where two new polls moved that State back to "lean" Democrat status.

McDonald: 5 Myths About Turning Out the Vote

Professor Michael P. McDonald, a nationally renowned authority on voter turnout (and an occasional commenter on Pollster.com), had a timely op-ed piece published in today's Washington Post reviewing the academic evidence that debunks "5 Myths About Turning Out the Vote." It's well worth reading in full.

McDonald covered a topic on a lot of minds lately (mine included), the Republicans' vaunted "72-Hour Campaign:"

Republicans supposedly have a super-sophisticated last-minute get-out-the-vote effort that identifies voters who'll be pivotal in electing their candidates. Studies of a campaign's personal contact with voters through phone calls, door-to-door solicitation and the like find that it does have some positive effect on turnout. But people vote for many reasons other than meeting a campaign worker, such as the issues, the closeness of the election and the candidates' likeability. Further, these studies focus on get-out-the-vote drives in low-turnout elections, when contacts from other campaigns and outside groups are minimal. We don't know what the effects of mobilization drives are in highly competitive races in which people are bombarded by media stories, television ads and direct mail.

Also, in 2002 and 2004, the 72-Hour-Campaign also benefited from a political environment and national mood largely favorable to Republicans. Not so this time. We will soon see hether they can work the same magic in a climate like 2006. 

Again, McDonald's piece is good summary of academic findings all political junkies should know. Go read it all.

Mellman: Another Measure of Stability

[Democratic Pollster Mark Mellman posted a comment here on Friday in response to the final installment of my three part series on the national data on the race to control Congress. It was structured around a metaphor Mellman has used to characterize the Democrats chances on November 7:

There's a big anti-Republican wave out there. But that wave will crash up against a very stable political structure, so we won't be sure of the exact scope of Democratic gains until election night. We really don't yet know which is ultimately more important -- the size of the wave or the stability of the structure.

Since not all readers browse the comments, I am promoting Mellman's remarks as a contribution to our Guest Pollster Corner section].

When I talk about stability I have a couple of other factors in mind in addition to incumbency advantages. As I noted in my original Hill article last March....

One measure of political instability: the number of Republicans holding seats that vote Democratic for president and vice versa. When big political waves hit, that is precisely where much of the action is. In the two prior presidential elections, Bush (the father) or Reagan had won 30 of the 34 seats Democratic incumbents lost in 1994. Similarly, two-thirds of the Republican incumbents who lost in 1882 were running in districts presidential Democrats had won just previously.

Today, though, there are fewer mismatched seats than at any point in recent history. Going into 1994, 53 Democrats held seats won by Bush in 1992. Today just 18 Republicans hold seats won by Kerry. So, while forces in the political environment push strongly in a Democratic direction, they are acting on a relatively stable structure: Hence the test.

Karl Rove's Math

Alert reader GS and AAPOR colleague CP alerted me to an intriguing (and somewhat contentious) NPR interview of chief Republican strategist Karl Rove conducted last Tuesday by correspondent Robert Siegel. Whatever one might think about Rove's spin, his comments remind us that for all the data we have gathered here on Pollster.com, the party strategists have their own flow of data that remains hidden from public view.

According to the transcript, the interview kicks off with Rove, "responding to a question about public polls and analysis predicting a Republican loss in November:"

KARL ROVE: I see several things; first, unlike the general public, I'm allowed to see the polls on the individual races and after all this does come down to individual contests between individual candidates. Second of all, I see the individual spending reports and contribution reports. For example at the end of August in 30 of the most competitive races in the country, the house races, the Republicans had 33 million cash on hand and Democrats had just over 14 million.

Siegel asked next about television advertising and their content. Then he came back to the topic of polls.

SIEGEL: We are in the home stretch though and many would consider you on the optimistic end of realism about...

ROVE: Not that you would be exhibiting a bias or anything like that, you're just making a comment, right?

SIEGEL: I'm looking at all the same polls that you are looking at.

ROVE: No, you are not, no you're not, no you're not, you're not. I'm looking at 68 polls a week [for candidates for the US House and US Senate, and Governor.]** You may be looking at 4 or 5 public polls a week that talk about attitudes nationally but that do not impact the outcome of individual races.

SIEGEL: If you could name races between, certainly Senate races, all...

ROVE: Like the poll today that showing Corker's ahead in Tennessee or the poll showing Allen is pulling away in the Virginia Senate race.

SIEGEL: Leading Webb, in Virginia, yea...

ROVE: Yeah, exactly.

SIEGEL: ...you've seen the DeWine race and the Santorum race and, I don't want to...you call [the] races.

ROVE: I'm looking at all of these Robert and adding them up. I add up to a Republican Senate and Republican House. You may end up with a different math but you are entitled to your math and I'm entitled to THE math.

SIEGEL: I don't know if we're entitled to a different math but your...

ROVE: I said THE math.

Now whatever one thinks of Rove's spin -- and I'm certainly dubious, at least with respect to the House -- he is probably not exaggerating the number of polls he sees a week in statewide and congressional races. The Republican campaign committees are likely conducting weekly tracking polls in at least a dozen competitive Senate races and 30 or more House contests. They have also probably fielded survey less frequently over the last month in another 40 to 50 less competitive House races to check their status. On top of that, many individual campaigns are sharing their own internal tracking polls privately with Rove and their national party.

The Democratic campaigns and the Democratic campaign committees have a similar research programs underway (and interests disclosed: my partners at Bennett, Petts & Blumenthal conduct some of the internal tracking polls for the DCCC and DSCC).

If you wanted to build the a true "dream" polling scorecard for the House, you would combine Rove's spreadsheet with the counterpart maintained by Rahm Emmanuel at the DCCC. The numbers in that combined scorecard spreadsheet would represent the collective efforts of the most pollsters with by far the most experience measuring preferences at the Congressional District level.

We cannot see that data, unfortunately, but we might be able to judge Rove's spin by the number of partisan polls that have been publicly released by the campaigns and party committees. Of the polls in our House database, 43 of the partisan polls released since Labor Day came from Democrats, only 11 from Republicans.

I am not giving away any trade secrets in pointing out that campaigns and party committees release internal polls only when they show good news for their candidates. Bad news rarely sees the light of day. If Rove's internal polls really add up to a "Republican House," it is hard to imagine we would not see more Republican polls showing it.

**I revised the "rush transcript" posted on NPR.org (also characterized as "transcribed excerpts") to include the discussion between Siegel and Rove on the races in Virginia, Tennessee, Ohio and Pennsylvania. The transcript omits that exchange and instead substitutes the phrase in brackets.

Correction: The original version of this post incorrectly reported the number of partisan polls released since Labor Day in our database as 47 from Democrats and 12 from Republicans. Apologies for the error.

Bafumi, Erikson & Wlezien: Forecasting House Seats from Generic Congressional Polls

(Editor's note: Today's Guest Pollster's Corner contribution comes from Professors Joseph Bafumi of Dartmouth College, Robert S. Erikson of Columbia University and Christopher Wlezien of Temple University. The post is based on a larger paper available for download here).

Although the Democrats hold a large advantage in generic ballot polls, there has been considerable uncertainty regarding whether the Democrats would win the most House Seats. Doubts are often expressed about the accuracy of the generic ballot polls. How district lines are drawn raises further doubts about whether the Democrats could win a sufficient majority of the vote to win the most seats. We estimate how the generic ballot "vote" translates into the actual national vote for Congress and ultimately into the partisan division of seats in the House of Representatives. Based on current generic ballot polls, we forecast an expected Democratic gain of 32 seats with Democratic control (a gain of 15 seats or more) a near certainty.

To begin with, we estimate a regression equation predicting the House vote in the 15 most recent midterm elections, 1946-2002, from the average generic poll result during the last 30 days of each campaign. The generic polls turn out to be very good predictors, as we have shown. Based on the current average of the generic polls (57.7% Democratic, 42.3% Republican) the forecast from this equation is a 55% to 45% Democratic advantage in the popular vote (1).

But would this mean that the Democrats also win the most seats? The Democrats winning 55% of the vote would represent a 6.4 percentage point swing from 2004, when they received 48.6%. If Democrats were to win exactly 6.4% more of the 2006 vote in every district than they won in 2004, they would win 228 seats. However, an average swing of 6.4% percentage points will be spread unevenly-sometimes more than 6.4% and sometimes less. Moreover, the prediction that the average vote swing will be 6.4% is itself subject to error.

We take these considerations into account by a set of simulations described in our larger paper. The simulations suggest that a predicted national vote surge of 6.4 percentage points would yield the Democrats 235 seats, for a 32-seat gain. This is 7 seats more than we would get with uniform swing.

A Democratic pickup of 32 seats might seem high to some readers. For a reality check, we compared our district level predictions from our simulations with the results of available district polls. The two sets of numbers match nicely. Our simulations might even underestimate Democratic strength in the sampled districts.


Of course if the generic ballot numbers shift as the election nears, the forecast should be revised according to the weight of new polling information. Figure 1 shows how the forecasts can shift with possible changes in the generic vote. If current trends in the Congressional generic ballot polling persist, the Democrats are near certain to win control of the House (2). But if the lead dips into the single digits, the Republicans can rekindle their hopes of holding on.

(1) As of October 24, PollingReport.com listed the results of 6 likely-voter generic ballot polls conducted during the final 30 days of the campaign, by CNN (2), ABC/Washington Post, Fox/Opinion Dynamics, Gallup/USA Today and Newsweek. The results for ABC/Washington Post listed on PollingReport.com actually are for registered voters, and we obtained the likely voter results from the news release posted on realclearpolitics.com. (Back to text).

(2) Readers conditioned to the idea that their districting advantage would allow the Republicans to govern with a minority of votes cast might be surprised that the threshold in terms of the national vote at which control is likely to revert to the Democrats is only 51%. The explanation is the partisan asymmetry in 2006 retirements. Among retirees who had faced major-party competition in 2004, 19 were Republicans and only 6 were Democrats. Strategic Republican retirements in anticipation of a Democratic wave would cause an electoral ripple even if the larger wave does not arrive. Our calculations are that if there is no vote swing whatsoever from 2004 to 2006, the Democrats would pick up 5 or more seats just from the greater number of Republican than Democratic retirements. (Back to text).