February 10, 2008 - February 16, 2008


Pre-President's Day "Outliers"

Topics: 2008 , Frank Newport , John McCain , Kathy Frankovic , Mark Mellman

The Hartford Courant's Joann Klimkiewicz examines the problems of polling in 2008.

Kathy Frankovic shares her skepticism over polls to tells us which is most electable in 2008.

Frank Newport finds that John McCains "displeases" many conservative Republicans.

Gary Langer says race has been the "single most powerful demographic in vote choices" in the Democratic primaries so far.

David Hill sees evidence that "immigration is a dud as an electoral issue."

Mark Mellman considers the complexities of the "politics of identity" on the Democratic primaries of 2008.

Tom Webster crunches the exit poll numbers on Republicans in Virginia and Maryland that listen to talk radio.

Josh Goodman compiles the exit poll results on abortion and immigration.

Carl Bialik calculates the odds of a tie in Syracuse.

Karl Rove does poll analysis on a white board.

TPM Catches a Milestone

Topics: 2008 , Barack Obama , Gallup , John McCain , Rasmussen

Talking Points Memo has a headline this morning pointing out something we were too busy to blog yesterday: For the first time, Barack Obama's number on the our national trend estimate (47.1%) is now greater than Hillary Clinton's (46.0%).


We are doing a bit of an overhaul on our database to facilitate inclusion of the daily tracking from both Gallup and Rasmussen Reports. The chart does not yet include data from Rasmussen, largely out of a concern that their more frequent updates would dominate the trend average. We are hoping to revise the chart to include the Rasmussen data soon.

The Problems of Primary Polling

Topics: 2008 , Jay Cost

I have written a lot this year about the surprising level of indecision about the presidential race among many Democrats on the eve of the primary elections, and the problems that uncertainty creates for polling. Jay Cost weighs in this week with an essay that approaches the same issue from a slightly different perspective. His take is especially relevant now that the primaries have moved beyond hotly contested states like Iowa and New Hampshire to states where information levels about the candidates are significantly lower.

Cost reminds us that "average voters do not pay much attention to politics" and that in general elections, partisan identification serves as the "cognitive heuristic" or "mental shortcut" that facilitates decision making despite low information. He points out that party identification is an "incredibly precise predictor of vote choice" and that it makes for stability in poll measurements:

Accordingly, we will see the polls vary only a little bit throughout the campaign. Oftentimes, they will break in late October or even early November. However, the magnitude of the break will be relatively modest.

However, as Cost points out, primary elections are different:

In a primary campaign, voters must choose among candidates who are all of the same party. Partisanship therefore does not enter into their decisions. It is a non-factor. I think this might be inducing the wild swings in the polls. The polls are varying because the voters are; the voters are varying because their partisanship is not stabilizing their preferences. [...]

It thus should be unsurprising that candidate personalities are so influential in voters' decision-making processes. How else do you make determinations when party distinctions are non-existent? Candidates often try to create clear contrasts, but these usually amount to making mountains out of molehills. The average voter is not really paying much attention, anyway. Thus, they have to go by their personal evaluations of the candidates.

And when those personal evaluations are mostly positive, some voters are having a hard time making up their minds. So their choices, as described to pollsters, may be tenuous. Cost's essay is good; go read it all.

DC-AAPOR event on Wednesday

Topics: AAPOR , Jon Cohen , Pollster.com

An announcement for those in the DC area. I will be participating in a discussion on "Politics and Polling" next Wednesday afternoon hosted by the DC Chapter of the American Association for Public Opinion Research (DC-AAPOR). Hope you can join us.

Politics and Polling
Wednesday, February 20th, 2008, 3:30pm - 5:00pm
The Pew Research Center, 1615 L Street, NW, Suite 700, Washington, DC 20036

Please click here to RSVP no later than COB Monday, February 18, seating is limited.
Danna Basson, Mathematica Policy Research, Inc.
Jon Cohen, The Washington Post
Mark Blumenthal, Pollster.com

As the November elections near, please join DC-AAPOR in an informative discussion on how the general public evaluates candidates and how well the candidates are doing.

"The Impact of Accessible Political Knowledge on Voters' Candidate Evaluations, Issue Positions, and Issue Consistency," Danna Basson

The Current State of the 2008 Presidential Elections, Jon Cohen and Mark Blumenthal

Questions and Answers

Why So Much Volatility in Texas?

Topics: 2008 , ARG , Barack Obama , Hillary Clinton , John McCain , Jon Cohen , Likely Voters , Rasmussen , Washington Post

Not surprisingly, the three new Texas polls we posted yesterday provoked quite a bit of discussion. We have three polls showing very different results for the Democrats, but much more consistency for the Republicans. How can that be?

First, a quick summary: A survey sponsored by the Texas Credit Union League and conducted by two campaign pollsters, Hamilton Campaigns (D) and Public Opinion Strategies (R) has Clinton leading Obama by eight points (49% to 41%). A new automated survey from Rasmussen Reports has Clinton leading by sixteen (54% to 38%) and a new survey from American Research Group (ARG) shows Obama leading by six (48% to 42%). The Republican results are far more consistent, showing John McCain leading Mike Huckabee by margins of four to eight points.

One likely reason for much of the apparent "volatility" in the Democratic results is that the Obama-Clinton vote preference shows large variation on five critical variables: race and ethnicity, gender, age, socio-economic status and party affiliation (percent non-Democratic on party ID). Small changes in pollster methods (such as whether they sample from a list, how they select respondents within each sampled household, what time of day they call, whether they use live interviewers or an automated methodology and how they weight their data) can produce important differences in sample composition that will in turn affect the vote preference results.

Here is the data available online from the three most recent surveys (some of which was posted by our readers in comments yesterday):


Unfortunately, only the TCUL/Hamilton/POS poll provides complete information on its sample composition, although the ARG summary provides percentages for selected subgroups. From these data we can see that the we can see that the TCUL survey includes slightly more Latino voters and slightly fewer African-American voters than the ARG survey. That explains a few points of the difference between them but (as noted below) not all.

The table above also includes sample composition statistics from the 2004 Texas Democratic exit poll, although the 2008 composition will likely be different. Just how different we will not know until the votes are cast, but the exit polls so far this year in other states provide some guidance. The Washington Post's Cohen and Agiesta have put up a very helpful compilation showing the demographic shifts from 2004 to 2008 in 17 states that have held primaries or caucuses so far this year. Women have made up a slightly greater share of Democratic electorates almost everywhere (averaging about a 4 percentage point gain). The percentage of 18 to 29 year olds has also increased in just about every state, up 4 points on average.

The changes in race and ethnicity have been less consistent. Most relevant to Texas are California and Arizona, the two states with the largest Latino populations. In California, the Latino contribution surged (+14), while the African American percentage was roughly constant (-1). In Arizona, the African American percentage as up far more (+6) than the Latino contribution. Cohen and Agiesta also note that black percentage of the Democratic electorate is down slightly in two states (Florida and Virginia) where the Latino percentage increased.

The racial and ethnic composition of the three most recent surveys does not explain the their different Obama-Clinton results. As the following table shows, the biggest difference among the three is that the ARG survey reports an even race among Texas Latino Democrats while the Hamilton/POS and ARG surveys give Clinton a roughly two-to-one lead, comparable to her showing in other states with large Hispanic populations.


Another factor in the "volatility" of these polls -- a factor that is next to impossible to evaluate from the data available -- is how tightly (and accurately) they screen to identify "likely voters." In 2004, the Texas Democratic primary attracted 839,231 voters, 6% of all eligible adults and 5% of all adults in the state. Democratic turnout has increased everywhere this year, nearly doubling on average in primary states (as a percentage of eligible adults) although the state-by-state patterns have varied widely. Texas is all but certain to see a big turnout boost, but just how big is anyone's guess.

They key point here is that polls may yield different results depending on how broadly or narrowly they conceive of the Texas primary electorate. Unfortunately, the degree to which they screen for "likely voters" is hidden from our view.

Exit Poll Data: Education and Race

Topics: 2008 , Barack Obama , Exit Polls , Hillary Clinton , Jon Cohen

The Washington Post's Jon Cohen has posted some extremely useful data from the exit polls to his Behind the Numbers blog. He ran the Clinton-Obama vote by education among white voters and found evidence of a large "education gap:"

In each of the states where the Post subscribed to exit polls (and voters were asked about their level of education), Clinton did better among non-college than college-educated white voters. She also outpaced Obama among non-college whites in all 14 of these states, but beat him by more than a single percentage point among college graduates in only five.

This data helps shed some light on the subject of speculation earlier this week, whether Obama is "finally cracking the code of the working class white voter" as one observer put it. Our initial look at the exit polling on this issue was inconclusive, because the official exit poll tabulations show the results by education (and income) among all voters. Since the biggest differences between Clinton and Obama have been by race and ethnicity, the share of African American or Latino voters in each state determines whether they do better or worse among the less-well educated voters in that state.

Cohen's tabulations control for race, showing the percentage by education among white voters in each state, thus allowing for better comparisons across states:

02-15 Post Exit Poll Data

Obama's share of non-college whites in Virginia was, as many assumed, higher than in any other state except Illinois, although his performance among this subgroup has been relatively consistent elsewhere. Obama's percentage of non-college whites in Maryland was similar to most of the other states. Also, as some have speculated elsewhere, his percentage of non-college whites was lowest in three Southern states: Florida, South Carolina and Tennessee.

Somewhat surprising -- to me at least -- is the much larger variation across states among college educated white voters. Obama had large double digit leads among college educated white voters in Virginia, Missouri and Illinois but trailed by double digits among college whites in New York, New Jersey and Florida.

Some of these differences are clearly related to the home state advantages (Illinois, New York and possibly New Jersey). Others may have to do with the relative expenditure of resources (candidate time, television advertising and field organizing) by Obama and Clinton in each state. Do our readers have other theories?

POLL: Research 2000 Wisconsin, Gallup Daily Tracking

WISC-TV/Research 2000
(Dems, Reps)

Obama 47, Clinton 42... McCain 48, Huckabee 32, Paul 7

Gallup Poll

Obama 47, Clinton 45... McCain 53, Huckabee 28

Bialik on Poll Mash-Ups

Topics: Pollster.com

Carl Bialik, author of "The Numbers Guy" column for the Wall Street Journal, takes a balanced look today at the pitfalls of something we do here at Pollster, "mashing up surveys from various sources this election year to produce composite numbers meant to smooth out aberrant results." His piece is worth reading in full, as it considers both the benefits and risks of creating composite trends or averages:

Stirring disparate pollsters in one pot has its critics. "That's dangerous." says Michael Traugott, professor at the University of Michigan, and author of a recent guide to election polls. "I don't believe in this technique."

Among the pitfalls: Polls have different sample sizes, yet in the composite, those with more respondents are weighted the same. They are fielded at different times, some before respondents have absorbed the results from other states' primaries. They cover different populations, especially during primaries when turnout is traditionally lower. It's expensive to reach the target number of likely voters, so some pollsters apply looser screens. Also, pollsters apply different weights to adjust for voters they've missed. And wording of questions can differ, which makes it especially tricky to count undecided voters. Even identifying these differences isn't easy, as some of the included polls aren't adequately footnoted.

Bialik quotes both me and Charles Franklin in the column, but here are a few additional thoughts. We do not consider the trend estimates to stand as worthy replacements to the data from individual surveys. The trend lines -- and the estimates derived from their end-points -- are best considered as tools to help make sense of the barrage of often conflicting results from individual surveys. We learned in 2006 that "mashing up" surveys and "smoothing out" the variation between them helps counter the instinct to overreact to variation between individual polls -- some of it clearly aberrant -- that is common in hotly competitive political races. Moreover, while we only plot a few summary measures here such as vote preference and job approval, many of the surveys we report and link to include a wide variety of questions that help illuminate many aspects of public opinion.

Bialik is correct to argue that benefits of averaging lessen when we start to see large and consistent "house effects" separating the results from different pollsters. If a few polls are providing good estimates, while many other polls have misleading results, the mashed up averages may reflect more of the bad than the good. I wrote as much just before the Iowa Caucuses. Bialik correctly notes that the averages were misleading in California, where most polls showed the Clinton-Obama race closer than it turned out to be. His suggestion that we could "bolster" the case for trend estimates or averaging by comparing those numbers "directly against those from individual polling firms in terms of election accuracy" is a good one and something we are working on.

Bialik adds some additional detail in a companion blog item that focuses, among other things, on my calls for greater disclosure of methodological details, which includes a response of sorts from Zogby International:

When I asked Zogby spokesman Fritz Wenzel for further details, such as what those flawed estimates were, and passed along a blog post from Mr. Blumenthal calling for more disclosure from the firm, Mr. Wenzel dismissed sites like Pollster.com as “rivals.” “We are satisfied that we have identified the problem in California,” Mr. Wenzel wrote in an email, “and giving our rivals more ammo in the form of methodological detail, some of which is proprietary, with which to criticize us further doesn’t make the world a better place.”

Bialik is asking his readers comment on the value of composite poll numbers and whether better disclosure would "make the world a better place. Your comments are welcome here or there (or both!).

POLL: Texas from ARG, Rasmussen, POS/Hamilton

Texas Credit union League
conducted by Public Opinion Strategies (R) and Hamilton Campaigns (D)

Clinton 49, Obama 41... McCain 45, Huckabee 41, Paul 6
(Crosstabs: Dems, Reps)

American Research Group

Obama 48, Clinton 41... McCain 42, Huckabee 36, Paul 11

Rasmussen Reports

Clinton 54, Obama 38... McCain 45, Huckabee 37, Paul 7

White Men With Obama Since The Beginning

Topics: 2008 , Barack Obama , Exit Polls , Hillary Clinton

Much was made this week of Obama's performance among white men in Virginia. Indeed, his support with white men was seen as both the key to Obama's Potomac Primary victories, as well as a sign of broadening support to include those formerly in Clinton's base. Others are skeptical, even worrying that while male superdelegates might tip the scale toward Clinton.

In fact, Virginia was neither the first state (nor even first Southern state) where Obama bested Clinton among white men. Nor was it the state where he won this group by the largest margin. Obama has been doing well with this group since the beginning of primary season.

Below is a table of the Clinton/Obama vote among white men, from exit poll data from every contest thus far. The table is ranked in descending order, with the state showing the largest Obama margin at the top.


Compared to Virginia, Obama did even better with white men in Utah, New Mexico, and California (setting his home state of Illinois aside). This pattern is also not a function of election type or overall outcome. Obama led with white men in states with primaries and states with caucuses, and in states that he won and states that Clinton won.

Further, the country doesn't exactly fall into an obvious North/South divide. While Obama tends to do less well with white men in the South, he still led with the group in Georgia (in addition to Virginia), and trailed with the group in New Jersey and Missouri.

Finally, it's also worth reminding ourselves about the contest that started it all - the Iowa caucuses. Among white men in Iowa, Obama garnered a 10-point lead over Clinton, and an 8-point lead over Edwards.

** (Thanks to Joe Lenski for correcting the original graph by sending the official numbers from Iowa.)

POLL: ARG, Gallup Daily Tracking

American Research Group

Obama 47, Clinton 45... McCain 54, Huckabee 31

Gallup Daily Tracking

Obama 46, Clinton 45... McCain 51, Huckabee 29

The Keith Number

Topics: Sampling Error

My latest National Journal column looks at Keith Olbermann's polling innovation, the Keith Number.

POLL: Quinnipiac Ohio & Pennsylvania

Quinnipiac University

Clinton 55, Obama 34

Clinton 52, Obama 36

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: Gallup Daily Tracking

Gallup Poll

Obama 45, Clinton 44... McCain 51, Huckabee 29, Paul 6

Economic Conditions
Poor 34, Excellent/Good 23
Getting worse 80, Getting better 13

Low Information Voters and Television Ads

Topics: 2008 , Associated Press , Barack Obama , Hillary Clinton , Jon Cohen , Pew Research Center

TNR's Michael Crowley blogged this question from a reader last night:

I haven't seen all the exit data, but listening to the talking heads it seems Obama is finally cracking the code of the working class white voter. But maybe it is simpler than that.

Could it be that downscale voters are also "low-information" voters when compared with their Volvo-driving broadband-surfing upscale brethren? If so, it would suggest that all it was going to take was a bit of time for the word to get through that Obama is looking like a winner. Wealthier voters may be the leading edge of a wave. Downscale voters may be the ones who catch trends later--and then really give them mass market power. If so, it's bad news for Senator Clinton going into Ohio and Pennsylvania.

I can answer part of that question. Downscale voters -- those with less education and lower incomes -- are absolutely "low information voters" as compared to their more upscale brethren. That finding has been a consistent theme of 40 or 50 years of political survey research and was vividly reaffirmed by the updated political knowledge study released by the Pew Research Center last year. Here is a slightly condensed version of a table from that report showing that lower income and less well educated Americans are by far the least informed on an index based on 23 questions of political knowledge:

02-13 Pew knowledge table.png

The harder question to answer is whether the Virginia and Maryland results represent some sort of "breakthrough" for Obama among downscale white voters. It is not immediately clear from the exit polls, as Noam Scheiber put it, whether "Obama's strong showing in Virginia a sign of an expanding coalition, or...the predictable result of a contest waged on favorable terrain." Scheiber sees signs of "genuine growth for Obama" in the exit poll results, although he sees signs of improvement among older whites and Catholics. It is hard to be certain about any progress among downscale whites since, as Scheiber points out, the exit poll tabulations do not break out the results by education and race or by income and race.

A different but important piece of this puzzle comes from exit poll question that received suprisingly little attention last night (at least on the coverage that I watched). In both Virginia and Maryland, exit pollsters asked voters to "rate the importance of campaign ads...in your vote in today's presidential primary." As the table below shows, Obama did much better in both states among those who rated political ads as important.

02-13 exit poll ad importance.png

The Associated Press reports that Obama "far outspent his rival on television advertising" in Virginia and Maryland, and these exit poll results are consistent with a perceived Obama advantage in campaign ads. A Clinton supporter who saw nothing but Obama advertisements on television is more likely to say the ads were "not important," while an Obama backer would be more inclined to react favorably to ads that seemed mostly about Obama.

It is thus difficult to tease out from these results the actual importance of the ads in persuading voters in Virginia and Maryland. Still, this is a topic worth digging into more deeply. Did the content of those advertisements -- or simply the fact that Obama had a substantial advantage on that score -- help move some of the lower information voters that have been more supportive of Clinton in other states?

Update: The Washington Post's Jon Cohen reports some additional and highly relevant exit poll numbers which we discuss here.

Potomac Primary Results Watch

Topics: 2008 , CBS , CNN , Exit Polls , Mark Lindeman , MSNBC

Live blogging will be light tonight (hopefully), to add whatever value we can to the much more current news you can get on television or major news sites.

First, here are the usual official exit poll links: MSNBC, CNN, CBS.

Second, Mark Lindeman has been running extrapolations for us each primary night using the publicly available exit poll cross-tabulations to get the overall estimate used to weight data. These overall numbers reflect the estimate the exit pollsters have the most confidence in at any moment. They are based on some combination of exit polls and pre-election polls as the polls close; they are based increasingly on a sample of actual vote returns as the night wears on. What we post here is probably stale. What we post here may be an hour or more behind what the network "decision desks" are looking at and use to call the race.

Updates will follow in reverse chronological order, all times Eastern.


9:34 - If you're watching television, you know that the networks have just called Maryland for Obama and McCain. Here, via Mark Lindeman, are the underlying vote estimates used to weight the cross-tabs just posted online: Obama 62%, Clinton 35% -- McCain 55%, Huckabee 29%, Paul 8%, Romney 7%.

9:21 - Reader Daniel T asks:

I just took another look at the CNN exit poll and now all the numbers havechanged but the number of respondents did not. The exit poll now showsMcCain winning 50% of the male/female vote when it showed Huckabee winningthe female voted before.

Can you explain why/how this happened?

Yes [though admittedly, this process is confusing]. Here is how I explained the process last week:

So I've alluded to the fact that these estimates improve over the course of the evening. What does that mean? It means that as the polls close, the estimates are based on some combination of results from the exit poll interviews and (believe it or not) pre-election poll averages. Once the polls close, the interviewers attempt to obtain actual results for their sampled precinct (or another "reporter" attempts to get the results from the county or state registrar). The exit poll analysts use these numbers to do two things: First, they gradually replace the exit poll results in their estimate models with the actual count precinct-by-precinct. They also calculate the "within precinct error" statistic for each state (or regions with each state - not sure) that are used to adjust exit poll results from the other precincts where actual count is not yet available.

The exit pollsters also have "reporters" who gather hard vote counts from a much larger random sample of precincts. All of this data goes into the computer models and is used to create various estimates of the vote. The "decision desk" analysts look at all of those estimates in deciding whether to "call" a race.

A separate operation within exit-poll-central takes whatever estimate they deem most trustworthy and uses it to weight the subgroup tabulations that we can read online. And we take those tabulations as they appear online extrapolate the estimates above from those tabulations. They typically do one updates 30 to 60 minutes after the polls close and another two or three over the course of the evening. All of this is a long-winded way of saying that what you are seeing here is not as current as the information the network analysts are using to make their calls.

8:51 - Call it "McShift:" After the calls for McCain by NBC, CNN and (presumably) the other networks, an update to the online cross-tabulations shows McCain head by eight points (49% to 41%). Keep in mind, as noted above, that updates to these tabulations run well behind the updates to estimates used to call he race, and the estimates improve as the exit pollster are able to incorporate actual vote returns into their random samples of precincts.

8:30 - Reader Andrew Therriault emails to point out something something minor but nonetheless noteworthy that reminds us that exit polls -- like all other surveys -- are subject to "measurement" error. Roughly 3% of the respondents to the Virginia Democratic exit poll provided answers that are not exactly consistent: They either support Obama but say they would be "disappointed" of Obama is the nominee, or support Clinton and say they would be "disappointed" if Clinton is the nominee. So, presumably, a small number of respondents did not hear these questions are intended (unless you have a better explanation).

7:44 - The initial estimate for the Democratic race in Virginia had Obama leading Clinton, 62% to 38% (Mark corrects me...either 61%-38% or 62%-37%).

7:43 - The tabulations that appeared for Virginia just after the polls closed showed McCain ahead of Huckabee by four points (46% to 42%) and Obama leading Clinton. An update that we noticed at about 7:30 showed Huckabee with a numeric advantage (46% to 44%), though we should stress that these differences are almost certainly not large enough to be statistically significant.



Clinton 56, Obama 39... McCain 50, Huckabee 36, Paul 6

North Carolina
Obama 50, Clinton 40... McCain 45, Huckabee 40, Paul 5

Regression Analysis of the Democratic Race

Topics: Barack Obama , Exit Polls , Hillary Clinton , Jay Cost , NY Times

Over the last few days, a number of political scientist bloggers have turned their statistical firepower on the Democratic presidential race, producing some analyses that are both tantalizing in their implications and confusing for those unfamiliar with multiple regression analysis. The most interesting posts come from Brendan Nyhan, Jay Cost and DailyKos diarist Poblano. Other than pointing you to their efforts, here are a few thoughts.

In many ways, the Democratic contest is the perfect problem for multiple regression analysis. Many different important variables appear to be strongly related candidate support: race and ethnicity, gender, age, socio-economic status and whether voters participate in a primary or caucus (to name just the most obvious). We are really interested in understanding the independent effects of each of these factors. You can see crude efforts along these lines in the exit poll tabulations: How does vote preference vary by gender or age, for example, once we control for race? The promise of multiple regression is the ability to estimate the independent effects for a large number of different variables on vote choice while controlling for all of them simultaneously.

Another tempting feature of multiple regression analysis -- at least in theory -- is the ability to take a model that does a good job predicting the Obama-Clinton vote looking backwards, plug values for the upcoming contests for each of the variables into the model (race, gender, age, etc) and attempt to predict the outcomes. The lure of predicting "what might happen at the end of an active campaign" (as Poblano put it), is what led Bill Kristol to cite Poblano in his New York Times column. Obviously, if it were possible, we would all like to use hard data to anticipate what might happen in Ohio, Texas or Pennsylvania.

At the same time, the efforts by the aforementioned bloggers also demonstrate just how complex and challenging multiple regression analysis can be when applied to real world problems using real world data. Here are three reasons to be cautious about interpreting the models linked to above:

1) The data are imperfect. As Jay Cost explains, we have a choice between two kinds of data. "Micro-level" exit poll data and "macro-level" data from statewide results. Exit polls collect data on the vote preferences and characteristics of individual voters. That level of data is idea, since we want to understand how individuals vote (not states or counties). Unfortunately, for now, only the subgroup exit poll tabulations are available and not for all states. The networks have not conducted "entrance polls" for most of the smaller caucus states.

Data is plentiful at the aggregate level (mostly states) but far less precise. One problem is that Census data (on race, age, religion or socio-economic status) is based on the total population rather than those who participated in the Democratic primaries or caucuses. We also have a relatively small number of states to consider, and we have to deal with the statistical problem that populations sizes vary considerably from state to state.

2) The models are poor predictors of the future. The limitations of the data are one reason why these sorts of regression models make for poor predictors of future outcomes. Consider the predictive accuracy of Poblano's model. He says it explained 95% of the variation in 26 states that voted through February 5 and reports estimates that predicted Obama's actual share of the vote within these states "within an average of two points." However, as TNR's Josh Patashnik points out, the model overestimated Obama's support in Louisiana (+11 points) and Nebraska (+8) and understated it in Washington (-14) and Maine (-7). The reason is something statisticians call "overfitting" "overestimation". The number of variables in Poblano's model (9) was large relative to the number of cases involved (26 states). So the "fit" of Poblano's model to the past data is deceiving because it is, in essence, too good. The 95% of variance explained is unique to those 26 states and thus does not generalize to predict the results in other states with anywhere near as much precision.

Reducing the number of variables does not solve the problem, it just makes the "fit" of the model to the existing data less predictive (though more realistic). Jay Cost explains why his own model is a decent vehicle for explaining the existing data but a poor predictor of future outcomes:

The model's predictive power (69%) is very high from a certain perspective. From another perspective, though, its accuracy is not great enough to [allow for] "publishable" predictions - not when candidates are often separated by tiny margins.

3) Demography is not always destiny. Or to put it another way, campaigns matter. At least that is the underlying assumption behind all the personal campaigning, field organizing and paid advertising that both campaigns are doing. The one thing these models lack is a better measurement of the influence of the various means of campaigning. Once again, a lack of decent data is the primary culprit. For example, we do not yet have FEC reports providing decent breakdowns of how much the candidates spent in each state. Also, the University of Wisconsin's Advertising Project will ultimately have breakdowns of what each candidate spent on television advertising in each media market, but those data are not yet in the public domain. The impact of campaigning so far is important. Will it matter, for example, that the campaigns will now slow down enough so that the candidates can devote significantly more time and paid advertising to states like Ohio, Texas and Pennsylvania than they did the Super Tuesday states?

Jay Cost's model does include "number of candidate visits" as a variable meant to "measure campaign effects per state." He reports that:

Clinton does better as the number of candidate visits increases. This was a bit of a surprise, but it is good news for her. Campaign effects seem to incline the electorate to her.

This finding is intriguing, but I wonder how the results might differ if Cost had used separate variables for the visits of each candidate rather than just the total number of visits for all candidates.

Mechanical issues of this sort help illustrate one of the practical limitations of regression modeling. It is a very powerful tool, but it is also sensitive to decisions the analyst makes about what data to use and what variables to include. We will no doubt see more attempts to model the primary campaign in the future. Do not be surprised if reasonable people disagree about what data is most appropriate, what model best "fits" the data and about which conclusions are best supported.

Update: Just want to underline a point that may have been unclear. Neither Brendan Nyhan nor Jay Cost used their regression models to try to predict future outcomes.

POLL: Gallup Daily Tracking

Gallup Poll

Clinton 45, Obama 44... McCain 52, Huckabee 28, Paul 5

Poor 33, Excellent/Good 24
Getting worse 79, Getting better 14

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: PPP (D) Wisconsin

Public Policy Polling (D)

Obama 50, Clinton 39... McCain 53, Huckabee 32

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: Constituent Dynamics MD/DC/VA

Constituent Dynamics

Obama 53, Clinton 36

Obama 51, Clinton 34

Washington, DC
Obama 63, Clinton 27

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.



Obama 60, Clinton 38... McCain 48, Huckabee 37, Paul 7

Obama 55, Clinton 32... McCain 52, Huckabee 26, Paul 10

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: UNH New Hampshire Senate

University of New Hampshire Granite State Poll

New Hampshire
Shaheen 54, Sununu 37

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: Brown University Rhode Island Primary

Brown University/Taubman Center for Public Policy

Rhode Island
Clinton 36, Obama 28

Clinton 43, McCain 32
Obama 42, McCain 30

POLL: USA Today/Gallup, AP/Ipsos, Gallup Daily Tracking

USA Today/Gallup

Obama 47, Clinton 44... McCain 53, Huckabee 27


Clinton 46, Obama 41... McCain 44, Huckabee 30, Paul 9

Obama 48, McCain 42... Clinton 46, McCain 45

Gallup Daily Tracking
(a completely different sample from the USA Today/Gallup survey)

Clinton 46, Obama 44... McCain 56, Huckabee 25

Economic Conditions
Poor 34, Excellent/Good 24
Getting Worse 78, Getting Better 14

John Gorman

Topics: Fox News , John Gorman , Pollsters

Ben Smith has just reported that Pollster John Gorman, the CEO and founder of Opinion Dynamics Corporation, has passed away.

Since regular readers may know Gorman's through the polls his company conducts for Fox News, they may be surprised to learn that he was the co-founder, along with Patrick Cadell of Cambridge Survey, the company that polled for President Jimmy Carter and many other Democrats.

Smith quotes a reader who says of Gorman, "He was the smartest pollster I ever knew. I learned something every single time I spoke to him. He will be sorely missed." You will hear a lot of that over the next few days.

My former business partner David Petts, who worked for Cadell and Cambridge in the early 1980s, relays these thoughts:

Gorman's ability to excel in commercial and academic circles was extraordinary. In addition to having many successful and appreciative clients, he co-authored one of the single most important articles ever published on voters' decision making in presidential elections.

Lawrence Shiman, who worked for Gorman at Opinion Dynamics and considers him a mentor, shares this reaction:

He combined statistical rigor with common sense to a rare degree in his approach to survey research. What was also rare in the polling industry was his humility - he never had to be the face of any organization, and was perfectly willing to give those who worked for him the limelight even when John deserved it most...He will be missed.



American Research Group

Obama 56, Clinton 38... McCain 54, Huckabee 32

Obama 55, Clinton 37... McCain 50, Huckabee 25

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.

POLL: MSNBC/McClatchy/Mason-Dixon MD


Obama 53, Clinton 35... McCain 54, Huckabee 23

Sleep Deprived Post-Super Tuesday "Outliers"

Topics: 2008 , Associated Press , Frank Newport , Gary Langer , Kathy Frankovic , Mark Mellman , Mike McDonald , Wall Street Journal

Regular readers should note that I added an update this afternoon with more thoughts on the Response from Frank Newport. Now on to this week's "outliers"....

Al Hunt is betting against the pollsters, big time.

Wall Street Journal "Numbers Guy" drills down (here and more recently here) on why media delegate counts have been so varied.

Bialik also covered "purported pitfalls" in the exit polls and discrepancies among the California pollsters last week.

Brian Schaffner examines how well polls predicted delegates won on Super Tuesday.

Jennifer Agiesta parses the Republican exit poll numbers on evangelicals.

AP is conducting an internal review on why they mistakenly called Missouri for Hillary Clinton on Tuesday.

Kathy Frankovic looks more closely at possible shifts in perceptions of Bill Clinton

Mark Mellman reviews the divisions John McCain creates among Republicans.

David Hlll looks into the Barna Group survey of born again Christians.

Gary Langer reviews all the numbers on turnout and the youth vote.

Michael McDonald has updated his 2008 Presidential Primary Turnout page to include Super Tuesday results.

Kos is not a big fan of Zogby or ARG.

The Week looks at how opinion polls work.

Josh Green's baby doesn't like automated political calls.

And nothing to do with "scientific" polls, but...Chris Bowers reminds me that he is a lot younger than I am.

SurveyUSA's Pollster Report Card

Topics: 2008 , Pollster , Pollster.com , SurveyUSA

SurveyUSA, the well known provider of automated polls, has posted a pollster "report card" based on the final polls reported by public pollsters during the 2008 primary season to date. Actually, they have put up two report cards, one for all pollsters that have released at least one survey and another for just the 14 most active pollsters.

Several readers have emailed asking us to do an analysis or to verify their statistics. Professor Franklin has been doing post primary "polling error" analyses that look at the graph the performance of all polls, and should have something on the Super Tuesday polls soon. A pollster report card of our own is also on a Pollster.com to-do list that is, alas, long and growing longer each day. For now, let me just point out two important characteristics of the SurveyUSA report card:

First, their statistics are based on the last poll conducted by each organization. Typically, surveys get more accurate as we get closer to election day, and the polls conducted a week or more before the election tend to be at a disadvantage when compared against those from organizations like SurveyUSA that typically continue to call right up until the night before the election. You can decide whether that issue is a "bug" in the report card or a critical "feature" in SurveyUSA's approach to pre-election polling.

Second, SurveyUSA bases their ranking on one particular measure of polling error, which compares the margin between the percentages received by the first and second place finishers on election day to the margins as reported for the same two candidates on the final poll. There are other measures of poll error (SurveyUSA has posted a paper they authored that reviews eight such measures). Those critical of SurveyUSA will note that they typically report very small percentages for the "undecided" category, so they tend to do better on their measure of choice (Mosteller 5) which does not reallocate undecided voters. Again, your call as to whether that is a bug in the report card or a fair way to highlight one of the positive attributes of SurveyUSA's methdology. [CORRECTION: I was incorrect to imply that SurveyUSA has an advantage on the Mosteller 5 measure. If anything, that measure appears to be relatively tougher on them than other pollsters. See the response from SurveyUSA's Jay Leve here and my comments here].

Others -- most notably, Chicago Tribune pollster and frequent Pollster.com commenter Nick Panagakis -- are critical of the Mosteller 5 measure for focusing on the margins rather than individual percentages. His beef is that pollsters report a "margin of sampling error" based on individual percentages, which will be smaller than the average errors on the margin between two percentages. So, Panagakis argues, the measure used by SurveyUSA makes the magnitude of the errors seem unacceptably large.

Finally, the real challenge for any "pollster report card" is providing guidance on when the differences among the pollsters are statistically meaningful when when they are based mostly on random chance. Put another way, how much bigger should the average error be (and on how many polls should it be based) before we conclude that the difference between two pollsters truly significant? Unfortunately, I do not have good guidance on that question. Perhaps our more statistically astute readers can chime in with their thoughts.

[Update: Again, a response from from SurveyUSA's Jay Leve here and my comments here]

POLL: VA/MD Rasmussen, Mason-Dixon

Rasmussen Reports

Obama 55, Clinton 37

Obama 57, Clinton 31


Obama 53, Clinton 37... McCain 55, Huckabee 27

We encourage our readers to click through to see field dates, sample sizes, margins of sampling error, target populations and addition results.