March 2, 2008 - March 8, 2008
Los Angeles Times' editor Don Frederick thinks pollsters are doing better, yet the paper closes down its own polling operation.
The Providence Journal considers the challenges facing political polling.
David Hill says the "youth vote" is unreliable and "over hyped."
Mark Mellman considers underdogs and bandwagons.
Frank Newport examines Hillary Clinton's strength among Catholic Democrats.
Jay Cost looks at how Hillary Clinton won Ohio and Texas.
John Judis sees exit poll evidence of Barack Obama's weaknesses.
Political Scientist Tom Holbrook charts changes in the Democratic primary electorate.
Carl Bialik looks at the delegate math in Ohio and Texas and how many Michigan and Florida voters were "silenced."
Jennifer Agiesta digs deeper into the relative perceptions of Obama and McCain, especially among independents.
Scott Shepard revisits the Bradley-Wilder effect.
John Distasio reports on efforts by the New Hampshire Attorney General to stop polling conducted for the Republican Governors' Association.
Kathy Frankovic reviews the challenges of polling in developing nations.
There has been a considerable buzz over the last two days about the surveys released yesterday by SurveyUSA that test both McCain-Obama and McCain-Clinton trial-heat questions in all 50-states. Putting aside the concerns some have about SurveyUSA's automated methodology and the other usual caveats about horse race polling at this stage in the campaign, I tend to agree with the critique from Matt Yglesias (via Sullivan):
Each of these polls has a sample size of 600, so the margin of error will come into play. What's more, there are 100 separate polls being aggregate here, so the odds are that several of these are just bad samples.
True on both counts. SurveyUSA colors in states on their maps even if a candidate leads by a point or two, margins that are not close to achieving statistical significance. However, since SurveyUSA says they did 600 interviews in each state, we can take their analysis a step further, applying statistical sampling error to the candidates' margins in each state.
Professor Franklin and I have done just that, classifying each state based on the statistical significance of the candidate's lead. We call a state "strong" for the candidate if they lead by a margin that is statistically significant at a 95% level of confidence, the level typically used to calculate the "margin of error" attached to most surveys. We label as "lean" any state where a candidate leads by more than one standard deviation, which amounts to a 68% confidence level. We label all other states as toss-ups.
Note also that these significance tests assume "simple random sampling," which produces a smaller error margin than we would get if we could take into account that SurveyUSA, like virtually all pollsters, weights its data. We would need access to the raw data and weights in order to do truly correct significance testing.
The tables and maps appear below, followed by some discussion. First, here are the results and a map showing an Obama vs. McCain match-up (you can click on any of the images for a larger size version):
And here are the results and a map showing an Clinton vs. McCain match-up:
If you would prefer, you can also download the spreadsheet that we used to create the tables.
Now that you have all of the data before you, let's consider the merits of the project and a few caveats about the data. First, this sort of project -- which involved 30,000 interviews completed in 50 states over a three-day period (February 26-28) -- would not have been feasible with live interviewers.
On the other hand, the automated methodology is controversial with traditional survey researchers. I wrote about the arguments for and against IVR (interactive voice response) surveys Public Opinion Quarterly, and I have blogged often on the subject often, both here at Pollster and on its forerunner MysteryPollster. Readers are obviously welcome to share their opinions about the IVR methodology in the comments.
The other caveats noted by SurveyUSA are worth repeating: They surveyed all self-reported registered voters, and did not attempt to screen for "likely voters" (although many national pollsters do the same at this stage, feeling that we are too early in the process to attempt to predict what voters will actually cast ballots). McCain would likely do slightly better in both match-ups under a "likely voter" screen. Also, we are obviously still eight months from the election. Much can and will change in terms of voter perceptions and preferences.
Let us also keep in mind the limitations of random sampling error. It tells us only about the variability that comes from calling a sample of households rather than dialing every working phone number in every state. As with any survey, it tells us nothing about the potential for error based on the wording of the questions, the selection of respondents within the household and the voters missed because they lack land-line phones or do not participate in the survey. Be careful about using the misnomer "statistical tie" to describe states in the toss-up category. One candidate would likely show a "significant" lead if we could increase the sample size -- we just lack the statistical power to know which candidate that would be.
Finally, keep in mind that since we are looking at 100 tests (2 each in 50 states), these results probably misclassify five states by chance alone (as opposed to the way we would classify them if SurveyUSA had called every working telephone in the 50 states).
With all the caveats out of the way, what does all this data tell us? Consider this summary of the electoral vote totals**:
These data are less useful in forecasting the ultimate result than they are in gauging the relative strength of both Clinton and Obama as of last week (February 26 to February 28). Those dates are important, since both the Gallup Daily and Rasmussen Reports automated tracking have shown Clinton gaining ground on Obama nationally over the last week.
Nonetheless, as of last week, Hillary Clinton led in states that add up to a slightly greater electoral vote total counting the leaners (250 for Clinton vs. 244 for Obama. Still Obama appeared to put more states into play (138 pure toss-up for an Obama-McCain race vs. a Clinton-McCain race). So Obama's initial electoral vote advantage is greater.
The most interesting aspect of these surveys is the states that explain those differences. Let's consider first the states where Obama does better than Clinton:
- Obama moves three states from lean McCain to strong Obama: Colorado, Iowa and Oregon
- Obama moves two states from strong McCain to lean Obama: Nevada and North Carolina
- Obama leads in two states that are toss-ups in a Clinton-McCain race: New Mexico (lean) and Washington (strong)
- Obama moves four states from strong McCain (against Clinton) to toss-up: Nebraska, New Hampshire, North Carolina and Virginia
On the other hand, Clinton does better than Obama in a smaller number of states:
- Clinton moves one state from strong McCain to strong Clinton: Arkansas
- Clinton moves one state from strong McCain to lean Clinton: West Virginia
- Clinton leads in the two states that are toss-ups in an Obama-McCain race: Florida (strong) and New Jersey (lean)
- Clinton moves one state from strong McCain to undecided: Tennessee
- Clinton moves one state from lean McCain to undecided: Pennsylvania
Here is another table that makes it easier to see these comparisons (again, click on the image to see a full size version):
So, Pollster readers, what do you think?
**And yes, after putting these tables together I see that SurveyUSA split the Nebraska electoral votes based on on the vote totals, something I did not do.
Update: Nick Beaudrot (via Yglesias) creates thematic maps based on the same data keyed to the size of the candidate margin.
My NationalJournal.com column, which looks at how the leaked and at-poll-closing exit polls compare to the actual results, is now online.
Obama 46, Clinton 40
American Research Group
Obama 58, Clinton 34
Clinton 41, Obama 41
Clinton 55, Obama 39
Here is a quick review of a few items of interest I neglected to link to in the aftermath of this weeks' Junior Tuesday primaries.
First, SurveyUSA has posted report cards comparing pollster performances in four contests: Ohio Democrats, Ohio Republicans, Texas Democrats and Texas Republicans. One suggestion for this already helpful feature would be some indication of whether the results for each pollster fall within the range of random sampling error of the actual result. In other words, in theory, even when polls are as right as they can be about an election result they are still subject to random sampling error. As such, a poll that is "right" should capture the actual result within its "margin of error." If all polls are "right" then the ranking of best to worst is a matter of chance.
Yes, when we compile these scores over many different elections, those random factors should cancel out, but when focusing on an individual race, it would be useful to see, at a glance, which polls captured reality within random sampling error and which did not.
Second, here's an example of the lengths pollsters will go to in trumpeting their successes. The Boston Herald's Marie Szaniszlo sent a bouquet to Suffolk University pollster David Paleologos in the form of a short piece puffing his "knack for calling races." The evidence?: A poll in Ohio that "came decidedly closer to the mark" than a survey by "polling giant Zogby," and a New Hampshire poll that showed "Obama winning by 5 compared with Zogby, which showed him leading by 13."
Adam Lewis of the Boston Phoenix responded with a blog entry calling the piece "a bit of a debacle," pointing out that Clinton, not Obama, won the New Hampshire primary, and noting several other instances this spring (the New Hampshire Republican primary and the Massachusetts and California Democratic primaries) in which the Suffolk poll had been far off the mark compared to other pollsters. His final point:
In her lede, Szaniszlo sets up a David-and-Goliath narrative, with "polling giant Zogby" showing Clinton and Obama tied just before Ohio's Democratic primary and "a small polling center based at Suffolk University" putting Clinton ahead 52-40. Clinton won by ten points. Good for Paleologos, but good for a few other pollsters, too, all of whom go unmentioned.
True, but one more thing to consider: That Ohio poll by Suffolk also forecast a Democratic electorate that was 8% African American and 38% age 65 or older. The reality, according to the exit poll, was 18% African Americana and 14% 65+. None of the other pollsters -- save for the Columbus Dispatch mail-in poll sample that interviewed only previous primary voters -- came any where close to that demographic mix.
Finally, Wall Street Journal Numbers Guy Carl Bialik blogged on Tuesday about the challenges we all face telling good polls from bad focusing on the decision by our friends at RealClearPolitics to drop polls from the American Research Group from their averages. Here is a quote from our own Charles Franklin summing up our desire to report all polls that at least claim to provide a representative sampling of "likely voters:"
“Lots of pollsters have shown volatility, not just ARG.” Prof. Franklin added, “The inclusion or exclusion of a pollster runs the perils of cherry-picking polls, something we’ve tried not to do.”
But because of another difference between Real Clear Politics and Pollster, American Research’s numbers aren’t having much of an impact; the two poll aggregators basically agree that Sen. Clinton is ahead by six or seven points in Ohio, and by two points in Texas. That’s because Pollster’s method “has always discounted the effects of outliers — the more dramatically out of line a poll is, the less weight it gets,” Prof. Franklin said.
Those looking for more details on our method might want to review this post Charles did back in August.
Clinton 52, Obama 37
ABC News/Washington Post
(ABC story, results; Post story, results)
Obama 53, McCain 42
Clinton 50, McCain 47
Clinton 48, McCain 46
Obama 46, McCain 46
UPDATE: SurveyUSA will also release 50 statewide surveys testing general election match-ups for McCain vs. Obama and McCain vs. Clinton later today.
Ok, another Tuesday night and another election night. Polls will close at 7:00 p.m. (Eastern). at 7:30 p.m. in Vermont and 9:00 p.m. in Rhode Island and Texas. Exit polls tabulations will be posted by the networks at these links:
Carrying on with our "live blogging" tradition, I'll post what seems relevant here on what we can learn from the non-leaked exit poll information tonight. Updates will follow in reverse chronological order -- all times Eastern.
11:32 - Using a far less sophisticated extrapolation, the current Ohio estimate looks to be 55% Clinton, 43% Obama. Quite a shift from the initial four point estimate.
11:25 - Mark Lindeman is signing off to get some sleep. I'll post what seems relevant, but for now, our estimate extrapolator is offline. Thanks for your help Mark!
11:11 - The tabulations for Vermont and Rhode Island have updated. Rhode Island's estimate now shows a 59% to 40% Clinton lead (was 51% to 48% before the update -- quite a shift). Vermont shows a 60% to 38% Obama lead.
10:05 - I'm going to be offline for about 30 minutes while I relocate. Until then, look for Lindeman's updates in the comments.
9:40 - Promoted from the comments, from my very able helper Mark Lindeman:
In case anyone is wondering, that wasn't a typo about RI. The exit poll tab, consistent with the early leak, shows a very close race -- but the networks have seen enough votes to make a call. It's not news that the exit polls have often (but not consistently) overstated Obama's performance.
9:37 - From the Texas Democratic crosstabs, a preliminary look at the other set of numbers I've been obsessing over for the last few days (numbers from 2004 and 2000, in that order, in parentheses):
- 57% Female (52%)
- 30% Latino (24%)
- 19% African American (21%)
- 16% Age 18-29 (9%)
- 44% Age 18-44 (34%)
- 55% Age: 18-50 (47%)
- 13% Age: 65+ (19%)
- 33% Independent/Republican (26%)
- 43% College degree (37%)
- 61% Income: $50K+ (49%)
One extra caution on Texas: From what I understand, the estimates are especially subject to change because they weighting on early vote vs election day vote, and the geographic mix (which helps determine the racial composition) really requires hard count to get right. And if that last sentence goes over your head, the point is, the numbers above are not set in stone.
9:21 - [Sorry this got garbled a few moments ago -- finishing the thought now]: Caro asks "Are the early votes (pre-election) included in the exits?" In Texas the exit pollsters would have conducted a telephone survey of early voters that they fold into the interviews conducted at polling places. They have done telephone interviews with early voters in Texas before. With the number of early votes cast in Ohio, I assume that an absentee vote telephone survey was conducted there as well, but I'm not certain.
9:10 - More estimates crunched by Mark Lindeman: In Rhode Island, Clinton 51%, Obama 48. Among Texas Republicans, McCain 49%, Huckabee 38%.
9:02 - The current cross-tab estimate for Texas shows 50% Clinton, 49% Obama. See the usual caveats below
7:45- From the Ohio Democratic crosstabs, a preliminary look at the numbers I've been obsessing over for the last few days (numbers from 2004 and 2000, in that order, in parentheses):
- 59% Female (52%, 60%)
- 19% African American (14%, 17%)
- 15% Age 18-29 (9%, 8%)
- 44% Age 18-44 (32%, 36%)
- 54% Age: 18-50
- 31% Independent/Republican (29%, 24%)
- 37% College degree (37%, 27%)
- 56% Income: $50K+ (49%, 45%)
Keep in mind that these numbers, like all in the current tabulations, are preliminary and will likely change over the course of the night.
7:35 - As polls close in Ohio, the posted tabulations show 52% Clinton, 48% Obama. Again (can't say this enough) these are preliminary estimates that will grow more accurate as the night wears on. See the 6:59 entry for caveats.
7:22 - Pollster reader Thatcher has reposted some leaked mid-day exit poll results found on Huffington Post. For those who want to speculate about the meaning of those numbers -- and that's everyone, right? -- I'd recommend my exit poll tips last posted on February 5 and especially the "comparable" leaked numbers posted at about this time that day. Bottom line: a percentage point or two in either direction on a mid-day exit poll doesn't mean much.
7:10 - A few minutes ago, MSNBC's Nora O'Donnell reported two key statistics from the very preliminary exit poll tabulations from Texas and Ohio. In Ohio, 22% of Democratic primary voters described themselves as independent and 10% as Republican, that compares to 24% and 2% respectively in 2004. In Texas, 24% in the early tabulations are independent and 10% Republican (which compares to 20% and 5% respectively in 2004). For what it's worth, the combined non-Democrat percentage in Ohio is 8 to 15 percentage points higher than any of the pre-election polls that released party ID results. In Texas, that number is on the high side of what public polls were showing.
7:00 - MSNBC projects Barack Obama the winner, but Keith Olbermann tells us that "we do not have a number" yet for Vermont. That's true -- they do not have a number they consider "air-worthy." However, they have posted preliminary cross-tabulations on the MSNBC web site, and those currently indicate an estimate of 64% for Obama, 34% for Clinton. See the caveat below -- these estimates are preliminary and will become more accurate as the evening progresses.
6:59 - Shortly after the polls close in each state, our friend Mark Lindeman will report the extrapolated overall vote estimate used to weight the exit poll cross-tabulations. These estimates begin as a mashup of pre-election polls and the interviews exit polls conducted at polling places and over the phone (with early voters) by the networks. These estimates improve, becoming more accurate over the course of the night. Click here for the usual caveats on how these numbers are derived and how they improve over the course of the evening.
Cook Political Report/RT Strategies
National 2/28 - 3/2
Obama 47, McCain 38
Full cross tabs here.
Let's start with the bottom line: The final value of our trend estimate for Texas (at least as of this writing) shows Hillary Clinton running slightly ahead of Barack Obama (47.6% to 45.9%), but I would advise readers against treating that as a solid prediction of the outcome. It may turn out that way, of course, but variation among individual polls and more importantly -- uncertainty at this hour about the racial composition of the Texas electorate -- means that the ultimate result is unknowable.
First, let's take a look at the latest version of my table comparing the demographic composition of most of the polls out over the last few weeks, updated with the surveys released since my post on Friday (and lets say a thank you to all the pollsters who have released this data -- what a change since Super Tuesday):
If you read through the lines a bit, you can see the different approaches that various pollsters take to "modeling" the likely electorate. Some arrive at a set of arbitrary weighting quotas for gender, age and race and apply these consistently on each survey. Notice the way the percentages for both Zogby and InsiderAdvantage are identical on all of their surveys, except one. The exception is the Latino percentage on the most recent InsiderAdvantage, which plunges from 37% to 27% (while all other demographics remain spot on identical). Perhaps someone there had a change of heart about their model?
Some -- such as ABC/Washington Post and SurveyUSA -- take a very different approach. The begin by interviewing a sample of all adults in Texas, weight the demographics of the adult sample to Census estimates for Texas, choose "likely voters" based on their answers to screen questions and allow the demographics of the likely voters "fall where they may." See this post for more details on SurveyUSA's approach.
An important underlying point here is that some pollsters have more confidence than others in the ability of their measurements to "predict" the likely electorate and its demographics. My own sense (and be advised that other pollsters may not agree) is that pre-elections polls are much better at measuring the opinions and preference of respondents than at precisely predicting who will vote and who will not. I will spare the detail this morning, but the bottom line is that screen questions are at best a crude measure of who will turn out. We can select "likely voters" with a greater probability of voting than those we screen out, but that's as good as it gets.
The best approach in situations like these, when voters demonstrate huge differences by racial and demographic subgroups, is to watch those differences and understand the potential range of outcomes. So let's do just that.
The good news, again, is that most of the pollsters have released both racial composition data and cross-tabulations of the vote by race, which allows for the following table. A few observations: Most surveys have been reasonably consistent (within the vagaries of sampling error) in their results for Latinos and African Americans. Most have shown Clinton with a roughly two-to-one lead among Latinos and most have shown Obama winning 75% to 85% of African Americans. Those results are generally consistent with exit poll results from other states (although Obama has typically done a few points better on the exit polls than in final pre-election polls).
However, pollsters have been less consistent in their measurements of white voters.For example, on poll released in the last 24 hours, SurveyUSA and Rasmussen show Clinton leading by just four points among white voters, while Mason Dixon, PPP and InsiderAdvantage show Clinton with margins of 15 or more points among white voters.
I have included three sets of averages at the bottom of the table: One for all of the polls listed, one for the final poll by each organization and one for final polls released over the last three days. Keep in mind that cross-tabs by race were not available for all surveys.
Let's use these results to put together a "what if" analysis of the turnout (similar to that found in the Belo/Public Strategies analysis). I have created a spreadsheet in Excel (download here) or Google Documents (edit here) that you can use to try and test your own assumptions (details on how to edit the Google docs version at the end of this post).
I set up the spreadsheet using the following assumptions:
(1) A racial composition of 51% Anglo, 29% Latino and 20% African American -- the average of the assumptions and findings of the pollsters in the first table; (2) A 56% to 44% Clinton margin among white voters -- which takes the average above and proportionately allocates unecideds; (3) A 67% to 33% Clinton margin among Latinos and (4) an 85% to 15% Obama margin among African Americans (which assumes that Obama overperrforms in this constituency by about as much as he has elsewhere this year). I am not making predictions here, just grabbing for reasonably defensible assumptions based on the available data as a starting point -- your "mileage" may vary.
As it turns out, these assumption produce a roughly two-point Clinton lead, the same as our current trend estimate. That's not a great surprise, since they are based on mash-ups of the demographics and results of most of the the polls used to generate the overall estimate. But now, play "what if" and see how making very small changes in any of the assumptions can easily alter the outcome.
For example, apply racial composition findings from the ABC/Washington Post survey (39% Latino, 17% African-American), leave all other assumptions as is, and you get a 6 point Clinton victory. On the other hand, apply the racial composition used on the Belo/Public Strategies polls (25% Latino, 22% African American), and you get a half-point Obama win. Leave the racial composition as-is, but assume that SurveyUSA or Rasmussen has the white vote right (Clinton leading by just four points) and Obama wins by two. But if InsiderAdvantage, PPP and Mason Dixon has Clinton's margin among white voters right (roughly 15 points), and all other assumptions remain constant, and Clinton wins by six. I could go on.
Again, your assumptions may be different on any of the above, so open up the spreadsheet and have at it. But now hopefully, you have a sense for why races like these give pollsters heartburn.
To edit the Google Docs spreadsheet: You will need a free Google account. Click here to display the published version, then click the "edit this page" link at the bottom right of the screen and you will see a "view only version of the spreadsheet." If you use the "File" pull-down at upper right to copy and rename the spreadsheet, you can edit, change and even publish as you see fit.
Hat-tip to David Ianelli and Elise Hu for providing the final Belo/Public Strategies numbers by race. Typos in the tables have been corrected.
Update: In the comments, Steve P asks some good questions:
Extrordinarily high turn out overall but I would think that [African-American] turnout is probably outpacing their regular share. But that Hispanic share is probably down.
I understand that this is difficult to guage since we have never had a primary go this deep into the schedule before on a competitive basis. Wouldn't you have to toss election models out because we have never seen anything like this before?
How well has final polls lined up with exit poll data demographically? Are we seeing the same demographic portions despite the much larger pie?
Should we throw out old demographic models? Certainly. But which model makes the most sense? To answer Steve's question briefly: While we have seen consistently greater proportions of younger and higher income voters participating in Democratic primaries as measured by exit polls, the change in racial composition has been inconsistent. We will know the answer in a few hours, of course, but a critical question is whether Texas will be more like California this year (where the Latino share surged from 16% to 30%, while the African American share dropped a point from 8% to 7%) or more like Arizona (where the Latino share increased one point, from 17% to 18%, but the African American contribution quadrupled, from 2% to 8%).
ABC News/Washington Post
(ABC story, results; Post story, blog, results)
Obama 50, Clinton 43
67% of Democrats and those who lean Democratic want Clinton to stay in the race if she wins one of Texas and Ohio but loses the other; 29% want her to drop out.
Before going back into the deep weeds of Ohio and Texas, I want to pass along a link to the long article by my National Journal colleague Ron Brownstein. He did an extensive dive into the primary exit polls to paint a picture of the "new Democratic coalition" being forged by race for the Democratic nomination:
From New Hampshire to California, and from Arizona to Wisconsin, exit polls from this year's contests show the Democratic coalition evolving in clear and consistent ways since the 2004 primaries that nominated John Kerry. The party is growing younger, more affluent, more liberal, and more heavily tilted toward women, Latinos, and African-Americans.
In the 18 states for which exit polls are available from both 2004 and 2008, the share of the Democratic vote cast by young people has risen, often by substantial margins. Voters earning at least $100,000 annually have also increased their representation in every state for which comparisons are available -- again, usually by big margins. Women's share of the vote has grown in 17 of the 18 states (although generally by smaller increments). In 12 of the states, Latinos have cast a larger percentage of votes, as have the voters who consider themselves liberals. African-Americans have boosted their share in 11 of the 18 states.
This story is well worth the click.
Brownstein's thesis gets at one challenge of primary polling this season that my discussions of the demographics of Texas and Ohio only hinted at. The challenge is about more than getting race, ethnicity, gender and age right. Polls have also had to cope with a surge in better educated, higher income voters in Democratic primaries. Click on the link in Brownstein's story labeled "A Shifting Landscape" (in the right column, just below the cover image) and notice the large and consistent increase in $100K+ voters.
One thing that surprised me a bit is how many Texas and Ohio pollsters fail to include any measure of the education or income level in their surveys. How can you diagnose how well your sample represents the primary electorate if you have no measure of socio-economic status?
Ohio 3/1 - 3/2
Clinton 52, Obama 40
Also, full cross tabs are available.
Ohio 2/27 - 3/2
Clinton 49, Obama 45
Last Thursday, the New York chapter of the American Association for Public Opinion Research (AAPOR) held a post-mortem on "What Happened" to the polls in New Hampshire. The meeting included a presentations by pollsters from Gallup, the Marist Institute and CBS News on the polls conducted by their own organizations. Gary Langer, the ABC News polling director, blogged a complete report on the discussion that is well worth reading in full. Some highlights follow.
Gallup's Frank Newport attributed "half of the misstatement" of their poll to their likely voter model:
Gallup, whose final poll had Obama ahead by 13 points, had a closer 5-point Obama lead among people who described themselves as registered voters. That means its likely voter modeling, used to produce a more accurate estimate of who’ll actually vote, instead introduced error.
Gallup’s editor-in-chief, Frank Newport, said the modeling included factors such as enthusiasm and attention to the race, both of which may have increased for Obama and slacked off for Hillary Clinton after Obama’s Jan. 3 victory in Iowa. Unlikely voters – those excluded from the model – were much better for Clinton. “Obviously that was a cause for the incorrect likely voter numbers that Gallup put out,” he said.
The conference also featured the first discussion of post-election follow-up surveys conducted by both Gallup and the Marist Institute:
Newport and Miringoff based their conclusions partly on post-election polls in which they called back respondents to their pre-election polls in an effort to see where those polls went wrong. Analysis of those data is not complete, though Newport said Gallup hopes to post some conclusions on its website next week.
Both said their callback polls reached about two-thirds of the original poll respondents; they hadn’t yet weighted these samples to adjust for the noncoverage, a step that could improve their analysis.
See the full article for some additional details on the call back surveys and a discussion of their potential pitfalls.
The Columbus Dispatch released a mail-in survey of registered Democrats and Republicans in Ohio this morning. We have chosen not to include that survey in our chart for the Ohio primary because the Dispatch made the odd choice of sampling only registered Democrats and Republicans in a semi-open primary that allows non-partisan registrants to participate. We did briefly and inadvertently include the poll in our chart earlier this afternoon, but have removed it.
The Columbus Dispatch has long conducted pre-election polls by mail, but our issue with this particular survey is unrelated to its mode. The Dispatch sends out poll "ballots" to voters randomly selected from Ohio's list of registered voters. This method has been surprisingly accurate in general elections since 1980, something I wrote about approvingly in October 2004. On the other hand, the Dispatch poll produced a disastrous result on a set of ballot initiatives in 2005, owing partly to some deviations from their usual methodology, such as not replicating the exact ballot language, including an undecided option and fielding the survey a week earlier than usual.
However, in this case, the key issue is that the Dispatch sampled only registered partisans, that is, voters with some previous history of voting in primaries. Why does that matter?
Ohio has a "semi-open" primary. The state has no formal "party registration," in that voters do not choose a party when they register to vote. However, those who vote in primaries have their party affiliation recorded in the voter lists. Those who have previously voted in a primary and want to switch their party affiliation can do so by filling out a form on primary day (or when they request an absentee ballot). But those who have never voted in a primary before [and are registered to vote] -- those considered "non-partisan" by the registrar of voters -- can opt to participate in any primary simply by showing up on Election Day (for more details, see the blog post by Pollster reader Tom Fox).
As of 2006, Ohio had 7.6 million registered voters, but only 2.4 million voted in the primary election (of either party) in 2004. Slightly more, 2.5 million, voted in the Ohio primary in 2000, and the turnouts in off-year primaries are lower.
As such, the majority of Ohio's registered voters do not participate in primaries and are, therefore, registered as "non-partisan" but yet still fully eligible to participate in Tuesday's primary. The current voter file maintained by Voter Contact Services (a political list vendor) includes 7.9 million registered voters of whom 20% are "registered Democrats," 19% are registered Republicans and 60% are non-affiliated.
Keep in mind that "party registration" in Ohio is very different from the self-reported "party identification" that most surveys measure. While 60% are unaffiliated on the voter lists, a recent SurveyUSA poll of registered voters finds only 23% identifying as "independent" (while 44% identify as Democrats and 29% as Republicans).
Typically, most primary voters in Ohio have voted in primaries before. So the choice by the Dispatch to sample only those with previous primary history may have been appropriate for typical off-year primaries. The 2008 primary will be anything but typical, however, and their decision to exclude non-partisan voters from their sample is questionable.
Ohio's Secretary of State Jennifer Brunner is predicting that 52% of Ohio's registered voters will participate this week, a level that the Associated Press appropriately described as "incredibly high." They also reported that Brunner cited as as evidence the early requests for absentee ballots and the experience of other states this year.
As the Dispatch observed, if Brunner is right and this week's turnout hits 4 million, it would mean that "well more than a quarter of Ohio's 5 million-plus nonpartisan voters will vote" in the primary. That means that roughly 30% percent of the voters in the two primaries would be unaffiliated. Presumably, given the interest in the Obama-Clinton race, the percentage of non-affiliated voters in the Democratic primary would be higher.
Of course, the percentage of unaffiliated voters that turn out on Tuesday is a matter of speculation. However, since the Dispatch sample design entirely excludes this potentially critical category of voters from its sample, we are not including it in our trend chart.
On Friday I posted demographic profile data on recent polls in Texas. Today I have the equivalent data from Ohio, and a few thoughts about what it all means.
Last week, I emailed questions about poll demographics to all pollsters that had fielded recent surveys in Ohio. Fortunately, those requests were less necessary than in previous contests, as more and more pollsters have been including demographic composition data in their releases. A thank you is in order, however, to the pollsters at the Washington Post, the University of Cincinnati, Quinnipiac University and Public Policy Polling that shared data not already in the public domain.
The racial composition of the Ohio Democratic electorate is less of a puzzle than in Texas, if only because Ohio's Latino population is relatively small (amounting to 3% or less on the various polls that reported it). Still, the surveys show meaningful variation in their African American composition from a low of 12% on the University of Cincinnati "Ohio Poll" to a high of 22% on today's new poll from the Cleveland Plain Dealer and Mason-Dixon Polling and Research. With Barack Obama winning the overwhelming majority of black voters, differences of a few percentage points can have a significant affect on vote preference. If African Americans had been 16% of the respondents instead of 22% in the PD/Mason-Dixon poll, Clinton would lead by roughly 10 rather than 4 percentage points.
The variation in age has been bigger, (although again, age comparisons are more difficult because pollsters are inconsistent about the age breaks they use). The percentage of 18-to-45-year-olds varies from a low of 26% by the University of Cinninnati and 28% by Quinnipiac to a high of 46% by SurveyUSA. The percentage of 18-to-50-year-olds varies from a low of 44% in today's PD/Mason-Dixon poll to highs of 57% and 60% in the most recent surveys from ARG.
As with Texas, I have included comparable numbers from the 2004 exit poll (based on final data from the Roper Center Archives), although the "right" answer for this year will be unknowable until all the votes are cast and this year's exit poll is available.
Also, as in Texas, I asked pollsters to estimate the percentage of Ohio adults represented by their samples (unless they included the necessary data in their releases). This statistic is a rough measure of how tightly they screened for likely voters. Ohio's Democratic primary drew 12% of eligible adults in 2000 and 15% in 2004 (as per Michael McDonald's turnout page). How high will it go on Tuesday? The Ohio Secretary of State is predicting a total turnout (for Democrats and Republicans) of roughly four million voters, representing a 60-70% increase compared to the last two presidential elections.
With that in mind, consider the percentage of adults that some of these surveys represent. The number of polls included in the table below is smaller, because fewer pollsters were willing or able to provide an estimate. Obviously, the percentages of adults sampled are much higher than previous turnouts, higher than even the optimistic projection from the Ohio Secretary of state. They vary from a low of 27% in the PPP survey to a high of 40% in the most recent poll by SurveyUSA.
What all of this means is that polls are in disagreement about who will vote in Tuesday's primary, and that uncertain composition will likely determine the winner. The polls we have before us can tell us a great deal about how preferences differ across the key demographic and regional groups, but the tools of survey research are simply not powerful enough to predict who will vote with great precision. I'll have more thoughts on this issue after we see the final round of surveys tomorrow.
Obama 46, Clinton 45