The most recent polling in New Jersey shows an excruciatingly close race between incumbent Democrat Jon Corzine and Republican challenger Chris Christie. As of this writing, our standard trend estimate (below) puts Corzine "ahead" by a negligible 0.8% (41.4% to 40.6%). The more sensitive setting on our smoothing tool makes the Corzine margin slightly narrower (0.6%), the less sensitive setting makes it slightly larger (0.9%). Any way you look at it though, the differences between the estimates -- and more importantly, between Corzine and Christie -- are virtually meaningless. Right now, the current polling snapshot of this race is a close as these things get.
For perspective on the closeness of the margin you might want to stroll down memory lane and revisit my final Election Day update from Tuesday, November 4, 2008. We showed only four states where the Obama-McCain margin on our trend-estimates was less than 2 percentage points, and the leader ultimately won the state in 2 of 4 states. So a margin of under two percentage points puts us well within true toss-up territory in terms of predictive accuracy, especially with a weekend of polling still to go.
Understandably, the close nature of the race has political junkies turning these numbers upside down and reading every possible tea leaf and in search of the key to the outcome. After doing much of the same (while out with the flu) the last few days, the best answer I can give based on the empirical evidence -- for the moment at least -- is that this race is currently looking very close.
Are things trending toward Corzine? Yes, when compared to early September, our chart indicates a decline of roughly four percentage points for Christie and an increase of roughly three points for Corzine. Over the course of the summer, Christie had been dropping (from a high of roughly 49% in early July), while Corzine remained flat.
What is less clear is whether the closing trend has continued over the last two weeks. As of this writing, only three pollsters have tracked more than once since mid-October, allowing apples-to-apples trend comparisons. Two, SurveyUSA and Democracy Corps -- show Corzine's margin two percentage points better. One, Rasmussen, shows it one point worse. None of these differences are statistically significant alone and the patterns are obviously small and inconsistent.
That said, the trend over the next four days may not be as smooth, and the Daggett "wild card" that everyone has focused on for the last few months is the reason. Consider at least three ways that the Daggett effect leaves us even more uncertain about the outcome:
Individual level uncertainty -- The Monmouth University Polling Institute reported yesterday on a focus group they convened earlier this week in Edison, NJ among voters who are still either undecided or just leaning to a candidate. While they explicitly warn against treating the findings as representative of all undecided voters, the most clear finding was a sense of unhappiness with both major candidates: "These voters claim that this is the most difficult election choice they have ever faced. Nearly all said that Jon Corzine has not done a good enough job to deserve reelection. They simply have not heard enough from Chris Christie to cast their lot with him." Their final decision about Daggett, the report says, may come down to whether he has a chance of winning.
Aggregate level uncertainty -- One statistic worth pondering: On the last ten polls, all conducted in the last week, the portion of the electorate that is either undecided or supporting a candidate other than Corzine or Christie averages 16.5% (with a range of 11% to 23%). As a crude measure of voter uncertainty, that's considerably more than 5% or so we saw at this stage of last year's presidential election.
Measurement artifacts? -- Complicating this issue even further are the measurement challenges that pollsters face when testing lesser known independent candidates, especially when voters are unhappy with the top two choices. Offer just three choices and no explicit undecided category and some undecided voters will choose the independent as their way of expressing uncertainty. On the other hand, fail to prompt for the independent and you may measure a number that's much lower (see, for example, the intriguing experiment embedded in the Fairleigh Dickinson poll). Reality likely falls somewhere in between. And no one can be certain of the effect that the other 9 candidates will have.
And finally, there is the intriguing pattern noted earlier this week by PPP's Tom Jensen and explored last night by Nate Silver. Christie has done consistently better on telephone polls conducted using an automated, recorded voice than on those using live interviewers. Using the filter tool on our chart, as of this writing, Christie runs roughly three points ahead of Corzine on the automated polls, but Corzine runs a little less than three points ahead on live interviewer polls. The chart below, which Charles Franklin kindly prepared this afternoon, shows that the difference has been consistent throughout the race (his margins are likely different than on our interactive chart due to his use of slightly different smoothing levels).
We also see a similar though far less pronounced and consistent effect in Virginia, and then only since Labor Day.
What this effect is about, and what it portends for the outcome in New Jersey, I cannot say. Nate Silver has some plausible speculation about automated surveys being potentially more sensitive to an enthusiasm gap between Republicans and Democrats, although if that is true, I have no explanation for why we saw no such consistent difference between automated and live interviewer surveys in the Obama-McCain polling last year. We should have new surveys over the weekend or on Monday from all three automated pollsters in New Jersey (SurveyUSA, PPP and Rasmussen) and from at least three of the live-interviewer polls. So this phenomenon will be interesting to watch.
Either way, the combination of a very close snapshot and many indicators of potential volatility makes for a very uncertain outcome.
My National Journal column for the week, now posted, defends automated, recorded voice polling from what is becoming a common line of attack: without a live interviewer anyone, regardless of age, might participate in the survey. Please click through for the details.
Since I typically file my NationalJournal.com columns on Friday afternoon to appear on Monday morning, I get a chance to mull them over all weekend before posting these quick updates on Pollster. This weekend, I realized I that one conclusion could have used more emphasis: My bottom-line on automated polls is that they have established a strong record in measuring campaign horse-race results in pre-election polls. Over at least the last four election cycles, they have been as accurate as live election polls at the end of the campaign, and their horse race results generally track well with live interviewer surveys. So I think that it is wrong to condemn automated polls simply because they use a recorded voice rather than live interviewers.
That said, we need to keep in mind that the mode of interview is just one aspect of a methodology. If you look at the best known automated surveys, you will see a lot of variation in how they draw their samples, how persistent they are in attempting to call-back households where no one answers on the first call, how many interviews they conduct, how they identify likely voters, how they weight the data and, finally, in the questions they ask. All of those factors might make any given automated poll more or less reliable or accurate than any given live interviewer poll.
Also, while automated surveys have proven themselves in one particular application -- measuring campaign horse race numbers late in the campaign -- we need to be careful about overlooking potential shortcomings for other kinds of research. I would certainly not recommend an automated interview for any general population study that wants to ask more than four or five substantive questions or that involves open-ended questions that allows respondents to answer in their own words.
On a slightly different subject, the column also highlights one statistic that Charles Franklin computed:
[T]he national job approval data does not support the assertion that automated polls are more "erratic." My Pollster.com partner Charles Franklin checked and found that despite identically sized three-day samples, the Rasmussen daily tracking poll is less variable than Gallup (showing standard deviations of 1.8 and 2.4, respectively), probably because Rasmussen weights its results by party identification.
Charles also sent along a chart, which is based on deviations from the trend line for Obama's job approval rating since taking office in January.
The tails of the Gallup curve are slightly wider than the Rasmussen curve. The point is not that Rasmussen is better or worse than Gallup, again only that the presidential approval is slightly less variable as Rasmussen, probably because they weight by party.
You can certainly make a case that rolling average daily tracking, whether automated or traditional, includes a lot of random variation, and that those seeking a narrative can find whatever story they want in the meaningless daily bumps. On that score, I generally agree with the advice offered by the First Read piece I quoted in the column: Beware -- lots of daily approval polls with widely differing methods "lets some folks cherry-pick what they want."
Finally, one subject that deserves more attention than the two brief paragraphs I gave it is what we lose when a live interviewer does not gather the data. A few weeks ago, a survey researcher named Colleen Porter shared a defense of quality interviewing in the form of an anecdote on the member-only listserv of the American Association for Public Opinion Research (AAPOR). She gave me permission to share the story here, in which she describes monitoring an interview being conducted on behalf of her client:
The interviewer is amazing. Her surname is Hispanic--is she this good in Spanish, too? Of course they put their best interviewers on the first night; I would, too, when I was at a survey lab.
When she asks about the location of an event, the respondent commences a story about the many times it has happened. The interviewer repeats the question exactly as worded, with emphasis on "LAST TIME," but a tone of complete patience as if reading a new question. The respondent focuses, and answers promptly.
That is exactly how it is supposed to work. Score! As the respected client, I am off in a room alone, and there is no one to give a high five. I punch the air. I love to hear good interviewing.
Update: Brenden Nyhan emails to pass along a 2006 journal article by respected political scientist Gary Jacobsen (requires a subscription to view the published article, but you can access an earlier conference version of the paper here, in pdf format). Jacobsen's paper is based in the 50-state job approval surveys that automated pollster SurveyUSA conducted during 2005 and early 2006. In the article's appendix, he describes how he "examined the data carefully for internal and external consistency as well as intuitive plausibility" and found that "they passed all of the tests very satisfactorily." His conclusion:
In sum, I found no reason to believe that the quality and accuracy of the aggregate data produced by SurveyUSA's automated telephone methodology is in any way inferior to that produced by other telephone surveys, and I thus have no qualms about using the data for scientific research on aggregate state-level political behavior.
My National Journalcolumn for the week, on the surprisingly dire view of the future of polling from SurveyUSA's Jay Leve, is now online.
At a panel at last week's Joint Statistical Meetings in Washington DC, Leve delivered a presenation with this surprising conclusion: "If you look at where we are here in 2009," for phone polling, he said, "it's over... this is the end. Something else has got to come along." Intrigued? Hope so. Click through for the details.
*Correction: The original headline and subheading on both the National Journal column and this entry incorrectly stated that Leve forecasts "doom" for all of polling and the polling profession. Leve sees doom for a particular kind of polling, what he calls "barge-in telephone polling" -- in essence,this means telephone surveys as we now know them, both live operator and automated. However, as I hope the last paragraph of the column makes clear, he is optimistic about the future of polling: "And for those who might ask, he adds that he 'doesn't look to the
future with despair but with wonder' at the opportunities for the
polling profession."
About a month ago, I wrote a post about the fairly obvious and consistent differences among pollsters ion the Barack Obama job approval question -- what we usually refer to as "house effects." At issue is that the two of the national pollsters that have produced consistently lower scores for Obama use an automated, recorded voice to ask questions rather than live interviewers. My argument was that we should not overlook the other factors that might also explain the house effects at evidence on our job approval chart.
One admittedly far-fetched hypothesis I floated to explain the consistently lower approval scores produced by Public Policy Polling (PPP), one of the automated pollsters, is that they ask a slightly different question: Most of the others ask respondents if they "approve or disapprove of the way Barack Obama is handling his job as president." PPP asks if they "approve or disapprove of Barack Obama's job performance" (emphasis mine). I wondered if "some respondents might hear 'job performance' as a question about Obama's performance on the issue of jobs," and suggested that they conduct an experiment to check.
Well, it turns out that the folks at PPP took my advice. They randomly split their most recent North Carolina survey (pdf) in two. The full survey interviewed 686 registered voters, so each half sample had roughly 340 interviews. One random half-sample heard their usual question (rate "Barack Obama's job performance"). The other half heard the more standard question (rate "the way Barack Obama is handling his job as president. According to PPP's Tom Jensen, the two versions "actually came out completely identical- 51 [percent approve] / 41 [percent disapprove] on each."
So much for my theory. That said, the bottom line from last month's post remains the same:
While tempting, we cannot easily attribute to [the automated methodology] all of the apprent difference to Obama's job rating as measured by Rasmussen and PPP on the one hand, and the rest of the pollsters on the other. There are simply too many variables to single out just one critical.
To review, let's quickly list a few (I discussed most in the original post).
1) Population. Rasmussen interviews "likely voters;" PPP interviews registered voters. Most of the other national media polls interview and report on all adults, although a handful (most notably Fox/Opinion Dynamics, Quinnipiac, Diageo/Hotline, Cook/RT Stategies and Resurgent Republic) all report results from registered voters.
Alert reader Tlaloc suggested that while our charts allows easy filtering by mode (live interviewer, automated, etc) it would be even more useful to filter by population. We will add that feature to our to-do list. Meanwhile, Charles Franklin prepared the chart below, which shows three solid (loes regression) trend lines for Obama's approval percentage. Black shows the polls of all adults, blue shows the polls of registered voters (including PPP, whose individual releases are designated with blue triangles) and red shows the Rasmussen Reports results.
As the chart shows, the three categories produce consistently different estimates of Obama approval, with Rasmussen lowest, adult surveys highest and registered voter surveys somewhere in the middle. Moreover, the three PPP surveys are closer to the Rasmussen result than the other registered voter surveys (and we omitted the small handful of other pollsters besides Rasmussen that report Obama approval among "likely voters").
2) Question format. If you scan the "undecided" column of our table of recent Obama job approval results (and really that should be "not sure" -- another item for our to-do list), you will see quite a lot of variation. Although Rasmussen rarely reports a specific result, they usually have only a percentage point or so that is neither approve nor disapprove. The unsure percentages for CNN/ORC, ABC/Washington Post, AP/GfK and Ipsos/McClatchy tend to be in the low single digits. PPP has produced an unsure response of 6-8 percent. Meanwhile, pollsters like Pew Research Center, CBS News, Fox/Opinion Dynamics typically produce unsure responses over 10 percentage points.
The reason for the variation is usually some combination of the format of the question, including the number of answer choices offered, whether the pollster offers an explicit "unsure" category and whether they have an added push of those who are initially reluctant to answer the question. The point is not that any particular method is right or wrong, but that these differences matter.
3) Sample frame. PPP is unlike virtually all of the other national pollsters in that they sample from a list of all registered voters culled from voter rolls. Phone numbers are usually obtained by attempting to match names and addresses to listed telephone directories. As such, a significant number of selected voters are not covered -- PPP does not say how many are missed in their public releases. That difference in coverage may also contribute to the apparent house effect.
4) Live interviewer vs automated telephone. If we could easily control for the first three factors, we might be able to reach some conclusion about whether the lack of a live-interviewer produces an effect of its own. In other words, holding all other factors equal, are some respondents providing a different answers to the job approval question when asked by an automated method rather than a live interviewer. Unfortunately, we have only national results on Obama job approval from just three pollsters that use the automated phone mode (Rasmussen, PPP and SurveyUSA - and just one poll from the latter).
The above is not an exhaustive list of the possible reasons for pollster house effects. So again, it's next to impossible to try to reach any firm conclusions about the automated mode alone. Also, as I concluded last month (and it bears repeating):
Just because a pollster produces a large house effect in the way they measure something, especially in something relatively abstract like job approval, it does not follow automatically that their result is either "wrong" or "biased" (a conclusion some readers have reached and communicated to me via email), only different. Observing a consistent difference between pollsters is easy. Explaining that difference is, unfortunately, often quite hard.
In a column this past Sunday, Washington Post polling director Jon Cohen explains why the Post has not reported on recent surveys "purporting to show the status of" the upcoming Democratic primary contest for governor in Virginia. Their bottom line:
None of the recent polls in the Virginia governor's race meet our current criteria for reporting polls: Two primary ones were by Interactive Voice Response, commonly known as "robopolls," and the third was a partial release from one of the candidates eager to change the campaign story line.
Cohen's piece starts a conversation worth having about the difficulty of polling in low turnout primaries, about the coverage of "horse race" results and where journalists should draw the line in reporting on polls conducted by campaigns or of otherwise unknown or questionable quality. For today, I am going to shamelessly gloss over those bigger issues (and shamelessly promote that I'll take up some of them in my about to resume NationalJournal.com column next week) and consider instead the narrower issue of the Post's policy against reporting the results of automated polls (also known as interactive voice response, or IVR).
Cohen makes makes two arguments for not reporting automated surveys:
1) Automated polls take "less care" determining likely voters:
Given the great complexity in determining "likely voters" in the upcoming electoral clash, extra care should taken to gauge whether people will show up to vote. Unfortunately, polls that use recorded voice prompts typically take less care than polls conducted by live interviewers.
2) Automated polls are impractical for surveys asking more than a half-dozen substantive questions:
People are generally less tolerant of long interviews with computerized voices. One recent Virginia robopoll asked six questions about the governor's race; the other asked four....Lost in the brevity is much, if any, substance. Neither of the two in Virginia asked about the top issues in the race, what candidate attributes matter most or anything about the economy. Without this essential context, these thin polls offer little more than an uncertain horse race number. In understanding public opinion, "why" voters feel certain ways is crucially important
Expanding on the second point, Cohen also points out that the requisite brevity of automated polls also leads campaign pollsters to rarely use automated polls. He quotes Joel Benenson and Bill McInturff and cites the poll released by Virginia candidate Brian Moran (conducted by Greenberg, Quinlan Rosner).
Let's take these in reverse order. First, he is right that the automated methodology is inappropriate for longer, in-depth surveys and that a single, automated pre-election poll can typically "offer little more than an uncertain horse race number." So we would want to stick to live interviewer surveys if we want to understand the broader currents of public opinion surrounding an election (the goal of the work done by the Post/ABC poll) or if we want to plot campaign strategy or test campaign messages (the goal of campaign pollsters). The inherent brevity of automated polls is the primary reason that campaign pollsters still rely on traditional, live-interviewer methods for their work.
Similarly, the need for a very short questionnaire on automated polls prevents the use of a classic Gallup-style likely voter model (which requires asking seven or more questions about vote likelihood, past voting and attention paid to the campaign). However, I do not agree that the absence of a Gallup style index means that automated polls take inherently "less care" with likely voter selection than other state-level pre-election surveys. Many pollsters, including most of those that work for political candidates, rely on other techniques (such as screening questions, geographic modeling and stratification and the use of vote history garnered from registered voter lists) to sample and select the likely electorate.
Do we really think the polls produced by SurveyUSA and PPP in Virginia take "less care" in selecting likely voters than the Mason-Dixon Florida primary poll reported yesterday by the Post's Chris Cillizza or the Quinnipiac New Jersey primary poll reported in Sunday's Post?
And while I will grant that final-poll pre-election poll accuracy is a potentially flawed measure of overall survey quality, it is the best yardstick we have to assess the accuracy of likely voter selection methods. After all, the Gallup-style likely voter models were developed by looking back at how poll estimates compare election outcomes and tweaking the indexes until they produced the most accurate retrospective results. With each new election, pollsters look back at how their models performed, adjusting them as necessary to improve their future performance. Thus, if a pollster is careless in selecting likely voters it ought to produce less accurate estimates on the final poll.
On that score, automated "robo" polls have performed well. As PPP's Tom Jensen noted earlier this week, analyses conducted by the National Council on Public Polls (in 2004), AAPOR's Ad Hoc Committee on Presidential Primary Polling (2008), and the Wall Street Journal's Carl Bialik all found that automated polls performed about as well as live interviewer surveys in terms of their final poll accuracy. To that list I can add two papers presented at last week's AAPOR conference (one by Harvard's Chase Harrison and Farleigh Dickinson Unversity's Krista Jenkins and Peter Woolley) and papers on prior conferences on poll conducted from 2002 to 2006 (by Joel Bloom and Charles Franklin and yours truly). All of these assessed poll conducted in the final weeks or months of the campaign and saw no significant difference between automated and live interviewer polls in terms of their accuracy. So whatever care automated surveys take in selecting likely voters, the horse race estimates they produce have been no worse.
One reason why is that respondents may provide more accurate reports of both their vote intention to a computer than a live interviewer. We know that live interviewers can introduce an element of "social discomfort" that leads to an underreporting of socially frowned upon behavior (smoking, drinking, unsafe sex, etc). Is it such a stretch to add non-voting to that list?
So let me suggest that this argument is really about the value of polls that measure the "horse race" preference -- and little more -- a few weeks or months before an election. Is that something worth reporting? Jon Cohen and ABC News polling director Gary Langer, the two principals of the ABC/Washington Post polling partnership, have been consistently outspoken in saying, "no," urging all of uurging us all to "throttle back on the horse race."
I have no doubt of their sincerity of their commitment to that goal or the obstacles they face putting it into practice, but I wonder if urging abstinence is a workable solution. Political journalists and their political junkie readers are intensely and instinctively interested in the basic assessments that "horse race" numbers provide. Poll references have a way of showing up in stories about the Virginia governor's race, even in a newspaper that is supposedly not reporting on Virginia primary polls. Just yesterday, for example, the Post's print edition debate story reported that the Virginia candidates "sought to stamp a final impression in a race where polls show the majority of voters remain undecided" and Chris Cillizza told us in his online blog that "polling suggests [Terry McAuliffe] leads both [Brian] Moran and state Sen. Creigh Deeds."
So the "polls" show something newsworthy enough to report, but the reporters are not allowed to name or cite the polls they looked at to reach that conclusion. Does that make any sense?
I am pondering two somewhat related questions this afternoon, but both have to do with national surveys conducted using an automated ("robo") methodology (or more formally, IVR or interactive-voice-response) to measure Barack Obama's job approval rating. One is the ongoing Rasmussen Reports daily tracking, the other is the just-released-today national survey by Public Policy Polling (PPP).
Both surveys are certainly producing lower job approval scores for President Obama than those from other pollsters. The difference for Rasmussen is painfully obvious when you look at our job approval chart, magnified by the sheer number of data points they contribute to the chart. Look at the chart and you can see two bands of red "disapproval" points with the trend line falling in between. Point to and click on any of the higher scores and you will see that virtually all come from Rasmussen. Similarly point to and click on a Rasmussen "black" approval point and you will see that virtually all of their releases fall somewhere below the line.
The most recent Rasmussen Reports job rating for Obama s 55% approve, 44% disapprove. Use the filter tool to drop Rasmussen from the trend, and the current trend estimate (based on all other polls) is, with rounding, 61% approve, 30% disapprove. Leave Rasmussen in and the estimate splits the difference. The latest PPP survey produces a result very similar to Rasmussen: 53% approve of Obama's job performance and 41% disapprove.
I know that Charles Franklin is working on a post that will discuss the impact of the Rasmussen numbers of the job approval chart, so I am going to defer to him on that aspect of this discussion. (Update: Franklin's post is up here).
But since some will find it very tempting to jump to the conclusion that the IVR mode explains the difference -- as PPP's Tom Jensen did back in February -- I want to take a step back and consider some of the important ways these surveys differ from other polls (and with each other) that have little or nothing to do with IVR.
First consider the Rasmussen tracking: Like many other national polls it begins with what amounts to a random digit dial sample -- randomly generated telephone numbers that should theoretically sample from all working landline telephones. However, unlike many of the national surveys, it does not include cell phone numbers, it screens to select "likely voters" rather than adults, and Rasmussen weights by party identification (using a three-month rolling average of their own results weighted demographically, but not by party). Rasmussen also asks a different version of the job approval rating. Other pollsters typically ask respondents to say if they "approve" or "disapprove" Rasmussen asks if them to choose from four categories, "strongly approve, somewhat approve, somewhat disapprove or strongly disapprove."
And Rasmussen uses an IVR methodology.
Now consider PPP: Unlike Rasmussen, they draw a random sample from a national list of register voters compiled by Aristotle International (which gathers registered voter lists from Secretaries of State in each of the 50 states plus the District of Columbia and attempts to match each voter with a listed telephone number in the many states where that information is not provided by the state. As far as I know, Aristotle has not published the percentage of registered voters on that list for which they lack a working telephone number, but it is likely a significant percentage. The critical issue is that the population covered by PPP is going to be different than that covered by other pollsters including Rasmussen.
So any coverage problems aside, PPP still samples a different population (registered voters) than most other public polls. Like most other pollsters, but unlike Rasmussen, they do not weight by party identification. Finally, the also ask a job approval question that is slightly different from most other pollsters.
Consider these versions:
Gallup (and most others): "Do you approve or disapprove of the way Barack Obama is handling his job as president?"
Rasmussen: "How would you rate the job Barack Obama has been doing as President... do you strongly approve, somewhat approve, somewhat disapprove, or strongly disapprove of the job he's been doing?"
PPP: "Do you approve or disapprove of Barack Obama's job performance?"
Note the very subtle difference: Others ask about how Obama is "handling his job" or about the job he "has been doing as president." PPP asks about his "job performance." MIght some respondents might hear "job performance" as a question about Obama's performance on the issue of jobs? That hypothesis may seem far fetched (and it probably is), but a note to PPP: It would be very easy to test with a split-form experiment.
Oh yes, in addition to all of the above, PPP uses an IVR methodology.
As should be obvious from this discussion, not all IVR methods are created equal. I happened to be at a meeting this morning with Jay Leve of SurveyUSA, one of the original IVR pollsters. As he pointed out, "there is as much variability among the IVR practitioners as there would be among the live telephone operators" on methodology, including some of the other more arcane aspects of methodology that I haven't referenced.
So the main point: While tempting, we cannot easily attribute to IVR all of the apprent difference to Obama's job rating as measured by Rasmussen and PPP on the one hand, and the rest of the pollsters on the other. There are simply too many variables to single out just one critical. The lack of a live interviewer may well play a role, but the differences in the populations surveyed, the sample frames and the text of the questions asked or some other aspect of methodology may be just as important.
More generally, just because a pollster produces a large house effect in the way they measure something, especially in something relatively abstract like job approval, it does not follow automatically that their result is either "wrong" or "biased" (a conclusion some readers have reached and communicated to me via email), only different. Observing a consistent difference between pollsters is easy. Explaining that difference is, unfortunately, often quite hard.
I must admit, despite the fact that my National Journal colleagues publish The Hotline just one floor down from my office, I missed this brief announcement (subscription required) on Tuesday appended to results from a recent survey from Public Policy Polling (PPP):
Traditionally, the Hotline has only published live-telephone interview surveys while excluding interactive voice response (IVR) polls, despite the increased media coverage of many of these so-called "robo-polls." In our constant effort to remain tuned to industry developments, and to determine if such distinctions are fair and valid, the Hotline will begin running selected numbers from IVR polls during the upcoming cycle. Specifically, head-to-head matchups, favorability ratings and approval ratings from IVR outfits will appear on an interim basis in the Hotline's Latest Edition through the '10 midterms. This data -- from firms such as InsiderAdvantage, Public Policy Polling, Rasmussen Reports and SurveyUSA -- will be published alongside live-telephone data, but will be clearly labeled as IVR results.
For those who are unfamiliar, The Hotline has been a DC institution for more than 20 years, serving up a daily political news summary chock full of polling data since the days when the preferred mode of delivery was the fax machine. They have long refused to publish surveys that used an automated methodology rather than live interviewers, so in our small world, their decision to publish IVR results, even if only on an "interim" basis, is important and, in my view at least, a welcome step.
"Numbers Guy" Carl Bialik devotes his Wall Street Journalcolumn and a companion blog post today to the subject of the automated "interactive voice response" polling that has become such a staple of the current campaign. Both are well worth reading in full.
Bialik managed to interview most of the major players in the political IVR field, and had a reaction from our partner Charles Franklin, summing up our own philosophy regarding the automated polls (that use a recorded voice rather than a live interviewer, and ask respondents to answer questions by pressing keys on their touch-tone phones):
The automated-polling method, says Charles Franklin, professor of political science at the University of Wisconsin and co-developer of the poll-tracking site Pollster.com, "can prove itself through performance or it can fail through poor performance, but we shouldn't rule it out a priori."
The column notes that IVR pollster SurveyUSA ranks second most accurate among all pollsters during the 2008 primaries in the ratings compiled by Nate Silver and that IVR polling was indistinguishable during the primaries in terms of how the final poll compared to the election result:
Their accuracy record in the primaries -- such as it was -- was roughly equivalent to the live-interviewer surveys. Each missed the final margin by an average of about seven points in these races, according to Nate Silver, the Obama supporter who runs the election-math site fivethirtyeight.com.
Franklin did our own compilation of polls conducted during the final week of the 2006 (for a paper presented at the AAPOR conference last year) and reached essentially the same conclusion.
The article also indicates some cracks may be forming in the intense skepticism that the survey research establishment has long held for IVR surveys. Bialik notes that a polling textbook (The Voters Guide to Election Polls ) authored by Paul J. Lavrakas and Michael Traugott, "refers to these surveys as Computerized Response Automated Polls -- insulting acronym intended." But at the end of the column, Lavarakas indicates a willingness to consider the methodology:
Accepting responses by touch tones may have a particular advantage this election, says Mr. Lavrakas, former chief methodologist at Nielsen Media Research, because it may extract more-honest responses from white respondents about their intent to vote for Sen. Obama. "Ultimately the proof is in the pudding, and those firms that use IVR for pre-election polling and do so with an accurate track record should not be dismissed," he says.
My NationalJournal.com column, on those wildly variant automated polls in North Carolina from Public Policy Polling (PPP), is now online.
A few additional pieces of the story: First, I get a lot of email asking about the firms like PPP. Who are they? Who pays for the polls? PPP was founded by a North Carolina businessman named Dean Debnam, and its clients are mostly Democratic candidates holding or seeking local office in North Carolina. According to Tom Jensen, PPP's communications director, Debnam founded the company to help provide "low cost, high quality polling" to candidates for local offices who "could never afford a $12,000 poll."
PPP is a good example of a growing trend that the automated (or interactive voice response - IVR) technology makes possible. It is easier than ever for organizations with little prior experience in survey research to make calls, ask questions, tabulate the results and disseminate them via the Internet. Where my pollster colleagues disagree -- often vehemently -- is whether the new firms like PPP are delivering the "high quality" polling they promise. For example, one campaign pollster friend I talked to this week said he had a "hearty laugh" about the change in the sample selection methodology I describe in the column because, "I doubt seriously that they had one in the first place."
I should say that Jensen has been very responsive on behalf of PPP and as transparent about their methods as any pollster we have dealt with. On the other hand, PPP made no reference to their changed sample selection in their most recent releases (here and here, though they did note the change in a separate blog entry). They also neglected to extract the relevant vote history data from the sample that would have allowed a simple tabulation of the results from the latest survey using the older, narrower universe of past primary voters. Those are the kinds of mistakes that fuel skepticism among experienced pollsters.
Professor Franklin and I share an attitude about these sorts of surveys that sometimes puts us at odds with many of our colleagues in survey research. We believe we ought to judge all surveys by their performance rather than simply dismissing them by their methodology alone. Skepticism is certainly appropriate for newcomers using relatively unproven methods, but we will continue to track and follow the results from companies like PPP in order to evaluate their ultimate success or failure in achieving their stated goals.
In case you missed our update, the most recent Gallup Daily result on the Democratic race shows a near dead-heat, with Barack Obama ahead of Hillary Clinton by a single percentage point margin not nearly large enough to attain statistical significance (47% to 46%). That one point lead is somewhat apropos, since it is virtually identical to the average of all of Gallup's Daily releases since February 8 (Obama 46%, Clinton 45%). So the question for the day: How much of the daily variation over the last six weeks has been real and how much is random noise?
Let's start with the chart of the Gallup Daily results since their three-day track completed on February 8 (and released on February 9). That was the first three-day result collected entirely after the results from the Super Tuesday primaries were known.
While the Gallup trend has shown several "figure eights" over the last few weeks (as reader "emcee" put it), most of that variation occurs within the range that we should expect from a survey with a +/- 3 point margin of sampling error.
To illustrate that point, consider the hypothetical possibility that the preferences among Democrats have remained perfectly stable for the last six weeks. Let's assume that the average result since February 8 -- 46% to 45% favoring Obama -- has been the unchanging reality. What sort of random variation should we expect from taking a sample rather than interviewing the entire population?
First, remember that the so-called "margin of error" applies to the individual percentages, not the margin between the candidates. So under our hypothetical "no change" scenario, we would expect the the Obama percentages to fall somewhere between 43% and 49% (46% +/- 3) and the Clinton percentages to fall somewhere between 42% and 48% for Clinton (45% +/-3).
Since February 8, the results of the actual Gallup Daily have fallen outside that range on just three days:
March 1, when Obama led 50% to 42%
March 13, when Obama led 50% to 44%
March 18, when Clinton led 49% to 42%
But wait. As some of you may remember, most political surveys (including Gallup) calculate the margin of error using a 95% confidence level. That assumption means that we should expect results slightly outside the margin of error for one poll in twenty.
Unfortunately, at this point our story gets a little bit more complicated, because the "one in twenty" assumption applies to statistically independent measurements. Since each Gallup Daily release is based on a three-day rolling average, there is overlap in the sample on successive days. So only the results from every third day are truly "independent." 'll skip over some even more confusing explanation and get to the bottom line: Since February 8, roughly one-in-seven independent samples from the Gallup Daily series has produced a result outside the margin of error from my hypothetical, no-change, 46-45 scenario. That's a little bit more than we would expect by chance alone, but not much more.
Having said all that, my explanation still oversimplifies. It ignores the possibility for meaningful change within the standard "margin of error" -- subtle shifts that might not attain statistical significance in a single three-day sampling, but might over the course of a week or more.
A better way to distinguish the meaningful patterns is to compare Gallup's results to those from another pollster or two. Let's start with a chart of the Rasmussen Reports daily tracking poll over the same six week period. Not surprisingly, the average of the Rasmussen data gathered since February 8 also shows Obama leading by a single percentage point (45% to 44%).
Compare the two charts (or look at the chart below, which plots a Clinton-minus-Obama margin for both polls) and you will see several features in common:
Both show a shift from Clinton to Obama between Super Tuesday and mid-February
Both show Obama maintaining a low single-digit lead from mid to late February
Both show Clinton rising a few days before the March 4 primaries and falling a few days after
And yet, at about the time the news surrounding Jeremiah Wright became a full-blown media obsession (March 14), the results of the two polls appear to diverge. Why is that?
We should keep in mind that Gallup and Rasmussen collect their data differently (and ask slightly different questions -- see the postscript). Gallup uses live interviewers, makes repeated call-backs to unavailable respondents, samples cell phone numbers, and routes calls to Spanish speaking interviewers when they reach a Spanish speaking household. Rasmussen uses an automated system and recorded voice to conduct interviews, a slightly tighter screen for "likely voters," yet (as I understand it) makes no calls backs, does not call cell-phones and makes no provision for bilingual interviewing.
Some, I am sure, will readily conclude that one or more of these characteristics (or perhaps others that I've omitted) provide "obvious" explanations for the discrepancies. I am reluctant to make too much of these differences. The reasons be clearer after we look at data from a third source. I obtained it earlier today from an anonymous but trusted pollster that I'll call "Polimatic." Here is a chart of the Polimatic's tracking data for the last six weeks:
Those who notice the greater stability in the Polimatic data as compared to Gallup and Rasmussen are on to something important. Next consider how the Clinton-minus-Obama margin from the Polimatic data compares to the other pollsters:
See some interesting patterns? Starting to form theories about what type of poll Polimatic is, or how their methodology might influence their results?
Well, before you go too far, I should fess up. I fibbed. "Polimatic" is not a pollster at all. The data are based on a simulation run by our friend Mark Lindeman. Mark created a spreadsheet that generates random results consistent with a thee-day rolling average tracking sample of 1,26040 interviews and the assumption that the "true" population value remains an unchanging 46% to 45% Obama lead.
The Polimatic line is more stable, suggesting that the consistently highest highs and lowest lows of the blue and red lines probably represent real divergence. However, the purely random variation of the simulated poll trend line is frequently hard to distinguish from the real surveys.
To generate the results above, I closed my eyes and clicked the mouse to let the spreadsheet recalculate. As such, the "Polimatic" line illustrates one potential trend showing nothing but random noise around a 46% to 45% margin. I'll say it one more time to be clear: All of the variation in the Pollmatic trend lines is based on purely random chance. Any resemblance to real changes as measured by Gallup or Rasmussen is entirely coincidental.
So what can we conclude from all this?
First, there has been far more stability than change in the national Obama-Clinton vote preference since Super Tuesday, and that includes the period of last ten days. To the extent that we have seen real changes, they are barely bigger than what we might expect by chance alone.
Second, if you look closely, you will notice that the seemingly odd divergence between Gallup and Rasmussen since the Wright story broke is really not that unusual. It is comparable to similar separations in the trend lines that occurred around February 13 and February 29. Random variation will do that.
Third, and probably most important, it is far too easy to look at these rolling average tracking surveys and see compelling narratives and spin interesting theories from what is often little more than random noise.
PS: Yes, as a few readers have already suggested in prior comments, some of the stability in national Democratic vote preference may stem from the fact that most states have already held their primaries and caucuses. We had some discussion about a month ago about how Gallup alters its screen slightly to accommodate states that have already voted. However, neither Gallup nor Rasmussen alters their vote question for those who have already voted. Here is the text used by each:
Gallup: Which of these candidates would you be most likely to support for the Democratic nomination for president in 2008, or would you support someone else? [ROTATED: New York Senator, Hillary Clinton; Former Alaska Senator, Mike Gravel; Illinois Senator, Barack Obama]
Rasmussen: If the Democratic Presidential Primary were held in your state today, would you vote for Hillary Clinton or Barack Obama? [options are rotated]
PPS: While I was writing this post, Mickey Kaus blogged a theory for the divergent Gallup and Rasmussen trend lines:
The 'Bradley Effect' is Back? Gallup's national tracking poll has Obama retaking the lead over Hillary after bottoming out on the day of his big race speech. Rasmussen's robo-poll, on the other hand, shows Obama losing ground since last Tuesday. True, even Rasmussen doesn't seem to be putting a lot of emphasis on his survey's 6-point shift. But isn't this week's primary race exactly the sort of environment--i.e.., the issue of race is in the air--when robo-polling is supposed to have an advantage over the conventional human telephone polling used by Gallup? Voters wary of looking like bigots to a live operator--'and why didn't you like Obama's plea for mutual for understanding that all the editorial pages liked?'--might lie about their opinions, a phenomenon known as the Bradley Effect. But they might be more willing to tell the truth to a machine. ...
Or more likely, the apparent differences between are about random variation in one or both polls. If you average the results from data collected since March 14 (the day the Wright story exploded) they are not very different:
Live Interviewer Gallup Daily: Clinton +2 (47% to 45)
Automated Rasmussen Reports: Obama +1 (45% to 44%)
Kaus also links to an automated PPP survey in North Carolina that fielded on the evening of March 17, the night before the Obama speech. As such, it is consistent with Gallup's "bottoming out" for Obama, not contradictory. The SurveyUSA results I blogged about on Friday were also collected from March 14 to March 16, just after the Wright story broke but before Obama's speech.
Yesterday, Clinton chief strategist Mark Penn released a polling memo highlighting "some pretty big changes" in polling numbers that suggest "a strong swing in momentum in the race to Hillary." Later in the afternoon, ABC News correspondent Jake Tapper posted some analysis by Peyton Craighill of the ABC News Polling Unit:
Mark Penn’s note is full of overblown claims based on current polling. He’s cherry picking numbers from recent polls. Much of his claim of a Clinton swing is based on the latest tracking data from Gallup in which Clinton is now ahead by 7 points. If you go back two more days Obama has a 7-point lead in a separate USA Today/Gallup poll. CBS has a new poll out today that shows a close 46-43 percent Obama-Clinton race. The CBS poll also has the match ups with McCain at 48-43 percent for Obama-McCain and 46-44 percent for Clinton-McCain. We see little indication of a shift to Clinton. Of the nine polls cited in his note, five of them are not airworthy.
Tapper adds: "'Airworthy' is a term our Polling Unit uses for polls so poorly done we are discouraged from mentioning them on air." I believe Tapper left out the word "not" in that sentence. Polls considered "not airworthy" are those ABC does not mention on air, and that category includes polls conducted using an automated methodology, such as those by SurveyUSA (ABC details its standards here).
Without reopening the long debate on automated polls (a topic we've written about often), we should note that the latest round of SurveyUSA polls do generally show Obama's support worsening in general election matchups against McCain. Of course, all of those surveys were fielded last weekend (March 14-16) while the Jeremiah Wright sound-bites played endlessly on the cable news network but before Obama's speech on Tuesday. Probably the wisest advice on how to interpret poll numbers this week comes from some commentary yesterday by NBC News political director Chuck Todd:
Don't use the polls this week to judge where Obama is and what kind of damage...is it long term or is it short term. I'd wait a week and look at the polls in a week and then we'll know how badly this [hurt Obama] because there has certainly been critical mass as far as attention has been concerned on the speech and how he is trying to pivot and move on. So if there is an uptick then we will know that what we are seeing is bottom, what we are seeing today is the worst, and if today is bottom, the Obama campaign probably thinksthey can recover.
For those who will be watching results from the Mississippi primary tonight, here is a breakdown of the demographics of recent surveys as well as the tabulations of vote by race. First, the demographic composition and overall results:
Obviously, we have far fewer polls (and pollsters) to consider than for last week's primaries in Texas and Ohio. Only three pollsters have been active in Mississippi -- ARG, Rasmussen and InsiderAdvantage -- and their reported demographic compositions have been reasonably consistent. ARG uses live interviewers, Rasmussen Reports uses an automated (IVR) methodology and InsiderAdvantage has used both in recent months but does not specify their methodology on their lasttwo releases.
The vote results by race are less consistent. All show Clinton with a wide lead among white votes and Obama with a wider lead among African-Americans, but the specific results -- particularly Obama's support among black voters -- have varied. Assuming that the networks conduct an exit poll tonight, we will see in a few hours how the results from that survey compare to those above.
Update: Rasmussen Reports emails with demographic composition numbers (thank you), so I updated the table above.
Update2: As jr886 points out in the comments, the folks at The Page are certainly expecting exit poll results.
There has been a considerable buzz over the last two days about the surveys released yesterday by SurveyUSA that test both McCain-Obama and McCain-Clinton trial-heat questions in all 50-states. Putting aside the concerns some have about SurveyUSA's automated methodology and the other usual caveats about horse race polling at this stage in the campaign, I tend to agree with the critique from Matt Yglesias (via Sullivan):
Each of these polls has a sample size of 600, so the margin of error will come into play. What's more, there are 100 separate polls being aggregate here, so the odds are that several of these are just bad samples.
True on both counts. SurveyUSA colors in states on their maps even if a candidate leads by a point or two, margins that are not close to achieving statistical significance. However, since SurveyUSA says they did 600 interviews in each state, we can take their analysis a step further, applying statistical sampling error to the candidates' margins in each state.
Professor Franklin and I have done just that, classifying each state based on the statistical significance of the candidate's lead. We call a state "strong" for the candidate if they lead by a margin that is statistically significant at a 95% level of confidence, the level typically used to calculate the "margin of error" attached to most surveys. We label as "lean" any state where a candidate leads by more than one standard deviation, which amounts to a 68% confidence level. We label all other states as toss-ups.
Note also that these significance tests assume "simple random sampling," which produces a smaller error margin than we would get if we could take into account that SurveyUSA, like virtually all pollsters, weights its data. We would need access to the raw data and weights in order to do truly correct significance testing.
The tables and maps appear below, followed by some discussion. First, here are the results and a map showing an Obama vs. McCain match-up (you can click on any of the images for a larger size version):
And here are the results and a map showing an Clinton vs. McCain match-up:
If you would prefer, you can also download the spreadsheet that we used to create the tables.
Now that you have all of the data before you, let's consider the merits of the project and a few caveats about the data. First, this sort of project -- which involved 30,000 interviews completed in 50 states over a three-day period (February 26-28) -- would not have been feasible with live interviewers.
On the other hand, the automated methodology is controversial with traditional survey researchers. I wrote about the arguments for and against IVR (interactive voice response) surveys Public Opinion Quarterly, and I have blogged often on the subject often, both here at Pollster and on its forerunner MysteryPollster. Readers are obviously welcome to share their opinions about the IVR methodology in the comments.
The other caveats noted by SurveyUSA are worth repeating: They surveyed all self-reported registered voters, and did not attempt to screen for "likely voters" (although many national pollsters do the same at this stage, feeling that we are too early in the process to attempt to predict what voters will actually cast ballots). McCain would likely do slightly better in both match-ups under a "likely voter" screen. Also, we are obviously still eight months from the election. Much can and will change in terms of voter perceptions and preferences.
Let us also keep in mind the limitations of random sampling error. It tells us only about the variability that comes from calling a sample of households rather than dialing every working phone number in every state. As with any survey, it tells us nothing about the potential for error based on the wording of the questions, the selection of respondents within the household and the voters missed because they lack land-line phones or do not participate in the survey. Be careful about using the misnomer "statistical tie" to describe states in the toss-up category. One candidate would likely show a "significant" lead if we could increase the sample size -- we just lack the statistical power to know which candidate that would be.
Finally, keep in mind that since we are looking at 100 tests (2 each in 50 states), these results probably misclassify five states by chance alone (as opposed to the way we would classify them if SurveyUSA had called every working telephone in the 50 states).
With all the caveats out of the way, what does all this data tell us? Consider this summary of the electoral vote totals**:
These data are less useful in forecasting the ultimate result than they are in gauging the relative strength of both Clinton and Obama as of last week (February 26 to February 28). Those dates are important, since both the Gallup Daily and Rasmussen Reports automated tracking have shown Clinton gaining ground on Obama nationally over the last week.
Nonetheless, as of last week, Hillary Clinton led in states that add up to a slightly greater electoral vote total counting the leaners (250 for Clinton vs. 244 for Obama. Still Obama appeared to put more states into play (138 pure toss-up for an Obama-McCain race vs. a Clinton-McCain race). So Obama's initial electoral vote advantage is greater.
The most interesting aspect of these surveys is the states that explain those differences. Let's consider first the states where Obama does better than Clinton:
Obama moves three states from lean McCain to strong Obama: Colorado, Iowa and Oregon
Obama moves two states from strong McCain to lean Obama: Nevada and North Carolina
Obama leads in two states that are toss-ups in a Clinton-McCain race: New Mexico (lean) and Washington (strong)
Obama moves four states from strong McCain (against Clinton) to toss-up: Nebraska, New Hampshire, North Carolina and Virginia
On the other hand, Clinton does better than Obama in a smaller number of states:
Clinton moves one state from strong McCain to strong Clinton: Arkansas
Clinton moves one state from strong McCain to lean Clinton: West Virginia
Clinton leads in the two states that are toss-ups in an Obama-McCain race: Florida (strong) and New Jersey (lean)
Clinton moves one state from strong McCain to undecided: Tennessee
Clinton moves one state from lean McCain to undecided: Pennsylvania
Here is another table that makes it easier to see these comparisons (again, click on the image to see a full size version):
So, Pollster readers, what do you think?
**And yes, after putting these tables together I see that SurveyUSA split the Nebraska electoral votes based on on the vote totals, something I did not do.
Update: Nick Beaudrot (via Yglesias) creates thematic maps based on the same data keyed to the size of the candidate margin.
A few comments on our post of the new SurveyUSA Texas poll raised two questions worthy of further discussion.
First, reader s.b. notes:
[W]ith an automated survey, if its in English, they aren't sampling spanish only or mostly spanish speakers. I think it skews these results.
Some pollsters (such as Gallup) offer voters the opportunity to complete the survey in Spanish when they encounter Spanish speaking respondents. Most pollsters, however, will simply end the interview in these instances. I asked SurveyUSA's Jay Leve about their procedure in Texas and he notes that while they do have the facility to offer respondents the option to complete a survey in either English or Spanish (and have done so in mayoral elections in New York and Los Angeles and some congressional districts), they did not offer a Spanish interview for their Texas poll.
However, before leaping to conclusions about the SurveyUSA results, keep in mind that none only one of the other Texas pollsters report using bilingual interviewing for any of their surveys [Correction: interviews for the Washington Post/ABC News poll "were conducted in English and Spanish"]. Three of the other pollsters -- Rasmussen Reports, PPP and IVR polls -- also interview with an automated methodology rather than live interviewers.
And before leaping to conclusions about all the Texas polls, we might want to know just how many Latino voters in Texas speak only Spanish. I have not done survey work in Texas, but my memory from conversations with pollsters that do is that the percentage that will actually complete an interview in Spanish when offered is typically in the low single digits.
Second, several commenters have speculated about the small changes in the demographic composition of the last two SurveyUSA Texas polls. For example, "Mike in CA" points out:
Hispanic turnout at 28% sounds just about right. The last SUSA survey had it at 32% which was way too high. It seems SUSAhas scaled back their Hispanic estimates, so they must have a reason. Additionally, the boosted AA to 23%, from 18%. Seems reasonable considering the extraordinary increases in early voting turnout from Houston and Dallas [emphasis added].
That's not quite right. Keep in mind that SurveyUSA's approach to likely voter modeling is comparable to that used by Iowa's Ann Selzer, in that they do not make arbitrary assumptions about the demographic composition of the likely electorate. As SurveyUSA's Jay Leve explains, they "weight the overall universe of Texas adults to U.S. census" demographic estimates, then they select "likely voters" based on screen questions and allow their demographics to "fall where they may." So some of the demographic variation from survey to survey is random, but large and statistically statistically significant variation should reflect real changes in the relative enthusiasm of voters. Leve goes into more detail in the email that I have reproduced after the jump, which also includes the full text of the questions they use to select likely voters.
Jay Leve and his crew at SurveyUSA have been busy this week. Following up on our discussion of their pollster report cards, SurveyUSA has a new and improved scorecard chart for individual states primaries (example for Florida Republicans with explanation here, example for Wisconsin Democrats here). The new state-level report card format includes eight different measures of error and a number of additional variables intended to help us "better understand the correlation between the methodological choices an election pollster makes and the results an election pollster produces."
Length of field period.
Proximity of poll release to election
The number of undecided voters
The number of respondents interviewed
The sample source (if available)
The interviewing technique (if available)
The method of respondent selection (if available)
They also updated their "high-level" report cards (summarizing one measure of error for all final polls in 2008) to include polls from Wisconsin.
Finally, in another interesting innovation, they are also soliciting reader input on the McCain story:
What questions should SurveyUSA ask Americans in its polling today about today’s New York Times story, today’s Washington Post story, and John McCain’s response to it? We welcome your suggestions at editor@surveyusa.com.
Washington Post polling director Jon Cohen reported an easily overlooked but important statistic yesterday, especially to anyone thinking about the reliability of the last round of Iowa polls. Using the Iowatables here at pollster.com, he determined that public polls in Iowa this year have interviewed nearly 80,000 "likely caucus goer" respondents:
As a ratio of voters polled to expected turnout, this must be something of a record. (In 2004 about 120,000 people participated in the Democratic caucuses, and in 2000 about 90,000 in the GOP contest.)
And it's not just the public pollsters calling. Campaigns have been known to set up a phone bank or two to gauge opinion, solicit support and cajole voters to actually show up and spend hours caucusing in the middle of winter.
A month-and-a-half ago, already deep into the "silly season" but well before the final stretch, eight in 10 likely Democratic caucus goers and nearly six in 10 on the GOP side said they'd been called on the telephone by at least one of the campaigns. And Pew reported the pervasive use of robo-calls (though most Iowans who get such automated calls about the campaign said they usually hang up).
I can add two confirming anecdotes. The first comes from a comment left by "Randy Iowa" here at Pollster just last night:
Is there a Do Not Call list that i can get on? I have received a survey call everyday this week and at least one candidate has called everyday as well.
I emailed Randy, and sure enough, he is an Iowa voter. He says that "80%" of the calls he received were automated. Interestingly, he is also a non-affiliated voter (not registered with a party) registered independent who has never participated in a caucus (though has "voted Republican my entire life"). (By the way, the short answer to Randy is no. Pollsters and political campaigns are exempt from the federal do not call restrictions, though at least one group is trying to change that).
I wonder how many calls those identified as past caucus goers are getting? Here is one possible answer in he form of an email I received about an hour ago from a "help desk" operator at a major residential telephone company. He apparently assumed (mistakenly) that Pollster.com conducts surveys:
Subject: Please stop calling this customer
This customer is getting upwards of 20 calls a day from automated poll services, she lives in Iowa and her phone number is 563-[redacted]. Please stop calling her.
Not surprisingly, the recipient of the calls lives near Davenport Iowa.
Aside from spectacle of the sheer volume of "poll" calls, we might want to think about what all that calling is doing the the response rates the real pollsters are getting. And if pollsters are having a harder time getting voters to respond this week, are those suddenly reluctant voters skewing the results? We may never know, of course, but if nothing else, I would be very nervous were I using an automated (IVR) methodology to collect survey data in Iowa right now. More important: I wonder how many many Iowans have been ignoring their ringing phones altogether the last few days?
A suggestion from alert reader and frequent commenter
Andrew:
I write to suggest that you analyze
the huge discrepancy between the latest Rasmussen and Washington Post/ABC
polls. I'm talking about the Republican nomination. Rasmussen says Thompson is
up by 4 over RG, while WP/ABC says Rudy is up by 20 pts over FT, who isn't even
in second place here (36 RG to 14 FT). One of these pollsters is
obviously very wrong. Two polls cannot both be accurate, if their margin of
victory do not approximate each other. This is a humongous 24 point
discrepancy.
Here, with a little assist from Professor Franklin, is a
chart showing the discrepancy that Andrew noticed. The two surveys do seem to
show a consistent difference that is clearly about more than random sampling
error. The ABC News/Washington
Post survey shows Giuliani doing consistently better, and Thompson
doing consistently worse, than the automated surveys conducted by Rasmussen
Reports, although the discrepancy has been largest in terms of how the most recent ABC/Post poll compares to Rasmussen surveys conducted over the last month or so.
To try to answer Andrew's question, it makes sense to take
two issues separately. First, why are
the surveys producing different results for the Republican primary?
At the most basic level, these surveys seem to be measuring
the same thing: Where does the Republican nomination contest stand nationally? And
both surveys begin with a national sample of working telephone numbers drawn
using a random digit dial (RDD) methodology. Take a closer look, however, and
you will see some pretty significant difference in methodology:
The
ABC/Post survey uses live
interviewers. Rasmussen uses an automated recorded voice that asks
respondents to enter their answers by pushing buttons on a touch tone
keypad. This method is known as Interactive Voice Response (IVR). The
response rates -- and more importantly, the kinds of people that respond --
are likely different, although neither pollster has released specific
response rates for any of the results plotted above.
The
ABC/Post survey attempts to
select a random member of each household to be interviewed by asking "to
speak to the household member age 18 or over at home who's had the last
birthday" (more details here). Rasmussen interviews whatever adult member
of the household answers the telephone. Both organizations weight the
final data to reflect the demographics of the population.
Rasmussen
Reports weights each survey by
party identification, using a rolling
average of recent survey results as a target (although their party
weighting should have little effect on a sub-group of Republican primary
voters). The ABC/Post survey
does not weight national surveys at this stage in the campaign by party
ID.
[Update -- one I overlooked: The ABC/Post survey includes Newt Gingrich on their list of choices. Gingrich receives 7% on their most recent survey. If the Rasmussen survey prompts Gingrich as a choice, they do not report it. It is also possible that Rasmussen omits other candidates as well, as t Their report provides results for just Giuliani, Thompson, Romney and McCain. Update II -- Scott Rasmussen informs via email: "We include all announced candidates plus Fred Thompson"].
And
perhaps most important for Andrew's question: The ABC/Post survey asks the presidential primary question of all
adults that identify with or "lean" to the Republicans. The Rasmussen
survey screens to a narrower slice of the population: Those they select as
"likely Republican primary voters."
Unfortunately, neither pollster tells us the percentage of
adults that answered their Republican primary question, but we can take a
reasonably educated guess: "Leaned Republicans" have been somewhere between 35%
and 42% of the adult population on surveys conducted in recent months by Gallup and the Pew
Research Center. If Rasmussen's likely voter selection model for Republican
is analogous to their model
for Democrats, their "likely Republican primary" subgroup probably
represents 20% to 25% of all adults.
Consider also that, even before screening for "likely
voters" and regardless of the response rate,
those willing to complete an IVR study may well represent a population that is
better informed or more politically interested than those who complete a survey
with an interviewer.
Put this all together, and it is clear that the Rasmussen
survey is reaching a very different population, something I would wager
explains much of the difference in the results charted above.
Now, the second question, which result is more "accurate?" It is tempting to say that this question is impossible to
answer, since we will never have a national primary election to check it against.
But a better answer may be that "accuracy" in this case depends on what we want
to use the data for.
If we were trying to predict the outcome of a national
primary, and if all other aspects of methodology were equal (which they're
not), I would want to look at the narrower slice of "likely voters" rather than
all adult "leaned Republicans." Since the nomination process involves series of
primaries and caucuses starting with Iowa and New Hampshire, and since
the results from those early contests typically influence preferences in the
states that vote later, we really need to focus on early states for a more
"accurate" assessment of where things stand now. While interesting and fun to
follow, these national measurements provide only indirect indicators of the
current status of the race for the White House.
Why would the ABC/Post survey want to look at all
Republicans, rather than likely voters? Here is the way ABC polling director
Gary Langer explained it in his online
column this week:
I like to think there are two things we cover in
an election campaign. One is the election; the other is the campaign.
The campaign is about who wins. It's about tactics
and strategy, fundraising and ad buys, endorsements and get-out-the-vote
drives. It's about the score of the game - the horse race, contest-by-contest,
and nothing else. We cover it, as we should.
The election is the bigger picture: It's about
Americans coming together in their quadrennial exercise of democracy - sizing
up where we're at as a country, where we want to be and what kind of person
we'd like to lead us there. It's a different story than the horse race, with
more texture to it, and plenty of meaning. We cover it, too.
We ask the horse race question in our national
polls for context - not to predict the winner of a made-up national primary,
but to see how views on issues, candidate attributes and the public's personal
characteristics inform their preferences.
Questions like Andrew's are more consequential in the statewide surveys we
are tracking here at Pollster.com, and those surveys have been producing some
discrepancies even bigger than the one charted above. We will all be in a
better to make sense of those differences if we know more about the
methodologies pollsters use. I'll be turning to that issue in far more detail
next week.
The latest automated SurveyUSA poll in the Kentucky
Governor's race provides us with one of those classic conflicting poll stories
that we just love here at Pollster.com, because it illustrates how small differences
in methodology can have a profound effects on the results. In this case, SurveyUSA
shows Democrat Steve Beshear leading incumbent Republican Ernie Fletcher by a
23 point margin (59% to 36%) with only 5% undecided. Meanwhile, an InsiderAdvantage
poll conducted a week earlier shows Beshear leading by just three points (41%
to 38%) with a much larger number (21%) in the undecided category
What explains the difference? Continue after the jump for
more explanation, but my best guess is that the solution can be found in this
conundrum: On a poll, "undecided" means something different than "still trying
to decide."
Today's New
York Times gives prominent play to a story
on bills working their way through various state legislatures across the
country to crack down on prerecorded campaign calls:
Nearly two-thirds of registered
voters nationwide received the recorded telephone messages, which as political
calls are exempt from federal do-not-call rules, leading up to the November
elections, according to a survey by the Pew Internet and American Life
Project, an independent research group. The calls, often known as
robocalls, were the second most popular form of political communication,
trailing only direct mail, the group said.
The article did not address the potential impact, if any, on
automated surveys conducted using a pre-recorded script that ask respondents to
answer by typing keys of their touch tone telephones. As of January, according to
the newsletter of the
Council of American Survey Organizations (CASRO), there were already "sixteen
bills in seven different states addressing automated calls." However, as I read
the CASRO report, most of these new bills -- like the federal "do-not-call"
regulations -- do not appear to restrict calls made for the purpose of survey
research.
I thought it might be useful to ask the opinion of survey
researchers who conduct automated "interactive voice response" (IVR) surveys
for their reaction to today's Times
story.
Jay Leve is the editor of SurveyUSA,
a firm that conducts public polls exclusively with IVR. His comment:
The people who try to deceive voters, using whatever
technology, should be put in prison. Nothing is more repugnant than individuals
or firms who use technology to disenfranchise voters, which is what the calls
being debated do. Many such calls are designed to suppress turnout. They are
the 21st Century Bull Connor, with a fire-hose replaced by Ethernet.
SurveyUSA welcomes carefully drawn legislation that makes it a crime to mislead
voters, by whatever means. SurveyUSA opposes sloppily drawn legislation, in any
jurisdiction, that fails to recognize the vital community interest served by
legitimate, hyper-local public opinion research.
Thomas Riehle is a partner in RT Strategies, a firm that usually conducts
telephone surveys using live interviewers (including the polls conducted for
the Cook Political
Report). One exception was last year's Majority
Watch project, which fielded pre-election polls via IVR in contested U.S.
House races. Riehle's comment:
The research industry, under the
leadership of CMOR [the Council for Marketing and
Opinion Research], has done a good job in helping legislators and
regulators distinguish between a telemarketing program contacting hundreds of
thousands of households with sales or advertising mass-marketing messages,
which 'do not call' lists regulate, and live telephone interviews with a few
hundred or a thousand households who complete a survey research project. I
would hope that regulators or legislators intending to limit any negative impact
they might find caused by hundreds of thousands of political telemarketing
recorded calls will not unintentionally limit the ability to complete a few
hundred survey research calls using recorded-voice interviews.
As they say in radio land, our comment line is open. What do
you think?
Both Mickey
Kaus and Chris
Bowers at MyDD noticed that Rasmussen Reports has been showing a much
closer race on their automated national
tracking of the 2008 Democratic presidential primary contest. Both floated
different theories for that difference that imply that the Rasmussen's numbers
are a more accurate read. This post takes a closer look at those arguments,
although the bottom line is that hard answers are elusive.
The chart below shows how the recent Rasmussen surveys
compare to the trend for all other conventional polls as tracked by Professor
Franklin here at Pollster. The bolder line represents the average trend across
all conventional surveys, while the shorter narrow lines connect the recent
Rasmussen surveys. Click the image to enlarge it, and you will see that all but
one of the Rasmussen surveys shows Barack Obama running better than the overall
trend. The Rasmussen results for Clinton
show far more variability, especially during the first four weeks of
Rasmussen's tracking program. They show Clinton
running worse than other polls over the last three weeks. Note that a new survey
released overnight by
Gallup (that shows Clinton's lead "tightening") has not altered
the overall trend.
Of course the graphic above includes survey questions that continue
to include Al Gore on the list of candidates. In order to reduce the random
variability and make the numbers as comparable as possible, I created the following
table. It shows that Clinton leading by an average of roughly 15 points (38.6%
to 23.8%) on the three most recent conventional telephone surveys, but by just
5 points (33.0% to 28.3%) on the three most recent Rasmussen automated surveys
(surveys that use a recorded voice and ask respondents to answer by pressing
buttons on their touch tone phones). Given the number of interviews involved,
we can assume that these differences are not about random sampling error. Something
is systematically different about the Rasmussen surveys that has been showing a
tighter Democratic race over the last three weeks.
But what is that difference? That's a tougher question to answer.
Here are some theories, including those suggested by Bowers and Kaus:
1) The automated methodology yields
more honest answersabout vote choice (and thus, a more
accurate result). The theory is that some people will hesitate to reveal
certain opinions to another human being, particularly those that might create
some "social discomfort" for the respondent. Thus, Kaus provides his "Don't Tell Mama"
theory: "men don't like Hillary but are reluctant to say so in public" or to
"tell a human interviewer -- especially, maybe, a female interviewer."
2) The people sampled by
Rasmussen's surveys are more representative of likely Democratic primary
votersbecause it uses a tighter screen. Chris Bowers makes
that point by arguing that the Rasmussen screen looks slightly tighter than
those used by other pollsters - "38-39% of the likely voter population" rather
than the "40-50% of all registered voters [sampled by] the vast majority of
national Democratic primary polls."
3) The people sampled by automated
surveys are more representative of likely primary votersbecause
they give more honest answers about whether they will vote. We
know from at least 40 years of validation studies that many respondents will
say they voted when they did not, due to the same sort of "social discomfort"
mentioned above. Voting is something we are supposed to do, and a small portion
of adults is reluctant to admit to non-voting to a stranger on the telephone. In
theory, an automated survey would reduce such false reports.
4) The people sampled by automated
surveys are less representative of likely primary votersbecause
they capture exceptionally well informed respondent. This theory is one
I hear often from conventional pollsters. They argue that only the most
politically interested are willing to stay on the phone with a computer, and so
automated surveys tend to sample individuals who are much more opinionated and
better informed than the full pool of genuinely likely voters.
Lets take a closer look at the arguments from Kaus and
Bowers.
Kaus makes much of the fact that the Rasmussen poll shows a
big gender gap, with Clinton
showing a "solid lead" (according to Rasmussen) among men, but trailing 11
points behind Obama among men. He wonders if other polls show the same gender
gap. While precise comparisons are impossible, all the other polls I found that
reported demographics results also show Clinton doing significantly better
among women then men (Cook/RT
Strategies, CBS News,
Time and the Pew Research
Center). Rasmussen certainly shows Obama doing better among men than the
other surveys, but then, Rasmussen shows Obama doing better generally than the
other surveys.
Of course (if it turns out the
gender gap in the two polls is roughly comparable) it could be that many men and
many women don't like Hillary but are reluctant to say so in public. (if it
turns out the gender gap in the two polls is roughly comparable) it could be
that many men and many women don't like Hillary but are reluctant to
say so in public.
His backup may be plausible, especially when interviews are
conducted by women, although we obviously have no hard evidence either way.
Bowers' theory feels like a better fit to me, especially if
we also consider the possibility that the absence of an interviewer may
reduce the "measurement error" in the selection of likely voters. The bottom
line, however, is that we really have no way to know for sure. It is certainly possible,
of course, that the Rasmussen's sampling is less accurate. All of these
theories are plausible, and without some objective reality to use as a
benchmark, we can only speculate about which set of polls is the most valid.
What strikes me most, as I go through this exercise,
is how little we know about some important methodological details. What are
the response rates? Are Rasmussen's higher or lower than conventional polls? How many respondents answered the primary vote questions on recent surveys conducted by ABC News/Washington Post, NBC/Wall Street Journal and Fox News and the most recent CNN survey? Many
pollsters provide results for subgroups of primary voters, yet virtually none
tell us about the number of interviews behind such findings. We also know
nothing of the demographic composition of their primary voter subgroups, including
gender, age or the percentage that initially identify as independent.
And how exactly do those pollsters that currently report on "likely
voters" select primary voters? How tight are their screens? Very little of information
is in the public domain (and given that these numbers involve primary results,
my likely
voter guide from 2004 is of little help).
I emailed Scott Rasmussen to ask about their likely voter
procedure for primary voters. His response:
We start
with the tightest segment from our pool of Likely Voters... Dems are asked about
how likely they are to vote in Primary... Unaffiliateds are asked if they had the
chance, would they vote in a primary... if so, which one...
I am not completely sure what the "tightest segment" is, but
I my guess is that they take those who say they will definitely or certainly vote
in the Democratic primary. He also confirmed that the 774 likely Democratic
primary voters came from a pool of 2,000 likely voters. So last night I asked what portion of adults qualified as likelyvoters so we might do an apples-to-apples comparison of the relative "tightness"
of survey screens. As of this writing, I have not received an answer.UPDATE: Via email, Scott Rasmussen tells me that while he did not have numbers for that specific survey readily available, the percentage of adults that qualify as likely general election is typically "65% to 70%...for that series." He promised to check and report back if the number for this latest survey are any different.
But with respect to all pollsters again, and not just Mr. Rasmussen, why
is so little of this sort of information in the public domain? Most media pollsters
pledge to abide by professional codes of conduct that
require disclosure of basic
methodological details on request. Maybe it's time we start asking for that
information for every survey, and not just those that produce quirky results.
Picking up on the post from earlier tonight, the new Majority Watch surveys released today provide another strong indicator of recent trends, in this case regarding the race for the U.S. House. The partnership of RT Strategies and Constituent Dynamics released 41 new automated surveys conducted in the most competitive House districts.
Since they conducted identical surveys roughly two weeks ago in 27 30 of the 41 districts, we have an opportunity for an apples-to-apples comparison involving roughly 27,000 30,000 interviews in each wave. The table below shows the results from both waves from each of those 27 30 districts. The bottom line average indicates that overall, the Democratic margin in these districts increased slightly, from +1.9 to +2.7 percentage during October.
Whatever one may think of their automated methodology, the Majority Watch surveys used the same methodology and sampling procedures for both waves. And as with the similar "mashup" of polls in the most competitive Senate races in the previous post, these also show no signs of an abating wave.
Interests disclosed: Constituent Dynamics provided Pollster.com with technical assistance in the creation of our national maps and summary tables
Andrew Kohut is the President of the Pew Research Center and arguably the
dean of the survey research profession. President of the Gallup
Organization from 1979 to 1989, Kohut recently received the highest honor of
the American Association of Public Opinion Research's highest honor, their
2005 Award for Exceptionally Distinguished Achievement. He spoke with
Pollster.com's Mark Blumenthal last week about how the Pew Research
Centerwill measure voting intentions for the upcoming elections and
about the future of survey research.
Topic A - for just about everybody right now - is handicapping the races for control of the House and Senate. I'm sure our readers would be interested in your take. But I think perhaps of even greater interest would be what kinds of surveys and measures you are looking at and will be looking at over the coming weeks?
Well, we're going to do what we traditionally do in off-years and that is measure voting intentions for the House. Generally in off-years the pre-election polls do a pretty good job of estimating the popular vote for the House and we know that has a correspondence to the number of seats that each party has. In 1994 we were very fortunate that The Times Mirror Center, the center that preceded Pew, was among the first to say, "We've got a Republican plurality in the popular vote." We didn't have quite enough of a margin in the poll, even though the poll provided a very accurate estimate of the popular vote to flatly predict that Republicans would take over, but we described it as a high likelihood. We could have the same thing happen in this election. What I'm struggling with is that safe-seat redistricting has made the relationship between the popular vote and seats won by each party less than what it once was. And so we're going to have to try to make our estimates, taking into account the traditional relationship between seats and votes and how that relationship may have changed since the '90s Census was used to redistrict.
Will you be looking at any of the statewide surveys or congressional level surveys that are out in the public?
Well, I look at them just for the sake of trying to understand what else is going on out there, but what I learned from Paul Perry at the Gallup Organization was to not use ad-hoc judgments, but to focus on the survey measures that we use to estimate the size of the vote of the party or a candidate. So in the meantime we're concentrating on whether our turnout scale is working well, how the undecideds are likely to break, what the last minute trends are if any, and how stable are people's choices. Those are the things that are really most important to me. I'm not a handicapper, I'm a measurer. There's a difference.
Actually that's a perfect segue to another question I wanted to ask. Just before the 2004 election, as you well know, your final survey gave George Bush a three-point lead in the popular vote. And you did a projection in which you allocated the remaining six percent that were undecided about evenly and predicted a 51 to 48 Bush win, which turned out to be right on the nose exactly the way the popular vote broke. You wrote in that final report, "Pew's final survey suggests the remaining undecided vote may break only slightly in Kerry's favor." And I think you did a three-to-three allocation or something close to that. And I just wondered what you can tell us about the process you used to reach that conclusion then and what does it say about what you will do in the coming weeks?
Well, we do a couple of things. First, we throw out half of the undecideds because validation surveys show that they vote at very low rates. Then, we look at a regression equation that predicts choices based upon all of the other questions we have in the survey among the decideds and apply that model to the undecideds. We also then look at the way the leaners - that is the people who don't give us a choice initially - are breaking and make the assumption that the leaners are closer to the undecideds than to the people who give us an answer right off the top of their heads when we ask them it first.
I want your readers to know that we ask several questions, the first one is the flat out question where we ask where you lean, and we look at how the leaners break. We take those two estimates in mind and divide the undecideds. They are based upon measures. They're not based upon "you know I think," "I got this feeling," "history tells us," or any of this other stuff where you can let judgments get in your way.
What I learned from Paul Perry - and I keep going back to him because he taught me everything I know about this - is that what you should be prepared to do is to have a way of measuring all of the things that you're interested in covering and be able to look at those measurements in the current election relative to your experience in previous elections. And we try to do that. The one time I didn't do that was in 2002, because I was pre-occupied with other things. On an ad hoc basis, I kicked out one of my traditional questions out of the turn-out scale and it really hurt our projection. It made it too Democratic. I won't do that again. I chalk that mistake up to being pre-occupied with the first Global Survey that we were doing at the same time. In any event having said that, that's my philosophy and that's the way we will pursue it here at the Pew Research Center.
I'd like to take a more forward look at what trends you've seen developing in survey research. If you could try to imagine a world in ten or twenty years, how differently do you think the very best political surveys will be conducted?
I really don't know the answer to that. Hopefully somehow we're going to solve the problem of a sampling frame for online surveys, because I'm a firm believer that unless you have a sampling frame in which you can draw samples of people online, it's hard to do these post-facto weightings of people who opt-in to samples and make that work. I haven't seen it yet to my satisfaction. Obviously means of communication are so much more sophisticated and varied - the old land-line telephone will probably be a relic - so I don't have a good answer for you. I'm confident this is a practice that is pretty nimble and full of people who are survivors and will figure a way to cope with it. What that way is, I'm not sure.
I guess that takes me to one last topic. We've logged in over 1000 statewide polls in our database at Pollster.com, and more than half of the statewide surveys have been either automated recorded voice telephone (IVR) or Internet panel. And of the 200 or so polls that have been released on the House, about half of those have been automated. You spoke about the Internet panel problem and I wonder what sort of reaction you have to the explosion of automated recorded IVR surveys.
Well, I know they did reasonably well in one election. I would have to see them perform over a longer period of time. I'd like to see where they succeed and where they don't succeed. They always remind me a little bit of a New Yorker cartoon of two hounds sitting in front of a computer screen and one turns to the other and says, "On the internet they don't know we're dogs." One of the things that really bothers me about this is that we just don't know who we're talking to. And that goes to the very premise of the practice of sampling: you should know who you're talking to. In any event I will take a wait-and-see - I want to see more evidence before I come to some conclusion about it, other than my true discomfort with completion rates that low and not knowing firmly or clearly who you're dealing with.
With the addition of House race data to Pollster.com, it is a good time to talk about the difficulty of measuring the status of the race to control Congress at the district level. Political polling is always subject to a lot of variation and error (and not all of it the random kind), but Congressional district polls have their own unique challenges.
First, we are tracking something different in terms of voter attitudes an preferences than in other races, particular contests for President. Two years ago, voters received information about George Bush and John Kerry from nearly every media source for most of the year. Huge numbers of voters tuned in to watch live coverage of nationally televised candidate debates. In races for the Senate and House, news coverage is far less prevalent and voters pay considerably less attention until the very end of the campaign. Even then, voters still get much of their information about House candidates from paid television and direct mail advertising.
Of course, in the top 25 or 30 House races, the candidates (and political parties) have already been airing television advertising. However, if you expand the list to the next 30-40 races that could be in play, the flow of information to voters drops off considerably. Middle-tier campaigns in districts in expensive media markets (like New York or Chicago) will depend on direct mail rather than television to reach voters.
So generally speaking, voter preferences in down ballot races are more tentative and uncertain. The (Democratic affiliated) Democracy Corps survey of Republican swing districts released last week reported 26% of likely voters saying there is at least a "small chance" they may still change their minds about their choice for Congress. When they asked the same question about the presidential race in mid-October 2004, only 14% said they saw a "small chance" or better of changing their mind about voting for Kerry or Bush.
This greater uncertainty means that minor differences in methodology can have a big impact on the results. Specifically, pollsters may vary widely in terms of the size of the undecided they report depending on how hard they push uncertain voters.
Second, the mechanics of House races polling can be very different from statewide methodology. The biggest challenge involves how to limit the sample to voters within a particular House district. In statewide races the selection is easy. Since area code boundaries do not cross state lines, it is easy to sample within individual states. So most of the statewide polls we have been tracking use a random digit dial (RDD) methodology that can theoretically reach every voters with a working land line telephone.
No such luck with Congressional districts, whose gerrymandered borders frequently divide counties, cities, even small towns and suburbs. Since very few voters know their district numbers, pollsters use a variety of strategies to sample House districts. Most of the partisan pollsters, as well as the Majority Watch tracking project, use samples drawn from lists of registered voters (sometimes referred to as "registration based sampling" or RBS). These lists make it easy to select voters within a given district, but the lists frequently omit telephone numbers for large numbers of voters (typically 20% to 40%30% to 50%**). Remember the real fear that RDD surveys are missing cell-phone-only households? Right now the missing cell phone households represent roughly 6-8% of all voters. Lists, obviously, miss many more. If the uncovered households differ systematically from those with working numbers on the lists, a bias will result.
Again, most partisan pollsters (including my firm) are comfortable sampling from lists, because the benefits of sampling actual voters within each District appear to outweigh the risks of coverage bias (see the research posted by list vendor Voter Contact Services of a sampling of arguments in favor of RBS). Media pollsters are generally more wary. SurveyUSA, for example, conducted a parallel test of RDD and RBS in a 2005 experiment that found a large and consistent a bias in RBS sampling that favoring one candidate. "SurveyUSA rejects RBS as a substitute for RDD," their report read, "because of the potential for an unpredictable coverage bias." So in House polls they often use RDD and screen for voters in the given district based on voters' ability to select their incumbent member of Congress from a list of all members of Congress from their area.
These various challenges have made many media outlets and public pollsters wary of surveys in House races. As of two week ago, we had logged more than 1,000 statewide polls for Senate or Governor into our Pollster.com database for 2006. As of yesterday, we had tracked only 173 polls conducted in the most competitive House races, but as the table below shows, only 47 of those came from independent media pollsters using conventional telephone methods
Nearly half of all the House race polls come from two automated pollsters: SurveyUSA (23) and especially the Majority Watch project of RT-Strategies and Constituent Dynamics (56). Also, more than a quarter of the total (52) are partisan surveys conducted by the campaigns, the party committees or their allies, with far more coming from Democrats (44) than Republicans (8).
The sample sizes for House race surveys are also typically smaller. While national surveys typically involve 800 to 1000 likely voters, and statewide surveys 500 to 600, many of the House polls involve only 400 to 500 interviews (although the Majority Watch surveys have been interviewing at least 1000 voters in each district).
Finally, very few districts have been surveyed by public pollsters more than a few times since Labor Day. Only two of the 25 seats now held by Republicans rated as "toss-ups" by the Cook Political report have been polled 5 or more times. Most of these critical seats have been polled 2 to 4 times. Put this all together, and the results are likely to be more varied and more subject to all sorts of error than other kind of political polls. After the 2004 election, SurveyUSA put together a collection of results for every pre-election public opinion poll released in the U.S. from October 1 to November 2, 2004. Their spreadsheets included 64 House race surveys, and their calculations of the error of each survey indicate that those few House races had more than double the error on the margin (5.82) than the polls conducted in the presidential race (3.43).
All of which goes to say that while we too will be watching the House polls more closely over the next three weeks, for all the tables and numbers, we know far more about these races than meets the eye. More on what we do know tomorrow.
**Correction: Colleagues have emailed to point out that quoted match rates for list samples have improved in recent years and now typically range from 60% to 80%. I won't quarrel, although I have had past experiences where quoted rate exaggerated the actual match once non-working numbers are purged from the sample.
We have devoted much attention recently to the flood of new national surveys showing small declines in the Bush job approval rating and modest Democratic gains on the generic House ballot question since mid-September. Until today, I had not looked closely at levels of party identification reported on those surveys. It turns out those have also trended Democratic recently, a finding that may explain some of the apparent "house effect" differences among statewide pollsters over the last few days.
The debate over weighting surveys by party identification has been a focus of this blog since its inception. My posts on the subject from 2004 and beyond are worth reviewing but the gist is this: Pollsters typically ask respondents some variant of a question asking whether they consider themselves "a Democrat, a Republican or an independent?" The so called "Party ID" question has been asked, examined and studied for more than 50 years, and an ongoing debate exists about whether to weight (or statistically adjust) survey results by party.
The crux of the debate is whether party identification is more like a fixed demographic characteristic (such as gender or race) or more like an attitude that can change with the prevailing political winds. For most adults, party identification does appear to be highly stable, changing rarely if ever. The problem is that some small portion of voters (perhaps 10% or 15%) appear willing to jump back and forth -- usually between one of the parties and the independent category -- depending on the wording of the question, its position in the survey, how hard the interviewer pushes for an answer, or, in some cases, what has been happening in the news.
Those who argue for weighting by party say that the real trends tend to be slow and gradual and that party weights can adjust dynamically over time to accommodate these slow moving trends (see also the party weighting page maintained by Prof. Alan Reifman). Those who argue against party weighting (a class that includes most of the national media pollsters) worry that such an approach will suppress real but short-term changes that sometimes occur in reaction to news events (such as the period just after the 9/11 attacks or the period just after the 2004 Republican convention).
A look at the party identification data from the recent surveys suggests we may be in the midst of another such short term change. The table that follows shows party identification results for six national surveys conducted before and after the resignation of Congressman Mark Foley. Five of six show some Democratic gain in party identification:
This change may also explain the wide divergence in results reported by the two automated pollsters in two nearly simultaneous surveys conducted this week in Missouri and Ohio. In both states, SurveyUSA showed the Democratic candidates with significantly greater leads (+14 in Ohio and +8 in Missouri) than Rasmussen (+6 in Ohio and -1 in Missouri). While both pollsters use the automated "interactive voice response" (IVR) methodology, Rasmussen weights by party and SurveyUSA does not. Moreover, the most recent SurveyUSA samples have grown more Democratic since August.
Does this shift in party identification represent a real shift in attitudes among the population of adults or registered voters or does it reflect some short enthusiasm among Democrats to be interviewed? Is the change a momentary spike or will it persist until Election Day? These are the questions that professional pollsters are mulling over right now, and the answers are not obvious. We will just have to wait and see (no pun intended).
Our Slate Election Scorecard update tonight focuses on two new polls in New Jersey that confirm recent gains by Bob Menendez and move the race to lean Democrat status. The overall scorecard tally now indicates 49 seats leaning or held by Democrats, 49 seats held or leaning Republican. Is this change indicative of a larger Democratic surge?
Two new polls out this evening from Survey USA in Ohio and Missouri both show the Democratic candidates in each state leading by much wider margins than on other recent polls. These results and the sometimes improbably wide Democratic margins on the generic House ballot in some recent national surveys leave some wondering whether, as reader Gary Kilbride put it in a comment a few hours ago, "the current poll numbers skew misleadingly toward Democrats due to the Foley scandal." He wonders if the same might be happening to the Majority Watch congressional district results released today.
I will have more to say about all of this tomorrow, but for tonight one quick note about those new Majority Watch congressional surveys. Although they released results from 32 districts today, only nine involved follow-up surveys in districts polled previously using comparable ballot tests. The table below shows the August and October results for those nine districts.
All of these Districts are currently represented by Republicans and all were rated as toss-ups by the Cook Political Report when the polls went into the field (they moved CO-07 to lean Democratic status just yesterday). While Tom Riehle's analysis made much of the apparent Republican improvement in Washington-08, Virginia-02 and Indiana-02, the overall pattern looks more random. Those Republican advances were largely offset by Democratic gains in North Carolina-11 and New Mexico-01. Overall, the average Democratic margin declined by just a single percentage point.
The bigger story may be that the average Republican percentage across these nine districts has not budged from 46% since August or that none of the Republicans in the nine districts holds a statistically significant lead. More on the meaning of these House polls tomorrow.
This "Guest Pollster Corner" contribution comes from Thomas Riehle, a
Partner of RT Strategies
Editor's note: In a 2:30 p.m. press conference, Riehle announced that when
the sum up results of the 63 surveys they have conducted since August and
consider races where a candidate hold a lead beyond the margin of error,
Democrats currently lead or safely hold 217 seats and while Republicans lead
in or hold 198 seats. Democrats will need to win 218 seats to gain majority
control. Full data are now available at www.majoritywatch.com, including a
summary of all top-line results to date. The Pollster.com House Race page
is updated to include all of the new Majority Watch data.
Majority Watch, a project of RT Strategies and Constituent Dynamics, sponsored by Waggener Edstrom Worldwide, is the most comprehensive project ever undertaken to identify and conduct polls in most of the highly contested House races across the country. In August and September, Majority Watch polled in 30 House districts. On October 1, we polled Mark Foley's Florida 16th district with two simultaneous polls, one in which respondents were informed that a vote for Foley would count for the Republican candidate to be named later, and one in which respondents did not get that information.
Today, Majority Watch begins to release results from Round II. In 32 races polled in the current cycle:
Republican incumbents who seemed to be in trouble in late August have held on or even improved their positions. In Washington's 8th C.D., Republican Rep. Dave Reichert has moved from 3 points behind to 3 points ahead of Democrat Darcy Burner, 48%-45%. In Virginia's 2nd C.D., Republican Rep. Thelma Drake has moved from 8 points behind to a marginal 2-point lead over Democrat Phil Kellam, 48%-46%. In Indiana's 2nd C.D., Republican Rep. Chris Chocola has moved from 12 points behind to only 4 points behind Democrat Joe Donnelly, 46% for Chocola to 50% for Donnelly. In Colorado's 7th C.D., Majority Watch polling shows Republican Rick O'Donnell is tied with Democrat Ed Perlmutter in the race to fill the open seat, essentially unchanged since August. All of these were among the first races Democrats targeted, and that early warning may have given Republicans the head's up they needed to remain competitive and avoid getting swept away.
Most Republican leaders have survived the worst of Foley's Folly, but in localized areas where there was a local media hook for the story (Florida, New York, possibly Arizona), damage may have been severe for many Republicans, at least at this time -- there's still time to recover.
On the positive side for Republicans, neither Speaker Denny Hastert (ahead by 10 points, 52%-42%) nor House Page Board chairman Rep. John Shimkus (ahead by 17 points, 53%-36%) seem to have suffered. The highest profile Republican House incumbent closest to Washington, D.C., Rep. Frank Wolf in Virginia's 10th C.D., remains ahead of Democrat Judy Feder, 47%-42%.
On the other hand, in Ohio, where the Republicans were already beset by the culture of corruption charge, Republican Conference Chairperson Deborah Pryce is behind by double digits, in Ohio's 18th District Republican Joy Padgett trails Democrat Zack Space by 9 points, and even in Ohio's 2nd C.D., Rep. Jean Schmidt is marginally behind Democrat Democrat Victoria Wulsin by 3 points, 45%-48%.
In New York, NRCC Chairperson Tom Reynolds has stumbled badly, trailing Democrat Jack Davis by 16 points, 56% for Davis to 40% for Reynolds. In the open seat in New York's 24th C.D., Democrat Michael Arcuri has opened a significant lead, 53%-42% over Republican Raymond Meier. Even Republican Rep. Peter King, never shy about pointing out when the leadership is wrong and vocal in his anger at how House leaders have handled the Foley case, seems to have suffered -- his is only marginally ahead, 48%-46% over Democrat Dave Mejias.
In a surprise, Arizona Republican Rep. Rick Renzi is marginally behind Democrat Ellen Simon, 50%-46%.
The Philadelphia suburbs remain troublesome for Republicans, with Republican Rep.s Jim Gerlach and Curt Weldon trailing their Democratic challengers.
Majority Watch takes advantage of new technologies, married to the oldest standards of sampling and vote modeling, to extend the practice of public opinion polling down to the level of House races. Calls are made by IVR recordings ("robo-calling"). The sample is drawn from voter lists of active voters, with Majority Watch controlling in-home selection in those households where more than one voter resides. The calls are kept extremely short in order to keep response rates as high as those for many publicly-released telephone interviewer polls (about 20% response rate using the standard AAPOR definition). And consumers are increasingly comfortable pushing buttons to respond to recorded voices -- can any reader say he or she is unfamiliar with the notion of "press 1" for one thing or "press 9" for another? These "robo-calls" perform not much differently than traditional telephone interviewer calls for very short, "horse-race" polls.
Majority Watch is currently polling in ten more House districts for release next week (GA-08, IL-08, IL-10, NH-01, NH-02, NY-19, NY-20, NY-25, NY-29, and OH-01), at which time we will have solid polls, with about 1,000 voters, in each of 55 House races. Depending on developing political circumstances, we may further expand the list and conduct more polls after next week.
Based on what you know right now, do you think Speaker of the House Dennis Hastert should remain in his position as Speaker of the House? Do you think he should resign as Speaker of the House but remain a member of Congress? Or do you think he should completely resign from Congress?
27% Remain Speaker
20% Resign leadership
43% Resign from Congress
10% Not sure
MP readers may want to note that the results above, from a one-night sampling of 1,000 adults conducted Thursday night, were actually the second of a two-night tracking poll. From the SurveyUSA release:
Though Thursday night's polling data is not good news for Hastert, the data is an improvement from SurveyUSA interviews conducted 24 hours prior, on Wednesday night. Then, 49% of Americans said Hastert should resign from Congress, 17% said he should remain as Speaker, and 23% said he should resign his Leadership post but remain a member of Congress. Though the day-to-day movement is small, and some of it is within the survey's 3.2% margin of sampling error, the movement is consistent across the board and therefore worthy of comment.
There are inherent limitations to surveys with short field periods; however, when a news story is changing hour-by-hour, nightly tracking studies can provide a valuable "freeze-frame" snapshot of what Americans were thinking at a moment in time.
As part of their "interactive crosstabs" for this poll, SurveyUSA provides a time series chart that allows users to plot trends for each of the key subgroups (via the pull-down menu that appears in the upper left corner of the data table).
Now I have no idea whether SurveyUSA intends to continue tracking this question going forward. They are obviously a lot busier now than when they tracked the response to Hurricane Katrina for 24 days in September 2005. But if these results intrigue you, it's probably worth checking the SurveyUSA Breaking News page for further updates.
Update: Rasmussen Reports, the other big automated pollster, also conducted a survey on whether Hastert should resign. Their results offer a lesson on the challenges of writing this sort of question:
Should Dennis Hastert Resign from His Position as House Speaker? 36% Yes 27% No 37% Not sure
Do you have a favorable or unfavorable opinion of Dennis Hastert? 10% Very Favorable 14% Somewhat favorable 19% Somewhat unfavorable 16% Very unfavorable 41% Not sure
The number who say Hastert should resign as speaker is much higher on the Rasmussen survey (36%) than on the SurveyUSA poll (20%), but Survey USA reports a much higher number (63%) who say Hastert should resign either as speaker or from Congress. Offering three choices rather than two appears to make a big difference. And the fact that 41% on the Rasmussen survey say they do not know Hastert well enough to rate him helps explain why the question format and language make such a big difference.
The timing also differed: The Rasmussen survey was conducted over the last two nights (Thursday and Friday) while SurveyUSA tracked on Wednesday and Thursday nights.
Our Slate Senate Scorecard update for tonight focuses on a new Rasmussen poll in Connecticut that shows Joe Lieberman leading Democratic nominee Ned Lamont by ten points (50% to 40%).
Tracking the Connecticut Senate race especially challenging because the most active pollsters in the state have shown consistent differences in their results -- at least until today. See the chart below (courtesy Charles Franklin), which shows Lieberman's margin over Lamont (Lieberman's percentage minus Lamont's percentage):
Both the Rasmussen automated surveys and the conventional, live interviewer phone polls conducted by Quinnipiac University showed Lieberman's margins narrowing since July but holding fairly steady over the last month. However, until the survey released today, the Rasmussen surveys have consistently shown a closer margin than the Quinnipiac Polls. This pattern is similar to the one we described yesterday in Tennessee, where Democrat Harold Ford is running stronger on the Rasmussen surveys than on conventional telephone interview polls conducted by Mason Dixon.
In this case it is harder to use the survey mode (live interviewer vs. automation) to explain the differences in because the house effects are inconsistent by mode. Another live interview pollster (American Research Group) has also shown a consistently closer race, while automated pollster SurveyUSA reported Lieberman ahead by 13 points in early September.
Today's result, however, brings the Rasmussen and Quinnipiac polls into agreement, at least for the moment. The last Quinnipiac poll released last week also showed Lieberman leading by 10 points. So is the latest turn in the Rasmussen trend line the sign of new Lieberman momentum, a convergence in the polls results or just an outlier result? Only time, and more surveys, will tell for sure.
Hmm. Just yesterday we had one with Ford up by 5; not long before that there was one with Corker up by 5. Is it just me, or is this more variation than we usually see? Are voter sentiments that volatile (or superficial)? Or is there something about this race that makes minor differences in polling methodology more important? Or is this normal?
At the moment at least, I agree with the answer he received later from Michael Barone that the poll numbers in Tennessee do not appear unusually volatile. Barone pointed out that the results of nearly all the Tennessee polls this year appear to fall within sampling error of the grand average. That point is worth expanding on, but it is also worth noting that the averages conceal some important differences among the various Tennessee surveys.
First, let's talk about random sampling error. If we assume that all of the polls in Tennessee used the same mode of interview (they did not), that they were based on random samples of potential voters (the Internet polls were not), that they had very high rates response and coverage (none did), that they defined likely voters in exactly the same way (hardly), that they all asked the vote question in an identical way (close, but not quite) and that the preferences of voters have not changed over the course of the campaign (no again), then the results for the various polls should vary randomly like a bell curve.
Do the appropriate math, and if we assume that all had a sample size of roughly 500-650 voters (most did) than we would expect these hypothetically random samples to produce a results that falls within +/- 4% of the "true" result 95% of the time. Five percent (or one in twenty) should fall outside that range by chance alone. That is the standard "margin of error" that most polls report (which captures only the random variation due to random sampling. But remembering the bell curve, most of the polls should cluster near the center of the average. For example, 67% of those samples should fall within +/- 2% of the "true" value.
Now, let's look at all of the polls reported in Tennessee in the last month, including the non-random sample Zogby Internet polls:
As it happens, the average of these seven polls works out to a dead-even 44% tie, which helps simplify the math. In this example, only 1 the 14 (7%) results falls outside the range of 40% to 48%44% (that is 44%, +/- 4%.). And only 4 3 of 14 (28%21%) fall outside the range of 42% to 46% (or 44%, +/- 2%). So as Michael Barone noted, the variation is mostly what we would expect by random sampling error alone. Considering all the departures from random sampling implied above, that level of consistency is quite surprising.
These results may seem more varied than in previous years partly because the samples sizes are considerably smaller than the national samples of (typically) 800 to 1000 likely voters that we obsessed over during the 2004 presidential race.
The confluence of the averages over the last month (or even over the course of the entire campaign, as Barone noted) glosses over both important differences among the pollsters and some real trends that the Tennessee polls have revealed. Charles Franklin helped me prepare the following chart, which shows how the various polls tracked the Ford margin (that is, Ford's percentage minus Corker's percentage). The chart draws a line to connect the dots for each pollster that has conducted more than one survey. The light blue dots are for pollsters that have done just one Tennessee survey to date.
The chart shows a fairly consistent pattern in the trends reported by the various telephone polls, both those done using traditional methods (particularly Mason-Dixon) and the automated pollster (Rasmussen). Franklin plotted a "local trend" line (in grey) that estimates the combined trend picked up by the telephone polls (both traditional and automated). The line "fits" the points well: It indicates that Ford fell slightly behind over the summer, but surged from August to September (as he began airing television advertising).
As Barone noticed, the five automated surveys conducted since July (including one by SurveyUSA) have been slightly and consistently more favorable to Ford than the three conventional surveys (to by Mason-Dixon and one by Middle Tennessee State University). But the differences are not large.
The one partisan pollster - the Democratic firm Benenson Strategy Group - released two surveys that showed the same trend but were a few points more favorable to Democrat Ford than the public polls. This partisan house effect among pollsters of both parties for surveys released into the public domain is not uncommon.
But now consider the green line, the one representing the non-random sample surveys of Zogby Interactive. It tells a completely different story: The first three surveys were far more favorable to Democrat Ford during the summer than the other polls, and Zogby has shown Ford falling behind over the last two months while the other pollsters have shown Ford's margins rising sharply.
This picture has two big lessons. The first is that for all their "random error" and other deviations from random sampling, telephone polls continue to provide a decent and reasonably consistent measure of trends over the course of the campaign. The second is that in Tennessee, as in other states we have examined so far, the Zogby Internet surveys are just not like the others.
UPDATE: Mickey Kaus picks up on Barone's observation that the automated polls have been a bit more favorable to the Democrats in Tennessee and speculates about a potentially hidden Democratic vote:
Maybe a new and different kind of PC error is at work--call it Red State Solidarity Error. Voters in Tennessee don't want to admit in front of their conservative, patriotic fellow citizens that they've lost confidence in Bush and the GOPs in the middle of a war on terror and that they're going to vote for the black Democrat. They're embarrassed to tell it to a human pollster. But talking to a robot--or voting by secret ballot--is a different story. A machine isn't going to call them "weak."
Reynolds updates his original post with a link to Kaus and asks whether the same pattern exists elsewhere.
Another good question, although for now our answer is incomplete. We did a similar "pollster compare" graphic on the Virginia Senate race over the weekend. The pattern of automated surveys showing a slightly more favorable result for the Democrats was similar from July to early September, but the pattern has disappeared over the last few weeks as the surveys have converged. In Virginia, the most recent Mason-Dixon survey has been the most favorable to Democrat Jim Webb.
While we will definitely take a closer look at this question in other states in the coming days and weeks, it is worth remembering that most of the "conventional surveys" in Tennessee and Virginia were done by one firm (Mason-Dixon), while most of the automated surveys to date in Tennessee have been done by Rasmussen. As such, the differences may result from differences in methodology other than the mode of interviewer among these firms (such as how they sample and select likely voters or whether they weight by party as Rasmussen does).
If one story is more important than all others this year--to those of us who obsess over political polls--it is the proliferation of surveys using non-traditional methodologies, such as surveys conducted over the Internet and automated polls that use a recorded voice rather than a live interviewer. Today's release of the latest round of Zogby Internet polls will no doubt raise these questions yet again. Yet for all the questions being asked about their reliability, discussions using hard evidence are rare to non-existent. Over the next month, we are hoping to change that here on Pollster.com.
Just yesterday in his "Out There" column (subscription only), Roll Call's Louis Jacobson wrote a lengthy examination of the rapid rise of these new polling techniques and their impact on political campaigns. Without "taking sides" in the "heated debate" over their merits, Jacobson provides an impressive array of examples to document this thesis:
[I]t's hard to ignore the developing consensus among political professionals, especially outside the Beltway, that nontraditional polls have gone mainstream this year like never before. In recent months, newspapers and local broadcast outlets have been running poll results by these firms like crazy, typically without defining what makes their methodology different - something that sticks in the craw of traditionalists. And in some cases, these new-generation polls have begun to influence how campaigns are waged.
He's not kidding. Of the 1,031 poll results logged into the Pollster.com database so far in the 2006 cycle from statewide races for Senate and Governor, more than half (55%) have been done by automated pollsters Rasmussen Reports, SurveyUSA or over the Internet by Zogby International. And that does not count the surveys conducted once a month by SurveyUSA in all 50 states (450 so far this year alone). Nor does it count the automated surveys recently conducted in 30 congressional districts by Constituent Dynamics and RT Strategies.
Jacobson is also right to highlight the way these new polls "have made an especially big splash in smaller-population states and media markets, where traditional polls - which are more expensive - are considered uneconomical." He provides specific examples from states like Alaska, Kanasas and Nevada. Here is another: Our latest update of the SlateElection Scorecard (which includes the automated polls but not those conducted over the Internet) focuses on the Washington Senate race, where the last 5 polls released as of yesterday's deadline had all been conducted by Rasmussen and SurveyUSA.
Yet the striking theme in coverage of this emerging trend is the way both technologies are lumped together and dismissed as unreliable and untrustworthy by establishment insiders in both politics and survey research.
Jacobson's piece quotes a "political journalist in Sacramento, Calif," who calls these new surveys "wholly unreliable" (though he does include quotes from a handful of campaign strategists who find the new polls "helpful, within limits").
Consider also the Capital Comment feature in this month's Washingtonian, which summarizes the wisdom of "some of the city's best political minds" (unnamed) on the reliability of these new polls. Singled out for scorn were the Zogby Internet polls - "no hard evidence that the method is valid enough to be interesting" - and the automated pollsters, particularly Rasmussen:
[Rasmussen's] demographic weighting procedure is curious, and we're still not sure how he prevents the young, the confused, or the elderly from taking a survey randomly designated for someone else. Most distressing to virtually every honest person in politics: His polls are covered by the media and touted by campaigns that know better
The Washingtonian feature was kinder to the other major automated pollster:
SurveyUSA's poll seems to be on the leading edge of autodial innovation. Its numbers generally comport with other surveys and, most important, with actual votes.
[The Washingtonian piece also had praise for the work of traditional pollsters Mason-Dixon and Selzer and Co, and complaints about the Quinnipiac College polls]
Or consider the New York Times' new "Polling Standards," noted earlier this month in a Public Editor column by Jack Rosenthal (and discussed by MP here), and now available online. The Times says both methodologies fall short of their standards. While I share their caution regarding opt-in Internet panels, their treatment of Interactive Voice Response -- the more formal name for automated telephone polls -- is amazingly brusque:
Interactive voice response (IVR) polls (also known as "robo-polls") employ an automated, recorded voice to call respondents who are asked to answer questions by punching telephone keys. Anyone who can answer the phone and hit the buttons can be counted in the survey - regardless of age. Results of this type of poll are not reliable.
Skepticism about IVR polling based on theoretical concerns is certainly widespread in the survey research establishment, but one can look long and hard for hard evidence of the lack of reliability of IVR, or even Internet polling, without success. Precious little exists, and the few reviews available (such as the work of my friend, Prof. Joel Bloom, or the 2004 Slate review by David Kenner and William Saletan) indicate that the numbers produced by the IVR pollsters comport as well or better than with actual election results than those from their traditional competitors.
The issues involving these new technologies are obviously critical to those who follow political polling and require far more discussion than is possible in one blog post. So over the next six weeks, we are making it our goal here at Pollster to focus on the following questions: How reliable are these new technologies? How have their results compared to election results in recent elections? How do the current results differ from the more traditional methodologies?
On Pollster, we are deliberately collecting and reporting polls of every methodology -- traditional, IVR and Internet -- for the express purpose of helping poll consumers make better sense of them. We certainly plan to devote a big chunk of our blog commentary to these new technologies between now and Election Day. And while the tools are not yet in place, we are also hoping to give readers the ability to do their own comparisons through our charts.
More to say on all the above soon, but in the meantime, readers may want to review my article published late last year in Public Opinion Quarterly (html or pdf), which looked at the theoretical issues raised by the new methods.
Interests disclosed: The primary sponsor of Pollster.com is the research firm Polimetrix, Inc. which conducts online panel surveys.
Our daily Slate Scorecard update posted earlier this evening focuses on the new poll from Mason-Dixon that shows a narrowing race in the Virginia Senate pitting incumbent Republican George Allen against Democratic challenger Jim Webb.
We also discuss why the Slate Scorecard does not include the online Zogby Interactive/Wall Street Journal polls and made the following observation:
The latest Zogby results for Virginia-showing Webb ahead 50 percent to 43 percent-help explain our caution. Zogby's Virginia samples have been consistently more favorable to Webb than other pollsters, suggesting a bias in Zogby's online methodology.
With the help of Charles Franklin, here is a chart showing the consistent difference in Virginia. It plots the Allen margin (that is, Allen's percentage of the vote minus Webb's percentage -- click on the graph for a full size image) for each of the four pollsters that have tracked the race. All four show the same sharp drop in Allen's lead since July, but the Zogby result (the green line) has been consistently more favorable to Webb than the three telephone pollsters that use random probability samples of all telephone households rather than samples of Internet volunteers.
The differences are not trivial. The latest polls from Survey-USA, Mason-Dixon and Rasmussen have shown Allen with leads of 3, 4 and 5 percentage points respectively. Zogby's result is very different, showing Webb with a seven percentage point lead.
Incidentally, we do include all of the Zogby Interactive results and other Internet polls in the charts and tables here on Pollster.com. Our aim is to give readers the ability to compare results across pollsters. One minor wrinkle -- at least for now -- is that the 5-poll averages reported here may differ from those on Slate because the averages here include the Zogby numbers while those reported on Slate do not. We are hoping to address that conflict in a future update to our chart pages.
The big news yesterday for true political junkies was the release of separate polls conducted simultaneously in 27 of the most competitive districts nationwide (with surveys in three more districts ongoing) using an automated recorded voice rather than live interviewers. The surveys were conducted for a project dubbed "Majority Watch" by the team of RT-Strategies, a DC based firm that polls for the Cook Political Report, and Constituent Dynamics, a company that specializes in the new automated methodology. While the slick Majority Watch website provides full crosstabs if you click far enough, many readers have asked, are these surveys legitimate? Are they reliable? The best answer, to paraphrase the Magic Eight Ball is, "reply hazy, ask again later."
The formal name for the automated methodology is Interactive Voice Response (IVR). Two companies - SurveyUSA and Rasmussen Reports - have conducted IVR surveys for years. While those companies do many things differently, both typically sample using a random digit dial (RDD) methodology that has the potential to reach every working land-line phone in a particular state. Unlike traditional surveys, the IVR polls use a recorded voice, rather than a live interviewer, and respondents must answer by pressing the keys on their touch-tone telephone. With IVR, the pollster's ability to randomly select a member of each sampled household is also far more limited.
The Majority Watch surveys add a few new twists. My friend Tom Riehle of RT Strategies kindly provided some additional details not included on the Majority Watch methodology page:
1) Majority Watch drew its samples from lists of registered voters rather than through random digit dial sampling. The advantage of this approach is that it solves the problem of how to limit the survey to those living in the correct district (a big challenge with RDD sampling). It also excludes non-registrants and allows the use of individual level vote history to determine who is a "likely voter."
The downside to voter list sampling - sometimes called Registration Based Sampling (RBS) - is that it only covers voters that have either provided their phone number to the registrar of voters or whose numbers are listed in public phone directories. "Match rates" (the percentage of voters on the list with working phone numbers) vary widely from state to state and district to district, but rarely exceeds 60%. If the uncovered 40% (or more) differ in their politics, a bias can result.
Pollsters continue to debate the merits of RDD and RBS sampling, and that debate deserves more attention than I will give it today. The short story is that most media pollsters continue to use RDD sampling, especially for national polls. Internal campaign pollsters have been making far greater use of list sampling, especially at the Congressional District level where they use RBS almost exclusively.
2) Majority Watch used individual vote history to select the "likely voters." The lists provided in most states by the registrar of voters typically reports vote history. If you voted in the 2004 presidential election, but not in that school board election in 2005, the list will say so. It is a matter of public record. Majority Watch used an approach common to campaign pollsters: The sampled only those who cast votes in at least two of the last four general elections in their precinct (which included "off-year contests" in 2005 and 2003). What this means, in effect, is that most of their respondents voted in both the 2004 presidential election and at least one general election race.
Majority Watch used based their "likely voter" model entirely on vote history from the list, and did not ask "screen" questions to select their sample.
3) The Majority Watch pollsters used an interesting approach to selecting a random voter in each household and matching the interviewed respondent to the actual voter on the list. They randomly selected one voter to be interviewed within each household, but then used the automated method to interview whoever answered the phone. The interview included questions asking respondents to report their gender and age. After each interview, a computer algorithm checked to see if the reported gender and age matched the data for that individual on the voter file. If the gender and age data did not match, they threw out the interview and did not include it in their tabulations.
4) According to the methodology page, the Majority Watch pollsters then weighted their data "to represent the likely electorate by demographic factors such as age, sex, race and geographic location in the CD." But how did they determine the demographics of the likely electorate in each district? The answer is surprisingly complicated.
They obtained data on each district reported by the U.S. Census as part of the Current Population Study (CPS). Keep in mind, as noted in a post last week, the CPS is also a survey (albeit with a huge sample size and very high response rate), subject to some of the same over-reporting of voting behavior as other surveys.
The Census publishes data for gender and race by Congressional District, but not for age. So the Majority Watch pollsters created their own estimate of the age distribution by applying state level CPS estimates of turnout by age cohort to the district level estimates of age for all adults. If that last sentence was confusing - and I know it was - don't worry. Just note that the estimate of age for "2002 Voters" provided in the lower left portion of each district page on the Majority Watch site is their estimate, as extrapolated from statewide CPS data and not an official Census estimate.
Also note something that has confused many readers that have looked at the Majority Watch web site. All of the demographic data that appears on their district level page is taken (or derived) from U.S.Census data. It is not based on data from their surveys!
Finally, those who have drilled down deep into the Majority Watch crosstabs will notice that the age distribution on the poll is older than their Census-based age estimate. That is because the Majority Watch pollsters also looked at the estimated age distribution obtained directly from the voter lists (based on the birthdays voters provide when they register to vote). This subject is definitely worthy of more discussion, but voter lists consistently show an older electorate than the CPS survey estimates. The Majority Watch pollsters set an age target that was a meld of their CPS estimate/extrapolation and the list estimates, and weighted the data to match. How accurate and reliable is this approach? I have no idea, and am quite sure other pollsters will see shortcomings.
Here is the bottom line for those wondering how much faith to put in the Majority Watch data: Political polling gets considerably more difficult at the congressional district level. While the Majority Watch approach is innovative, it is also new and untested, and it includes a lot of departures from the standard survey practice. And, according to Tom Riehle, the design of the survey may evolve between now and Election Day (and yes, future tracking surveys are planned):
We are privately taking the methodology and results to some tough critics to find out what questions they ask that we may not have thought to ask, in order to keep moving the quality closer to the best quality that can be achieved in telephone interviewing. In that sense, this is a work in progress, because we have made our best effort to develop an excellent methodology, and will continually improve that methodology based on the informed and legitimate questions of methodological critics.
The Majority Watch surveys may turn out to yield reliable results, or they may not. We really will not know until we watch how they compare to other public polls and the ultimate election results. And here at Pollster.com, we are hoping to help you do just that.