Pollster.com

Articles and Analysis

 

Does IVR Explain the Difference?

Topics: Approval Ratings , Charts , Divergent Polls , IVR , PPP , Rasmussen

I am pondering two somewhat related questions this afternoon, but both have to do with national surveys conducted using an automated ("robo") methodology (or more formally, IVR or interactive-voice-response) to measure Barack Obama's job approval rating. One is the ongoing Rasmussen Reports daily tracking, the other is the just-released-today national survey by Public Policy Polling (PPP).

Both surveys are certainly producing lower job approval scores for President Obama than those from other pollsters. The difference for Rasmussen is painfully obvious when you look at our job approval chart, magnified by the sheer number of data points they contribute to the chart. Look at the chart and you can see two bands of red "disapproval" points with the trend line falling in between. Point to and click on any of the higher scores and you will see that virtually all come from Rasmussen. Similarly point to and click on a Rasmussen "black" approval point and you will see that virtually all of their releases fall somewhere below the line.

The most recent Rasmussen Reports job rating for Obama s 55% approve, 44% disapprove. Use the filter tool to drop Rasmussen from the trend, and the current trend estimate (based on all other polls) is, with rounding, 61% approve, 30% disapprove. Leave Rasmussen in and the estimate splits the difference. The latest PPP survey produces a result very similar to Rasmussen: 53% approve of Obama's job performance and 41% disapprove.

I know that Charles Franklin is working on a post that will discuss the impact of the Rasmussen numbers of the job approval chart, so I am going to defer to him on that aspect of this discussion. (Update:  Franklin's post is up here). 

But since some will find it very tempting to jump to the conclusion that the IVR mode explains the difference -- as PPP's Tom Jensen did back in February -- I want to take a step back and consider some of the important ways these surveys differ from other polls (and with each other) that have little or nothing to do with IVR.

First consider the Rasmussen tracking: Like many other national polls it begins with what amounts to a random digit dial sample -- randomly generated telephone numbers that should theoretically sample from all working landline telephones. However, unlike many of the national surveys, it does not include cell phone numbers, it screens to select "likely voters" rather than adults, and Rasmussen weights by party identification (using a three-month rolling average of their own results weighted demographically, but not by party). Rasmussen also asks a different version of the job approval rating. Other pollsters typically ask respondents to say if they "approve" or "disapprove" Rasmussen asks if them to choose from four categories, "strongly approve, somewhat approve, somewhat disapprove or strongly disapprove."

And Rasmussen uses an IVR methodology.

Now consider PPP: Unlike Rasmussen, they draw a random sample from a national list of register voters compiled by Aristotle International (which gathers registered voter lists from Secretaries of State in each of the 50 states plus the District of Columbia and attempts to match each voter with a listed telephone number in the many states where that information is not provided by the state. As far as I know, Aristotle has not published the percentage of registered voters on that list for which they lack a working telephone number, but it is likely a significant percentage. The critical issue is that the population covered by PPP is going to be different than that covered by other pollsters including Rasmussen.   

So any coverage problems aside, PPP still samples a different population (registered voters) than most other public polls. Like most other pollsters, but unlike Rasmussen, they do not weight by party identification. Finally, the also ask a job approval question that is slightly different from most other pollsters.

Consider these versions:

  • Gallup (and most others): "Do you approve or disapprove of the way Barack Obama is handling his job as president?"
  • Rasmussen: "How would you rate the job Barack Obama has been doing as President... do you strongly approve, somewhat approve, somewhat disapprove, or strongly disapprove of the job he's been doing?"
  • PPP: "Do you approve or disapprove of Barack Obama's job performance?"

Note the very subtle difference: Others ask about how Obama is "handling his job" or about the job he "has been doing as president." PPP asks about his "job performance." MIght some respondents might hear "job performance" as a question about Obama's performance on the issue of jobs?  That hypothesis may seem far fetched (and it probably is), but a note to PPP: It would be very easy to test with a split-form experiment.

Oh yes, in addition to all of the above, PPP uses an IVR methodology.

As should be obvious from this discussion, not all IVR methods are created equal. I happened to be at a meeting this morning with Jay Leve of SurveyUSA, one of the original IVR pollsters. As he pointed out, "there is as much variability among the IVR practitioners as there would be among the live telephone operators" on methodology, including some of the other more arcane aspects of methodology that I haven't referenced.

So the main point: While tempting, we cannot easily attribute to IVR all of the apprent difference to Obama's job rating as measured by Rasmussen and PPP on the one hand, and the rest of the pollsters on the other.  There are simply too many variables to single out just one critical.  The lack of a live interviewer may well play a role, but the differences in the populations surveyed, the sample frames and the text of the questions asked or some other aspect of methodology may be just as important.  

More generally, just because a pollster produces a large house effect in the way they measure something, especially in something relatively abstract like job approval, it does not follow automatically that their result is either "wrong" or "biased" (a conclusion some readers have reached and communicated to me via email), only different. Observing a consistent difference between pollsters is easy. Explaining that difference is, unfortunately, often quite hard.

 

Comments
kevin:

Hmmm, I'm inclined to just write off the PPP results to a good ol' fashioned outlier - not uncommon when the base phenomena is so split. I.e., with the current electorate so polarized(a more accurate statement than the current conservative meme of "Obama is polarizing"), you simply have a greater likelihood of getting a skewed number, as "polarized" is another way of saying "much higher variance/standard deviation".

However, the Rasmussen numbers and the impact on the model just smell fishy to me (as a market researcher and social psychologist who's used to working heavily with trended quant opinion data). Beyond their obvious extreme systematic bias (and I mean that in the statistical sense, not idealogical) towards disapproval, they also have several other oddities for your model: 1) a complete lack of variance (for two weeks now they've been exactly 55/44 with a rare point up or down). And 2) their disportianate contribution to the overall mean (in your comments you say the trend estimate falls exactly between the mean of all others and the Rasmussen score - why does 1 source have a weight equal to 15 others?).

The first of those is often a sign to me that someone is "gaming" the numbers - reminds me of research I conducted in Eastern Europe in the early 90s, amazingly enough I seemed to get an awful lot of "straight" response (e.g., all 5s or whatever) from countries with weak survey industries and poor economies (as interviewers, paid by the interview, would often just fill out surveys themselves rather than do the door-to-door legwork we specified).

The second is easily corrected - why not just weight the data so a survey source that is "flooding" the model (another sign that they might be attempting to game the system) only counts equal to any single other source? Adding in a decay mechanism would help account for new and old data in your predictive model, and "reward" pollsters who keep current data flowing, without exagerating their effects.

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR