Articles and Analysis


The RT Strategies - Polimetrix Test

Topics: 2008 , Internet Polls , The 2008 Race

While I was busy at the AAPOR conference last week, Thomas Riehle of RT Strategies released results from an intriguing experiment on the left-of-center blog MyDD. His test involved an Internet panel survey that used the same questionnaire as on two recent surveys conducted by his company for the Cook Political Report (analysis, crosstabulations: telephone & Internet). The comparison finds less support for Hillary Clinton in the Democratic primary (24%) than in a conventional telephone survey (32%). While that difference deserves some commentary, it is a bit tricky for two reasons, one personal and one substantive.

The personal issue is one we were going to have to confront sooner or later, and now is as good a time as any. The company that Riehle used to conduct the Internet panel survey is Polimetrix, the same company that also owns and sponsors Pollster.com (something we have always disclosed). Readers should know that my arrangement with Polimetrix provides complete editorial freedom. I can write whatever I want, and no one from the company has ever tried to influence, direct or even review anything I have written (including this post). Moreover, we are walled off (physically and otherwise) from the work that Polimetrix does for its clients. In this case, for example, Thom Riehle advised me that he would be conducting some sort of parallel test, but I had no idea Polimetrix was involved until he shared the data last week.

Having said all that, of course, I certainly understand if some of you choose to be skeptical about our relationship with Polimetrix. If I were in your shoes, I probably would be too.

On to the substantive issue. Not surprisingly, as soon as Chris Bowers had posted the results last Friday, some of his readers questioned the accuracy of Internet panel samples. At about the same time, coincidentally, Professor Franklin and I were presenting a paper at the AAPOR conference on the accuracy of the various survey modes (conventional telephone, automated [IVR] telephone and internet) in the 2006 campaign, including surveys conducted by Polimetrix. We will be presenting that paper here starting in the next few days.

For now, however, let me say a few things about Polimetrix. Better yet, let me share something I wrote in October 2005, long before I had any business relationship with the company. The background is that most Internet surveys depend on a non-random "panel" of individuals that volunteer to take occasional surveys, often in exchange for small cash incentives. Polimetrix adds a new twist:

At Polimetrix, [founder Doug] Rivers has been developing a new type of sampling methodology based on a non-probability Internet panel. The key difference is something he calls "sample matching." The gist of it is this: Other Internet panels recruit panel volunteers wherever they can find them, draw samples from that panel and then weight the results to try to match the demographics or attitudes of the general population. The Polimetrix approach to political polling is to draw a true random sample from the list of registered voters provided by election officials then go into their panel of volunteers and select the closest possible match for each sampled voter based on a set of demographic variables. They then interview each panel member that best "matches" the randomly selected voter.

The matching uses a complex statistical algorithm (something so complex that MP finds it difficult to decipher much less evaluate). According to an email from Rivers, the variables used for sample matching in the California polling are "age, race, gender, ethnicity, party registration, vote history, census bloc characteristics, precinct voting behavior, and some consumer variables."

Complex, yes, but it is a bit of a stretch to assert, as Riehle does in his analysis, that this process produces a "true, randomly selected sample." Yes, Polimetrix starts with a random sample (in this case, a stratified random sample of respondents from the 2005 American Community Survey, a "true" random sample survey conducted by the U.S. Census) and matches each sampled record to members of their volunteer panel. The end result of this process may approximate a random sample, but does not produce a "true" random sample. Of course, Doug Rivers would probably argue that an "approximate" random sample is what we get when we take the results from a random digit dial telephone sample with 85% coverage and a 20% or lower response rate and weight it to match populations estimates. But that is another argument for another day. The point here is that the Polimetrix sampling procedure is a bit more complicated (and controversial) than Riehle implies.

So what do we make of the results? Chris Bowers' first reaction was that the survey results, which show Hillary Clinton receiving less support online (24%) than on the phone (32%) in a Democratic primary match-up, tend to support his "Inflated Clinton Poll Theory." As he described it Friday, his theory is that "live-interviewer telephone polls might create a sort of social pressure that alters results," that voters might "tell machines different things about their political preference than they tell live humans." For a variety of reasons, Chris backed off the theory a bit in a subsequent post Friday night, concluding that the problem "probably has no clear answer."

I mostly agree with his second thought. The problem with using this particular experiment to test this theory or any theory about the survey mode is that these two polls differed in three important ways:

  • The mode -- One was conducted with interviewers by telephone, the other was a self-administered Internet survey.
  • The vote question -- While both asked the same vote preference question, the online version explicitly prompted for an "undecided" category, while the telephone did not (as explained in Riehle's summary).
  • The sampled voters -- The two surveys used very different sampling methods.

Whatever you think about the merits of conventional telephone sampling versus the Polimetrix method, it is clear that the two techniques yielded different kinds of voters. Here is a summary of the composition of the Democratic and Republican subgroups in each survey, based on the weighted subgroup sizes listed in the cross-tabulations (phone and Internet):


The biggest difference is that for both the Democratic and Republican primary voter subgroups, the Internet sample yielded fewer college graduates and more self-identified independents than the survey conducted by telephone. Among Democrats, the Internet sample was less Caucasian than the telephone sample. While some of these findings are puzzling (and the education difference is at odds with conventional wisdom about Internet samples), they indicate that- for whatever reason - the two methods identified different types of people as likely primary voters.

Which sample is more "accurate?" Who knows? I put scare around accurate because the concept of a national primary electorate is fuzzy all by itself, given the widely varying rules for participation and turnout across states. We lack any sort of a widely accepted benchmark estimates to use as a comparison. Moreover, once we start comparing the two surveys - whether overall or within subgroups - we can certainly identify differences, but we can only speculate about the explanations for those differences. They may result from the involvement of an interviewer (and the "social desirability" pressures that come with it), from reading choices rather than hearing them, from the harder "push" that comes from omitting the undecided category or because two methods sampled different kinds of voters (or some or all of the above).

One last thought: Thom Riehle deserves great credit for conducting such a test and putting the complete results (including the full cross-tabulations) into the public domain for all to see. It would be useful to compare the demographic compositions of all surveys of primary voters, not just those involving new or experimental methodologies. Unfortunately, very few pollsters release the necessary data.

So let's close with a quick (rhetorical) pop-quiz: Other than RT Strategies and SurveyUSA, which pollsters routinely release data on the demographic and partisan composition of their primary voter subgroups? Anyone?



"the education difference is at odds with conventional wisdom about Internet samples"

This could be due to a skewed age distribution -- if the internet survey has an excess in the 18-22 age range (or younger), this could be a group of *current* college students who will be college grads in a few years. (Note also that these students probably are short on landline phones and thus underrepresented in RDD surveys.)

(posted by Eric)


Kevin Roust:

Working through the age and education cross tabs, I can mostly answer my question. The two surveys actually use different top-level categories for both age and education, but also provide 5-year age and six-category education counts in the cross tabs.

I estimated the fraction of the surveyed group that belongs to 24 age-education combinations:
18-34, 35-49, 50-64, 65+
some HS, HS, some College, 2yr deg, 4yr deg, Grad deg
(estimating the education distribution of 30-34 and 45-49 internet respondents based on the 30-44 and 45-64 groups, respectively).

These groups were at least two percentage points different between the two surveys (+ is more in internet survey):
50-64 HS: +4.6%
65+ HS: +3.8%
18-34 some college: +3.0%
35-49 HS: +2.0%
18-34 4yr deg: -2.3%

In short, about 3% more of the internet survey appears to be current college students (my hypothesis). 10% more of the internet survey respondents are seniors, Boomers, or Gen Xers with only a high school diploma than of the phone survey.

This seems to raise a fascinating question: who are these "older" (than me) people with a high school education, good internet access, but few landline phones? My wife posits either (or both) military or RV-dwellers...

(same as above, but now the comment system works)


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.