Articles and Analysis


Selzer: Study on Data Quality

[J. Ann Selzer is the president of Selzer & Company and conducts the Des Moines Register's Iowa Poll.]

Can you trust your data when response rates are low? And, in this age of the ubiquitous internet, do we make too much out of its inability to employ random sampling? We asked and answered those questions in a study we conducted a few years ago, commissioned by the Newspaper Association of America. Given recent online discussions of data quality, I revisited this study.

In April and May of 2002, five surveys-asking the same questions-were conducted in the same market. The only difference was the data collection method used to contact and gather responses from participants. This rare look at what role data collection methodology plays in the quality of data yields some fascinating results. Our goal for each study was to draw a sample that matched the market, to complete interviews with at least 800 respondents for each separate study, and to gather demographics to gauge against the Census.

Method of contact. Our five methods of contact were:

  • Traditional random digit dial (RDD) phone (landline sample);

  • Traditional mail;

  • Mail panel, contracting with a leading vendor to send questionnaires to a sample of their database of previously screened individuals who agree to participate in regular surveys, with a small incentive;

  • Internet panel, contracting with a leading vendor to send an e-mail invitation to a web survey to a sample of online users who agree to participate in regular surveys, with a small incentive; and

  • In-paper clip-out survey, with postage paid.

The market. We selected Columbus, Ohio as our market. It was sufficiently large that the panel providers could assure us we would end up with 800 completed surveys, yet it is perceived to be small enough that mid-sized markets would feel the findings would fit their situation.

Analysis. To compare datasets, we devised an intuitive method of analysis. For each of six demographic variables-age, sex, race, children in the household, income, and education-we compared the distribution to the 2000 Census, taking the absolute value of the difference between the data set and the Census. For example, our phone study yielded 39% males and the Census documents 48%, so, the absolute value of the difference is nine points. We calculated this score for each segment within each demographic, added the scores, then divided by the number of segments to control for the fact that some demographics have more segments than others (for example, age has six segments, education has three). We then summed the standardized scores for each method and those raw scores give us a comparison allowing us to judge the rank order of methods according to how well each fits the market. Warren Mitofsky improved our approach for this analysis.

Problem with the internet panel. I'll just note that both panel vendors were told the nature of the project-that we were doing the same study using different data collection methods to assess the quality of the data. I said we wanted a final respondent pool that matched the market. They would send reminders after two days. Participants would get points toward rewards, including a monthly sweepstakes. The internet panel firm e-mailed 7,291 questionnaires; after 850 completed responses were obtained, they made the survey unavailable to others who had been invited. Because the responses to the first 850 completed surveys were so far out of alignment with the Census, we opted to implement age quotas post-hoc, to systematically substitute some in the 45-54 age group (which were too plentiful) with respondents in other age groups (which were underrepresented) with additional invitations to the survey. We reported out both findings-those before and after the adjustment.

Results. Unweighted, the RDD phone contact method was best; the in-paper clip-out survey was worst.


Weighting just for age and sex improved all data collection methods. Most notable is traditional mail, which comes close to competing with traditional phone contact after weighting for age and sex. The in-paper survey showed the greatest improvement because the respondent pool was strongly skewed by older women. One in four respondents to that survey were women age 65 and older (26%). The median age was 61 (meaning, just to be clear, half were older).


Other data. This study was commissioned by the newspaper industry, so it was natural to look at readership data. Scarborough is to newspapers what Nielsen is to television, and we had their data from the market for comparison. Partly because of the skew toward higher income and especially in higher educational attainment in the internet panel, that method produced stronger readership numbers-higher than the Scarborough numbers and higher than any other data collection method. This was one more check on whether a panel can replicate a random sample, and casts suspicion on whether a panel can ever sufficiently control for all relevant factors to deliver a picture of the actual marketplace.

Concluding thoughts. I have to wonder how this study might change if replicated today. The rapid growth in cell-phone only households probably changes the game somewhat. Panel providers probably do more sophisticated sampling and weighting than was done in these studies. Our mail panel vendor indicated they typically balance their sample draw, though their database in Columbus, Ohio, was just on the low end of being viable for this study, so we're confident less rather than more pre-screening was done. We did not talk with the online vendor about how they would draw a sample from their database, though we repeatedly said we wanted the final respondent pool to reflect the market. It is our sense little was done to pre-screen the panel or to send out invitations using replicates to try to keep the sample balanced. Nor did they appear to have judged the dataset against the criteria we requested before forwarding it to us; it did not look like the Columbus market. We specified we did not want weighting on the back end because we were wanted to compare the raw data to the Census. Had they weighted across a number of demographics, they certainly could have better matched the Census. And, maybe that is their routine now. But, I wonder how the readership questions might have turned out, for example. The Census provides excellent benchmarks for some variables, but not all. Without probability sampling, I always wonder if the attitudes gathered in from panels do, in fact, represent the full marketplace.

Epilogue. Of course it would be a good idea to replicate this study given recent changes in cell phone use. The non-profit group that commissioned this study just announced it is laying off half its staff, so they are unlikely to lead this quest.