Pollster.com

May 20, 2007 - May 26, 2007

 

Pre-Holiday Roundup

With the holiday weekend fast approaching, I want to clear out my in-box of a number of interesting items that have been piling up over this past week:

  • On Wednesday, NBC's Chuck Todd reported on a voter focus group conducted in Baltimore County, Maryland this week sponsored by the Annenberg Center of the Univ. of Pennsylvania. The group, which was conducted by veteran Democratic pollster Peter D. Hart, will be aired at some point soon on C-SPAN (according to Todd's column). Check the C-SPAN schedule for details.

    Todd's take is that these 12 voters (5 Democrats, 4 Republicans and 3 independents) were more interested in presidential candidates who "provided a vision and leadership rather than one who had real-world experience." And they were also "torn between wanting a candidate who provided hope and a candidate who made them feel safe."

    His column is worth reading in full, but when you do, I recommend starting with caveat at the end of page 2 as well as a few of my own: Remember, a focus group is not a true random sample of anything. It can tell you a lot about the 12 people in the room, but projecting those attitudes on some wider population is inherently risky. And, as Todd points out, attitudes that seem strongly held by focus group participants can be misleading. He shares an example of a similar group in 1999 that "indicated how potent of a threat Bill Bradley was to Al Gore."

    Consider another example involving a December 2003 focus group, also sponsored by Annenberg and conducted by Hart, that convinced columnist Mark Shields that Howard Dean had "established a beachhead" among blue-collar Democrats and independents in Toledo, Ohio.

  • The Peter D. Hart that conducts focus groups for Annenberg (and the NBC/Wall Street Journal poll with Republican Neil Newhouse) is NOT the same person as Peter Hart, the analyst with the media watch group Fairness & Accuracy In Reporting (FAIR). The latter Hart has an op-ed piece out this week calling media reporting on early horse-race polls "a complete waste of time."

  • On the American Prospect Tapped blog, University of Maryland political science professor Tom Shaller has published a critique worth reading of a recent study by the group Third Way that took an odd approach to analyzing exit poll data from the 2004 and 2006 exit polls (via Stoller & Cillizza).

  • Earlier this week, the Wall Street Journal described how the practitioners of political "microtargeting" are "taking their mastery of sophisticated new campaign techniques into the corporate world" (via Sullivan). The small irony is that I have often seen political microtargeting described as an application of corporate data mining to political targeting.

  • CBS News has an interview posted today with Ann Selzer, the pollster who conducts the Iowa Poll for the Des Moines Register, about her methods and interpretations of recent data.

  • CBS News polling director Kathy Frankovic has a new column on the CBS.com web site. Her inaugural effort - "Trust But Verify" -- provides thoughts on the value of putting poll results into the proper context.

  • Finally, ABC News polling director Gary Langer strives to put just context around a controversial question asked on the recent Pew Center survey of American Muslims.

Enjoy the holiday weekend!

By Mark Blumenthal on May 25, 2007 9:05 PM | | Comments (0) | TrackBacks (0)

Washington Scandals and Baby Names

1Monica0525.png

The appearance this week of Monica Goodling before the House Judiciary committee sparked a conversation in the Political Arithmetik household about a previous Monica related Washington scandal. It perhaps says something about our household that this provoked a search for empirical evidence concerning the effect of the Clinton-Lewinsky scandal on the popularity of Monica as a name. Was it urban legend that the scandal had an effect? Was the effect large or small? Was it immediate? Let's run the numbers.

Monica was a reasonably popular name in the early 1970s, ranking between 39th and 56th in the decade of the 1970s. As it happens both Monica Lewinsky and Monica Goodling were born in the summer of 1973, two weeks apart, when the name was ranked 40th, its second highest ranking. (Monica ranked between 59 and 141 in the decade of the 1960s.) [My thanks to my colleagues at the coffee shop for suggesting I check tennis player Monica Seles, who turns out to also be a 1973 baby. Granted, she isn't connected to a DC scandal despite being born in 1973, and being born in the former Yugoslavia makes the relevance to our current investigation a tad suspect.]

If we were going to pick a name to go with a DC scandal from babies born in 1973, better bets would have been Jennifer, Amy, Michelle, Kimberly, Lisa, Melissa, Angela, Heather, Stephanie or Rebecca, the top 10 girls names that year. But Monica at 40th wasn't rare by any means.

The 1970s were the peak years for Monicas. By the 1990s the name had slowly but steadily declined to rank between 76th and 88th during 1990-1997.

And then the events of 1998 intervened. The Clinton-Lewinsky scandal broke on January 21, 1998, reached its fevered peak by the end of 1998 with the impeachment of President Clinton and was resolved by the Senate's failure to convict on February 12, 1999. Of course that didn't prevent late night comics from continuing to milk the material for months, years, perhaps forever after.

The impact on parents was immediate, but not as drastic as I had expected. There were 11 months of 1998 in which the scandal's impact could be felt. And the ranking of Monica dropped from 79 in 1997 to 105 in 1998, a substantial but not precipitous drop. Of course events were unfolding during this year, so perhaps it is reasonable to focus on 1999, by which time surely every expectant parent in America would be aware of the Clinton scandal.

And in 1999 the ranking of Monica did fall dramatically, to 151, just a bit below where it stood in 1960.

So indeed, the impact of the scandal produced an immediate and substantial response, as one would surely expect. No urban legend this.

But what I find fascinating is the continued decline since 1999. I would expect the impact to be greatest in the immediate aftermath of the infamous episode and to level off or perhaps even abate thereafter. Instead, the data suggest a much slower response and a much longer diffusion of unpopularity through the population. Having dropped 72 places between 1997 and 1999, the popularity of Monica dropped ANOTHER 99 places from 1999 through 2006, the last year for which we have data, to now stand at the 250th name on the popularity list.

One interesting speculation is to consider the effect of the Clinton-Lewinsky scandal on the parents who are just now having baby girls. Many of them would have been in their teens or early 20s during the height of the scandal, compared to parents of 1999 or 2000 who would have been on average 7 or 8 years older. I wonder if the impact of the scandal was larger on teenage and college age parents to be. These are ages not noted for consumption of political news, but they are ages extremely well known for crude sexual humor, for which Kenneth Starr provided an abundant supply of raw material. So I wonder if this cohort that is now giving birth was somewhat more affected by the scandal than were even slightly older cohorts who were past the age of campus humor as well as early sexual development. That could explain the continued and steady decline in use of Monica as a girl's name. It would also predict a leveling off once cohorts start to dominate births who were too young to understand the Clinton-Lewinsky scandal at the time.

The alternative is a slow diffusion of unpopularity throughout the culture, which is having an increasing effect regardless of personal experience with the scandal. If so, there is little reason to expect a leveling off of ranking. But there is also a puzzle about why the cultural diffusion is as slow as it has been.

It seems unlikely that Monica Goodling's testimony will significantly reduce the already declining popularity of the name. But given the current standing of "Monica", it is much less likely that a DC scandal in 2035 or so will feature a Monica in the staring role.

Prospective parents may want to visit the source of these data, the Social Security Administration's Popular Baby Names site here.

A superb academic study of the sociology of naming babies is A Matter of Taste: How Names, Fashion and Culture Change, by Stanley Lieberson.

Technical Appendix (added 5/26/07)

Warning: This is the really geeky part. Unless you think log2(x) is really cool, you might want to turn back now!

"Professor M" posted a comment on the cross post of this article at Pollster.com. Rather than "geek up" Pollster, I'm replying here. (This was supposed to be a "just for fun" post, after all.)

His/her comment is:

    Hmmm. Try graphing the percent of babies given the name Monica in each year instead of the popularity rank. I think your discussion might change.

The good Professor M makes an excellent point. Let's think why. The actual rate of name use is quite small, even for the most popular names. For example, in 2006 the most popular name for girls was Emily. That name was used for 1.0267% of girls born. The number 2 name was Emma, 0.9159%. This difference in percentages is actually rather large. When we get down to ranks 101 and 102 we find Mya at 0.1602% and Amanda at 0.1599%. When we get down to Monica at 250, the rate is 0.0650% and for Carly at 251 the rate is 0.0649%.

So the rate of name use gets closer together for adjacent ranks as we go from more popular to less popular ranks. In my plots above, a change of one rank is the same vertical distance in the plot whether we are going from 1 to 2 or 100 to 101 or 250 to 251. But the percentage rates would not be changing by the same amount for each of those ranks. Instead, the difference in percentage rate would be getting smaller as we go from more popular to less popular rankings. In techie terms, the relationship between rank and percentage use is non-linear. And that can produce a different look to the plot, as Professor M suggests. So let's take a look.

I've converted the percentages into rate per 10,000 girls born, just to avoid the decimal points. That makes no difference for the look. So let's look at what Professor M suggests:

1AppFig10529.png

And behold! As Professor M suggested, the look is a bit different. What appears as a continued sharp drop after 1999 in my plot of rankings, now looks more like a continued decline but not so sharp, and much more of the decline came between 1997 and 1999. Also, the declining popularity of Monica between 1973 and 1997 appears more substantial, dropping from 41 per 10000 to 22 per 10000.

So Professor M's point is well taken. The change in rates are significantly different from the change in ranks. The popularity of Monica has continued to decline since 1999 but not nearly so dramatically as it appears in my ranking graph.

But...

Is the raw percentage (or per 10,000) rate the right measure either? As the rate approaches zero, it becomes impossible to decrease by a constant amount. From1973 to 1997, the rate of use of Monica fell from 41.0 to 22.1 per 10,000, a decline of 18.9. But in 1999 the rate was 10.96 per 10,000. It would be impossible for that to decline by another 18.9, lest we end up with a negative rate of name use! The point is, a constant change in the raw rate is impossible as we approach low incidence of the name. So perhaps linear change in the rate is also not a good way to model this.

An alternative is to think of the "half life" of the name use. This equates a fall of 1/2 from say 40 to 20 per 10000 with an equivalent proportionate change from 10 to 5 per 10000. This makes proportionate declines equal across the entire range of name rates. In effect, this says a fall of 1/2 in usage rate is the same wherever it occurs.

A simple way to measure this is to use the log base 2 of the rate per 10000. In base 2, each unit increase on the log2 scale is a doubling of the rate. So 1=log2(2), 2=log2(4), 3=log2(8), 4=log2(16), 5=log2(32) and 6=log2(64). Those values cover the range of Monica rates, and the critical point is that each 1 unit increase is a doubling and each 1 unit decrease is a halving of the rate of use.

Replotting the data on this log2(rate per 10000) scale produces the following:

2AppFig40529.png

Now we see that from 1973 to 1997 the log2 rate fell from 5.4 to 4.5, or almost a full unit, representing a halving of the rate. From 1997 to 1999 it fell from 4.5 to 3.5, another halving. And from 1999 to 2006 from 3.5 to 2.7, a bit less than half again.

On this scale of proportionate change then, the drop from 1997 to 1999 is huge, a full halving of the rate (from 22.1 per 10,000 to 10.96) in just 2 years. The subsequent decline from 10.96 to 6.50 is a 41% decrease in rate over 7 years.

Now this plot is not identical to my ranking plot, but it is pretty close. The qualitative description in my original post applies pretty well to this one as it did to the ranking plot. So I stand by my original comments.

I had not looked at these issues before Professor M's comment, so I am very grateful to him/her for pointing this out. And indeed, as we saw above, the raw rates do look somewhat different. But on reflection, prompted by that comment, I think the log2 rate is probably the most reasonable way to look at this. The ranks alone can be misleading because the equal intervals between ranks distort the changes in rate. But the raw rates are also misleading because changes cannot remain constant when there is a lower limit of zero usage which we approach. Proportionate change seems more compelling in this case, and log2 is a convenient and easy to understand approach to this.

And one last technical point. The plot of rate against rank is strongly non-linear, as Professor M implies. The plot of log2(rate) against rank is much closer to linear, though with some continued bend. This is why my final log2 plot above more closely resembles the rank plot. Since log2 rate is close to linear with rank, the two plots must look quite similar.

3AppFig20529.png

4AppFig30529.png

Cross-posted at Political Arithmetik.

By Charles Franklin on May 25, 2007 6:27 PM | | Comments (6) | TrackBacks (0)

POLL: InsiderAdvantage Georgia Senate

A new InsiderAdvantage/Majority Opinion statewide survey of 500 registered voters in Georgia (conducted 5/22 through 5/23) finds Sen. Saxby Chambliss edging out ex-Gov. Roy Barnes (42% to 40%) in a hypothetical senatorial match-up; Chambliss leads Vernon Jones 48% to 31%.

By Eric Dienstfrey on May 25, 2007 6:16 PM | | Comments (0) | TrackBacks (0)

POLL: CBS/Times 08, Immigration

Additional results from the new CBS News/New York Times national survey (CBS story, 08 results, Immigration results; Times story, results) of 1,125 adults (conducted 5/18 through 5/23) finds:

  • Among 275 Republican primary voters asked to choose between three candidates, former Mayor Rudy Giuliani leads Sen. John McCain (36% to 22%) in a national primary; former Gov. Mitt Romney trails at 15%.

  • Among 441 Democratic primary voters asked to choose between three candidates, Sen. Hillary Clinton leads Sen. Barack Obama (46% to 24%) in a national primary; former Sen. John Edwards trails at 14%.

  • 62% of Americans (66% of Democrats, 61% of Republicans) think illegal immigrants "who have lived and worked in the United States for at least two years" should be given "a chance to keep their jobs and eventually apply for legal status;" 33% think they should be deported.

By Eric Dienstfrey on May 25, 2007 4:44 PM | | Comments (2) | TrackBacks (0)