### Pre-President's Day "Outliers"

#### Mark Blumenthal | February 16, 2008

##### Topics: 2008 , Frank Newport , John McCain , Kathy Frankovic , Mark Mellman

The Hartford Courant's Joann Klimkiewicz examines the problems of polling in 2008.

Kathy Frankovic shares her skepticism over polls to tells us which is most electable in 2008.

Frank Newport finds that John McCains "displeases" many conservative Republicans.

Gary Langer says race has been the "single most powerful demographic in vote choices" in the Democratic primaries so far.

David Hill sees evidence that "immigration is a dud as an electoral issue."

Mark Mellman considers the complexities of the "politics of identity" on the Democratic primaries of 2008.

Tom Webster crunches the exit poll numbers on Republicans in Virginia and Maryland that listen to talk radio.

Josh Goodman compiles the exit poll results on abortion and immigration.

Carl Bialik calculates the odds of a tie in Syracuse.

Karl Rove does poll analysis on a white board.

## Comments

I thought that Frank Newports comments to be the most interesting, though his conclusion misses the mark. The issue is not if McCain can cozy up to the more conservative Republicans; he can. The issue is can he do so without alienating the moderates and independents that he needs to win the general election. The danger for McCain, and it is a danger no one seems to acknowledge, is that his efforts to "unite" the party alienates the independents without comforting the conservatives. Given McCain's big mouth and his habit of saying indelicate things, I personally feel this is the most likely situation and one of the reason why I predict him to lose.

Posted on February 16, 2008 12:23 PM

To Mark Blumenthal,

Subject: Survey USA Strike back

Survey USA is correcting the record and today they released a long page on their website to rebut some of the points that Mark made a few days ago about their report card and their surveys' accuracy. It's a nice read, convincing, and hard-knuckle toward pollster. Mark, I would appreciate you take a look at it and let us know what you think.

As a blogger, I've learned that Survey USA is very sensitive about their reputation and rightly so. However, sometimes they seem to be bullying people who dare to criticize them.

It's fair to say, Survey USA has done a very good job this cycle and in the past years too as their nice graphic try to prove today. I welcome their forthrightness and they deserve to be applaud for being transparent to the public. Few pollsters released their crosstabs free to the public but Survey USA always try to be open to us, political junkies, who relied on their internal to see where the races truly stand and sometimes throw some flaks at them.

Posted on February 16, 2008 3:43 PM

Link to Survey USA long rebuttal

http://www.surveyusa.com/index.php/2008/02/16/about-those-surveyusa-pollster-report-cards-part-i/

Posted on February 16, 2008 3:47 PM

Now that the absentees votes have been counted, the vote in Syracuse is no longer a tie. Anybody with any experience in politics rather than academia would have know that this change was inevitable.

Posted on February 16, 2008 5:33 PM

I did not see Mark's original post jr1886 refers too and could not find it on the site. Anyone have a link?

I read through the post at SurveyUSA and overall his approach seems ok, so I am not sure what Mark's concerns are. One thing I will say is that his notion that the odds of his scorecard happening by chance are a billion to one is false. It might be true that the specific pattern of results is more than a billion to one, but the idea that the winning percentage of .527 or whatever is a billion to one is false. A .527 winning percentage is only slightly better than the .5 long run average one gets from flipping a coin. Since the difference between .527 and .5 is .27, the question is what are the odds of this .27 deviation occuring over 700 or whatever number trials. I haven't calculated it but I would be shocked if it approached a billion to one or even a million to one. As to whether this .27 represents a statisticly significant difference I admit that my initial reaction was, "I'm not in grad school anymore. Leave it alone."

Posted on February 16, 2008 9:30 PM

Here's the link:

/blogs/surveyusas_pollster_report_car.html

SurveyUSA like playing Hardball to get credit for their poll. They slammed Bob Novak, which was hilarious, and forced the Prince Of darkness to correct his original false reporting about their California poll.

http://www.surveyusa.com/index.php/2008/02/11/earth-to-robert-novak-lies-damned-lies-and-the-prince-of-falseness/

I received an e-mail from them too for my analysis about their California poll in which I side with Zogby because I thought wrongly that they overestimated Latino turnout. I have to give them credit though, they polled many races this cycle and they've done very well. MO was a spoiler but they got the big prize by nailing CA.

Posted on February 16, 2008 9:50 PM

What are the odds that SurveyUSA winning percentage on Mosteller 5 happened by chance alone?

SurveyUSA uses a trinominal distrabution (win, loss, tie) to determine its winning percentage. How this works is not clear from their website since no details of how a tie is handled are given. As a consequnce, I threw out ties and went with a binominal system of wins and losses. The Mosteller 5 data reports 712 wins and 473 losses, which represents a winning percentage of .514 (712/1385). Using the binominal distrabution method found here: http://faculty.vassar.edu/lowry/binomialX.html the expected mean is 692 with a standard deviation of 18.6. Thus, the results of SurveyUSA are about one SD away from the mean. Precisely, the odds that survey USA arrived at this result by chance is 15%. Given the fact that we normally want P

Note, however, that there is a much bigger problem with evaluating SurveyUSA data. The winning percentages listed by them suffer from selction bias; the draws are non-random. This is so because, as they admit, they don't survey the same places/events as everyone else nor do they survey all possible elections. For example, Mosteller 1 produces 920 wins out of 1136 for win percetage of .809. The produces a result that is 20 SD away from the mean and a the probablity that it happened by chance of .000001. The difficulty is that we don't know if this is a result of better polling or a result of better selection of polls (they only poll where they do well). Clearly, by the Mosteller 1 method SurveyUSA is doing something right, but it simply could be cherry picking polls. (This would also be true for Mosteller 5 but just getting lost in the noise).

In the end, because of selection bias, we really can't determine the odds of SurveyUSA results being better than chance because the draws are not random. And even if were assume randomness, we are left with decidedly mixed results that vary based upon the measure used.

Posted on February 16, 2008 10:52 PM

Not sure why this sentence got cut off..

Given the fact that we normally want P less than 5%, the results of the Mosteller 5 series are NOT statisticly significant.

Posted on February 16, 2008 10:56 PM

Daniel T,

I think there might be a slight mistake in your calculation. In the data that SurveyUSA gives, there are 1559 trials (counting ties) and 1185 trials if ties are ignored (I think you had 1385 trials instead). They say that a tie counts for half of a win, so their score in the case of the Mosteller 5 metric is 899 (712 wins plus half of the 374 ties out of 1559 trials). Assuming a win probability of 0.5, the P-value of 899 "wins" out of 1559 trials would be 7.74x10^-10, close to the values they are mentioning. If you ignore ties (i.e. throw them out and just go for wins out of wins+losses) the P-value is even smaller (~10^-12). Looking at their win% table I doubt any of their results are likely to be explained by chance in any simple null model where the players of the game have equivalent chances of winning a given match.

Responding to SurveyUSA's article, I think Mark's original question is deeper; this is not about winning and loosing a coin toss, but rather, predicting election results. In my opinion, the relevant question is: given two scores on any given scale that one uses (Mosteller 1 or 5 or what have you), can we create a model of whether or not the observed difference in scores is significant. If I understand correctly, part of what Mark is asking has to do with how you count wins, losses and ties, rather than about whether or not SurveyUSA is a statistical winner of some hypothetical polling game. This seems to me to be a much harder question to answer.

Posted on February 17, 2008 1:51 AM

"Gary Langer says race has been the "single most powerful demographic in vote choices" in the Democratic primaries so far."

And thanks to the Clinton campaign for making it so.

Posted on February 17, 2008 5:46 AM

Docd7:

Where did you find how they were handling ties? I couldn't find that anywhere.

Anyway, I'd discount the value of counting a tie as 1/2 a win. Conceptually, that is very sketchy. A tie is a tie and win is a win. If you were talking coin tosses and it landed on its edge, all rational people would toss that out and flip again. I would want some back up that this is normal practice is such situations before I accepted it. Having said that, given their assumptions, the analysis is near enough.

As for the "deeper" question, I argue that the notion that polls "predict" is a false notion. Polls are not designed to predict, they are designed to take a snapshot of the population at a particular point in time. While the win/loss is an amusing game, it's meaningless. The correct answer is that a poll "wins" when it is an accuate snapshot. I said this before and I'll say it again, using polls as a predictive model is an abuse of polling; it's trying to get apples juice out of oranges.

Polls don't predict, they can't predict, and trying to use polls to create a model of prediction is not difficult; it's impossible. The failure lies not in the math but at the conceptual level.

Posted on February 17, 2008 7:27 PM

Daniel T,

I found out about the way they handle ties from reading their excel spreadsheet. I think they split ties because they feel that, in those cases, the scores are too close to accurately call who won, so they assign half a win to both parties. Another way of putting it is that they assume that any given player actually won half of their tie games.

I am not sure if this is a reasonable way to approach the tie problem, or if it is in any way standard practice in situations like these. It is clear that ignoring ties (i.e. throwing them out entirely, as if they did not occur) actually lowers SurveyUSA's P-values; that is, including a number of 50/50 outcomes will always make it seem more likely that a given set of trials resulted from a set of equal probability coin tosses. In that way, I would say that their handling of ties in no way unduly favors their case.

Including ties all as "losses," or as a separate class in a multinomial distribution, is much more problematic because it is difficult to determine what the a priori probability of a tie should be. In the case of a pure win/loss system (in which ties are apportioned withe equal probabilities) the underlying binomial distribution with equal probabilities is clearly the correct null model; if ties represent their own class of outcomes the appropriate null model becomes much more fluid and thus more susceptible to manipulation in one direction or another.

I have no idea whether or not polls should be used as predictive tools in some theoretical sense. I do know, however, that polls are often employed by people for this purpose. I would bet that many people visiting this very blog are doing so to get a sense of how "their" candidate might fare in upcoming contests. Indeed, people all over the place are pointing to polls as a major reason that one candidate is more "electable" than another. Given the somewhat widespread inaccuracy of polls this year, I find that particular argument hard to swallow, but there you have it.

I guess I would just say that polls might not be suitable predictive tools, but that is how they are employed and conceptualized during an election season. In that case I think it is reasonable to try and create a good theory of how well a given polling methodology or firm performs at that task compared to others; at least that gives the poll consumer an understanding of which method or pollster is statistically likely to be the best predictor.

Posted on February 18, 2008 11:56 AM

DocD7

Thank you for your thoughtful comments.

"In that case I think it is reasonable to try and create a good theory of how well a given polling methodology or firm performs at that task compared to others; at least that gives the poll consumer an understanding of which method or pollster is statistically likely to be the best predictor."

My only objection to this statement is that you don't really learn anything by such an approach. Even if you could demonstrate that certain pollsters results are more closely correlated with voting results, you simply have a correlation not a prediction. Correlation looks backwards (i.e, it's historical in nature), polling takes a snapshop of the present, and predictive models look forward (such as weather forcasting). If one were going to create a predictive model of elction results, one would start with the realization that polling is only one factor among many that would need to be taken into account, in the same way that heat, moisture, and wind are only a few of the variables that go into weather forcasting.

I do realize that many people want polling to be predictive. But that desire simply stems from conceptual confusion. I think the industry would be much better off educating the layman to the limitations of polling rather than playing off the layman's confusion and creating models that look really fancy in their math but don't really mean much in practice. The key advantage of education is not only does it inform, it also lowers peoples' expecations of what polling can do to a realistic level.

Posted on February 19, 2008 2:13 PM

## Post a comment