### Pennsylvania Poll Errors

#### Charles Franklin | April 23, 2008

##### Topics: ARG , Barack Obama , Hillary Clinton , Pollsters , Rasmussen , Suffolk , SurveyUSA , Zogby

Pennsylvania was a pretty good night for most pollsters, certainly compared to some earlier primaries this year. A few made it into the "five-ring" of the target, while almost all were within the "ten-ring". Only two polls, one rather old, got the winner wrong.

Polls finished on or after April 14 are included in the analysis here.

These errors are based on the vote counts at the Pennsylvania Secretary of State web site as of Wednesday afternoon, with 99.44% of precincts reporting and a Clinton vote of 1,237,696 to Obama's 1,029,672, which rounded to 1 decimal point is 54.6% to 45.4%.

There are a number of different ways to compute accuracy for individual pollsters. SurveyUSA has an excellent assessment and explanation of these as well as measures for all pollsters in all primaries this year. (Their Pollster Report Card is currently masked, awaiting a 100% count from Pennsylvania, so I can't link to it right now. I don't expect the remaining precincts to change the 1 decimal point accuracy here, though I will check and update if necessary.)

The measure of accuracy I use here is being close to the "bullseye" of the target above. I think that is what most people would intuitively think of as accuracy-- getting both candidates right. A "perfect" poll would be exactly on the crosshairs in the middle of the target, which corresponds to getting both candidates' votes exactly right.

Because polls almost always include "undecided" voters, their results tend to be in the lower left quadrant, underestimating the final vote for each candidate. (And in a two candidate race, it is impossible to be in the upper right quadrant, but not so in multi-candidate primaries earlier in the year.)

To summarize a pollster's accuracy, I calculate the distance from their poll to the crosshairs of the bullseye. (The distance is the square root of the sum of squared errors for each candidate, if you recall your math about triangles and the hypotenuse.) This "Total Error" is plotted by pollster below. Smaller errors are to the left.

Quinnipiac gets the bragging rights by this measure, with their 51%-44% from polling completed 4/18-20/08. They are followed by Suffolk, ARG and SurveyUSA.

The dots become darker the closer to election day the poll was taken. In this plot, the more recent polls are usually more accurate than are older polls. This is especially clear in the Zogby/Newsmax polling.

A reasonable complaint about this measure is that if a poll finds more "undecided" voters, they will tend to be further away from the bullseye and so this measure penalizes pollster who are more sensitive to potential uncertainty among voters, while possibly rewarding those who push respondents harder for an answer. Deciding how hard to push for a preference is part of the "art" of polling and reasonable pollsters may differ on how hard to push.

An alternative measure focuses on the "margin" between the candidates in the poll compared to the vote. By this approach a poll with a 10 point margin is "right on" if the vote margin is 10 points. But this is true for a poll that has 55-45 as well as for one that has it 45-35 or even 25-15. Despite this drawback, the margin measure doesn't penalize for undecided rate and so it has fans. By that measure the pollsters line up as below.

Here two pollsters can each claim victory. Suffolk and Zogby/Newsmax each had a 10 point margin in their final polls, just a bit over the 9.2 point margin in the vote count. The Insider Advantage final poll has a larger error, with a 7 point margin in their 4/21 poll, while the 10 point margin was for their 4/20 poll. Likewise, Rasmussen's 9 point margin came from a very old poll taken 4/14 (50%-41%) while their final poll taken 4/20 had a 5 point margin (49%-44%).

*Cross-posted at PoliticalArithmetik.com.*

## Comments

It looks like you missed the *final* ARG poll, posted late yesterday, showing Clinton with a 16 point lead - that is not as accurate as the prior day's poll's 13 point lead.

Posted on April 23, 2008 11:07 PM

I'm new to this world of polling but you do a great job of making it fascinating!

In reading your explanation, I can easily see that there's potential for a great academic debate about just what it is that a poll is supposed to represent -- is it a snapshot of opinion in time? or is it a predictor? (I'm sure that we'd all quickly agree, though, that we wouldn't care so much about the snapshot, when discussing elections, if it wasn't an indicator of things to come).

At any rate, I can also see that these two ways of measuring accuracy beg for a compromise on how much the pollster is punished by undecideds. It seems that you could easily 'weight' the margin of error that is based on the difference in spread (bottom graph) by multiplying it by the other metric (the hypotenuse from the first graph). This seems like it would reward those that were closer in both metrics. I do admit, though that there is potential for a wide variance.

Again, new kid here. So, I'll apologize if this is overly simplistic or if there's a hole in my logic.

Posted on April 24, 2008 10:30 AM

## Post a comment