Pollster.com

Articles and Analysis

 

Rating Pollster Accuracy: How Useful?

Topics: Accuracy , Brendan Nyhan , Courtney Kennedy , Fivethirtyeight , Nate Silver

I have been posting quite a bit lately on the subject of the transparency of Nate Silver's recently updated pollster ratings, so it was heartening to see his announcement yesterday that FiveThirtyEight has established a new process to allow pollsters to review their own polls in his database. That is a very positive step and we applaud him for it.

I haven't yet expressed much of an opinion on the ratings themselves or their methodology, and have hesitated to do so because I know some will see criticism from this corner as self-serving. Our site competes with FiveThirtyEight in some ways, and in unveiling these new ratings, Nate emphasized that "rating pollsters is at the core of FiveThirtyEight's mission, and forms the backbone of our forecasting models."

Pollster and FiveThirtyEight serve a similar mission, though we approach it differently: Helping those who follow political polls make sense of the sometimes conflicting or surprising results they produce. We are, in a sense, both participating in a similar conversation, a conversation in which, every day, someone asks some variant of the question, "Can I Trust This Poll?"

For Nate Silver and FiveThirtyEight, the answer to that question often flows from their ratings of pollster accuracy. During the 2008 campaign season, Nate leaned heavily on earlier versions of his ratings in posts that urged readers to pay less attention to some polls and more to others, with characterizations running the gamut from "pretty awful" or "distinctly poor" to the kind of pollster "I'd want with me on a desert island." He also built those ratings into his forecasting models, explaining to New York Magazine that other sites that average polls (among them RealClearPolitics and Pollster.com) "have the right idea, but they're not doing it quite the right way." The right way, as the article explained, was to average so that "the polls that were more accurate [would] count for more, while the bad polls would be discounted."

For better or worse, FiveThirtyEight's prominence makes these ratings central to our conversation about how to interpret and aggregate polls, and I have some serious concerns about the way these ratings are calculated and presented. Some commentary from our perspective is in order.

What's Good

Let's start with what's good about the the ratings.

First, most pollsters see value in broadly assessing poll accuracy. As the Pew Research Center's Scott Keeter has written (in a soon to be published chapter), "election polls provide a unique and highly visible validation of the accuracy of survey research," a "final exam" for pollsters that "rolls around every two or four years." And, while Keeter has used accuracy measurements to assess methodology, others have used accuracy scores to tout their organizations' successes, even if their claims sometimes depend on cherry-picked methods of scoring, cherry-picked polls or even a single poll. So Silver deserves credit for taking on the unforgiving task of scoring individual pollsters.

Second, by gathering pre-election poll results across many different types of elections over more than ten years, Silver has also created a very useful resource to help understand the strengths and weaknesses of pre-election polling. One of the most powerful examples is the table, reproduced below, that he included in his methodology review. It shows that poll errors are typically smallest for national presidential elections and get bigger (in ascending order) for polls on state-level presidential, senate, governor, and primary elections.

2010-06-17-silver-election-error.png

Third, I like the idea of trying to broaden the scoring of poll accuracy beyond the final poll conducted by each organization before an election. He includes all polls with a "median date" (at least halfway completed) within 21 days of the election. As he writes, we have seen some notable examples in recent years of pollsters whose numbers "bounce around a lot before 'magically' falling in line with the broad consensus of other pollsters." If we just score "the last poll," we create incentives for ethically challenged pollsters to try to game the scorecards.

Of course, Silver's solution creates a big new challenge of its own: How to score the accuracy of polls taken as many as three weeks before an election while not penalizing pollsters that are more active in races like primary elections that are more prone to huge late swings in vote preference. A pollster might provide a spot-on measurement of a late breaking trend in a series of tracking polls, but only their final poll would be deemed "accurate."

Fourth, for better or worse, Silver has already done a service by significantly raising the profile of the Transparency Initiative of the American Association for Public Opinion Research (AAPOR). Much more on that subject below.

Finally, you simply have to give Nate credit both for the sheer chutzpah necessary to take on the Everest-like challenge of combining polls from so many different types of elections spanning so many years into a single scoring and ranking system. It's a daunting task.

A Reality Check

While the goals are laudable, I want to suggest a number of reasons to take the resulting scores, and especially the rankings of pollsters using those scores, with huge grains of salt.

First, as Silver himself warns, scoring the accuracy of pre-election polls has limited utility. They tell you something about whether pollsters "accurately [forecast] election outcomes, when they release polls into the public domain in the period immediately prior to an election." As such:

The ratings may not tell you very much about how accurate a pollster is when probing non-electoral public policy questions, in which case things like proper question wording and ordering become much more important. The ratings may not tell you very much about how accurate a pollster is far in advance an election, when definitions of things like "likely voters" are much more ambiguous. And they may not tell you very much about how accurate the pollsters are when acting as internal pollsters on behalf of campaigns.

I would add at least one more: Given the importance of the likely voter models in determining the accuracy of pre-election polls, these ratings also tell you little about a pollsters' ability to begin with a truly representative sample of all adults.

Second, even if you take the scores at face value, the final scores that Silver reports vary little from pollster to pollster. They provide little real differentiation among most of the pollsters on the list. What is the range of uncertainty, or if you will, the "margin of error" associated with the various scores? Silver told Markos Moulitsas that "the absolute difference in the pollster ratings is not very great. Most of the time, there is no difference at all."

Also, in response to my question on this subject, he advised that while "estimating the errors on the PIE [pollster-introduced error] terms is not quite as straightforward as it might seem," he assumes a margin of error "on the order of +/- .4" assuming a 95% confidence level. He adds:

We can say with a fair amount of confidence that the pollsters at the top dozen or so positions in the chart are skilled, and the bottom dozen or so are unskilled i.e. "bad". Beyond that, I don't think people should be sweating every detail down to the tenth-of-a-point level.

That information implies, as our commenter jme put it yesterday that "his model is really only useful for classifying pollsters into three groups: Probably good, probably bad and everyone else." And that assumes that this confidence is based on an actual computation of standard errors for the PIE scores. Commenter Cato has doubts.

But aside from the mechanics, if all we can conclude is that Pollster A produces polls that are, on average, a point or two less variable than Pollster B, do these accuracy scores help us understand why, to pick a recent example, one poll shows a candidate leading by 21 points and another shows him leading by 8 points?

Third, even if you take the PIE scores at face value, I would quarrel with the notion that they reflect pollster "skill." This complaint that has come up repeatedly in my conversations with survey methodologists over the last two weeks. For example, Courtney Kennedy, a senior methodologist for Abt SRB, tells me via email that she finds the concept of skill "odd" in this context:

Pollsters demonstrate their "skill" through a set of design decisions (e.g., sample design, weighting) that, for the most part, are quantifiable and could theoretically be included in the model. He seems to use "skill" to refer to the net effect of all the variables that he doesn't have easy access to.

Brendan Nyhan, the University of Michigan academic who frequently cross-posts to this site, makes a similar point via email:

It's not necessarily true that the dummy variable for each firm (i.e. the "raw score") actually "reflects the pollster's skill" as Silver states. These estimates instead capture the expected difference in accuracy of that firm's polls controlling for other factors -- a difference that could be the result of a variety of factors other than skill. For instance, if certain pollsters tend to poll in races with well-known incumbents that are easier to poll, this could affect the expected accuracy of their polls even after adjusting for other factors. Without random assignment of pollsters to campaigns, it's important to be cautious in interpreting regression coefficients.

Fourth, there are good reasons to take the scores at something less than face value. They reflect the end product of a whole host of assumptions that Silver has made about how to measure error, and how to level the playing field and control for factors -- like type of election and timing -- that may give some pollsters an advantage. Small changes in those assumptions could alter the scores and rankings. For example, he could have used different measures of error (that make different assumption about how to treat undecided voters), looked at different time intervals (Why 21 days? Why not 10? Or 30?), gathered polls for a different set of years or made different decisions about the functional form of his regression models and procedures. My point here is not to question the decisions he made, but to underscore that different decisions would likely produce different rankings.

Fifth, and most important, anyone that relies on Silver's PIE scores needs to understand the implications of his "regressing" the scores to "different means," a complex process that essentially gives bonus points to pollsters that are members of the National Council of Public Polls (NCPP) or that publicly endorsed AAPOR's Transparency Initiative prior to June 1, 2010. These bonus points, as you will see, do not level the playing field among pollsters. They do just the opposite.

In his methodological discussion, Silver explains that he combined NCPP membership and endorsement of the AAPOR initiative into a single variable and found, with "approximately" 95% confidence, "that the [accuracy] scores of polling firms which have made a public commitment to disclosure and transparency hold up better over time." In other words, the pollsters he flagged with an NCPP/AAPOR label appeared to be more accurate than the rest.

His PIE scores include a complex regressing-to-the-mean procedure that aims to minimize raw error scores that are randomly very low or very high for pollsters with relatively few polls in his database. And -- a very important point -- he says that the "principle purpose" of these scores is to weight pollsters higher or lower as part of FiveThirtyEight's electoral forecasting system.

So he has opted to adjust the PIE scores so that NCPP/AAPOR pollsters get more points for accuracy and others get less (he applies an analogous penalty for pollsters that conduct surveys over the internet). The adjustment effectively reduces the PIE error scores by as much as a half point for pollsters in the NCPP/AAPOR category. Pollsters with the least number of polls in his database get the biggest boost in their PIE scores. He also awards a similarly sized and analogous penalty to three firms that conduct surveys over the internet. His explains that his rationale is "not to evaluate how accurate a pollster has been in the past -- but rather, to anticipate how accurate it will be going forward."

Read that last sentence again, because it's important. He has adjusted the PIE scores that he uses to rank "pollster performance" not only on their individual performance looking back, but also on his prediction on how they will perform going forward.

Regular readers will know that I am an active AAPOR member and strong booster of the initiative and efforts to improve pollster disclosure generally. I believe that transparency may tell us something, indirectly, about survey quality. So I am intrigued by Silver's findings concerning the NCPP/AAPOR pollsters as a group, but I'm not a fan of of the bonus/penalty point system he built into the ratings of individual pollsters. Let me show you why.

The following is a screen-shot of the table Silver provides that ranks all 262 pollsters, showing just the top-30. Keep in mind this is what his readers get to when they click on the "Pollster Ratings" tab displayed prominently on tab at the top of FiveThirtyEight.com:

2010-06-17-538ratings-screenshot.png

The NCPP/AAPOR pollsters are denoted with a blue star. They dominate the top of the list, accounting for 23 of the top 30 pollsters.

But what would have happened had Silver awarded no bonus points? We don't know for certain, because he provided no PIE scores calculated any other way, but we did our best to replicate Silver's scoring method but recalculating the PIE score without any bonus or penalty points (regressing the scores to the single mean of 0.12). That table appears below.**

[I want to be clear that the following chart was not produced or endorsed by Nate Silver or FiveThirtyEight.com. We produced it for demonstration purposes only, although we tried to replicate his calculations as closely as we could. Also note that the "Flat PIE" scores do not reflect Pollster.com's assessment or ranking of pollster accuracy, and no one should cite them as such].

2010-06-17-flatPIE.png

The top 30 look a lot different once we remove the bonus and penalty points. The number of NCPP/AAPOR designated pollsters in the top 30 drops from 23 to 7 (although the 7 that remain all fall within the top 13, something that may help explain the underlying NCPP/AAPOR effect that Silver reports). Those bumped from the top 30 often move far down the list. You can download our spreadsheet to see all the details, but nine pollsters awarded NCPP/AAPOR bonus points drop in the rankings by 100 or more places.

[In a guest post earlier today on Pollster.com, Monmouth University pollster Patrick Murray describes a very similar analysis he did using the same data. Murray regressed to the PIE scores to a different single mean (0.50), yet describes a very similar shift in the rankings].

Now I want to make clear that I do not question Silver's motives in regressing to different means. I am certain he genuinely believes the NCPP/AAPOR adjustment will improve the accuracy of his election forecasts. If the adjustment only affected those forecasts -- his poll averages -- I probably would not comment. But they do more than that. His adjustments appear to significantly and dramatically alter rankings prominently promoted as "pollster ratings," ratings that are already having an impact on the reputations and livelihoods of individual pollsters.

That's a problem.

And it adjusts those ratings in a way that's not justified by his finding. Joining NCPP or endorsing the AAPOR initiative may be statistically related to other aspects of pollster philosophy or practice that made them more accurate in the past, but no one -- not even Nate Silver -- believes that a mere commitment made a few weeks ago to greater future transparency caused pollsters to be more accurate over the last ten years.

Yet in adjusting his scores as he does, Silver is increasing the accuracy ratings of some firms and penalizing others on those grounds, in a way that is also contrary to AAPOR's intentions. On May 14, when AAPOR's Peter Miller presented the initial list of organizations that had endorsed the transparency initiative, he specifically warned his audience that many organizations would soon be added to the list because "I have not been able to make contact with everyone" while others faced contractual prohibitions Miller believed could be changed over time. As such, he offered this explicit warning: "Don't make any inferences about blanks up here, [about] names you don't see on this list."***

And one more thought: If you look back at both tables above, you will notice Silver strikes out the name Strategic Vision, LLC, and marks with a black "x", because he concludes that its polling "was probably fake," cracks the top-30 "most accurate" pollsters (of 262) on both lists.

If a pollster can reach the 80th or 90th percentile for accuracy with made up data, imagine how "accurate" a pollster can be by simply taking other pollsters' results into account when tweaking their likely voters model or weighting real data. As such, how useful are such ratings for assessing whether pollsters are really starting with representative samples of adults?

My bottom line: These sort of pollster ratings and rankings are interesting, but they are of very limited utility in sorting out "good" pollsters from "bad."

**Silver has not, as far as I can tell, published the mean he would regress PIE to had he chosen to regress to a single mean. I arrived at 0.12 based on an explanation he provided to Doug Rivers of YouGov/Polimetrix (who is also the owner of Pollster.com) that Rivers subsequently shared with me: "the [group mean] figures are calibrated very slightly differently than the STATA output in order to ensure that the average adjscore -- weighted by the number of polls each firm has conducted -- is exactly zero." A "flat mean" of 0.12 creates a weighted average adjscore of zero. I emailed Silver this morning asking if he could confirm. As of this writing he has not responded.

**In the interests truly full transparency, I should disclose that I suggested to Nate that he look at pollster accuracy among pollsters that had endorsed the AAPOR Transparency Initiative before he posted his ratings. He had originally found the apparent effect looking only at members of NCPP, and he sent an email to Jay Leve (of SurveyUSA), Gary Langer (polling director of ABC News) and me on June 1 to share the results and ask some additional questions, including: "Are there any variables similar to NCPP membership that I should consider instead, such as AAPOR membership?" AAPOR membership is problematic, since AAPOR is an organization of individuals and not firms, so I suggested he look at the Transparency Initiative list. In his first email, Silver also mentioned that, "the ratings for NCPP members will be regressed to a different mean than those for non-NCPP members." I will confess that at the time I had no idea what that meant, but in fairness, I certainly could have raised an objection then and did not.

 

Comments
DCM:

Excellent critique & hopefully Nate will appreciate your feedback as valuable constructive criticism.

The pollster rating concept is certainly worthwhile - but this iteration should remain a 'draft' work-in-progress until it has been further evaluated over time.

Since Silver posted this new system, my main concern was the breadth of comparing large diverse data universes, rather than limiting the comparisons to more direct comparable elections only.

Mark - can you attempt to run a dynamic update for the rating given to Selzer [whom Nate rates & speaks of highly] by incorporating their wide miss in the IA GOP GOV primary earlier this month ?

IIRC, Selzer predicted Branstad by 29+/- and the margin was 'only' 9+/-.

Since Selzer currently has a small universe of qualifying polls in the database, it seems to me that this result might knock them down considerably.

Is that fair & equitable to ding a pollster THAT much for polling a small state primary - especially when most other pollsters avoided it... [R2K may have met the same fate by polling many races where others did not release.]

In other words, that example would probably be a good test case for how the ratings can & will be manipulated. Nate's system seems ripe for gaming it by publishing only the 'easy' general elections & hit the easy ones often while avoiding the special elections & primaries & CDs & small or difficult state level.

I fear that this rating system will inhibit pollsters, or at least inhibit the results being made publicly available for fear of being 'dinged' & reputations tarnished, when it will be easier to go with the flow [ala SV] and get 'good' ratings.

And as you correctly noted, the variance between ratings is mostly not very significant. I would be interested in IF they called the race wrong or 'corrected' late to approach the mean of other pollsters.

____________________

AySz88:

About the term "skill", it might be clarifying to note that the term has a different meaning in the context of algorithms and metrics that score the quality of "experts" when they are attempting predictions. For one example, here is how it is used in hurricane season forecasting. Technically, it's Silver's model that is the "expert" trying to do the predicting. But in building the model, it is the pollsters which are "experts" predicting the election result (even though the pollsters are not actually trying to do any such thing).

This can also help explain why Strategic Vision (and any other pollster) could do well with fake data. If you allow me the baseball analogy, it'd be like trying to use stats to segregate between talent and lifelong steroid usage, but with only one suspected steroid user. The difference between "find skill" and "find skill, but not skill at cheating" is huge - finding cheating is almost an orthogonal problem. (Obviously, the poor pollsters aren't cheating, but I don't see any obvious way to get any further.)

On the transparency "bonus", I think the root problem is that it's such a coarse classification. Consider if skill was correlated with some probability of a pollster having signed on to the transparency initiative. A few lucky pollsters may well have just happened to be outliers (poorer pollsters that happened to have signed on), and the magnification can cause a huge variance in PIE scores. (Now that I think of it, is such variance included in that "+/- 0.4" number, or has he assumed that the membership list is a given and not probabilistic?) If there's any way to make things more continuous (like using other metrics such as interviews conducted per year; I made more suggestions in the guest post comments), it'd be nice to see what could be gained.

____________________

Matthew Huntington:

There's an additional issue with pollster accuracy vs. skill, which came up in an earlier thread.

Let's say Joe and Ted are running against each other, with an election to be held November 4th.

Pollster 1 surveys on November 1st. It shows Joe: 47%, Ted 45%, undecided 8%.

Pollster 2 surverys on October 25th. It shows Joe: 47%, Ted 45%, undecided 8%.

Which survey is more accurate?

Well, they seem equally accurate, right? But Nate gives Pollster 2 the bonus, because their survey was done earlier. In fact, it's even worse- if pollster 2 had also done a survey on November 1st and gotten the same results, his rating would actually drop, even if his accuracy was dead on.

I'm not convinced that on an individual level that polls taken right before an election are inherently more accurate than one taken a week before the election, in part because polls taken before an election are usually in a compressed time frame (you can't do one day truly random and then a second day with selected targets to get the demographics right), and partly because people just stop answering polls near an election.

But on a group level it's just wrong. A pollster who polls routinely six days before an election starts badly in the hole against a pollster who polls only ten days or more out. That can't be right.

Now, when he uses that reliability for indexing, it works perfectly. Pollster's one's poll gets a penalty for the poll and a bonus for being taken recently, which effectily cancel out. So when you take a look at the two polls above, they're given roughly equal weight. But that really doesn't have anything to do with the skill of the pollster.

I think that for 13 days before vs. 6 days before, there tend to be two possiblities:

1. Nothing happens during that week, so the changes tend to be random noise (people doubting themselves and such).

2. Something happens during the week, which throws the old polls out the window.

Being a good pollster for Nate's metric seems to be mostly about guessing whther a bombshell is going to hit, and then polling as far out possible if there isn't (and not checking again), and waiting until after the bomb hits if there one coming.

Let's take my earlier example one more time:

Pollster 1 surveys on November 1st. It shows Joe: 47%, Ted 45%, undecided 8%. Five other polls are taken on this date, showing Ted winning by a 52-44 margin.

Pollster 2 surveys on October 25th. It shows Joe: 47%, Ted 45%, undecided 8%. Five other polls are taken on this date, showing Joe winning by a 52-44 margin.

It should be obvious, I hope that the actual accuracy of the pollster should no longer be easily determined. If Ted wins 53-47, pollster 1 did much worse than the other pollsters on that date, while Pollster 2 did much better. If Joe wins by 8 points, then Pollster 2 did horrible and pollster 1 did great at not going 'with the flow' and correctly predicting that Joe would win.

But Nate's study doesn't seem to go that deep. And when we're talking relatively small sample and error sizes, that makes all the difference in the world.

____________________

dpearl:

I think it would be interesting to see a rating system based on head-to-head comparisons like they use for rating peoples' abilities in chess.

____________________

kingsbridge77:

The problem with Nate Silver is the fact that he's a pro-Obama hack. If you come to Pollster.com, for example, you don't see political opinion. You see numbers only, and if there's opinion that opinion is based on poll numbers. Nate simply defends whatever Obama does.

So You won't see me visiting 538 very often.

____________________

DCM:

There is also the Q concerning variance due to the early voting vaiable & impact on late polling accuracy.

So much of the vote in many states is now cast as absentee + early including mail-in & walk-in and those percentage of the total is expanding rapidly. More than a few election exceed 50% of the total which can be cast as far as 30 days or so in advance of 'election day'.

IIRC, only a few pollsters asked the Q in the 2008 general "Have you already cast your vote" - but I do not see that asked across the board plus how can you ask an actual 'absentee' in a small random weighted sample ?

So it stands to reason that late movements or bounces can easily be misinterpreted when the voteing is spread out over a month period of time, and this will impact polls [same as landline/cellphone] increasing in the future imho - unless the model can accurately reflect this with transparent access to crosstabs.

This also can impact the conncept of 'LV' - as it does not take 'high enthusiasm' to cast an early or absentee ballot with no standing in lines, no weather, no other constraints.

Many of us are on the registars list to have our ballots mailed out directly to us for every scheduled election [no additional request required]. It shows up in the mail, so you return it whether you are enthused or not...

LV [especially early in a cycle] is a strawman concept - except when based on historical patterns. And 'enthusiasm' is mostly a bogus consideration in terms of accurate polling projections for the future [except perhaps to track trends].

____________________

dpearl:

DCM: good point on the mail-in issue. I hope more pollsters begin to ask the "have you already voted?" question routinely and weight their results accordingly when they publish election prediction polls. I'll bet that will be the standard by 2012.

____________________

Farleftandproud:

I get tired of some from the left complaining about Obama. Of course he could be tougher on the GOP and get legislation done through reconciliation more, but after seeing this clip it makes me so thankful for having him in the white house.

http://crooksandliars.com/john-amato/eric-cantors-insane-rant-town-hall-meet

____________________

coltrane:

Thanks for the excellent, thorough and detailed analysis, Mark. I have been a regular reader of 538, and have been troubled at times by some of the sweeping conclusions Nate makes (Florida, post-Crist has only a 22% chance of not being GOP but PA, post-Sestak has a 71% chance of going GOP???), and this will indeed help me take Nate's pollster ratings with a large grain of salt. You explained it, btw, in very clear and not enormously technical language, and that is much appreciated by a layman like me...

____________________

murphro:

Mark,

Nate just slapped you pretty good in his response posted today 6.20.10. What are you going to do about it? If you believe the business of conducting polls is useful/helpful/important, why then would you not want to determine a pollsters relative accuracy and competence? Your career choice and your arguments just don't add up. Nate says he does not think you are lazy in not creating your own rating system, but I have to wonder. Is your point that polling should be a faith-based enterprise rather than systematic and mathematic one?

____________________

brambster:

Nate has always mixed his opinion with stats, or rather, he seems to often form an opinion and then shape his stats to that opinion.

The danger for Nate of course is that eventually he'll find his opinion on the wrong side of the stats and he'll look worse for it instead of better as he has.

____________________

Robert Ford:

"The danger for Nate of course is that eventually he'll find his opinion on the wrong side of the stats and he'll look worse for it instead of better as he has."

Brambster, that has already happened. See the exchange between myself and Silver on UK election forecasting in April-May. Nate's poor results on this occasion seem to have had little impact on his faith in pure stats or his dismissive opinion of academics or others skeptical about his findings

____________________

hoosier_gary:

What I see as a problem for Nate is that he tends to act like a precious prima-donna. Right now, he is seething mad that people are questioning his methods as if he is the one and only world authority on polls.

He poured a lot of work into his ranking system. That doesn't make it right, though. There are too many arbitrary factors that he seems to have just made up. Like you mentioned above, he is attempting to predice future performance of pollsters. That's nonsense.

____________________

SystematicError:

These comments and responses go to the differing DNA of these two sites. Pollster.com started off as "Mystery Pollster" offering commentary on the arcana of polls, while 538 started life as "poblano", doing predictions on Dailykos.

Of course, a database of pollsters and their "pulls" is more attuned to the needs Nate Silver. After correction, his pseudoexperiments give more coherent (not necessarily more accurate) results.

Anyway, this is like Murray Gell-Man and Richard Feynman arguing about whether they are "quarks" or "partons".

To stick with the nuclear/particle physics analogy, you guys (pollsters) are where nuclear physics was in the 1960's. A plethora of results and not enough consolidation or standardization. A group from Berkeley formed the Particle Data Group and brought sanity. You need to do the same.

Perhaps you (Mark Blumenthal) along with Profs. Charles Franklin and Sam Wang, and Nate Silver, and maybe some dirt-on-their-soles pollsters (Ann Seltzer?) form a group that examines and consolidates polls.

A blue-ribbon panel of pollsters.

If enough pollsters cooperated, you could study the demographic weighting errors. You could publish standardized likely-voter screens. Get a handle on this cellphone problem. Define metrics for what is a "push poll".

____________________

John Zogby and Nate Silver: 2000-2008 True Vote vs. Recorded Vote Rankings

Richard Charnin (TruthIsAll)

July 17, 2010

http://richardcharnin.com/SilverRankings.htm

As discussed in my open letter to Nate Silver, his methodology for ranking pollsters is based on an invalid premise: that the recorded vote is an appropriate basis for measuring performance. Due to systemic election fraud, the recorded vote is not justified. The best measure is the True Vote, which is derived from total votes cast, rather than votes recorded. Using the Census value for total votes cast in the prior and current elections, we deduct four-year voter mortality and, combined with a best estimate turnout of living voters in the current election, we utilize National Exit Poll vote shares to calculate the True Vote.

Given the True Vote for the 2000, 2004, 2006 and 2008 elections, we can measure pollster performance in predicting the vote. Good pollsters such as John Zogby should not be penalized in the rankings because of election fraud. Conversely, biased pollsters such as Rasmussen should not have been rewarded in Silver’s rankings for predicting a fraudulent recorded vote.

Reputable election analysts who have crunched the numbers agree that the 2000 and 2004 elections were stolen and Democratic Landslides were denied in the 2006 midterms and the 2008 presidential election.

The following tables illustrate pollster performance for the four elections against both the True Vote and the recorded vote. The rankings are straightforward; they are based on the deviation between the final poll (adjusted for undecided voters) and the True and recorded votes.

The projections allocated 75% of undecided votes to the Democrats, who were the challengers in 2004, 2006 and 2008.
In 2000, Clinton was the incumbent who had high approval and a strong economic record, therefore a 50/50 split was assumed in the undecided vote.
In 2004, Bush had a 48% approval rating which declined to 25% in 2008. Obama was the de-facto challenger; McCain represented the incumbent.

Nate Silver ranks Zogby DEAD LAST. The historical record proves that Silver is DEAD WRONG.

This is what Zogby had to say just before Election Day 2004:
The key reason why I still think that Kerry will win… traditionally, the undecideds break for the challenger against the incumbent on the basis of the fact, simply, that the voters already know the incumbent, and it's a referendum on the incumbent.

And if the incumbent is polling, generally, under 50 percent and leading by less than 10, historically, incumbents have lost 7 out of 10 times. In this instance you have a tie, a President who is not going over 48, undecideds who tell us by small percentages that the President deserves to be reelected. And in essence, it gives all the appearances that the undecideds -- the most important people in the world today -- have made up their minds about President Bush.

The only question left is: Can they vote for John Kerry? If it's a good turnout, look for a Kerry victory. If it's a lower turnout, it means that the President has succeeded in raising questions about John Kerry's fitness.

There was a very heavy turnout of 22 million first-time voters and others who did not vote in 2000. In his Election Day polling, Zogby had Kerry winning by 50-47% with 311 electoral votes, indicating that 75% of undecided voters broke for Kerry. This was a virtual match to the 52-47% unadjusted state exit poll aggregate data later released in the Edison Mitofsky 2004 Evaluation Report.

Let’s review Zogby’s performance in the tables below:

In 1996, Zogby’s forecast ranked # 1.
He forecast that Clinton would win by 8.1%.
Clinton had an 8.4% recorded vote margin.

In 2000, Zogby ranked #1 of 10 national polls.
His final projection was within 0.5% of Gore’s recorded vote
But it was 2.4% lower than the True Vote.
Gore did better than the recorded vote indicates.
There were 6 million uncounted votes.
Gore won by at least 3 million votes.
The election was stolen.

In 2004, Zogby ranked #14 (tied) of 18 polls.
His final 3-day tracking poll projection deviated 1.2% from Kerry’s recorded 48.3% share.
Zogby’s Election Day polling had Kerry by 50-47%
There were 4 million uncounted votes.
Kerry had a 53.2% True Vote share and won by 10 million votes.
The election was stolen.

In 2006, Zogby ranked #7 of 11 Generic Polls.
There were over 3 million uncounted votes.
The pre-election Generic Poll Trend Model forecast a 56.4% Democratic Landslide.
The unadjusted National Exit Poll also gave the Democrats 56.4%.
The Democratic landslide was denied.

In 2008, Zogby’s True Vote rank was # 4 of 15 polls.
Obama had a 52.9% recorded share and a 9.5 million vote margin.
But Obama had a 58% True Vote share and won by 22 million votes.
The Obama landslide was denied.

Of the 54 polls listed, Zogby’s True Vote rank is #26.
So why is he at the very bottom of Silver’s list far below the next lowest?

____________________

I have always said that number do the talk.
This is one of those cases, simply impressive!

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR