Pollster.com

Articles and Analysis

 

46-45 Plus or Minus 3

Topics: 2008 , Barack Obama , Bradley/Wilder , Divergent Polls , Gallup , Hillary Clinton , IVR Polls , Mark Lindeman , Pollster , PPP , Rasmussen , Sampling Error , SurveyUSA

In case you missed our update, the most recent Gallup Daily result on the Democratic race shows a near dead-heat, with Barack Obama ahead of Hillary Clinton by a single percentage point margin not nearly large enough to attain statistical significance (47% to 46%). That one point lead is somewhat apropos, since it is virtually identical to the average of all of Gallup's Daily releases since February 8 (Obama 46%, Clinton 45%). So the question for the day: How much of the daily variation over the last six weeks has been real and how much is random noise?

Let's start with the chart of the Gallup Daily results since their three-day track completed on February 8 (and released on February 9). That was the first three-day result collected entirely after the results from the Super Tuesday primaries were known.


03-25 Gallup Daily.jpg

While the Gallup trend has shown several "figure eights" over the last few weeks (as reader "emcee" put it), most of that variation occurs within the range that we should expect from a survey with a +/- 3 point margin of sampling error.

To illustrate that point, consider the hypothetical possibility that the preferences among Democrats have remained perfectly stable for the last six weeks. Let's assume that the average result since February 8 -- 46% to 45% favoring Obama -- has been the unchanging reality. What sort of random variation should we expect from taking a sample rather than interviewing the entire population?

First, remember that the so-called "margin of error" applies to the individual percentages, not the margin between the candidates. So under our hypothetical "no change" scenario, we would expect the the Obama percentages to fall somewhere between 43% and 49% (46% +/- 3) and the Clinton percentages to fall somewhere between 42% and 48% for Clinton (45% +/-3).

Since February 8, the results of the actual Gallup Daily have fallen outside that range on just three days:

  • March 1, when Obama led 50% to 42%
  • March 13, when Obama led 50% to 44%
  • March 18, when Clinton led 49% to 42%

But wait. As some of you may remember, most political surveys (including Gallup) calculate the margin of error using a 95% confidence level. That assumption means that we should expect results slightly outside the margin of error for one poll in twenty.

Unfortunately, at this point our story gets a little bit more complicated, because the "one in twenty" assumption applies to statistically independent measurements. Since each Gallup Daily release is based on a three-day rolling average, there is overlap in the sample on successive days. So only the results from every third day are truly "independent." 'll skip over some even more confusing explanation and get to the bottom line: Since February 8, roughly one-in-seven independent samples from the Gallup Daily series has produced a result outside the margin of error from my hypothetical, no-change, 46-45 scenario. That's a little bit more than we would expect by chance alone, but not much more.

Having said all that, my explanation still oversimplifies. It ignores the possibility for meaningful change within the standard "margin of error" -- subtle shifts that might not attain statistical significance in a single three-day sampling, but might over the course of a week or more.

A better way to distinguish the meaningful patterns is to compare Gallup's results to those from another pollster or two. Let's start with a chart of the Rasmussen Reports daily tracking poll over the same six week period. Not surprisingly, the average of the Rasmussen data gathered since February 8 also shows Obama leading by a single percentage point (45% to 44%).


03-25 Rasmussen2popup.php

Compare the two charts (or look at the chart below, which plots a Clinton-minus-Obama margin for both polls) and you will see several features in common:

  • Both show a shift from Clinton to Obama between Super Tuesday and mid-February
  • Both show Obama maintaining a low single-digit lead from mid to late February
  • Both show Clinton rising a few days before the March 4 primaries and falling a few days after

And yet, at about the time the news surrounding Jeremiah Wright became a full-blown media obsession (March 14), the results of the two polls appear to diverge. Why is that?


03-25comparison1.jpg

We should keep in mind that Gallup and Rasmussen collect their data differently (and ask slightly different questions -- see the postscript). Gallup uses live interviewers, makes repeated call-backs to unavailable respondents, samples cell phone numbers, and routes calls to Spanish speaking interviewers when they reach a Spanish speaking household. Rasmussen uses an automated system and recorded voice to conduct interviews, a slightly tighter screen for "likely voters," yet (as I understand it) makes no calls backs, does not call cell-phones and makes no provision for bilingual interviewing.

Some, I am sure, will readily conclude that one or more of these characteristics (or perhaps others that I've omitted) provide "obvious" explanations for the discrepancies. I am reluctant to make too much of these differences. The reasons be clearer after we look at data from a third source. I obtained it earlier today from an anonymous but trusted pollster that I'll call "Polimatic." Here is a chart of the Polimatic's tracking data for the last six weeks:


03-25polimatic.jpg

Those who notice the greater stability in the Polimatic data as compared to Gallup and Rasmussen are on to something important. Next consider how the Clinton-minus-Obama margin from the Polimatic data compares to the other pollsters:


03-25polimatic_compare.jpg

See some interesting patterns? Starting to form theories about what type of poll Polimatic is, or how their methodology might influence their results?

Well, before you go too far, I should fess up. I fibbed. "Polimatic" is not a pollster at all. The data are based on a simulation run by our friend Mark Lindeman. Mark created a spreadsheet that generates random results consistent with a thee-day rolling average tracking sample of 1,26040 interviews and the assumption that the "true" population value remains an unchanging 46% to 45% Obama lead.

The Polimatic line is more stable, suggesting that the consistently highest highs and lowest lows of the blue and red lines probably represent real divergence. However, the purely random variation of the simulated poll trend line is frequently hard to distinguish from the real surveys.

To generate the results above, I closed my eyes and clicked the mouse to let the spreadsheet recalculate. As such, the "Polimatic" line illustrates one potential trend showing nothing but random noise around a 46% to 45% margin. I'll say it one more time to be clear: All of the variation in the Pollmatic trend lines is based on purely random chance. Any resemblance to real changes as measured by Gallup or Rasmussen is entirely coincidental.

So what can we conclude from all this?

First, there has been far more stability than change in the national Obama-Clinton vote preference since Super Tuesday, and that includes the period of last ten days. To the extent that we have seen real changes, they are barely bigger than what we might expect by chance alone.

Second, if you look closely, you will notice that the seemingly odd divergence between Gallup and Rasmussen since the Wright story broke is really not that unusual. It is comparable to similar separations in the trend lines that occurred around February 13 and February 29. Random variation will do that.

Third, and probably most important, it is far too easy to look at these rolling average tracking surveys and see compelling narratives and spin interesting theories from what is often little more than random noise.

PS: Yes, as a few readers have already suggested in prior comments, some of the stability in national Democratic vote preference may stem from the fact that most states have already held their primaries and caucuses. We had some discussion about a month ago about how Gallup alters its screen slightly to accommodate states that have already voted. However, neither Gallup nor Rasmussen alters their vote question for those who have already voted. Here is the text used by each:

Gallup: Which of these candidates would you be most likely to support for the Democratic nomination for president in 2008, or would you support someone else? [ROTATED: New York Senator, Hillary Clinton; Former Alaska Senator, Mike Gravel; Illinois Senator, Barack Obama]

Rasmussen: If the Democratic Presidential Primary were held in your state today, would you vote for Hillary Clinton or Barack Obama? [options are rotated]

PPS: While I was writing this post, Mickey Kaus blogged a theory for the divergent Gallup and Rasmussen trend lines:

The 'Bradley Effect' is Back? Gallup's national tracking poll has Obama retaking the lead over Hillary after bottoming out on the day of his big race speech. Rasmussen's robo-poll, on the other hand, shows Obama losing ground since last Tuesday. True, even Rasmussen doesn't seem to be putting a lot of emphasis on his survey's 6-point shift. But isn't this week's primary race exactly the sort of environment--i.e.., the issue of race is in the air--when robo-polling is supposed to have an advantage over the conventional human telephone polling used by Gallup? Voters wary of looking like bigots to a live operator--'and why didn't you like Obama's plea for mutual for understanding that all the editorial pages liked?'--might lie about their opinions, a phenomenon known as the Bradley Effect. But they might be more willing to tell the truth to a machine. ...

Or more likely, the apparent differences between are about random variation in one or both polls. If you average the results from data collected since March 14 (the day the Wright story exploded) they are not very different:

  • Live Interviewer Gallup Daily: Clinton +2 (47% to 45)
  • Automated Rasmussen Reports: Obama +1 (45% to 44%)

Kaus also links to an automated PPP survey in North Carolina that fielded on the evening of March 17, the night before the Obama speech. As such, it is consistent with Gallup's "bottoming out" for Obama, not contradictory. The SurveyUSA results I blogged about on Friday were also collected from March 14 to March 16, just after the Wright story broke but before Obama's speech.

 

Comments
Thatcher:

Thanks Mark.

Gotta love the "Art" of Noise.

____________________

Mark Lindeman:

Some folks may be interested to know that for this run of Polimatic, the standard deviation was about 2.4, compared with about 3.8 for Gallup and 4.4 for Rasmussen. This was a somewhat unusually "flat" run; the average standard deviation is more like 2.7.

So, a lot of the movement in the Gallup and Rasmussen lines can be attributed to noise, but a lot can't be. Also, there is a considerable (but moderate) correlation between the Gallup and Rasmussen time series, about 0.47. This campaign does have dynamics, but (as far as these poll series are concerned) they are fairly subtle.

____________________

lsmakc:

anyway you slice it, seems like a flat line here.
pretty tremendous and fascinating in and of itself and speaks to the conceptual tug of war in motion.

having made calls myself, i have found a vast difference in response based on phrasing prompts, tonal prompts ie how you read the question and striking up a conversation to get to the real meat. if you have a sampling size of less than a thousand it would be interesting to sample the three different approaches, in groups of 600 each then compare!

anyway - i'd like to see a poll of the re-vote question. i think you might discover embedded in this answer some interesting trends particularly as the issue heats up.

____________________

Chris G:

Right, so any kind of rolling average can give you spurious trend lines that span multiple days, it's just the nature of filtering a time series in that way. But I the 0.47 correlation suggests something weak relative to MOE only.

so here's my question: just what *is* the real, average daily fluctation in Obama-Clinton among states that haven't voted yet? 1.5%? 0.5%? the answer to that question could be very useful in assessing, e.g., Clinton's chances of catching up in the popular vote. the lower the daily fluctuation, the less moveable it suggests voters have been, the harder it is for Clinton to catch up. but the difference b/w 1.5% and 0.5% is actually pretty big in light of that question, even though they're both small relative to MOE.

a related question is how long do those real trends tend to actually last? do the lengths of those real trends tend to match news cycles? do candidate's have good and bad weeks when it comes to actual change in vote preferences? or do changes accumulate over that timescale by chance?

perhaps (cross)covariances would be more informative than correlations (as well as data from more pollsters)

____________________

Ciccina:

Very, very interesting. And even though I find the technical part baffling, kudos to Mark Lindeman for his success with the simulation ("Polimatic" is a fine name for an alter ego, btw). But shouldn't you be grading papers or something? ;-)

Per Kaus - the Bradley effect thing is so tedious. If I were to accept Kaus' premise - that support for Obama is overstated - I could come up with a half-dozen "explanations" based on pernicious socio-cultural forces. Why does it always have to Bradley?

I guess its become part of the narrative.

Sigh.

____________________

Mark Blumenthal:

For the record: The pseudonym "Polimatic" was entirely my doing, though I'll grant that the similarity to the name of a certain company was not entirely coincidental.

Another topic: Am I the only one experiencing problems with the preview function (text disappearing from the comments box)?

____________________

lsmakc:

4 sev days.

____________________

illinoisindie:

Blumenthal, super fine analysis, I actually read and followed the whole thing, bottom line democratic party is temporarily split down the middle. The highs and lows oscillate around the same individual points for both candidates. I think pollimatic should go public... let me know the PPS (LOL) (Now back to my P-P plots)

____________________

Patrick:

This is very interesting, but again, this year is like no other year, so the national polls don't mean as much. Never in recent history have 2 candidates running for a party nomination had such polarized demographic support. In a new poll, 37% of Clinton supporters say they will not vote for Obama. And 26% of Obama supporters say they will not vote for Clinton. Even if half of those people relent and vote for the other candidate, that is still enough to tip the entire election. This is why Clinton has as good as said that she knows she has to invite Obama to be her running mate if she gets the nomination. This is the only way she or Obama can even come close to "unifying" the party and appeal to the other candidate's supporters. The Obama camp doesn't seem to be thinking that way since everyone in DC has been saying that the only way Bill Richardson would endorse Obama is if he was promised the VP slot. (The fact that the candidates don't like each other is, as it always is in these races, irrelevant). As we all know, Obama has the overwhelming support of African Americans, youth voters, and young urban professionals. Clinton has the overwhelming support of older voters, blue collar Dems, Hispanics, Asian Americans, and in some states, women and white voters. Richardson can help Obama w/ Hispanics, esp. Mexican-Americans, but probably not that significantly %wise (and having 2 mixed race candidates on a ticket could turn off some working class Americans). Obama has won more states, but they are largely "Red" states and smaller states, and often he has won because of Republican and Independent voters (esp in caucuses where the vote percentages and delegate distributions are skewed due to the small % turnout and skewed demographic participation). Clinton has won fewer states, but they are largely "Blue" states and larger states (with more electoral votes) and she has won more votes - up to 1 million more - from actual Democrats. Both have won some of the key "swing" states, mostly in close contests. She "won" FL and MI, but not by the DNC rules, so those delegates still need to be figured out. Not seating the delegates from 2 very big important swing states would be a death sentance for the Democratic Party. Because the presidential race is a "winner take all" state by state electoral race, the state by state polls are much more imporant than the national polls this year, especially in the big "Blue" and "Swing" states, and most especially FL, OH, and PA. These 3 states have decided the last several presidential elections and there is no reason to think they won't again. Some polls show that Obama could "swing" a few states such as CO from "red" last time to "blue" this time, and some show that a few 2004 "blue" states such as OR could possibly go "red" if Clinton is the nominee, but nowhere near enough to make up for losing the electoral votes of PA, OH, and FL combined. The reality is that the vast majority of states are either "Red" or "Blue" and the vast majority of African Americans in the US live in safely "Red" states (MS, SC, NC, AL, TN, etc). Current polls show Clinton running much better against McCain (winning or tying rather than losing significantly) in OH and FL. In PA, they both run about even w/ McCain in current polls (but she runs a little better in most). Remember, John Kerry won PA, but lost OH and FL. If he had won OH, he would have defeated Bush. OH is one of Clintons's strongest states. Clinton also runs much better than Obama in MA, NJ, and a few other key Dem states, but Obama would probably still win those. So as usual, OH and FL will very likely be the key to a Democrat winning the White House. And according to current polls, of the 2, only Clinton can win at least one of them. It's still early and this could change, of course, but the superdelegates have to look at the polls in these key states when they decide who to tilt the nomination to, presumably in June. This is more important than pledged delegates or national polls.

____________________

Jon:

Excellent analysis! Thanks for writing things up so clearly. Looking at the three different results really helps give an intuitive feel for how much of the variation is random.

As somebody who's done a lot of detailed analysis in the corporate environment, it's appalling to me how little most of the "experts" understand or discuss issues like this. It is possible to get a huge amount of information from results that aren't statistically significant (for example, a few years ago we were able to accurately predict the security benefits of a set of changes to Windows from an analysis based on less than 30 data points) but it needs to be done with deep respect for and understanding of the data.

One of the things to be aware of is the possibility of shared distortions in the sampling. The results are equally consistent with a situation where (say) Obama's up 49-44 and both models and the simulation missample by underestimating the likely increased registration and turn out among younger voters -- or vice versa, of course. One of the things I'd be very interested in is how Gallup, Rasmussen, etc., have recalibrated their models during the course of the primary season.

jon

____________________

Mark Lindeman:

Ciccina, thanks for your concern -- the papers arrive tomorrow. ;) (It didn't take too long to put the sim together: it isn't very complicated. Get Excel to generate a bunch of random responses, then count 'em, then roll 'em.)

Chris G: At the risk of obviousness, it isn't just the filtering; it's also the possibility that several days will trend in the same direction by sheer dumb luck. --Your questions are interesting ones, but in the end one big event could render the whole enterprise moot, or so it seems to me. Still, some fun could be had in those directions.

____________________

great stuff. have you compiled for the states won by obama and clinton, including mi and fl for hillary the difference in electoral vote tallies?

factor in for a win in pa, kentucky, wva, oregon for hillary and playing conservatively... a win for obama in indiana and nc.

dont know about guam and pr.

dying to know.

____________________

JS:

You are adding the standard error (SE) for each mean (percentage), and coming up (at 95% confidence interval(CI)) that is twice the calculated CI (or MoE) of plus/minus 3%.

However, if you look at this as a difference of means test (same sample design), the SE is roughly the same as it is for each mean estimator (you lose one degree of freedom).

Therefore, if the test (Ho) is X (vote for BO) - X (vote for HRC) = 0, you have about the same SE as you would for testing a X for BO and HRC separately. They are not additive.

Therefore, looking at the difference between BO and HRC's estimated primary vote support should have about the same CI as that for each independently.

Since that CI is 3%, I conclude that in fact several survey results between BO and HRC are not explainable by random effects.

What am I missing.

Thanks.

JS

____________________

Why don't they just plot with error bars? It seems that would clear up the confusion and not be beyond the understanding of the layman.

____________________

Mark Lindeman:

JS: In the Gallup from 2/8 on, the correlation between Obama and Clinton estimates is -0.79. I don't think it will do to construe these as independent!

That isn't to say that all change in the Gallup and Rasmussen series is due to random sampling error. But all change in the Polimatic series is -- and you can see that the variability in margin is considerably larger than +/- 3.

____________________

Chris G:

Mark L- My main point is I think you guys are downplaying campaign dynamics too much, they can't be dismissed just because they can't be picked up in a couple of noisy time series. I agree that many pundits, amateur and expert, tend to read tea leaves, but that's one of the great things about Dr. Franklin's local regressions. they're not perfect but I think pooling data across pollsters at least gives us a clearer picture of what's going on numerically.

So if you look at the local regression plot over the past 6 weeks or so, it looks like Clinton's gained about 2 points while Obama's stayed flat. if that's an accurate reflection, then it means roughly an average of 1/3% increase in Clintons's support per week. of course you won't pick that up in one or two time series with such enormous error relative to those changes, but the changes are still significant to electoral outcomes. 2 points is huge in a tight race

it's Mark B's 1st conclusion that i think is way off:

"First, there has been far more stability than change in the national Obama-Clinton vote preference since Super Tuesday, and that includes the period of last ten days. To the extent that we have seen real changes, they are barely bigger than what we might expect by chance alone."

that simply does not follow from the simulations. the only thing that can be inferred is that if we're looking at these 2 time series alone, any meaningful changes in support are swamped by the noise. that's all we can conclude

____________________

mattn:

Just curious about something: how much does random change of the sample population itself affect all this? That is, pollimatic assumes that the only day to day change is random sampling variations of a stable underlying population. But that assumes that they are all equally accessible. Maybe if response rates were, say, 80% instead of 50% (or less?) I could believe that, but obviously that's not the case. There's a fair amount of re-weighting that occurs after the raw data comes in, and pollimatic doesn't account for that, does it?


Also, would variations in the sample population be independent? It wouldn't take much serial autocorrelation to generate some of these supposed outliers.

____________________

ALL:

Created this account just to say thank you, Mark. It's good to see election polls receive something like the level of analysis and explication that's regularly devoted to, say, baseball.

In re: Patrick's comment on the states, and on weak support amongst both candidates' supporters for the other candidate: has anyone seen a study of the correlation between preferences during the primaries and general election voting? My guess would be that there's a sudden jump in how predictive polling is once the nominees have been decided (i.e. in excess of the effect of the increasing proximity of the election itself), but I haven't seen any data.

____________________

Mark Lindeman:

Chris: Yeah, I don't think you and I are far apart. I'm not sure to what extent Mark B. is saying what he thinks about the unobserved "true dynamics," and to what extent he is characterizing the numbers we can actually see. Regardless, there are dynamics, even if two time series don't give us tons of insight into what they are. He may give those short shrift.

On the other hand, I think "more stability than change" is at least defensible (subjective, of course). The see-saw doesn't seem to have tilted very far in either direction. Two points is huge in a tight race if it translates into two percent of votes, but in this case I'm not sure what it means. It's fun to think about, at least.

mattn: "Polimatic" is indeed very simple. There's no attempt to mimic what Gallup and Rasmussen may or may not do with weights, or how day-of-week effects might influence results, or any such things. It just generates lots of random "respondents" and averages across them. It's useful as a baseline, but there's nothing realistic about it.

____________________

RS:

Great work, Mark and Mark!

I am sure if you ran enough runs with Polimatic, you could come up with a shape similar to the Daily Gallup or Daily Rasmussen... Didn't one of the polls that came out last week suggest that voters affected either way by l'affaire Wright were a wash?

As for Chris G's comments on campaign dynamics - they definitely do play a part, but it largely seems to make only minor differences either way, and largely lost in the noise. What would be important is how voters in the upcoming states feel, and again there it largely appears to be a wash: PA/WV/KY/PR for Senator Clinton, and NC/OR/SD/maybe IN for Senator Obama.

And as far as delegates go - after all, as Mark Penn reminded us as late as Feb 13, this is a delegate race - that can't be good for Senator Clinton.

____________________

JS:

Mark,


Thanks for you response to my post yesterday.

I am not sure I was clear in making my point, which was a technical statistical but important one.

You original post said that the MoE applies to the different percentages, not to the difference between them.

I debate that. The MoE of the difference in percentages is about the same as each individual percentage, even in the same sample. (The SE, on which the MoE is based, is slightly different. The the sample SD which is the estimator, is slightly different since the percentages are different, but only slightly; and you lose one degree of freedom.).

Look at this example. If you test Ho: X (Obama ) = 0 the SE of the sampling distribution would be = SE (X (Obama)). If you test Ho: X (Clinton ) = 0 the SE of the sampling distribution would be SE (X (Clinton). And, SE (X (Obama)) = SE (X (Clinton)), or more precisely, is mathematically close. (See above.)

Further, and this is my point: If you test Ho: X(Obama) - X(Clinton) = 0, the SE is about the same as either of the two individual tests.

Therefore, the MoE for comparing Clinton and Obama's percentages is not double the MoE for each individually. It is a separate calculation, which yields, in this case, an almost identical MoE.

As to your point that it cannot be argued that these are separate samples, I agree. They are the sample sample. Nevertheless, a t-test for a difference of means in the same sample (as opposed to a paired difference of means t-test), is about identical to the test for each mean individually.

I would be happy to send the equations, but it is hard to do in text.

This seems to be a common error in evaluating the difference in two candidates poll results. Typically, as I believe you did, the argument is that each percentages' outer-bound MoE must lie outside the outer-bound of the other's MoE. In effect doubling the MoE to get a statistically significant result. What should be done is to calculate the MoE of the difference in percentages, which is base on a SE distribution which will be almost identical to each SE individually.

Again, the practical implication is that there are more statistically significant differences than you are reporting by your method.

This is very dry statistics, but the resulting point is important.

Perhaps I am missing something in your argument. Let me know. I am interested in your evaluation of the above.

And thanks for your work.

JS

____________________

Parviziyi:

Mark Lindeman was right when he said "a lot of the movement in the Gallup and Rasmussen lines can be attributed to noise, but a lot can't be".

Both Gallup and Rasmussen show that Obama started going down (and Clinton up) at the outbreak of the NAFTA-gate scandal on Feb 29, and continued going down for a week. That story went away then and Obama creeped up again for a week, until the outbreak of the Jeremiah Wright scandal on March 15, causing his numbers to go down again for a week.

Obama's support is more adversely affected by bad news than Clinton's because he's relatively unknown or 'unvetted', as Clinton calls it, and because his image has had (still has) a sort of halo of virtue around it. Anything that takes some of the sheen off that halo will drive down his numbers. Clinton on the other hand has the image of an old dog politician extensively vetted but known to have unscrupulously manipulative tendencies, and she is relatively unaffected by bad news stories such as her "Bosnian sniper fire" scandal.
***********************
Separately, JS is mistaken because of the large, negative correlation (-0.79) between the random noise in Clinton's series and the random noise in Obama's series. Negative correlation means the subtraction (Clinton - Obama) is additive, not cancellative and not effectless, for the noise.

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR