Pollster.com

Articles and Analysis

 

Lunchtime Status Update for 9-19

Topics: 2008 , Charts , Map , Status Update , Trend lines

Another day, another 37 mew statewide polls (as of this writing) logged into the Pollster.com database. Today's batch managed to change classifications in several states into the toss-up column. Specifically, the new Big Ten polls in Minnesota, Wisconsin, and Pennsylvania helped tip the balance to move those states from lean Obama to toss-up. Two new polls in New Jersey helped move that state from strong to lean Obama.

When we returned from the Republican convention, our map classifications showed Obama with 260 electoral votes, McCain with 179 and 99 in the toss-up category. Since then, Obama's total dropped to 202, McCain's grew to 208 and the electoral vote total of those states currently rated as toss-up states has swelled to 128.

As of today the trendline for the latest national surveys shows a modest rebound for Obama over the last week. Our current national estimate shows Obama leading by just over two percentage points (47.2% to 44.9%). About a week ago, McCain had moved slightly ahead.

Any such trend is not obvious in the state trends, which look like as much of a dead heat as they have since we started running them this summer. But keep in mind that our state level classifications are based on state-level polling only, and the trend lines in our charts and the estimates they produce are inherently conservative, in that they require more than one new poll before the trendline moves significantly. As such, our current delegate count probably reflects where the national trends were about a week ago.

The way we classify states probably deserves some explanation, since our traffic has grown considerably and, if the questions in my inbox and any guide, many of you assume that our classifications are subjective (as they are on many other sites). To be clear: Our process is entirely empirical and automated. We input new polls and the system draws loess regression trendlines for each state. The end-point of the trendline serves as our estimate (analogous to the "averages" you see on other poll aggregating web sites). We then calculate confidence intervals (margin of error) around each estimate based on the average sample size for the polls in each state (more details in our FAQ).

Thus, the classification is automated and depends entirely on the size of the margin separating the candidates. The classifications will sometimes be slightly inconsistent from state to state, because the polls in some states (such as Pennsylvania and Ohio) use bigger sample sizes on average than others (such as West Virginia and Vermont ).

Also, as discussed earlier in the week, the trend estimates in smaller states with fewer available polls tend to be more sensitive to the latest new polls. As such, changes in classification in states like West Virginia and Montana (where polls are rare) may occur on the basis of fewer new polls than in states like Ohio or Pennsylvania (where they are far more frequent).

 

Comments
adam:

Why were the percentage of African-American respondents so low for the Big Ten polls yesterday?

That obviously skewed the results towards McCain. You had like 50% under-representation in PA. Unless Dr. Franklin weighted for demographics (which I found no evidence that he did), then you will severely underestimate Obama's support.

____________________

joelspolls:

Mark,

Did you notice that if you look at who is ahead and behind in the toss-ups and add them into the electoral vote count, it is an absolute tie: 269 to 269.

I also want to note that the Big 10 polls make the very unfortunate mistake of placing the vote preference question pretty far into their survey, after banks of questions on a number of topics that might influence the vote choice. This is, in my opinion, the biggest mistake an election poll can make and I am especially shocked that they did it this way given that Charles was involved. Can you comment on that, as well as on which other firms are doing it that way? Thanks!

-- Joel

____________________

jme:

Will Mark or Charles please, PLEASE actually specify the loess model they are using?

span = ?
degree = ?

Please! Fellow R geeks want to know! (He says, presuming by the look of Charles' graphics that they must be coming from R)

____________________

Charles and Mark,

Can you elaborate on how you construct your 95% confidence intervals? You state that you are averaging sample sizes, so if 10 polls come out with a 4% point lead, all with a MoE of +/-4%, you would call that a tossup state with a 4 point lead and a MoE of +/-4%, despite that there is added strength in that the ten polls all found the same margin. Is this correct? I thought you were doing something more sophisticated.

And let me chime in with jme, an R module and an on-line dataset would be nice to release for transparency and so that you could leverage the R community to improve/tweak the algorithm. I'm sure you have plenty of time available before the election :-)

____________________

Alan Abramowitz:

The samples for the Big 10 polls in PA and MI and, to a lesser extent, OH seriously under-represent African American voters. A 5% AA sample for PA is way, way off the mark. Given the extremely strong support for Obama among AA voters, this is going to produce a misleading result. The PA poll also appears to be too Republican by far, showing an almost even split in party id when party registration in PA now favors Dems by 13 points. Party id and registration are not the same thing, of course, but such a large disparity between them is highly unlikely. In my opinion, these polls should not even be included in the averages.

____________________

Wow, it looks like Fox news has been at it again with PhotoShop and the scary Obama photo

____________________

jme:

Yes yes, what Michael McDonald said: in principle doing this would be super simple, just post links to the .csv or whatever holding the polling data and another link to the text file with the relevant R code Charles is using.

Of course, that probably wrecks pollster.com as a _business_ venture (which I assume it is at some level). All I really want is a complete statistical specification of the loess curve and the CI calculation; I can do the R work myself.

____________________

RS:

Here's Mark B.'s response to Alan Abramowitz:
"Individual polls do not influence the estimates too much, unless there are too few polls."
[Just to be clear, I imagined that, and didn't hack Abramowitz's e-mail account ;-) ]

But seriously, I think we deserve a clarification from Professor Franklin on the skewed demographics of the Big Ten polls. I mean, it's fine to say that the Big Ten states will decide the election (though I'd disagree and point to VA/CO), but a founder of Pollster.com needs to be better at polling... No?
I don't see a post-9/15 post on PoliticalArithmetik, either...

____________________

RS:

Re my post @2:53 PM:
I should clarify - a founder of Pollster.com needs to be more transparent when polling. I usually find Professor Franklin's posts insightful.

____________________

My fellow Americans:

I am fallaciously inebriated by my latest approval polls, which show me above 30%!

Heckuva job, me!

Turd Blossom was right when he calculatized that by not any having press conferences, public appearances, and otherwise not showing my face outside West Wing family quarters, it would popularize me.

I look forward to not seeing anyone until January 20, 2009, and I'm sure that my countrymen feel the same way.

____________________

The more I think about this, the more I am troubled by it. Mark, are you really averaging sample sizes to construct MoE? This would seem to give more weight to a small-sample "outlier" poll that would then bring a state into the toss-up category, like say, the Big 10 PA poll. Ideally, we would want to discount smaller sample-sized polls, not give them equal weight.

Btw, I am not entirely suggesting that the Big 10 PA is completely off-base, though Alan raises some serious issues. McCain has spent $1.6 million in the past week to Obama's $0.9 million, which should move the needle in normal situations, except that we're seeing a large swing towards Obama in the national poll numbers. Indeed, it seems a little odd that the Obama-leaning battleground states would be tightening when the national numbers are moving in his direction. This tightening seems more artifical than real based on a few small-sample outlier polls.

____________________

So I know that I may be having this conversation with myself, but I am hoping that a few of the other academics who frequent here will chime.

I'm concerned that now that we're going to start getting a spate of small-sample state polls that more states are going to move into a toss-up status simply because their MoEs are larger.

Mark says that they average MoEs. I actually think that it is more complicated than that. I imagine Mark and Charles view each poll as an independent normal distribution and are applying standard formulas that permit one to add the variances of independent normal distributions. Thus, I suspect they divide each MoE by 1.96, square it, add them together, take the square root of the sum, and multiply the result by 1.96 to get the overall MoE for a series of polls (there must be some cutoff date to include polls into this calculation).

If this is the case, then a poll with a smaller sample, and higher MoE, even if it exactly confirms the trend line, can move a state into a tossup status simply by the virtue of its larger MoE which will "average" into the composite MoE.

This seems counter-intuitively wrong, that if a series of ten polls all show a 4 point lead with a MoE of +/-4 that a state will be considered a tossup because the average MoE using the standard formula will still be +/-4, depsite the fact that not a single one of the ten polls shows a lead in the other direction. There has got to be a better way.

Yet, this is exactly what is happening at Pollster.com as states are suddenly swinging into the tossup category. We're getting new polls with sample sizes of 500 and less from academic survey units and survey houses with "battleground polls" that cover many states, but do so with small state-level sample sizes.

My point is that some these states are not indeed getting closer, the polling is getting crummier (in the sense that we're getting lots of small sample-sized polls).

____________________

jme:

I suppose I qualify as an academic (in stats, not poly sci) and no, you are not having this conversation with yourself, Michael.

Although all I have to contribute at the moment, off the top of my head is that I completely agree with you.

It seems nonsensical to say that if we have 10 polls that show a candidate with a +2 lead (w/in a +/- 4MoE) that we are _truly_ uncertain about who is currently in the lead (just pulling that hypothetical out of my head).

Quite frankly, if all we're talking about is _coloring_ the stupid states on the map, then I might just ditch CI's from the loess fit altogether. Instead why not do some sort of binomial CI on the proportion of recent polls with a particular candidate in the lead (by any margin).

The idea being that if the state is a toss-up, we'd expect roughly half the polls to show each candidate with a lead. So we'd color the states based on how confident we are that this proportion is different that 0.5?

Maybe include the sample sizes as a weighting factor in the process?

____________________

RS:

Pollster.com FAQ says: "we calculate a "confidence intervals" around the trend estimate based on the average sample size for the available polls in each state" (emphasis mine)

That tells me that if the polls have sample sizes of 500, 800 & 1100, the MOE is based on an average sample size of 800 - not an average of MOEs. Not sure what Michael McDonald or jme are talking about, but then I am not a stats PhD!

If the polls were just a simple average a la RCP, a CI or MOE can probably be calculated by pooling the samples together, to get a larger sample size and thus reducing the MOE. But presumably that's harder with the Loess trendlines Pollster.com uses?

____________________

jme:

Our point (I think), RS, is that one's common sense is that when we aggregate the results of multiple polls, we should be _more_ confident in our estimates (i.e. the CI's should be narrower than for any single poll).

But if pollster uses the average sample size to construct it's CI's, as we accumulate lots of state polls, which may tend to have fairly small sample sizes, not only will the CI's not get narrower, they might actually get _wider_ if the new polls _lower_ the average sample size for polls in that state.

I have to agree with Michael that the method described on the pollster FAQ seems quite wrong to me.

Pooling the estimates is possible, but raises its own set of problems.

____________________

RS:

@jme:
That's true.
But I think Michael McDonald is practically viewing the cumulative MOE to be propagated as sum-of-squares; effectively, his procedure comes down to:
0.98*sqrt(1/n1 + 1/n2) (using the 0.98/sqrt(n) formulation)
So for sample sizes of 300 and 500, this gives an MOE of 7.1% - obviously bigger than either sample alone.

But the Pollster.com method says:
cum. MOE = 0.98/sqrt(average(n1,n2)) = 4.9% for 300 & 500 sample sizes.

The latter obviously gives a tighter MOE; but it definitely does not account for multiple polls. Maybe we need a simple t-test - though that would not account for sample sizes. Maybe you have an idea for weighting the polls by the sample sizes?

____________________

Mark Blumenthal:

All: Apologies for just now seeing this thread. It's been one of those days...one of those weeks, really.

Short answer on the confidence intervals is that RS is reading it right: We take the average n-size across all surveys for a given state and calculate confidence intervals based on that average n. As such, it is unlikely that a few recent polls with smaller sample sizes is making much difference, although later tonight (if I can stay awake), I'll take a look at how those statistics have changed over the last week or two and post something here.

Having said that, Mike is right that we are probably understating the statistical confidence we could have given the number of interviews driving the end-point estimate, although our intention was to err on the side of a conservative (small-c) estimate of leaders.

I'm going to have to save the longer answer for later, but I'd like to keep this thread going, as this discussion -- and your collective expertise -- is very helpful and much appreciated. We struggled with this issue mightily back in June, and are not opposed to tinkering with it now (development resources allowing).

Right now, I'm way overdue for dinner... will add more thoughts later. And yes, it's R that runs the graphics, although Franklin is really the one to speak to span, degree, etc.

As for the Big Ten surveys, I'm hoping to write something about them, but I can't speak at all for Franklin's design or weighting as I've had no involvement in conducting the surveys.

____________________

RS:

MarkB:
Thanks for clarifying the cum. MOE.

"our intention was to err on the side of a conservative (small-c) estimate of leaders" - Nice save :-)

As for the Big Ten polls - yes, I am really hoping Franklin writes a post, as he's most familiar with the surveys. Of course, it'd be good to get your insight as well!

____________________

Mark Lindeman:

@Mike: Sorry to miss the excitement -- I was off staring at paint that is supposed to go on our front door. w00t.

I remember Mark B. struggling with this issue back in late July. He was doing something subtler then, but commenters were getting confused, so he reasoned that this approach would be easier to explain. Now that polls are starting to rush in, I think the time is ripe for a subtler approach that gives recent polls more weight -- so that four small but very fresh polls are better, not worse, than four large but stale polls. It should be somewhat conservative.

____________________

jme:

Ok, after pondering this for a grand total of like 20 minutes, this is my two cents:

Option 1: Pool the sample sizes. In other words, the final loess estimate is a weighted average (based on time, and presumably sample size as well), so the final estimates should be considered as a single poll with sample size equal to the sum of the sample sizes of the polls that constituted the weighted average.

Option 2: Bear with me here...construct 1000 (say) bootstrapped loess curves (resampling residuals of the original loess fit, but localized, so that the 1000 bs replicates for observation i resample only the residuals in the relevant loess window. Maybe this, or something like it, is standard practice in bs-ing nonparametric regression?). Then just look at the proportion of times one candidate is ahead in the final loess estimate, use binomial CI's to assess the lean of the state.

There are obvious problems with both (I think), but that's the best I could come up with quickly.

Now that I've put myself out there, maybe I'll actually go back and reread the discussion on this topic from back in July. ;)

____________________

jme:

By the way, for those curious, I think the relevant post by Mark from July is:

/blogs/housekeeping_our_classificatio.html

The sort of "squirrelly" results he noticed then are happening again. See, for example, Montana, West Virginia and Indiana being colored yellow.

The issue isn't so much the loess estimate, but the coarseness of the coloring scheme, in my opinion. I agree that the loess lines give the best picture; the problem is that the red/blue/yellow coloring somewhat undercuts/contradicts the loess picture.

Anyway...back to dinner...

____________________

I'm back from dinner myself. I'm glad for a little clarification from Mark and the thoughts from the other posters (Hurrah! We've found a thread to talk about estimation issues!). There are actually two sources of variability to consider, both the variability from the MoE within a poll and the observed variability across the polls themselves. I'd like very much to see exactly what formula Mark and Charles are employing, which is another good reason to get the R code, or whatever.

This sudden emergence of many tossup states may be caused by small-n outliers messing with the trend estimate and MoE. My intuition is telling me that this may be further compounded by the whip-sawing of the two convention bounces and now the economic meltdown, making the loess regression trendline to be particularly unstable at the moment.

Mark L.: my complaint is not necessarily about new vs. old polls, rather that the small-sample polls are starting to dilute the information contained in the concurrent large-sample polls.

____________________

Alan Abramowitz writes above:

"The samples for the Big 10 polls in PA and MI and, to a lesser extent, OH seriously under-represent African American voters. A 5% AA sample for PA is way, way off the mark. Given the extremely strong support for Obama among AA voters, this is going to produce a misleading result. The PA poll also appears to be too Republican by far, showing an almost even split in party id when party registration in PA now favors Dems by 13 points. Party id and registration are not the same thing, of course, but such a large disparity between them is highly unlikely. In my opinion, these polls should not even be included in the averages."

Perhaps a moment to look at the actual results of the polls wold be helpful. For the states Alan singles out:

PA: Trend estimate is +3.2 for Obama. Big Ten estimate is even. Within MOE.
MI: Trend estimate is +3.8. Big Ten estimate is +4.0. Within MOE.
OH: Trend estimate is -2.1. Big Ten estimate is +1. Within MOE.

The other states are

IL: Trend estimate is +11.9. Big Ten estimate is 16. Within MOE.
IN: Trend estimate is -3.0. Big Ten estimate is -4. Within MOE.
IA: Trend estimate is +8.8. Big Ten estimate is even. OUTSIDE MOE, an Outlier.
MN: Trend estimate is +3.5. Big Ten estimate is +2. Within MOE.
WI: Trend estimate is +3.1. Big Ten estimate is +1. Within MOE.

US: Trend estimate is +2.0. Big Ten National estimate is +1. Within MOE.

So what's the issue here? Subgroups have large random variation. A group that is, say, 15% of a 600 person sample, has an expected size of 90, or a MOE for that group of 10.5%. Or anything within 4.5% and 25.5% would not be implausible. That is Sampling 101. Given that, it isn't surprising to miss some demographics by quite a lot even in a "perfect" random sample. "Random sampling" also means "random variation", NOT a perfect match to every demographic group in a population.

I'm not happy about the IA outlier. But I'm confident our methods were appropriate and followed industry standard practices. If we drew an outlier so be it. Outliers happen. That's been a theme here at Pollster and at Political Arithmetik for several years. I'm no more immune to outliers than anyone else, and when we produce an outlier I expect to take my lumps for that just as I give the lumps to other outliers. But Alan's claim that "these polls should not even be included in the averages" is wildly out of line with the actual results of the 9 polls we conducted.

Charles

____________________

krog:

Some weighting, proportional to sample size, would give less importance to small sample outliers.

Isn't that the gist of the issue?

But Franklin can't be faulted for using customary post hoc analytic tools.

____________________

Wow-- lot's of geeks reading today! Thanks!

Choice of smoothing span is more art than I wish it were. I've used "inter-ocular" tests and I've used Generalized Cross Validation and AIC and BIC. None seem to produce satisfying results in every case, especially with a dynamic process such as polling which are both dynamic over time and where the number of cases is constantly moving.

But since you asked: (The preview isn't showing this right but I hope it will post correctly. If not, see subsequent comment where I'll try again.)

## POLLSTER Smoothing function
# Return smoothing parameter given a number of polls
gsmoothfrac smoothfrac smoothfrac.7,.7,smoothfrac)
return(smoothfrac)
}


Recently I've been experimenting with setting a floor on the degree of smoothing as well:

# Default smoothing--MINE, used for small N but not identical to pollster
gsmoothfrac2 smoothfrac smoothfrac.85,.85,smoothfrac)
smoothfrac smoothfrac
}


The main issue is choice of npolls and whether to set a top or bottom for smoothing. With few polls, I've found the .85 ceiling helps avoid problems when only a handful of polls are available (really we should wait for more data, but people want to see the trends ...). Likewise I've been concerned that as the number of polls gets really large the npolls=15 is too small and produces too much noise, so I've set a floor to the smoothing.

See Loader's book on smoothing for more discussion of criteria for setting the span, but the bottom line is that a wide range of spans give "acceptable" results, and that numerical evaluation is often quite flat across the range to be optimized, so it is hard to find a numerical optimum.

Charles

____________________

joelspolls:

Charles,

Thanks for weighing in, but it doesn't look like you saw my question about the Big 10 polls. You guys ask the vote question after a bunch of other stuff, which inevitably biases the vote question by activating numerous considerations among the respondents that otherwise may not have been activated. This is survey design 101 stuff -- if you have a key question that you want to get a clean read on, you have to ask it first. Could you please comment on that? Thanks!

-- Joel

____________________

The smoothing code I tried to post isn't working for reasons beyond my poor understanding of html. Sorry about that. I'll consult a guru and see if I can get it to go up.

Charles

____________________

slinky:

I just want to add my voice to the growing sentiment that the R code be released with span and degree constants. I also think that hackers here could improve the regression by weighting, bootstrapping, jackknife, and other simulation methods. In addition, I'm convinced that pollster could play an important role in reassessing the future of the electoral college; which most of my professional friends think is a bad 18th century joke, as compared with what an excellent electoral system could be.

____________________

Joel--

Sorry-- I'm still catching up but since you just posted you get to jump ahead in the queue.

It may be Survey Design 101 stuff, but it is not without differing opinions.

We open with registration, likelihood of voting, interest and trust. Then we do direction of country and state, economy and then fav/unfav about the four Pres/VP candidates plus President Bush and Sen. Clinton. We then do vote.

Some pollsters go directly to vote, then ask everything else, taking Joel's advice. That theory is you want what people say they will do before they think at all about politics.

Other pollsters, and us, ask a few questions before the vote. The view here is that WHEN people actually vote, presumably they ARE thinking about politics. Does anyone believe that people will go to the polls without some notion of the direction of the country, the economy, and the candidates? To ask the vote question after that is to put the voter in the frame of mind they are apt to have at the polling place.

A few pollsters even put the vote question very late in the survey, so ALL the political considerations are activated first. That's an extreme case of getting the respondent to think about issues facing the country before saying how they will vote. Perhaps it produces a more considered opinion.

I don't think there is a clear empirical answer to which of these is best. Our approach is very common in surveys, so I don't think we are in any way unusual in this. But I think you could legitimately wonder if we should have asked about something other than direction and the economy, both of which are currently bad news of President Bush, and perhaps for Sen. McCain. Fair point, but again we aren't unusual in following this order.

My view is that getting respondents to think about the candidates first, with the fav/unfav items (which appear in random order, by the way) is a good thing because it asks for some brief consideration that is balanced by party and by office (plus Bush and Clinton, also balanced by party.)

Order effects are interesting, and given enough resources and respondents it would be fun to play with alternative placement of items to see what effects they produce. But common practice is to put vote early but not first, as we do.

Charles

____________________

Well, Michael may not be talking to himself, but it looks like about five of us are really not having much of a social life.

The MOE issue boils down to a question of whether you are measuring the CI of your ESTIMATOR or the variability to be expected of polls.

Initially, I calculated the CI around the loess estimator. That is, as several have noted, a good deal tighter than the distribution of polls (duh!) and made me initially very happy.

But then the polls were obviously much more variable than that estimate. Duh again.

So I thought about this for a while in light of our purpose here. If we were in the business of election forecasting, I would stick to the CI around the estimator. But the business we are actually in is explaining polls and how they vary. So it seemed to me on reflection that we should make the best estimate we could of where the race "really" was, as the trend estimator. But that we should calculate a MOE around that line based on the distribution of polls we would expect given an average sample size. Hence the CI we calculate and use.

I know that this means our estimates of who is ahead in each state have too wide an interval. That means we could claim more certainty for our map classification than we actually do. And I know that will annoy some of you (all of you?)

But I also think about the fallibility of models and future campaign dynamics and the link between polls and outcomes. Taking those into account, I decided we should take the wider interval and be conservative (small-c!).

Do consider the point we are trying to provide a fair view of polling. If we used the CI for the estimator, a LARGE number of polls would fall outside that CI. How would you explain that 60% of polls were falling outside the 95% confidence interval? It was this that Mark alludes to above-- we puzzled far more readers than we enlightened. And what they were puzzled about was legitimate-- what variability in polls should we expect to see around what is the best estimate of the trend.

I feel good about the choice we made. I think it is fair to pollsters, and fair to readers to see the best trend estimate we can do. I also think to claim the CI around the estimate is a good prediction of future election outcomes is a considerable error (not that I see anyone above saying exactly that.) We don't HAVE a forecasting model here-- we have a model that characterizes poll movements, estimates the trend, shows the variability, and gives a range of reasonable variation around those trends for the observed polls.

The map is the problematic part of this. We try to be clear that we are characterizing the current state of polling, not the election prediction. But you can make a good argument that the coloring is too driven by the polling variability and not our best estimate of the CI around the trend. I get that, but think consistency across charts and polls, and the uncertainty about what happens between now and November is justified.

I look forward to seeing what you folks think about this tradeoff.

Thanks for the conversation.

Charles

____________________

Mark tells me that the comments section chokes on less than signs and (apparently) curly brackets. So that really messes up R code!

Try this link to the code:

https://mywebspace.wisc.edu/chfrankl/web/GsmoothFrac.R?uniq=-gon8tw

Charles

____________________

RS:

Charles:
Thanks for the explanations. Back to the Big Ten polls, you ask: "So what's the issue here?"

I just realized this cuts across both your roles: as a pollster and on Pollster :-)

The issue here is that - based apparently on just the Big Ten polls - MN, WI and PA have all gone yellow. As Alan Abramowitz and others have pointed out, the demographics are skewed - apparently mostly against Obama (cue Obamabot refrains). Granted, as you say, these variations match the 2004 exit polls to within the MOE for the subgroup.

But the potential variation in the demographic composition is not reflected in the final MOE - otherwise the overall MOE would be very large (propagating the error on subgroups through to the final composition as sums-of-squares, for instance), making the poll useless.

I don't know how SUSA or PPP does it, come to think of it - but their demographic compositions are not usually obviously off (I am not saying Big Ten is off!) I think PPP fixes their demographics to expectations; this helped them in the WI democratic primary:
http://www.publicpolicypolling.com/pdf/PPP_Wisconsin_Release_021808.pdf
(via the Pollster.com archives!)

So as I said, this cuts across both your roles (from my POV!) One answer is that the MN, WI and PA margins were close to the trend MOE, and the Big Ten tipped them yellow. Perhaps a tighter MOE is required for classification, or some sort of a t-test (there could be something more complicated like jme's 1000 bootstraps, but I only go so far with statistics!)
Another potential answer is for the pollster to provide, a la PPP-WI-Dem, the results as you found them, and the results if the demographics matched "expectations."

____________________

Loyal:

Okay, I'm a cool guy and not a geek so I hope it's ok to comment here :). or as they say in radio, long time reader, first time poster.

I am not a pollster and my statistical skills are spotty and I'm a SAS guy and not able to read or write R. So maybe I'm really going to be off with this. But there are some methods used in meta-analysis of health care research that I could see useful here. Considering a relative risk of who the poll supports as the outcome, and the inverse of the variance as the weight, one could estiamte effect size in that fashion.

But getting to what I understood Charles to say about understanding the variability of the polls, couldn't you perform a random effects regression with some of the predictor variables being things like percent AA percent republican, etc and by so doing come to understand to what extent the different results can be explained by these factors?

Serious question since I am used to doing my analyses in other venues than polling.

Thsnk you in advance.
Loyal

____________________

Mike Drew:

This discussion is miles and miles over my head (graduate of UW-Madison PoliSci though I am, unfortunately!), so I need to ask this in the most layman terms imaginable: What about turnout?

I guess it's a general polling question, but I don't see how the universal assumption that turnout numbers will be completely unlike recent elections is reflected in these or other polls. Doesn't that X factor make polls' real predictive value this year questionable?

____________________

RS:

@Mike Drew:
See the PPP-WI poll I linked to in my post @12:28 AM. Still, for Obama partisans, I'd suggest working as if conventional turn-out models apply, so any increased turnout of favorable demographics will increase the likelihood of an Obama victory. If you are a McCain fan, forget what I said ;-)
"graduate of UW-Madison PoliSci though I am, unfortunately" - unfortunate to be a graduate of UW-Madison, or unfortunate that the discussion is over your head? Either way, Professor Franklin won't be happy ;-)

____________________

Mark Lindeman:

@Michael McD.: yeah, I didn't mean to imply that time was your particular concern -- just thinking out loud. I'm not sure what jme had in mind about bootstraps, but I was lying awake thinking about replicating each poll based on sample size (maybe with some extra noise) and repeatedly drawing the loess through the cloud -- which may be the same idea, or else similar in spirit.

@Loyal: Your last idea is good, but gets tricky because there is so much variability in disclosure -- not to mention timing. Basically, there are more unknowns than data points. Still, to get some sense of how (e.g.) %AA affects the results is very feasible. (I haven't done enough with relative risk to think whether that's a good approach -- I tend to like the measures I'm used to.)

@Mike Drew: I don't know/remember if you lived through the similar debate in 2004 -- we probably all should start brushing up on Mark B.'s posts now! -- but many LV models don't constrain turnout (so, if interest is higher, the implicit turnout projections will go up). While many people assume that higher turnout inherently favors Democratic candidates, of course it all depends on who turns out and why. Being wrong about turnout is not necessarily a problem; constraining the demographics to match past elections would be. But I'm thinking generally and prior to coffee. I haven't stared too much at recent poll internals, and I think it would have been too soon anyway, esp right after the conventions.

____________________

jme:

Sheesh, shouldn't have gone to bed! Look at all the excitement I missed.

Thanks for the technical details, Charles, us geeks love it!

Charles summed things up pretty well I think:

"The map is the problematic part of this. We try to be clear that we are characterizing the current state of polling, not the election prediction. But you can make a good argument that the coloring is too driven by the polling variability and not our best estimate of the CI around the trend. I get that, but think consistency across charts and polls, and the uncertainty about what happens between now and November is justified."

Indeed, the map is the problem. In my view, the map is simply a _summary_ of the more detailed (and accurate) picture painted by the scatterplots and loess curves. Of course, when you summarize fairly rich data like that, some losses are inevitable.

I take your point Charles that the coloring can include uncertainty about what _may_ happen in the future, but I think that's too subtle for your average viewer. I prefer to view the decision to color states as being tied to the state of things _right now_, which is consistent with your general philosophy of not predicting the future (a la 538).

I don't think that my worries about the coloring stem from me being less _conservative_ (small c!), it comes from my concerns about _consistency_! (small c!). As I pointed out above, when I look at the states MT, WV and IN and see that they are yellow, I think, "Wow, those states are toss ups!". And then I click through and look at the plots and realize that no, none of them are really toss-ups, they're just kinda close at the moment.

In other words, the data on the colored map should be _consistent_ with the data on the individual state plots, and I don't think that's happening right now.

I'm still curious how Charles/Mark would respond to my hypothetical from way above, or we could take WI as an example. Doesn't it seem _excessively_ conservative to call WI a toss up when we've had, let's see, 15 consecutive polls that show Obama in the lead? Certainly his lead is small, but (ignoring the time factor for a moment) wouldn't the standard repeated sampling interpretation of a CI lead us to suspect that if the state really were a "toss up" that McCain would show a lead in around half of those polls? (He says, not bothering to do the actual calculations...) Indeed, even if Obama's true lead were only +2, the probability of 15 polls in a row seems awfully small (again, I'm ignoring the time component here).

That's my concern: that the trend line CI's are _so_ conservative that they ignore important information that is actually being displayed beautifully in the trend lines themselves and become inconsistent.

Go stats geeks!

____________________

Loyal: Good to have you-- and I used to be a SAS guy so feel welcome. The meta analysis techniques would be great if we had consistent access to the data on demographics within the polls. This is one of those things pollsters often don't release so it is impossible to know what role they play across polls (as opposed to their effects at the individual level within polls.) Mark Blumenthal's efforts on the "Disclosure Project" during the primaries proved how hard it is to get even pretty basic information about every poll. We had to give it up but it proved the point that more is needed. We put up everything from the Big Ten Poll as an example so everyone can have a look and a say. But you can't do meta analysis without the original distributions.

Mike Drew: Sorry if we let you down at UW. Should have pushed those stats courses more in PoliSci! The turnout stuff is critical and I hope to have more for you on that in the next day or so. LV models are usually better for Reps, but the Dem effort this year is a wild card. Curious to think about how to model those effects at this point.

RS: Yes-- there is a lot of analysis we are doing on the demographics, though as the meta-analysis discussion points out, most polls don't release as much as we do, so it is hard to know how we compare.

More RS: The demographic variability doesn't propogate into the huge MOE. Suppose we had simply asked a single question- vote. What would the moe of that be? .5/sqrt(n). Now what if we collect a second variable. We haven't changed the sampling design at all, just collected another variable. That doesn't change the sampling distribution of the vote variable.

And I don't have anything to say to Mark Lindeman-- he already said what I would have wished to have said!

Charles

____________________

jme-- who wrote as I was composing- You've convinced me we should do a comparison and post it. I've been thinking of one based on the consistency of polls above and below zero (on the Obama-McCain scale). WI HAS been very consistently above zero even as it has drifted down from the +13 we found in our WI survey in June. That no poll has put McCain ahead is important data.

But this discussion makes me consider also using the CI for the estimator as an alternative approach and one that is better in several ways.

Of course, this will create confusion by having "two ratings systems" but so what? It will be fun.

Thanks! Time to leave for soccer.

Charles

____________________

RS:

Charles:
I understand what you say about random sampling. But I much prefer (what I understand to be) the SUSA approach - weight the sample to match the state demographics, and then apply a LV screen. Not even the PPP approach that juices up the demographic to match an 'expected' turnout (though that worked for them in WI). YMMV, of course, and it does, apparently.

Could the Big Ten release the splits within demographics? I attempted to calculate the numbers after adjusting the demographic make-up to "expectations" but was stumped by the lack of splits within subgroups. Like, say, this:
http://www.dailykos.com/dailypoll/2008/09/20

And then we could play with it as much as we wanted... :-)

Thanks for the discussion!

____________________

Loyal:

Thanks to Charles and to Mark Lindeman for their warm responses. Much appreciated.

I think I understand about the failure to disclose internals.

To do what I suggest witht eh best models you would need to know preference by race and party affiilation at the same time. And by the way, one could add days before most recent poll as another variable in the RE model to provide an opportunity for secular trend to show (this would work if we are only looking at recent poiint estimates, to include older data one would also need transformed variables to account for swings in each direction).

And from what I see you often have by race and by party, but not by both.

But I suspect that there is data about what percent of each race are in each party on a state by state or at leaset region by region basis.

So if you had preference by party and preference by race, you cold probably do a pretty good job of imputing the likely internal distribution and extrapolate in a way that still might be informative becasue of the capacity of the RE model to incorporate these other factors like racial and party distributions.

Or am I just thinking too much?

Off to my niece's wedding and I'll be back tomorrow to read what I've missed.

Thanks again.

Loyal

____________________

jme:

Ok, these are my final thoughts, I promise!

I'm glad I've helped convince Charles to do a post looking into incorporating the stability of who's leading into coloring the states.

I just want to reemphasize that my criticism here isn't that "the statistics are wrong", but that condensing the scatterplot+trend line for a single state to one of 5 values (dark blue, light blue, yellow, light red, dark red)inherently involves a significant loss of information, and that I doubt there is a perfect solution. I suspect that we could cleverly devise some procedure that would produce more "accurate" classifications for reasonably densely polled states like IN, WI or PA, but I doubt that anything we come up with will make the color assignments for states like WV or MT behave the way we would expect.

Additionally, although I understand the desire to do this sort of state coloring (hey, its popular!), it runs somewhat counter to the extremely admirable philosophy at pollster of simply trying to convey the current state of polling in the context of different sources of variation. I mean, when you reduce a state to one of 5 colors, you're completely sweeping the variability info under the rug.

Again, I understand the pressures to do this sort of categorizing of states, but one of the things I _really_ respect about pollster is that I sensed an effort to resist that kind of reductionism in poll analysis.

I mean, would it be so terrible to map the trend line estimates to something closer to a continuous color palette that goes from red-white-blue (538 does something like this)? You'd lose the ability to neatly count electoral college votes, but it would allow the map to very simply reflect the current state of the trend line estimates.

I guess like any self respecting stats geek, I'm pulled in two directions: get all technical and start devising complicated estimation procedures to classify each state, or treat it as a simple (hah!) data visualization problem and get all Tufte-ian on the way to display the trend line data on a single US map.

____________________

Thanks to all of you for a very helpful and stimulating discussion. If we can manage to do half the good ideas here it will be great.

Charles

____________________

Judy Shapiro:

Like Loyal, I had been wondering why meta-analysis techniques weren't being used.

Some of the suggestions here actually do involve pretty standard meta-analysis techniques. For example, jme's suggestion that we simply count the number of polls in a state that come out favoring Obama, and the number favoring McCain, would be a sign test. As jme notes, this sort of analysis ignores changes over time, but given how incredibly conservative the sign test is, and given that there haven't been huge overall trends towards one candidate or the other in the past few months, I don't see that as a big problem.

So, Charles, I'm trying to understand your comment, "you can't do meta analysis without the original distributions," since there are meta-analysis techniques like the sign test that require nothing more than the result of the poll. And, there are many other, less conservative meta-analysis techniques that require just the result of the poll and the MoE. Are you basically saying that the different polls may be different in quality and that you can't tell the relative quality of the polls you are summarizing (in terms of whether they weighted the proportion of AA voters properly, say)? Therefore, you don't want to use meta-analysis techniques because those techniques would be making the false assumption that all the polls were of equal quality? Just trying to see if I understand.

Judy

____________________

Judy-- Thanks much for the comments--

I was responding narrowly to the parts of Loyal's comment that seemed to require having the crosstabs from within the survey-- stuff that is almost never available, or even just the marginal distributions of all variables. So I didn't mean it to come over as a general rejection of meta analysis.

I'm not a meta analysis expert at all, so maybe Loyal and Judy can join me for a cup of coffee and offer me a tutorial! The sign test Judy mentions would be a simple thing to use, for example. I'm as much trapped in my narrow focus and competence as anyone, so some expansion to include ideas from meta analysis would seem like a good thing.

Thanks-- keep 'em coming.

Charles

____________________

LouieLou:

Does the poll numbers reflect people that subscribe to cable service, ie VOIP (Voice ov er IP addresses) phones and cellphone subscriber?? It seem that the none of the people that I've spoken to have yet to be contacted for their input to the poll numbers. Cable service phone number may not be readily available, unlike the yellow pages (landline phones).

____________________



Post a comment




Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.

MAP - US, AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN, MS, MO, MT, NE, NV, NH, NJ, NM, NY, NC, ND, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VT, VA, WA, WV, WI, WY, PR