Articles and Analysis


Weighting by Party...and the CBS/New York Times Poll

Topics: CBS/New York Times , National Journal , Party Identification , Party Weighting

My NationalJournal.com column for the week on the continuing debate over party identification, how pollsters measure it and what they should do when they see variable results is now online will be posted later this morning.

The timing is a bit ironic. I wrote the column yesterday afternoon and then noticed this morning that the venerable CBS/New York Times poll took the highly unusual step (for them) of weighting by party ID in addition to their usual weighting procedure (emphasis added):

The combined results have been weighted to adjust for variation in the sample relating to geographic region, sex, race, marital status, age and education. In addition, the land line respondents were weighted to take account of household size and number of telephone lines into the residence, while the cellphone respondents were weighted according to whether they were reachable only by cellphone or also by land line.

Because of fluctuations in party identification, this poll was also weighted by averaging in party preferences from three recent past Times/CBS News polls.

That last line surprised me because previously, CBS and the New York Times had a policy of not weighting by party. Many casual readers of the CBS summary reports (like this one for their most recent survey) tend to assume they do, because they provide weighted and unweighted interview counts for each party subgroup. In the past, the minor partisan differences between their weighted and unweighted samples have come from their standard procedure of adjusting demographics (gender, age, race, etc.) to match Census estimates (the part described in the first paragraph quoted above).

I emailed Kathy Frankovic, the CBS News director of surveys, to ask about the decision to weight by party. Her response was that the party ID adjustment described above " is something that we have done once before in the past, when it seemed appropriate." Although I asked, she did not provide any explanation for why they deemed it appropriate this time.

The way that the CBS/New York Times pollsters chose to weight this survey is not quite the "dynamic weighting" system long advocated by Alan Abramowitz, Ruy Teixeira and others and now used by Rasmussen Reports for their national and statewide surveys, but it's close. They chose to weight to their own recent estimates of party ID rather than to results of exit polls from years past or to surveys done by other pollsters. That approach is the most defensible method, and avoids some of the potential pitfalls that I outline in my column today.

Today's CBS/New York Times release also marks the first appearance this cycle of the unique CBS/New York Times likely voter model. Rather than trying to select or screen for likely voters, the CBS/New York Times method weights voters based on their probability of turning out. I explained the procedure at length in a blog post four years ago.

PS:  Pollster.com contributor Kristen Soltis made the case for weighting by party, with a focus on an earlier CBS/NYT poll.



So I have a question Mark about which last three polls they are using. CBS news/NY times comes out once a month, so the weighting would be from the last three months. This would have a much higher percentage of people identifying as Democrats than is currently the case, which is why Rassmussen switched to six weeks changing weekly.

However if its just CBS polls thye are using the last three would only go back to Aug 31, making them more current however more influenced by the conventions. These would have a lower Democrat identification and higher Republican indentification.

I'm just guessing here, but I bet they used the three month polls to oversample Democrats. Just guessing. Please update if you know.



MArk do you know what their exact weighting were for this poll? I have thought for some time that the shift in party ID is much more important than the candidate numbers on a daily basis. What we can see clearly from Rasmussen is that the swing is to Republican and that means that many independents are actually republican voters. If McCain gets them to come out he should have an easier time of getting their vote. This race always going to be decided by independents. if more of them are Republican leaning them Democrat and it seems that way, the trend is to McCain.

That's why I placed bets the NYT has used numbers as old as possible to justify oversampling Democrats.



Here are the RV numbers:

31.6 Republican
40.5 Democrat
27.8 Independent

Democratic Advantage: 8.9

30.3 Republican
40.0 Democrat
29.5 Independent

Democratic Advantage: 9.7

As you can see, had the the raw numbers been used the Obama would have been up by more than with the weighted numbers, so no complaints about this weighing from McCain supporters, please.


Mark Lindeman:

@s.b.: The questionnaire gives a different answer than Justin's (I don't know Justin's source), on page 31: 28% Rep, 39% Dem, 26% Ind. (Presumably these would be the weighted results.)

I agree with you that many of the Inds are Republican-leaning and that McCain should therefore do well in this category. However, if these folks drift back from R to I, that in itself would improve McCain's performance among "independents" without helping his bottom line one bit. So it doesn't necessarily matter how McCain does among independents per se.

That said, oversampling (or overweighting) Democrats as you mention isn't the only way in which weighting for party ID right now could have an ugly effect on topline estimates. Let's suppose for a moment that a bunch of former "Independents for McCain" recently became "Republicans for McCain." That in itself would boost the R share, drop the I share, and drop McCain's apparent performance among I's. Then, if the pollsters downweighted R's and upweighted I's, the effect would be to underestimate McCain's support.

For instance, let's pretend -- I am radically simplifying -- that at time A there are 25% Republicans for McCain, 35% Democrats for Obama, and 40% Independents who split 25%/15% (= 62.5/37.5) for McCain, so that McCain and Obama were tied. Now suppose that at time B, no one changes vote intention, but 5 percentage points of McCain Independents re-identify as Republicans, so now we have 30% McCain Republicans and 35% Independents split 20%/15% (= 57.1/42.9).

Our hypothetical pollsters say, "Whoa, we ended up with too few independents -- we had better upweight them, and downweight the Republicans, to get back to 25R/35D/40I." The I's are therefore weighted to split roughly 23/17, and the R's are weighted to 25/0. Suddenly Obama takes a 52% to 48% "lead" on the strength of his apparently better performance among independents -- even though (by assumption) no one changed vote intention.

These are crazy hypothetical numbers, and I have no reason to think that NYT/CBS actually gave Obama a 4-point bump. But they do dramatize the possible dangers of weighting on party self-identification when it is in flux.


Mark Lindeman:

I should say that the questionnaire numbers don't necessarily contradict Justin's. The q numbers appear to be for all adults; his are for RVs only.

However, if his comparison is between no weights at all and all weights, then I don't think we can tell what the effect of the party weighting is -- although we can surmise that it may be pretty small.



Sorry, I should have given my source. I'm referring to the numbers at the bottom of http://www.cbsnews.com/htdocs/pdf/Sep08b-Elec.pdf

The numbers you give, Mark, are indeed the weighted numbers for all adults, though the non-weighted numbers are virtually the same according to the document I linked. The one difference you will see is that the DK/NA answer is grouped in with Independents in the CBS release.



So the NYT/CBS poll has the Democrats up by 11% in party identification. That is very high. Rasmussen's is 5.1% for the last six weeks averaged. Could a difference of 6% in the Party weighting not have given Obama a 4% bump? Sounds about right to me. Rassmussen said that a 1% jump for Republicans translates into about .75% gain for McCain and a loss of 1% for the Dems equals about a .5% gain for McCain, so the difference between Rasmussen's 38.7 Dem 33.6 Rep and the NYT/CBS 28% Rep 39% Dem is the same for Dems but under 5.5% for the Republicans

So, it is not that they have oversampled Dems. And interestingly when you look at polls obama's raw numbers are much moe stable than McCain's which fluctuate. I have noticed this. It is Republicans they have undersampled.

This 5.6%*.75 rough estimate from Rasmussen gives McCain and extra 4.2%. So yes Virginia, or Mark in this case, the weighting of this poll by PArty affiliation did indead roughly give Obama a 4 point bump or more correctly underestimated Mccain's support by 4%.

Please write more about party affiliation numbers Mark as I think it is actually the most important number in this campaign. Again if moe independents are actually soft Republicans, it really means the rac eis much different than people perceive. For example, everyone assumed McCain needs to win democrat votes or that Palin was an appeal to democtaic women. No hehas to get Republican independents to reidentify as republican and come out and vote for him.

Also the Rass numbers have changed sharply and this 11% spread is no way accurate anymore. This poll aslo by the way has a 20% spread Dem for the generic congressional vote which also woudl lead one to believe their weighting was off.

Anyways, thanks for the answers. I've said enough. Please look into party identification in polls more Mark. I heard the Hotline polls were only using 8% independents, no confirmation though.



By the way, Rassmusen has the race tied today, which is exactly where this poll would be if they used the same weighting by party identification.

Don't you think it's fascinating Mark when two polls actually say exactly the same thing, but appear to say something very different?



Also, Democrats talk about the fact that their party registration has increased significantly in many states. This is probably due in good part to the extended primary season and many independents or even republicans registering democrats so they could vote int he primary. Some switched to vote for Obama for sure and may still vote for him. However many switched to vote for Clinton and they probably won't vote for Obama in the general. So all this talk about voter registration being up for Dems in many states may be misleading, especially if voters are still self identifying as Republicans or independents when polled.

These are the numbers to watch for sure.

The answer to the question, who are you, is much less changable and much more fundamental than a snapshot of the question, who will you vote for.


Mark Lindeman:

@Justin: thanks, I saw your post late and didn't have the energy to look. Hmmmm. As I said, what remains unclear is the impact of the party weights in particular.

@s.b.: Dunno which Mark you are talking to -- maybe both. I think you're assuming that the Ras and CBS/NYT party ID numbers are comparable, which isn't necessarily the case. I don't even know if they use the same question wording. So your calculation is a what-if, not a solid reconstruction. But you might be able to sell me on it if you trace the party ID numbers back to the beginning of the year. I'm curious, but can't "go there" right now. (It would be inconclusive in the end, because whatever we learned about party ID house effects earlier in the year might not apply right now.)

Many people already thought that Palin was more about re-energizing "the base" than going after Democratic women -- but I agree that freehand interpretation of the party ID tables can lead to all kinds of mistakes.



s.b., in order for this poll to have a weighted advantage of 5.1 they would have had to more a full 4.6 points from their raw numbers. That seems like a pretty extreme weighing measure.

Rasmussen charges for his crosstabs, which I'm absolutely no going to pay for. I do wonder, however, if he lists his raw numbers along with his weighted numbers.


Mark Lindeman:

@s.b.: Oh, about Diageo: I see 41D/36R/19I. It appears that Diageo pushes Independents to lean, while CBS/NYT doesn't.



Yes both Marks.



Yes Justin you can see all of the party weighting numbers on Rasmussen's web site


and their history back until 2000, on a monthly basis. Also of note is that Rassmussen's sample size for this data is massive 45,000 people and as such should be fairly accurate.

Interestingly the last quarter from April to June has the party spread at 9.9% on Rasmussen, very close to the NYT numbers.

Of course Mark applying Rassmussen's party id numbers to the NYT poll is game playing but interesting none the less. All I'm saying is the raw data of this poll and the Rasmussen numbers show exactly the same thing if the same party id numbers are used. Rasmussens numbers would show Obama +4 if you applied the NYT party id weightings. N'est pas?

I don't think it can be disputed that Party id is shifting Republican as both Rasmussen and Gallup are showing the same shift.

So has NYT/CBS decided to do this to make this poll more reflective of the current race or more reflective of the race a few months ago? To me it looks like the decision was made to have these numbers come out looking like the race did a few months ago. 11% party id spread is huge. Way way way to big. It is less than 5% now, perhaps as low as 3%, as even Rassmussen's numbers go back to the beginning of August.

Again I think these numbers are actually the most crucial and the most telling in all of the polls.



Thanks for the Diego numbers Mark L. 5% spread seems more in line with Gallup and Rass, but are Diego weighting for party id?



I guess my last question is a which comes first the chicken or the egg kind of thing. Does diego make the result a 5% spread in party id and a 52-48 female gender advantage or did the numbers just happen that way?

As an aside even if someone does weight for Party id, do they then internal weight those numbers for gender region age etc?

So is the gender weight first or after the Party id or age weighting, and which do you think is most important Mark?



Sorry I'm blah blah blahing probably to no one but I find this whole weighting aspect of polls fascinating.

So pollsters tend to asume that something like female-male voting ratios are static around 52%-48%. They also assume age ratios of voters are fairly static, although one hopes with the aging of the boomers that those numbers are shifted accordingly. Then we get to race, touchy to say the least.

So some weightings can fairly reliably be done because these numbers really don't change. However, for example in Florida people with criminal records can now vote and couldn't in the past. Should pollsters adjust their gender and race ratios to take this into account as 980,000 people, mostly black men were kept off the voter roles last time, can now vote. Interesting question don't you think?

Obama people will say that young people will come out more than they have in the past and that black people will come out more. I personally don't buy this and don't think even the primary numbers showed this to any extent except in a few states, IOWA for young people for example.

Then you have Party ID. Dem party registration went up during the Primaries. Will this be a significant factor and should polsters weight for it, for example an unprecidented 11% spread in voter Party id in the NYT poll. Did I say unprecidented.

Then, which do you weight first is also a concern. If Party id is weighted, then do you weight for gender inside the party id or before, or do you assume that women who want to vote for Obama will identify as Dems?

Here's what I think. I don't think any weighting should be done at all. I think if you random dial enough phones from the phone book, not voter rolls or party rolls, and let the chips fall, then the sample will be fine.

Geography is the only weighting that should be done and that can be done by phone number with a computer. ie Zogby was way off in California for the dem primary because he oversampled San Franscisco. We vote by geography, by senate or congressional district or state. That's the only weighting that matters and is truely static.

The only other weighting that should be used is a question on voter intention. Do you think you will vote 1-10? Only sample those that answer 8-10.

Who cares what someone's race is. Who cares if they have voted before. Who cares what their age or gender is. It's a simple question. Are you going to vote?

That's why I like SUSA. They let the chips fall. I think anything else is alchemy, fascinating in how it influences polls, but more of an art than a science.

I'm a scientist, so I believe in experimentor bias and think it should be eliminated as much as possible. Weightings are experimentor bias.



Thanks, s.b.

I understand the way he reaches his targets, I'm just wondering if he releases the raw data from his polls. I want to know what the actually responses were from the 3000 people called over the last three days of the tracking polls.

Transparency in his methodology is good, but I want transparency in the current polls.



Justin, Yeah I think you have to subscribe to get the daily internals. I don't subscribe.

Gallup is showing similar results though and a sample of 45,000 over six weeks is massive and should trend pretty accurately and fairly currently.

Much more so than three polls over three months or four if this one wasn't used to weight itself, as the NYT/CBS poll.


Mark Lindeman:

@s.b.: I haven't checked what Ras would show with the NYT weights -- I'll take your word for it. Surely there has been a shift toward the Republicans, although I can't tell whether it is continuing. From what's happened with the Gallup tracker, I tend to doubt it.

If you mean that you think that folks like Kathy Frankovic decided to give Obama a bump, I don't believe that for a moment, any more than I've believed the attacks against Gallup. I think these people work hard to do the best job they can. I'm happy to second-guess them, cautiously, but armchair quarterbacking is hazardous enough as it is.

I can't tell whether Diageo/Hotline does party ID weighting or not. The results lead me to suspect that they are doing some sort of dynamic weighting. I see 42/35 three days in a row, then a single 43/34, then four straight 41/36s. Diageo may well use an iterative weighting algorithm in which it doesn't really matter what is weighted "first," but I don't know that -- nor what demographics are included. (Gender is the only one they report, as far as I can see, but I won't assume that they show everything.)

However, in the cases I've studied, demographic weights are applied to all respondents, before the likely voter screen (if any) is applied. So, the pollsters don't make assumptions about whether turnout will be up or down among (say) young people or blacks. not to voters. They weight the sample of all respondents to match the population, then let the likely voter screens tell them whether (say) black turnout is likely to be up or down.

I'm fairly confident that "if you random dial enough phones from the phone book," the sample will be nowhere near fine. Between the unlisted numbers and the cell-only users, I'm not sure which way you will be off (although someone knows). That isn't to say what is the right way to weight -- only, let's not kid ourselves.

Likely voter screens are a whole 'nother story, but I don't think you'll get much bite from a 0-10 voter intention scale with a cutoff at 8.



Ok Mark, thanks for the info. Demographics are weighted to population not previous exit polls, as I had thought, then the LV screen does the job.

It's kind of you not to assume bias in pollsters, or any other kind of social science, but you bet its there. As I have said experimentor bias is the number 1 error to try to avoid in any study, or hard science experiment.

The NYT, a paper I used to be a fan of has had the most disapointing, biased, mean and downright mysoginistic coverage of this race including the primaries that I have ever seen. I would never have given Fox news the time of day before this election. now i believe it to have the most unbiased and only real political journalism out there. The Liberal Press have completely lost all credibility. Do I accuse the NYT of trying to bias this study, well look at the results.

Who else is showing a 20% spread in the generic congressional vote? No one. not even close. There is something wrong with this survey. If my numbers came out like this after i applied a new weighting system and I were a pollster, I would remove the weighting and go back to the old methodology.

Three months ago a 20% congressional spread would have been the highest on the charts, now it's utter fiction. Impossible.

So yes I do accuse them of deliberately skewing this data. The pollster may be a nice lady but she has a boss or client right?

Drug studies are done all the time with hard science that get exactly the result the client paying the bill wants. Social science is even more open to this.



By the way a 4% spread for Obama and a 20% spread for a Democratic congressional vote means Obama is underperforming the Democratic brand by 16%. I believe this is also the highest discrepancy shown for quite some time.

If Obama is underperforming by 16% against Dems, he is in trouble. I don't believe this to be true by the way. I think this study is flawed.



This is a very interesting discussion. One other element of the oversampling/overweighting issue: I've noticed on some polls (those that provide internals, at least) that the under-30 turnout seems to be grossly oversampled. For example, in one of their tracking polls a few days ago DailyKos had the under-30 at 25% of their sample population. SurveyUSA, I believe, did so as well.

I find it highly unlikely that the under-30 proportion of the total turnout will be higher than 12-14%. It's just simple demographics: the proportion of the eligible-voter population that is under-30 has been rapidly declining as the country is aging.

Put another way: the under-30 turnout is roughly the same as it was in 1972 (the 1st presidential election that under-21 voters could vote): however, the proportion of the under-30 voters in the total turnout has declined by a full 1/3. The under-30 share of the total turnout is now barely 9%.

(See especially table 3 of the following study -- very eye-opening: http://www.civicyouth.org/PopUps/FactSheets/FS_Youth_Voting_72-04.pdf )

Now, since Obama is polling the under-30 vote at something like 2-1, an oversample on the order of 50% of the under-30 turnout would create horrible validity problems.




Rasmussen polls 45,000 people to get his party weightings and they show on a 5 point edgse for the dems and that was before Palin was even chosen.

NYTimes weightings are off by atleast 5 points if not more.



Not sure if you're still reading this, Mark, but do we have any idea of Gallup's methodology for weighing or even know for sure if they do weigh parties?

Right now it looks like Rasmussen and Hotline are using about a five point democratic advantage while Kos/Research2000 is using a nine point advantage (similar to this NYT/CBS poll), but I can't find any information on Gallup.



Justin Gallup doesn't weight by Party id, they don't even screen for LV as of yet. They do however poll Party id.

Yes As much as I like SUSA's methodology, under 30's are far more likely to say they will vote than actually vote.

Obama was able in a few states to bring large numbers of young people to the "polls". Most of these states had caucuses. Surprise it's easier for a young person to hang out for four hours, past midnight, in Texas to wait to vote.

I do not think the youth vote in primary states was especially high. One also assumes that 50 states all voting on the same day is a whole lot more difficult to get excessive young people out than in one caucus at a time.

I do not believe youth turnout will be especially high this election. Call me a pessimist or a realist whatever. In many states in fact during primaries the older vote exceeded projections and the youth vote did not. He will be able to organize the youth to come out on campuses, outside of that, i doubt it.

SUSA is open to this skew.



Do you have a source for that Gallup info, s.b.?



I'm curious reading your posting on how this weighting affects third party candidates. If there is no weighting involved, they could be completely removed from the poll.

Another interesting question is that the third party candidates are usually around 5%, so if the primary candidates are at 50%, 5% would be a 10% error in the poll. Is this true?


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.