Articles and Analysis


The Demographic Composition of the PA Polls

Topics: 2008 , Barack Obama , Divergent Polls , Hillary Clinton , National Journal , Pollsters , Quinnipiac , Zogby

Just before the March 4 primaries, I did posts on the demographic compositions of the polls from both Texas and Ohio. With the ever valuable Eric Dienstfrey away on vacation this week, I am doing this post in a bit of a rush (so apologies in advance for typos). I would strongly recommend reviewing my post on the Texas demographics as a companion to this piece.

I have broken the available results into two tables below. Most come from documents posted on the web. Quinnipiac provided results on request, and the Zogby numbers were shared with my colleagues at the National Journal.

The racial mix of the Pennsylvania polls is not quite as critical to the level of candidate support as in Texas, since the share of black and Latino voters is smaller. Still, since Obama typically does better among African-Americans, men, younger voters and those with college degrees or higher incomes, while Clinton does better with whites, women, older voters and those with lower incomes or without a college degree, the demographic composition of the electorate will play a role in determining the outcome of the race.


The surveys show more variation on some characteristics than others. Most, for example, show the percentage of women as somewhere between 55% and 58%, and most show the African-American percentage as somewhere in the mid-teens. Of course, with Barack obama expected to receive 80% to 90% of the black vote, the difference between an African American composition of 13% and 18% can alter Obama's vote total by 3 to 4 points.

On the other hand, we see quite a bit of difference in age. Unfortunately, the pollsters do not all use the same categories to ask about and report respondent age. Still, we can see quite a bit of difference, particularly in the percentages in the 18-to--29, 18-to-35 and 18-to-44 categories. We see that 18-to-29-year-olds are are anywhere from 4% to 16%, that 18-to-44/45-year-olds are anywhere from 22% to 43%, depending on the pollster. Given that Obama typically does much better among younger voters, and that Clinton does much better among retirees, this variation is obviously critical. [Update: Brian Schaffner also blogged on this issue today].


Socio-economic status is another critical characteristic in the Obama-Clinton race, especially in Pennsylvania (and something that I have written about often). Unfortunately, quite a few pollsters either ask or report nothing about the level of self-reported education or income of their samples. Still, we see considerable variation. The percentage of respondents with college degrees varies from 29% to 44%. I should point out that education and especially income are subject to more measurement error than other demographic items, especially if the text of the question and the number of categories differs.

Finally, since readers asked for it the last time, I have also posted one more table that includes all of the data above, plus the vote preference results. You will need to click on the graphic below to see a larger, readable version.

It is important to remember that pollsters come to these composition statistics through different paths. Some interview samples of adults, weight those demographically to match census estimates of Pennsylvania's adults, then select "likely voters" and let their demographics fall where they may. Others will weight their "likely voter" samples directly to pre-determined demographic targets. Some pollsters will not set weights or quotas for demographics, but will set such weights or quotas for geographic regions (based on past turnout and their assumptions about what might be different this time).

Trying to discern the differences in these methods is beyond our capacity today. The important thing is to remember that different pollsters conceive of "likely voters" in different ways, and the "likely voters" they reporting at are not identical.

Update: Poblano at FiveThirtyEight.com blogged some worthy thoughts about differences in likely voter models today.

Please note that, given the crunch of time I have probably not proofed the tables as well as I should have. If you catch a typo, please do not hesitate to send an email so I can correct it.



The table with the cross-tabs and vote-splits is very illuminating - thanks, Mark!

I thought SUSA didn't try to force a demographic composition - just weighting by age, race etc. and then let the LVs fall where they may. Or am I wrong? Because in these results it appears that SUSA (a) dropped the %female from 58% to 55%; (b) bumped the 18-44 fraction from 38% to 43%; and (c) most recently, dropped the 65+ category from 24% to 21%. Could this mean Senator Clinton's support is weakening?

It almost appears as if SUSA's changes can be explained by demographic changes:

1. 4/5-7 56-38 to 54-40 on 4/12-14: Females dropped from 58% to 55%; 18-44 went up from 38% to 43%.

2. 4/12-14 from 54-40 to 50-44 on 4/18-20: 18-34 went up from 19% to 21%, but importantly, 65+ reduced from 24% to 21%. [Undecideds also increased from 3% to 6%; perhaps Clinton voters are reconsidering the negative attacks, as Poblano says.]

Also, comparing PPP (46-49 C-O) and SUSA (50-44 C-O), (as Mark points out) the change in African-American vote from 14% (SUSA) to 18% (PPP) means an almost 4% bump for Senator Obama.
Of course, there are other differences as well - PPP says Senator Obama does slightly better among women with more undecideds than SUSA, though PPP says 58% women compared to 55% in SUSA.

Anyway, interesting stuff... Thanks again, Mark!


Mark Blumenthal:

@RS: You remember correctly -- SUSA does not "force" the demographic composition of likely voters.


Virginia Centrist:

mark -

It's much more useful to look at regional breakdown tomorrow when you look at exit polls. The networks can estimate regional breakdown using hard numbers (actual vote counts at precincts), whereas the demographic exit polls can be rife with error (social pressures cause dishonesty, enthusiasm gap creates selection bias, targeting model is incorrect or based on bad turnout predictions, etc.)

With that in mind, I think the numbers to watch are the SE turnout % in the exit polls. If SE turnout breaks 46%, then Obama will have a "good" night (it'll be in the low to mid single digits). If it CRUSHES expectations and exceeds 50%, then Obama may win. If SE turnout falls below 40%, then we'll see a HRC win of about 10 points.

Election predicting using selected precinct turnout beats using demographic turnout where the error potential is higher.



Love this site!

After pouring over all the various analyses here- i'm left wondering- could Obama pull out a win today? I could not find an exit poll for the 2004 PA Dem Primary - as opposed to the Texas analysis which did have that data..

But exit polls for PA - 2004 general election had the 18-44 age group at 49% - granted this is republicans and dems mixed. The over 60 grp- 20% -- and females at 53%..



@Virginia Centrist:
While Philly turn-out is important, more important is how Senator Obama wins Philly.

SUSA earlier had southeast PA (including Philly) at 50-46; PPP says 58-32. PPP's margin with Philly at 45% of the total vote (= +11% of statewide) is much better for Senator Obama than SUSA's earlier split with Philly at even 55% of the PA vote (= +2% of statewide).

Just checked - in the latest SUSA poll, Senator Obama is ahead 55-41 in Philly. Still not as good as PPP, but much better.

All that comes down to demographics - the reason Philly/suburbs are good for Senator Obama (more affluent, more educated, more African-Americans). Who votes will be critical... as of course, how many of them!


Post a comment

Please be patient while your comment posts - sometimes it takes a minute or two. To check your comment, please wait 60 seconds and click your browser's refresh button. Note that comments with three or more hyperlinks will be held for approval.