Charles Franklin | June 19, 2007
Topics: 2008 , The 2008 Race
State level polling is vital for understanding the nomination contest. Candidates have to win in the states and of course the early states are especially vital. But the polling in states raises a variety of issues that should be kept in mind when reading these state primary charts.
First, the number of polls in a state varies widely and that affects our trend estimates. Ideally, the trend estimator should have a dozen or more polls to work with before we take the trend very seriously. When the number of polls drops too low, the trend estimator will jump around considerably if new polls are very far from previous polling and may produce jagged trend lines that are likely to change with more data. Despite this danger, we've estimated the trend with as few as eight polls, rather than stick to the safer minimum of twelve. Too many states are between 8 and 12 polls to ignore, and while we are cautious most of the trends look pretty reasonable even with less than 12 polls. When the polls are consistent with each other, the trend estimate will still be pretty good even with eight. But when there is substantial disagreement among the polls, we will get jagged or otherwise "bad" trends. Since the plots show the actual polls you can look at the data yourself and decide whether the trend is a reasonable fit to the data, or if it is erratic enough to be discounted. You decide. On the bright side, as more polls appear these trend estimates should all stabilize and this problem disappears.
The trend estimator here is the more stable and conservative estimator ("old blue"). This is because the generally small number of polls would make a less stable estimator jump around-- chasing random variation in the polls rather than giving a good estimate of the actual trend. The stable estimator may be a little slow to notice sharp changes in trend, though with the current (rather low) number of polls this will not be a big problem. When the number of polls grows enough, we'll check the more sensitive estimator as well.
Even with the stable estimator, a single poll at the end of the data can exert quite a bit of influence on the trend estimator. Be careful about extending the trends, especially if there is only a single recent poll after a period without polling. On the other hand, if the polls are consistent with each other then the trend estimator will follow them.
When there are fewer than eight polls, we don't try to compute a trend, and instead simply print the median of the existing polls. The median should be a pretty good estimate when the data are stable or scattered, but when there is a strong upward trend the median will tend to underestimate the candidate's current standing (and overestimate with a downward trend). Again, visual inspection of the data points should give you a good insight into this.
We are offering two "views" of the data. The "long-term" view consists of all data from January 1, 2005 through April 1 of 2008 (as those late data become available!) This gives the most comprehensive view of how the nomination races have shaped up and how they have changed over time. On the other hand, some people would like to "zoom-in" and focus just on recent data in 2007 and later. We give you that option, with a set of charts that run from January 1, 2007-April 1, 2008. You choose.
The trend-lines always use all the data regardless of which view you are looking at. Therefore the lines represent exactly the same numbers in either view. This can occasionally produce puzzles. A trend that fits the overall data in the long view may appear to not fit a small segment of data in the zoomed in view. A good example of this is Gingrich in Florida. In the long-term view, the trend line fits the data quite well but there are three polls at the start of 2007 that are well away from other polls and the trend line. When looking at the zoomed in view for Gingrich in Florida, the trend seems not to fit these data from early 2007, but it is actually the three polls that are "abnormal" not a poor fit of the trend line. When in doubt, check the long-term view.
If you switch between the long-term view (2005-2008) to the zoomed-in view (2007-08) be aware that the change in aspect ratio of the chart will mean the slopes of the trends will change. The numbers represented by the trend lines are exactly the same, but the slopes of the lines will be more shallow or less steep. Be careful judging a "sharp" change in the long-term which will look more gradual in the zoomed-in view. Only the aspect ratio has changed --- not the actual rate of change.
With state level polls there is often a great deal of variation across pollsters. Some polls may be excellent while others are poor, but it is hard to know which is which. Sample size can also vary considerably. This will produce more variation in polls than we might see with larger national samples of more consistent quality. This is why the trend estimator is probably more important in the state polls-- it will iron out the differences among polls. And given enough polls, the trend estimator will tend to ignore polls that are far from most of the other data. (However, it can be fooled if the bad poll is right at the end of the data, so be careful if the latest poll seems out of line with previous polls.)
We've written here before about the problems of sampling "caucus goers" or even likely primary voters. These problems are even worse than identifying likely voters in general elections. This should be expected to add more variability to the polls in the states. Again, the trend does its best to extract the signal from the noise.
Finally, different pollsters deal with the candidate list differently. Some exclude Gore while others include him. Some include Gingrich, others not. These are design decisions by the pollster, but may increase support for included candidates because the excluded candidate's supporters are forced to pick a candidate in the list. Our view of this is that the pollster has made their best judgment of how to ask the questions and we don't second guess that. But it means one pollster might have more support for top candidates than another because they have excluded, say, Gore who would otherwise get 10-15% support. If you want to know these details, go to the individual polls.