Guest Pollster | December 10, 2009
Topics: Fivethirtyeight , Gay marriage , Maine Question 1 , Nate Silver , Polling Errors , Pollster.com
Harry Enten is a student at Dartmouth College
The past two years have shown us that predicting voter support for same-sex marriage ballot measures is no easy task. Pollster.com's aggregate trend estimates, reflecting pre-election polling, incorrectly projected that voters in California and Maine would vote against measures to ban same-sex marriage. Nate Silver, using a regression model that included a state's religiosity, year of the measure, and whether the measure included a ban on civil unions, also incorrectly predicted that Maine's amendment to ban same-sex marriage would fail.
In a post this past Friday, Silver offered a possible explanation: "It's not clear that the results in Maine are comparable to those in other states. Question 1 was the only gay marriage ballot initiative that did not seek to rewrite its state's constitution... there was no particularly good way to model the uncertainty."
While Question 1 was rare in that it did not amend the state constitution, it is not the only anti-same-sex marriage ballot measure to do so. In 2000, California voters passed Proposition 22 (the California Defense of Marriage Act), an ordinary statute, by a margin of 61%-39%. I was interested to see if including California's 2000 vote and a variable signifying that it was not a constitutional amendment would have improved Silver's model. To do so, I simply added a dummy variable controlling for whether the measure in question amended the state's constitution or merely altered state law.
The result is a model that would have actually done worse in Maine with a predicted yes vote for Question 1 of only 33.4% (vs. 43.5% for Silver's initial model), when the actual yes vote was 52.9%. If one were to add a dummy variable for an off-year election to this model as Silver did "ad-hoc" to his, the yes vote would still only get 37.9%.
Still, I was inspired by Silver's 2008 presidential regression models that combined polling and states' demographic data to find out if combining polling data with other variables could create a more accurate prediction of same-sex marriage ballot measures.
I have built a linear regression model based on 25 state gay marriage referenda from 1998 to 2009. The model attempts to predict support for banning same-sex marriage using five variables: projected support for the measure from pre-election polls, a state's religiosity, year of the measure (where 1 is 1998, 2 is 1999, and so on), a dummy variable controlling for whether the measure in question amended the state's constitution or merely altered state law, and a dummy variable controlling for whether the election was off-year.
The results for this model are very encouraging for those of us hoping to add value to polling data and predict future results of same-sex marriage ballot measures. I found that 92.1% of the variation between the different same-sex marriage elections was explained by the model compared with 80.7% for Silver's unaltered model. The average difference between the model's predicted support for an amendment in an election and the actual support for the amendment was 2.69% (compared with Silver's 4.46%). Importantly, this difference was greater than 2.00% in only 4 instances (Michigan 2004, Montana 2004, North Dakota 2004, and South Dakota 2006) and greater than 4.00% in only two (Michigan 2004 and North Dakota 2004).
The polling data is the best predictor for support for same-sex marriage amendments. Indeed, a simple regression in which the poll variable alone predicts the final result explains 86.4% of the variation in support for same-sex referenda across elections.
Despite the polling variable's dominance, the year variable is statistically significant with 95% confidence in the model. That is, we can be 95% sure the effect this variable has on the model did not occur by simple chance. The year variable has a negative coefficient, suggesting that in more recent years polling is less likely to underestimate support for the propositions. This finding supports a study by NYU's Patrick Egan that concluded that any possible "gay Bradley Effect," the theory that some respondents were uncomfortable sharing their opposition to gay marriage with a stranger on the telephone, has subsided in recent years.
The reason for this abatement is unclear, but it may have to do with the fact that the issue of same-sex marriage is no longer heavily used as a wedge issue nationally. Senator McCain mentioned the issue fewer times in 2008 than President Bush did in 2004, and Congress has not voted upon the Federal Marriage Amendment since 2006. This explanation would be consistent with Georgetown's Daniel Hopkins finding that the Bradley Effect for black candidates began to disappear in the mid 90's once issues (such as welfare reform and crime) with a racial undertone began to recede from the national debate.
The off-year and religiosity variable are statistically significant with 90% confidence in the model. The coefficient for the off-year variable is positive implying that polling underestimates support for the "yes" vote in off-year elections. This is not surprising considering these elections tend to have lower turnout (and are thus more difficult to poll) and are dominated by older voters who are more likely to be opposed to same-sex marriage.
The coefficient for the religiosity variable is positive meaning that, when controlling for the other variables, polls tend to underestimate support for the measures in more religious states. Last year, Mark DiCamillo, director of The Field Poll in California, argued that polling errors for same-sex marriage referenda resulted from late shifts and a boost in turnout among Catholics and regular churchgoers. He speculated that these shifts resulted from "last minute appeals" from religious figures. If DiCamillo is correct, and if gay marriage opponents have used similar tactics elsewhere, we would expect this effect, and thus the polling error, to be larger in more religious states.
The variable controlling for whether the measure in question amended the state's constitution or merely altered state law is not statistically significant. That is, there is a relatively high probability that any effect this variable had on the predictive value of this model occurred only by chance. It is important to point out that the results from this variable should be viewed with caution because we only have two observations.
Of course, I was also interested in testing if my model can work proactively and not merely explain past results. I wanted to investigate if, unlike the Pollster.com aggregate, it would have accurately predicted the results for California and Maine. To estimate the result for California as I would have prior to the 2008 election, I eliminated all the observations from the 2008 and 2009 elections from my dataset: California 2008, Florida 2008, and Maine 2009. This altered model called for the "yes" side to win in California with 51.9% of the vote, an error of 0.3%. To estimate the result for Maine, I simply eliminated the Maine 2009 observation. This modified model called for the same-sex marriage ban to pass in Maine with 50.6% of the vote, an error of 2.3%.
All of these findings support the argument that we can add value to polling data on same-sex marriage amendments when we control for them with variables such as religiosity of a state and year of the measure. We should recognize that polling ballot measures is always very difficult due to their confusing language. Polling same-sex marriage measures is especially problematic because of added factors such as a possible same-sex marriage Bradley Effect. My model helps to eliminate some, but no means all, the possible errors that result from these problems.
Notes on Data
1. For my model, off-year is defined as any election that did not place during a presidential election (primary or general) or a midyear general. This includes Missouri 2004, Kansas 2005, Texas 2005, and Maine 2009. Silver's model only counts Kansas 2005, Texas 2005, and Maine 2009 as off-year elections. I used my measure because non-presidential primaries, like traditional off-year elections are often plagued by low turnout.
2 For Silver's and my model, religiosity is measured by the percentage of adults in a state who considered religion an important part of their daily lives in a 2008 Gallup study.
3. Because prior studies have found that due to the confusing nature of ballot questions voters become increasingly aware of the meaning of a "yes" and "no" vote for same-sex marriage ballot measures closer to the election (most likely relying on advertisements), my polling variable only uses data taken within three weeks of the election. In the case that more than one firm conducted a poll within three weeks of the election and less than a week separated the polls, I used an average of the firms' final polls. For Maine, this rule means I included an average of the final Public Policy and Research 2000 polls in my dataset, but not the Pan Atlantic poll because it was taken more than a week before the Public Policy's final poll was conducted.
While most of the data in my model is easily available, prior polling for same-sex marriage referenda is surprisingly difficult to find. I managed to locate and verify 25 elections with a measure to ban (or allow the state legislature to ban as is the case with Hawaii) same-sex marriage and a poll within three weeks of the election. I simply allotted undecideds to how already decided voters were planning on voting: projected vote in favor of the amendment by polls = those planning on voting yes / (those planning on voting yes + those planning on voting no).
Complete dataset is available here.