A brief break from polls to comment on graphics and politics. Today's New York Times has an op-ed by NBC's political director Chuck Todd and a graphic designed by Nicholas Felton. The text and graphic are here. The text describes the data (quite completely-- an unusual but welcome touch!) noting that candidates are stratified by time in rough line with their poll standing and that debates played a part in both the rise of Mike Huckabee and the slippage of Hillary Clinton.

The graphic is a variation on a pie chart, showing the total number of minutes each candidate spoke in the 21 debates held in 2007. It is an odd fact that statisticians and analysts of statistical graphics universally hate pie charts, while graphic designers for mass media love them. The latter seem to equate statistical graphics with pie charts, while the former have ranted for years about their defects.

The appealing metaphor of a pie chart is its division of a whole into parts. Whatever the slices represent, they have to add up to a "whole" pie. The trouble here is that the text and data are about amounts of time, and only implicitly about the share of a total that each candidate receives. Further, the pie mixes the two parties into one whole pie, but there really should be two pies, each divided within party and assuming we care about shares of the pies, since Republicans can't eat any of the Democratic debate pie, nor vice versa.

When what we want to compare are magnitudes, rather than shares of a whole, the data are more clearly presented as distances rather than areas. It is easy to compare which distances are longer than others, and relatively difficult to see differences between the areas of pie slices, especially when the slices are not adjacent to each other.

So let's look at the same data in a different format and see what we can see.

The chart above reproduces the information given in the original pie chart. It plots the number of minutes each candidate spoke during all debates, and distinguishes party by the use of color (here red and blue, in the pie chart by shading.)

The main points made in the op-ed are also evident here. There is clearly a great deal of stratification across the candidates, with front runners getting more time than the "back of the pack" candidates. But there are a few comparisons I think stand out more in this chart. The advantage of Obama and Clinton over Edwards and Richardson is clear here. Richardson got only about 2/3 of the time of Obama, with Edwards a bit better but still well short of Clinton. The next cluster of Democrats-- Dodd, Biden and Kucinich-- got only about half the time of Obama, with Gravel far behind even that.

On the Republican side, Giuliani and Romney were closely matched with McCain a little bit behind. As with the Democrats, there is then a large gap until we reach Huckabee at about 3/4 of Giuliani's time. A smaller but still clear gap separates Huckabee from the cluster of Paul, Hunter, Fred Thompson and Tancredo. Another gap separates Brownback and then one more puts the short-lived candidacies of Tommy Thompson and Gilmore and the single appearance by Keyes together.

To my eye, these differences are easier to perceive and compare when the number of minutes is simply the location of the dot in the chart above than when it is the area of a pie slice.

There is actually more data presented in the text of the op-ed than is present in the graphic. Todd's text notes differences in number of debates by party (11 for Dems, 10 for Reps), which means Dems should have about 10% more total time available than Reps (if, that is, the debates were equal length, something we don't know from the text but see below). He also gives the data on how many debates each candidate participated in, an obviously important point since we are comparing Keyes' minutes in a single debate with Obama's time in 11 debates. For some purposes we might care only for total time, but for others we might want to adjust for number of debates. I do that in the chart below, which shows the average number of minutes per debate for debates in which the candidate participated.

One immediately clear point in this chart is that Obama and Clinton retain their advantage over Giuliani and Romney even when we adjust for the extra Dem debate. Giuliani and Romney got the same time per debate that Edwards and Richardson received, but that leaves them well back of Obama and Clinton.

The most important shift in the chart is the movement of Fred Thompson to the midst of the top 4 Republicans. Thompson only participated in 5 of the 10 debates, so his total time in the first chart (and the original pie chart) dramatically misrepresents the attention he received after entering the race. Thompson received substantially more time per debate than did Huckabee, though in 10 debates Huckabee had 73 total minutes to the 49 Thompson got in 5 debates, as shown in the first chart.

In his one debate, Alan Keyes got more time than any other second tier Republican averaged over more debates.

But why do Giuliani and Romney continue to get less time per debate than do Obama and Clinton? Todd's text points out that there are 12 Republican candidates but only 8 Democrats who have to divide the time. That seems reasonable, but the data are a bit more complicated. While there are 12 Republicans in the charts, three of them participated in four or fewer debates while all but 1 of the 8 Democrats participated in at least 10 debates. When we count total candidate debate appearances, the Republicans had 88 and the Democrats 81, less than a 10% difference due to number of candidates and appearances, not the 12 to 8 ratio of candidates.

What is quite different is the total number of minutes the candidates of each party spoke. Democrats totaled 794 minutes over 11 debates, a total time per debate of 72.2 minutes of candidates actually speaking. For Republicans, the total in 10 debates was 666 minutes, or 66.6 minutes per debate. So the Democratic advantage in the original pie chart and my first chart above has built into it a longer total for Democrats regardless of the number of debates.

Further, if we divide by number of candidate appearances, we get how many minutes each candidate would have gotten if the total speaking time had been divided exactly equally for each debate appearance. The result for Democrats is 794/81=9.8 minutes per candidate per debate, while Republicans had 666/88=7.6 minutes per candidate per debate. For whatever reasons of debate format and schedule, Democrats enjoyed more time to talk even adjusting for number of candidates and number of debates. (Had Reps had the same 81 appearances as Dems, they would still have only had 8.2 minutes each.)

So the disadvantage in minutes per debate even for Republican frontrunners compared to Democratic leaders is not just an artifact of number of debates or of candidates. It is a real difference and it might be of political interest to know what accounts for it. Did Republican debates run shorter on average? Did questions to Republicans run longer on average, leaving less time for answers? The data don't answer these questions. But over even an equal number of 10 debates and equal number of candidates, Democrats would have enjoyed almost an hour longer to speak (721 minutes vs 666.)

Let's adjust the speaking time to show which candidates got more than their fair share relative to the time available per candidate per debate. In this case, 100% means the candidate got exactly the "fair share", or 7.6 minutes per debate for Republicans and 9.8 minutes per debate for Democrats. On this scale, leading candidates got up to 140% of their party's fair share, while the lowest share was 63%.

This new scale now removes the differences in total time between parties, and lets us compare relative advantage or disadvantage between parties and candidates. The data are plotted below.

Now that we are no longer confounding differences in total time between parties, new perspectives emerge from the data. The top two candidates in both parties were equally advantaged-- all four got about 140% of the time that an equal time rule would have given them. The previous comparisons masked this due to the shorter Republican times. But relative to a fair division of time, the top two were treated almost identically in both parties.

But the 3rd and 4th places were treated rather differently. In the Republican debates, McCain and Fred Thompson received about 120% of a fair share, while on the Democratic side Edwards and Richardson got only slightly more than a fair share would entitle them to, Richardson at 101% and Edwards at 109%. Based on this share of time comparison, then, the debates treated the Republican race as more of a 4 person contest, while the Democratic debates divided a top 2 from an "average" third and fourth.

The extra shares for leaders must come, of course, from the rest of the pack. Huckabee who now threatens to win Iowa has received only a 96% share of a fair time allocation. The bottom 5 among Republicans all got less than 80% shares. Among Democrats the gap between Richardson and the rest is from 101% to 80.7% and below.

So total time favored the Democrats, but did so even after adjusting for number of debates and debate participants. The separation into top 2 vs 3rd and 4th was especially clear for Democrats. The Republican race gave relatively more time to 3rd and 4th place candidates.

The text of the op-ed only contains four sentences that give interpretations of the data:

The front-running Democrats, thanks mostly to a smaller field (but also to one additional debate), got a lot more time to speak than the front-running Republicans.

Not surprisingly, the times for each candidate seem to follow the polls, with the leading contenders getting more minutes. As Mr. Huckabee’s poll numbers rose, his speaking time increased.

The debates had effects on both voters and candidates. Mr. Huckabee’s performances helped him emerge from the pack, and a few tough moments for Mrs. Clinton set the stage for her eventual fall in the polls.

The first point is right that Dems got more time, as the charts all show. But that advantage wasn't entirely due to fewer candidates and one more debate. The advantage was more real than that: Democrats got more time to speak per debate and per candidate. Changing the graph allows us to see this in a way the pie chart did not.

Also revealed by the charts here are systematic differences between candidates that illuminate the nature of the stratification within and between parties. Front runners are advantaged, but the Democratic race was treated as having 2 clear leaders while the Republican race had 4 apparent contenders, based on speaking time. That too is masked by the pie chart.

Finally, two of Todd's three points above are not addressed by the graphic or the data given. While it is clear that the order of speaking times roughly follows support in the polls, this is not entirely the case. For example Biden has more poll support than Dodd, yet Dodd got slightly more speaking time. (For that matter, and more powerfully, Clinton held a large lead in the national polls during almost all of the debates, yet trails Obama in total time.) Moreover, the dynamic element Todd mentions is not illustrated by the graphic at all. If Huckabee's speaking time rose with his polls we can't see it here. And did Thompson's time drop with his polls? Alas, the op-ed doesn't list these data (which would be quite lengthy.) But a graphic could have illustrated this dynamic aspect of the data in no more space than the pie chart.

Nor does the pie chart provide any evidence of the role of debates in the rise of Huckabee or the decline of Clinton. Did the polls move up or down noticeably following any of the debates? Did candidate time in one debate precede a rise in polls, or did a rise in polls precede more speaking time? We could see these things in a graph, but it would take well more than a thousand words to describe them. The better the graph the more words it is worth.

Dan:

This is a great little essay on the graphic presentation of data. It ought to be in a textbook. Thank you.

One small point: For me, the graphs would have been more effective if speaking time had been on the vertical axis instead of the horizontal. I don't know if I'm typical, but I've always found it easier to compare quantities by height rather than width.

There are, of course, more problems with pie charts - they distort the data; they are hard to read; humans are bad at perceiving angles; etc.

The charts you give here are great!

John:

Yes! More posts like this please...would love to see Pollster talk more broadly about political data (not just polling data).

