Pareto Charts
If period-share charts show how relative leadership positions among data categories change over time, Pareto charts help viewers understand each category's contribution to the total. Analysts commonly use Pareto charts to determine whether a small number of categories contribute disproportionately—that is, whether the data implies an "80/20 rule."
The Pareto chart requires a data set aggregated by category and sorted in descending order. Then, for each record in the data set, the analyst calculates the cumulative total as an absolute number and (optionally) as a percentage. Table 6-2 shows the sample security-related data set used in the period-share discussion, enhanced with calculated values for the Pareto chart. I have sorted the 2004 vulnerability counts and added two calculated columns: "Cumulative" and "Cumulative %."
Table 6-2. Pareto Chart Data Set (Security Example)
Vendor |
2004 |
Cumulative |
Cumulative % |
Symantec |
32 |
32 |
28.3% |
McAfee |
14 |
46 |
40.7% |
F-Secure |
11 |
57 |
50.4% |
Check Point |
8 |
65 |
57.5% |
ISS |
8 |
73 |
64.6% |
Trend Micro |
7 |
80 |
70.8% |
Zone Labs |
6 |
86 |
76.1% |
OpenSSL |
5 |
91 |
80.5% |
Sophos |
5 |
96 |
85.0% |
Sygate |
5 |
101 |
89.4% |
Webroot |
4 |
105 |
92.9% |
Panda |
3 |
108 |
95.6% |
SSH |
3 |
111 |
98.2% |
Kaspersky |
2 |
113 |
100.0% |
IronPort |
113 |
100.0% |
|
Total |
113 |
113 |
100.0% |
Figure 6-21 shows a naive, sample Pareto chart for this data set, created using a standard combination-chart wizard and heavily reformatted. Even though we've followed our graphics guidelines—the chart looks neat enough—clear readability issues emerge.
Figure 6-21 Sample Pareto Chart
To begin with, we needed to stretch the chart horizontally quite a bit in order to fit everything in. But because Pareto charts by definition attempt to show 80/20 distributions, we can safely cut the low-scoring items out of the list and save space. Focusing on the top 10 vendors, rather than including all of them, follows the 80/20 rule. Second, turning the chart on its axes helps quite a bit, although it requires some spreadsheet hackery to do.
Figure 6-22 shows the redrawn version of the Pareto chart. To create the chart, I plotted the two data series (one for the absolute vulnerabilities per vendor, the other for cumulative percentages) initially as two horizontal, overlapping bar graphs. The secondary vertical axis (right side of the chart) contains the cumulative percentage bars. I hid the secondary vertical axis' tick marks and labels. Then, I set the fill color and line for the second series to "no fill," rendering them invisible. Last, I added a polynomial trend line with a polynomial regression with an order of "6." This adds the red "cumulative %" line to the chart. Why do this? Because Excel cannot display a horizontally oriented line chart that uses categories (although it will do so with bar charts, which is how the first data series appears). Unfortunately, until Excel's charting capabilities improve, analysts will need to resort to hacks of this sort.
Figure 6-22 Pareto Chart (Redrawn)