Visualization: How to Present Security Data to Get Your Point Across
One picture is worth ten thousand words.
—Fred R. Barnard (1927)
Graphic design which evokes the symmetria of Vitruvius,
the dynamic symmetry of Hambidge,
the asymmetry of Mondrian;
which is a good gestalt,
generated by intuition
or by computer,
by invention
or by a system of coordinates,
is not good design
if it does not communicate.—Paul Rand (1985)
Security is such a complex topic that it defies easy description. In addition to the natural camouflage afforded the subject by its often-esoteric terminology and concepts, the sheer scope and volume of available security data overwhelms practitioners and laypeople alike. Because of these two facts, most information security professionals have no real idea how to "show security," either literally or figuratively.
It is fair to say that one of the reasons security professionals have so much difficulty dealing with their bosses is because they lack simple and clean metaphors for communicating priorities. Astute analysts recognize—correctly—that visual representations can dramatically enhance managers' abilities to understand security issues. Unfortunately, few analysts have received any formal training on the subject of presentation graphics or (more generally) graphic design. In addition, product vendors generally provide poor or inflexible graphical reporting tools.
This chapter is all about how to get your point across—that is, how to present your hard-won data in a clean and elegant manner that informs, illustrates, and illuminates.
In my view, information security urgently needs fresh thinking about data visualization. Most of what passes for information graphics in the security field generally takes one of two tired forms:
- Simple bar and pie charts showing samples of a single coarse-grained metric, such as the number of vulnerabilities found on BugTraq or the number of undesirable e-mails caught by an organization's spam engine
- Traffic lights that show the "health" of a range of analysis topics, typically built by hand-coloring reds, yellows, and greens into a grid or thermometer bulb
I find both of these approaches problematic. Bar and pie charts tend to be graphically inefficient; pie charts in particular take up a great deal of space relative to the number of distinct data points they show. In addition, they tend to include only a single metric or data range, rather than (for instance) juxtaposing several ranges.
Traffic lights are worse, because they oversimplify issues too much. In the same way that arithmetic means dilute important points by steamrolling over the outliers (see Chapter 5, "Analysis Techniques"), traffic lights obscure the variation, exception, and detail that lead to insight and smart decision-making.
"But wait," wail fans of traffic lights. "Senior management likes nice graphics. They want something simple. They don't understand the rarefied world of information security. If we give them anything more complicated, they won't understand it!" A former colleague of mine once made a statement like this to me, apparently seriously. I have met many information security professionals who agree with him. But the statement is pure rubbish, and arguably condescending. Want proof that the boss is not a simpleton? Consider a typical stock index chart in the Wall Street Journal or New York Times. Most charts of the Dow Jones Industrial Index contain these features:
- A time-based horizontal axis, often the last 30 trading days
- High, low, and closing positions for the dates in the range
- Trading volumes for each day in the range
- Often, a 30-day moving average
Doing the math: 4 data points per day, times 30 days, equals 120 pieces of data. These data appear in a compact, two-or-three-square-inch graphic. The boss understands this quite well, thank you very much. Compare this to a traffic light graph that shows exactly one data point that is neither accurate nor precise, or with the low-resolution "DefCon"-style bar charts espoused by the likes of Symantec1 and ISS2.
As an industry, we can do better than simple pie charts and traffic lights. We need to treat viewers of security metrics data—managers, regulators, and the general public—with more respect. In this case, "respect" means recognizing that intelligent people can, with a minimal amount of training, learn powers of discernment that go beyond nodding and smiling at low-resolution, brain-damaged exhibits.
We need to think of graphically representing metrics as an information visualization challenge, not simply as a "reporting issue." The term "information visualization" is relatively new to the business landscape. Broadly defined, it refers to the practice of using high-resolution graphics and related exhibits to display sets of data, particularly when the sets are large. If the analytical techniques reviewed in the preceding chapter describe ways to uncover patterns in data, information visualization provides methods of showing them off to maximum effect. Visualization concerns include composition, color, typography, arrangement, and use of space (both positive and negative).
Many readers might perk up their ears here and say, "Ah, so you mean making charts!" Yes and no: while information visualization does indeed often mean creating charts, these are means but not ends. Charts are often one part of the larger process of carefully evaluating the best way to present the information at hand.
As mentioned previously, this chapter discusses ways to graphically show off data to their best advantage, without losing the richness and texture that best facilitate deep understanding. Unfortunately, some of the most compelling examples described in this chapter cannot be easily reproduced with standard off-the-shelf office productivity packages. In these cases, I'll describe ways to create the exhibits yourself using custom tools.
A warning to the reader: as if you could not tell already, this chapter is heavily flavored with the strong tastes of my own opinions. If the taste seems excessively bitter, that is because I find more affinity with the aesthetic tastes of graphic designers and high-end management consultants than those of information security vendors and professionals. A relative latecomer to security, in my early years I was part of a business team that contracted Boston Consulting Group (BCG) for a seven-figure management consulting engagement. Believe me, several million dollars buys you a hell of a lot of management-grade graphical excellence. Since then, I have been a fan of the management consulting "house style" in general, and of McKinsey & Company in particular. Certain business magazines strongly influence my worldview, notably The Economist. Needless to say, the sophistication of visualization used by the organizations I have just listed could not be more different from the sorts of things we have been seeing in information security lately.
This chapter contains three major sections:
- Design principles—six basic rules to live by
- Guidelines for various exhibit formats—theory and practice for sixteen ways to visualize security data
- Thinking like a cannibal—three real-life examples showing how to rework existing exhibits
Design Principles
Before diving into the fun bits (the graphics!), I'd like to lay down some fundamental design principles that will help you create high-impact exhibits. These principles apply equally to all charting and data analysis packages: Microsoft Excel, Keynote, SAS, SPSS, JFreeChart, and others. However, the most common tool used for prototyping business graphics is the spreadsheet. What I am about to say will make the most sense to readers in that context. You can also apply these principles to automated exhibit generation, too, although I leave that as an exercise for the reader.
Generally speaking, mainstream software packages do not serve the cause of information visualization well. The default chart exhibits produced by spreadsheets are far too loud, colorful, and needlessly decorative. Excess chart bloat buries data in an avalanche of shininess, tick marks, unnecessary grids, irrelevant backgrounds, and other foolish bits of graphical frippery. But wisdom, as P.J. O'Rourke one put it, is "knowing the difference between can't and shouldn't." Just because an analyst can use a program to pollute charts with distracting visual noise does not mean it is a good idea to do so.
This chapter does not attempt furnish a treatise in graphic design. Others, notably the great Edward Tufte, have written beautifully and extensively on the subject already. You should, instead, see this chapter as a summary of effective presentation principles—part Envisioning Information, part How to Lie With Statistics.
Effective visualization of metrics data boils down to six principles:
- It is about the data, not the design
- Just say no to three-dimensional graphics and cutesy chart junk
- Don't go off to meet the wizard
- Erase, erase, erase
- Reconsider Technicolor
- Label honestly and without contortions
Following these six principles will result in exhibits that are clean, clear, and visually attractive. Let us start with the first one.
It Is About the Data, Not the Design
Good information visualization is like good graphic design. If the reader does not notice anything amiss, it succeeds: the audience pays attention to the data, not the decoration. But if the reader sees something that prompts a gawk or a head-scratch, the exhibit design may be overwhelming the data.
Data should stand on their own, without extra supporting props or bangles. Forcefully and reflexively check any urges to "dress up the data."
Just Say No to Three-Dimensional Graphics and Cutesy Chart Junk
I have never understood the fascination with three-dimensional pie and bar charts. I am continually astounded at how otherwise respectable security software companies insist on shipping reporting modules that sport ridiculous, gratuitous 3-D graphics. Unless your professional duties include preparing exhibits for the Department of Energy's nuclear weapons simulation program, few conceivable data sets genuinely merit a 3-D exhibit.
Simple, clean, "flat" charts make the same points a faux 3-D chart does, but with less ink. Certainly, ordinary bar charts and pie charts do not require them; the artificial depth only distracts the viewer from the data.
Recent versions of Microsoft's ubiquitous Excel spreadsheet software allow users to add photographs and flashy wallpapers to the backgrounds of charts or to the colored portions of area charts. Avoid these unless the exhibit serves some theatrical purpose. For example, a flashy photo background might feel right at home as part of a sales-oriented slide deck containing scads of music and the obligatory slide transitions. Nobody will take the exhibit seriously anyway, so the extra flash will not matter. But for situations in which the presenter intends to inform, persuade, or present results of analyses, charts should use white or translucent backgrounds and should omit 3-D.
Don't Go off to Meet the Wizard
Thanks to the profusion of "wizards," "assistants," talking paper clips, and other assorted digital menservants, modern desktop applications have made it easier than ever to create incredibly busy and tasteless graphics. It is helpful that Excel's wizards speed users through the process of selecting data series, titling charts, and labeling axes. However, the results disgorged at the end are, at best, overeager. Even the humblest line chart is festooned with a Technicolor palette, distracting axis tick marks, unnecessary grid lines, and a drab gray background. All these aspects distract the reader from the data.
An additional downside is that Excel's default layout wizards produce a particular, immediately recognizable style, one that screams "amateur"! (For me, spotting Excel punters is an admittedly snobbish, and slightly guilty, pleasure.) Use digital menservants carefully, and only as a starting point for exhibits. Generally speaking, graphics created for all but the most casual personal uses require cleanup.
Erase, Erase, Erase
Most charts produced by desktop software default settings contain a profusion of superfluous ticks, grid lines, plot frames, and chart frames. There is a good reason why most mainstream business publications use them sparingly: they look clumsy, and they distract attention from the data. You can eliminate all these ornaments without losing any meaning. In fact, your chart will look cleaner as a result.
The general rule: if you do not need it, erase it. Start getting into the habit of eliminating the tick marks immediately after creating a chart. Generally this involves formatting the axes with "No major tick marks" and "No minor tick marks." Likewise, eliminate the plot frame and chart frame by formatting each with "No border." These are not needed; the axis lines provide all the framing the chart needs. For bar charts, eliminate the enclosing borders for the bars; the bars themselves provide all the information needed.
Grid lines are trickier. Although I usually erase them, they do have appropriate uses. For sparse exhibits in which subtle comparisons are neither possible nor desirable, omitting the grid eliminates visual noise without sacrificing readability. For dense exhibits containing large data series, however, muted grid lines help readers compare individual data points. When using grid lines, always draw them in a light color (20 to 25% gray) or in black as sparse dots. They should not intrude on the data and should sit in the background.
In fact, other than those required to plot the data, good charts contain no lines other than the x- and y-axes, and (perhaps) some muted grid lines. Even the axis lines can be muted further: try choosing a thin line (1-point) and softer color (50% gray).
The cumulative effect of these erasures results in a crisp chart with few distracting lines. Although my recommendations may seem Spartan—severe, even—the results are worth it.
Reconsider Technicolor
Make no mistake—when used judiciously and appropriately, color can add tremendous depth and richness to charts and graphs. The eye's ability to make sense of, and discern between, wide ranges of colors is one of the great wonders of the human physiognomy. It is what enables us to discern objects in our peripheral vision or spot a blazer-wearing deer hunter from a long distance.
Tufte has previously noted that small, saturated spots of color are often the best way to draw attention to key points or to outliers in data sets. By that rationale, it stands to reason that many large swatches of saturated color are almost certainly overwhelming to the human eye.
In that light, the default Technicolor palette for Excel charts is less than ideal; the colors are far too saturated for most uses. The default palette includes Lemon Pledge Yellow, Kermit the Frog Green, Ticket-Me-First Red, and Cobalt Blue. For charts with multiple data series, that is quite an eyesore.
To prevent your exhibits from looking like an irradiated piece of luggage as it goes through an airport metal detector, consider these two suggestions:
- Mute the color palette. Reds, blues, greens—beautiful colors, all. But they need not saturate the screen. Consider replacing red with burgundy, blue with navy, and "Kermit" green with hunter or forest green. Readers will thank you for it; their eyes will relax rather than twitch. That said, if you need to emphasize a particular data point or series, use a small, focused swatch of saturated color.
- Use a monochromatic palette. An alternative to a less saturated palette is one that uses only black, white, and shades of gray. Monochromatic palettes work well when the target output device cannot be guaranteed, and when the number of data series is about five or less. A reasonable monochromatic palette includes white (with a black border), 20/25% gray, 50% gray, 75% gray, and black. Use pure colors; avoid fill patterns because they tend to "vibrate." On a related note, because photocopies of good exhibits (like the ones you will produce after reading this book!) tend to proliferate mysteriously into unforeseen hands, get into the habit of printing all exhibits in black and white first, before finalizing designs. By "proofing" exhibits this way, you can catch potential reproduction problems before they become an issue.
While I'm on the subject of color, be careful with yellow. There is nothing intrinsically wrong with yellow, but it tends to wash out in printed work and presentations. Use it as a "highlighter pen" accent color, but not as a data series color unless the background is very dark.
Label Honestly and Without Contortions
Labels matter. Labels convey an exhibit's intent; lack of proper labels leads to loss of clarity and meaning. Label honestly so that readers understand the units of measure, time intervals, and data series—and do it in a professional manner that does not cause torticollis.
A few guidelines are in order. First, pick a meaningful title that summarizes the exhibit's main point. A plain title like "Application Security Defects" is fine. More-forceful titles can help too; for example, "Decreased Risk from Applications" succinctly provides the main takeaway message. For charts that display data over a range of time, subtitles help establish the data source and context. For example, a good subtitle might be "Defects reported per application, 2001–2004."
Second, label units of measure clearly. Although this sounds simple enough, you might be surprised to see how many people forget to label either the independent or dependent axes, as if the thing being measured were somehow self-evident. Nothing is worse than a beautifully formatted line chart that insightfully points out that over time, a company observed a clear and definitive increase in the number of . . . uh, something.
Axis labels should succinctly describe the unit of measure and scope of each data point and should typically include one of these magic words: "of," "per," "by," or "from." For example:
- Number of defects per application
- Percentage of passwords
- External attacks, by source
- Median number of days per patch
Exception: axes containing units expressed in years do not require labels, since the unit of measure is self-evident.
Third, do not tilt text toward the vertical if you're running out of axis room, or, in fact, for any other reason. With apologies to my East Asian and Middle Eastern readers, Western-language text was meant to be read left to right. Slanting x-axis labels or turning them 90 degrees forces viewers to crane their necks. You don't want to be responsible for unwanted chiropractor bills, do you? Of course not. In all seriousness, though, tilted text tends to indicate deeper problems with the exhibit format itself, generally in the orientation. In such cases, try switching the x- and y-axes.
Spreadsheet software (Excel is a notorious offender) often rotates text by default because it believes it is being helpful. Do not let it. Instead, always position chart axis labels with 0° rotation—that is, exactly horizontal.
Fourth, for multiseries charts, consider eliminating series legends if you can get away with it. Place the series labels directly on or near the data series themselves—that is, at the point of use. This practice works especially well with line charts.
Fifth, do not abbreviate. Although it may seem more efficient to label axes with "nmbr.," "app.," and "bus," doing so forces readers to unconsciously pause while reading the chart, an unnecessary distraction from the data. Also, abbreviations look sloppy. Of course, any rule has exceptions. For example, most people understand that % stands for "percentage" and that IT denotes "information technology." In most cases, though, try expanding all abbreviations. If narrow space on the y-axis forces an abbreviation, try giving the axis more breathing room by widening the left margin.
Sixth, use simple and consistent fonts. Charts are not the place to trot out that new typeface downloaded from the Internet. Use classic sans-serif typefaces like Helvetica, Franklin Gothic, or plain old Arial. In addition, keeping text the same size throughout the chart helps readers focus on the data, rather than the labels. Therefore, as a general rule, all labels other than the title (axes, data, subtitles) should be the same size and font. For printed documents, I recommend 9-point Helvetica plain or 9-point Arial plain. For space-constrained exhibits, the "narrow" versions of these fonts work pretty well, too. Opinions differ on correct formatting of titles; I prefer to make them the same size and font as the other labels, but in boldface.
Finally, cite any data sources used to make exhibits. To make a citation, place a small, short caption at the bottom of the exhibit. A simple "Source: Security Metrics Study (1999–2004), Andrew Jaquith Institute" in 6-point type (or something similar) works nicely. In addition to making the exhibit look more official, the caption provides valuable information to readers about sources and methods.
Example
Although my suggested design guidelines may seem onerous, when followed they can dramatically improve the look and feel of metrics exhibits. For example, consider the very basic password-quality data set in Table 6-1.3 The analyst has decided to create a graphical exhibit for management showing the results of the latest password audit. He fires up Excel and selects a standard bar chart (formatted in 3-D because it "looks cool"). Figure 6-1 shows what Excel disgorges when using default settings.
Table 6-1. Sample Password Data Series
Department |
Value |
IT |
230 |
Account. |
22 |
Ops. |
129 |
Sales |
40 |
Figure 6-1 Initial Exhibit for Password Data Series
What is wrong with this picture? All sorts of things:
- Gratuitous 3-D effect
- Abbreviated category names
- Unnecessary legend
- Grid lines add no value
- Distracting shadows and background
- No data labels
Let's clean this up. Figure 6-2 shows a redrawn version of the exhibit. I made quite a few changes:
- Specified a sensible chart title indicating what the exhibit signifies—"Results of Password Audit by Department"—and a relevant time interval—"March 2005."
- Added a y-axis label, "Number of Weak Passwords."
- Eliminated the horizontal grid lines.
- Removed the series legend.
- Added data labels above each bar.
- Removed the tick marks from both the x- and y-axes.
- Removed the series border around each bar and changed the color from lilac to navy blue.
- Harmonized all labels to use the same typeface (Arial instead of Verdana), size (9-point), and style (plain, except for the title in boldface). Also, cleared the "auto-scale" check box for all text items.
- Removed the plot area border and background fill.
- Removed the chart area border and background fill.
Figure 6-2 Redrawn Exhibit for Password Data Series
We can still improve this exhibit further by making some additional changes to the format. First, switching the axes provides additional flexibility for the department names and looks more professional. In addition, sorting the departments in descending order of the data points strengthens the exhibit's message. Finally, reducing the exhibit's overall size saves some space. Figure 6-3 shows the chart in its final form.
Figure 6-3 Redrawn Exhibit for Password Data Series (2)
As an alterative form, some business magazines substitute thin tick marks in place of the x-axis line. That looks good too, and proves that judicious use of tick marks can pay off.