Treemaps
The visualization methods I have described in this chapter show how to effectively display data over time, in cross sections, and using systems of one or more variables. The vast majority of the security metrics discussed in the preceding chapters lend themselves well to these methods. However, all these methods assume a data set whose records are structured in a relatively simple manner. For example, department X has value Y, or activity A has value B at time T. We assume that the independent variables—departments or activities—iterate through flat lists of values.
But what if the data set is not flat? Metrics visualization for security often requires the ability to roll up, or drill down into, a data set. In these cases, containment and hierarchy relationships establish vital context for the viewer. Perhaps the best way to view the data is to show the hierarchy as part of the exhibit. For example, one could show the roll-up structure for departments, sites, and business units, or the containment relationships of business processes.
"Hmm," says the IT analyst as he strokes his goatee. "Graphical displays of hierarchy... isn't this what network diagrams do?" Yes, in part. Technology architects have been drawing network diagrams for an eternity, and these show containment relationships quite well. However, network diagrams are rarely suitable for metrics visualization because they are
- Too technical: Managers don't care about TCP/IP addresses.
- Too literal: Only a small number of security metrics make sense on network diagrams.
- Space-inefficient: Lots of white space, low density of nodes per inch.
Fortunately, recent innovations in data visualization outside the information technology field mean that security analysts need not rely on network diagrams to show containment-oriented data sets. There is a better alternative: the treemap.
Little known outside of academe, treemaps are used with hierarchical data structures that can be aggregated. The core data elements of a treemap are rectangular nodes that, when rendered, appear as a patchwork of rectangles. The arrangement of the rectangles shows the containment hierarchy, in the same way a Bento box does. The size (area) of each rectangle represents the node's "weight," while its color or brightness displays attributes such as relative importance, criticality, or membership within an arbitrary category. Treemaps possess four properties that make them extremely useful for large-scale data visualization:
- Simple visual paradigm
- Extremely space-efficient
- Naturally suited for aggregation
- Excellent for high-resolution data display
Originally developed by Ben Schneiderman, a professor in the University of Maryland's Department of Computer Science, treemaps are easily the most innovative data visualization technique to emerge from the research world in the last ten years. Although they are not yet mainstream, many companies have created compelling, rich information graphics with them. For example, SmartMoney.com's Java-based Map of the Market applet, shown in Figure 6-25, features a treemap that shows near-real-time stock activity. The size of each block represents the relative market capitalization of the sector or company; the color shows whether prices are increasing (green, rendered here as light grays) or decreasing (red, rendered as darker grays). What is particularly clever about this example is that it precisely illustrates the micro/macro visualization qualities of treemaps. The reader sees the overall sweep and scope of the market, and he or she also sees how the different blocks relate to each other—and can dive into one of the individual data points, too.
Figure 6-25 SmartMoney.com Map of the Market
Copyright © 2005 Smartmoney.com. Reprinted with permission; all rights reserved.
Creating Treemaps
Standard office productivity suites cannot create treemaps; instead, analysts must rely on specialized toolkits. Many treemap packages exist, including an open-source implementation I wrote for my own use called JTreemap. Let's walk through a simple treemap example using this tool, available on my website at http://www.freshcookies.org/jtreemap.
To construct a treemap, the security analyst identifies data attributes that supply:
- The size of each rectangle (size of deployed base, dollar value of asset, number of lines of code)
- The saturation value for each rectangle (criticality, priority, business impact)
and optionally
- The containment hierarchy (top-level category, department, business unit)
Next, the analyst formats a data set, loads it into JTreemap, and plots the results. JTreemap accepts a tab-delimited input file; after parsing the input, it creates a graph of the treemap. Table 6-3 shows a sample input file containing action plan data for an assessment of an e-commerce application. The field order for the file is as follows:
- NAME: Displayed as text within the node. Here, we'll use the name of the action item.
- DESCRIPTION: Displayed in the tool tip when the mouse pointer hovers over the node.
- BRIGHTNESS: The node's relative saturation, ranging from 0.0 (transparent) to 1.0 (fully saturated). In this example, we'll supply the item's priority, with the highest values representing the most important items.
- AREA: The amount of space to allocate to the node, relative to all others in the treemap. For the area, we will specify the amount of effort required to implement the action item.
- CATEGORIES[0..n]: The categories to use for this node (separated by tabs), with the highest-level categories first. An arbitrary number of categories may be specified, although in practice most simple applications will not need more than three or four. Each top-level category will be given its own color; in this example, there is only one top-level category. For this example, we will simply supply the name of the responsible business group ("E-commerce security").
Table 6-3. Sample Treemap Data File
Name (Action Item) |
Description |
Brightness (Priority) |
Area (Effort) |
Categories (Application Name) |
Password policy |
For end users |
0.9 |
4 |
E-commerce security |
Secure coding practices |
For developers |
1 |
8 |
E-commerce security |
Identity management |
Centralized account management |
0.6 |
12 |
E-commerce security |
Website server configuration |
To be done by the systems group |
0.7 |
5 |
E-commerce security |
To ensure that nodes are arranged sensibly and in a manner pleasing to the eye, treemaps typically support one or more layout algorithms. The first algorithm, originally developed by the University of Maryland, is the "strip" layout. However, at present prevailing consensus holds that J.J. van Wijk's "squarified" layout algorithm13 provides the best balance of structural fidelity and aesthetics. This is the one I use in my own package.
A single command from the console produces an interactive dialog box containing the treemap:
java -jar freshcookies-treemap-0.3.jar test.tab
Figure 6-26 shows the resulting JTreemap dialog for our sample data set.
Figure 6-26 Sample Treemap
The preceding example, while simplistic, shows how treemaps work. The "Identity management" rectangle dominates because it requires the most effort to fix; the saturated color of "Secure coding practices" (rendered as a lighter gray) shows that it is the most important priority. Since the business group "eCommerce security" will fix every action item, all items receive the same color (red, rendered as gray here).
Treemaps can support much higher data densities than in our simple example. Figure 6-27 shows action items mapped to the ISO 17799 security framework. In contrast to the previous example, which contained only one level of containment (the group name, eCommerce security), this example contains three. These levels correspond to the first three levels of the ISO topic hierarchy. Each rectangle is equally weighted (all have areas of 1) but contains different saturation values. In all, Figure 6-27 displays 556 data attributes (139 topics times 4 attributes: area, saturation, top-level grouping, and name).
Figure 6-27 ISO 17799 Treemap (Displaying Three Levels of Hierarchy)
Figure 6-28 shows the same data again, but aggregated so that the lowest level "rolls up" to the top two.
Figure 6-28 ISO 17799 Treemap (Displaying Two Levels of Hierarchy)
Treemap styles often vary from the one I have presented here, which is my own implementation. Some include text in individual nodes; others do not. Other implementations feature clever shading or border-rendering algorithms, drilldown capabilities, and more. The University of Maryland's treemap website (http://www.cs.umd.edu/hcil/treemap-history) contains links to other implementations, including a wide variety of commercial packages.
In summary, treemaps add another tool to the security analyst's bag of tricks. Treemaps provide an effective way of visualizing highly dense, hierarchically structured data. Although treemaps are not yet implemented in commercial office productivity packages, implementations exist that can help you today. Get to know them, and watch your colleagues' jaws drop.