Bivariate (X-Y) Charts
As noted earlier, time series charts are the most common information security visualization technique. The unadorned time series chart is like an old reliable friend who speaks plainly and always shows up on time. Everyone knows what to expect, and he rarely disappoints. But he is not too bright, and his insights are rarely very penetrating. His slightly more flashily clad brothers—the indexed and quartile-based time series charts—offer a bit more excitement but not necessarily any extra insight.
In contrast, bivariate charts—which show how two variables interrelate—resemble cranky uncles more than old friends. They offer lots more insight and wisdom but require readers to take more time to understand their unique qualities (or eccentricities, if you prefer). You can't just inflict an uncle upon the uninitiated.
Bivariate charts gain their power from the often-unexpected linkages one finds by plotting two variables from the same data set on the same page. Each variable corresponds to an attribute in the data records being analyzed. When charted together, cause-and-effect relationships often emerge. Note that bivariate charts are time-independent; that is, they do not show temporal relations in data the way time series charts do. In most cases, bivariate charts display data for a given time interval; make sure to note the relevant interval in the chart's title.
Let us try a security example. Recall the application security defects data set discussed in the preceding chapter (in Table 5-2). The data set contained instances of security defects in a developed application. Each record in the data set contained these attributes:
- Application
- Owner
- Defect
- Date
- Exploitability
- Business impact
- Business-adjusted risk (BAR)
- Engineering fix hours
An analyst could use a bivariate chart to show how two of these attributes relate. A hypothetical chart might show one of the following:
- Exploitability versus business impact
- Business-adjusted risk versus engineering fix hours
- Business impact versus engineering fix hours
Figure 6-11 shows all three of these charts using the sample data set from Table 5-2. To show the relationship between the x- and y-axis variables explicitly, I have added a logarithmic regression line for the latter two charts.
Figure 6-11 Sample Bivariate Charts
Casual examination of the two rightmost charts suggests that a weak relationship exists between remediation effort and either business impact or business-adjusted risk. That fits; one can reasonably expect that more serious security flaws will take longer to fix. In contrast, the left chart suggests that no obvious relationship exists between business impact and exploitability. This, too, seems to align with expectations—exploitable security holes do not possess any intrinsic qualities that would cause them to also be high-impact.
Exploring relationships between variables in a graphical way can help confirm or deny an existing hypothesis. For example, an analyst reviewing the exhibits in Figure 6-11 would not be able to make strong, definitive statements about cause-and-effect relationships between business impact, exploitability, and remediation effort.
Some bivariate charts show much stronger relationships. Figure 6-12 shows a fictitious bivariate chart that displays the relationship between end-user training and password strength, as measured by a password-cracker like John the Ripper. In this case, the relationship between the cause (how long since the last user training session) and the effect (the relative security of passwords) is much clearer. I have added a logarithmic trend line to highlight the relationship; a linear trend line works well also.
Figure 6-12 Password Effectiveness Bivariate Chart
Two-Period Bivariate Charts
Although bivariate charts cannot display temporal relationships as well as time series charts can, they can show comparative data in a limited way. A variation on the standard bivariate chart, the "two-period" bivariate chart, plots each period's data series and connects interperiod points with thin lines. Different markers distinguish the "before" and "after" points. The overall effect resembles a football or basketball chalkboard diagram. Figure 6-13 shows an excellent two-period chart from The Economist of a Boston Consulting Group study of investment banking revenues and corresponding value at risk (VAR).5
Figure 6-13 Sample Two-Period Bivariate Chart (Redrawn)
Copyright © The Economist Newspaper Ltd. All rights reserved. Reprinted with permission. Further reproduction prohibited.
Two-period bivariate charts are a relatively specialized breed; they do not work well with sets larger than about a dozen pairs of data points. In addition, mainstream desktop packages like Excel cannot create them, so they must be hand-drawn.