It’s Already Designed
Throughout this chapter—and indeed in much of this book—we have assumed you would be largely working on new projects, rather than assessing or continuing prior work efforts. Well, that can be a rather unrealistic assumption these days. So let’s take some time to dive a bit deeper into what sorts of design security activities can be reasonably accomplished even after a system is in production.
One immediate disadvantage we see here is that ex post facto design changes tend to be costly,5 so you will doubtlessly encounter more than a little initial resistance in trying to make any substantive changes to the design. This is particularly true for projects that are widely deployed and not (just) run from centralized data centers. This means that any design changes that might come out of an ex post facto review need to be rigorously researched and cost-justified in order to be successful. We believe that this inertia is more than likely to result in workarounds and compromises more often than it results in real design changes, which will no doubt present their own challenges over time. Moreover, some fundamental security issues in legacy applications simply cannot be accomplished without complete rewrite of the application. This might happen due to inherently insecure application architectural choices or simply because of underlying platform or language shortcomings—more on that in Chapter 8, “Maintaining Software Securely.” But we should still press on.
Another common hurdle to clear is in “simply” finding the design itself. Many software products and systems are built and deployed without any sign of design documentation. Often, the design “documentation” lives in the brains of the people who designed the system in the first place, and quite possibly those people have moved on to other jobs, perhaps in different companies. We feel that documenting a system’s design, even if it is done only ex post facto, is in and of itself a huge benefit that will find value over time even if no changes are made by way of a security assessment. Indeed, two of your authors have amassed considerable experience doing just this at a major corporation. A further pitfall of a system that lacks a clearly documented design is that seemingly innocuous changes can often lead to spectacular failures. It is vital for the entire team to have a clear, comprehensive, and deep understanding of the underlying system.
So enough with the impediments; let’s look at how to do design review of a deployed system. Here are some general steps to consider, along with some tips and recommendations in accomplishing them.
Document the design.
Start by looking at whatever existing design documentation is available, of course. Depending on your organization’s software development process, this could range from nonexistent to voluminous. The important thing is to “get your head around” the design and thoroughly understand what the software is doing. In our experience, this is best accomplished through a combination of documentation, visualization, and human interviews. So start by sitting down, turning off all interruptions, and studying whatever documentation you have. Care should be taken to validate that the derived design accurately depicts the running system.
Then, if at all possible, seek out the design team and spend some quality time with them and a large whiteboard. Draw the top-level design in the form of the major software elements and dependencies, their functions, communications channels, data, and so on. Next, overlay a state diagram onto that design, and do your best to understand the software’s different states of operation, including initialization, steady-state operation, exception modes, shutdown, backup, availability features, and so on. Take this discussion as deep as you’re able to. Look, for example, at each component interconnection and ask how it is identified, authenticated, protected, and so on, as well as what network protocols and data types are being passed. Be sure to document any assumptions at this point as well.
Perhaps most important during this whiteboard exercise, ask questions. Assume nothing. Look for unanticipated failure states. And document all the answers you get. You might well find that your own documentation has details in it that surpass the existing design documentation, such as it might be.
If you’re lucky, much of this will already be in place. But even if that is the case, your priority right now has to be attaining a high degree of understanding of what the documentation says. So even in that fortunate case of having ample documentation, it is still worthwhile to interview the design team and have them explain things in their own words. This will help galvanize your understanding of the system, as well as potentially point out areas of ambiguity and downright errors that might exist in the documentation itself.
Perform threat modeling.
After you are confident that you have a deep and thorough understanding of the documentation, go through the threat modeling process as we’ve described. You might well have collected ample fodder for this in documenting the design (or studying the extant documentation).
Assess the risks and costs.
Either directly during the threat modeling process vor separately, it’s vital to assess the risks and prioritize them.
Decide on a remediation strategy.
The toughest part at this point could well be deciding what the right threshold of risk is for the system you’re reviewing. The biggest difference in doing this now versus during the initial product development is that remediation costs are likely to be substantially higher. And not just the direct costs, but the indirect ones. For example, how will deploying a new design inconvenience your customers? How will it affect backward compatibility? The answers to these questions are extremely important.
The remediation strategy, then, must take all these answers into account, in addition to the normal business impact justifications. For example, a fairly low-impact design defect might well get remediated because its costs are relatively low, whereas a higher impact problem might be delayed until a major release cycle because the costs do not justify the value.
Fix the (justified) problems.
Any issues found that meet your remediation strategy should now be fixed. And, as you might well imagine, it’s never just as simple as coding a fix and then checking off the issue as being finished. Since the issue is now, by definition, an important one, it becomes important to dive deeply into the issue.
Verify the fixes.
It’s of course not enough to say something is fixed; you have to prove it as well. Especially for security weaknesses, whenever possible you should build test cases that explicitly test for each weakness. Try also to consider similar classes of weaknesses that might exist elsewhere in the system.
Lather, rinse, repeat.
Essentially, keep iterating through this until you’re finished. Mind you, defining that end point isn’t easy. What’s important is to give each application a level of scrutiny that its value warrants, and to keep at it until that level has been met.
With luck, you won’t have unearthed any truly catastrophic design flaws during this process on an application that is already deployed. And if you did, hopefully you found it before your adversaries did. As we said, major design flaws can be hugely costly to fix after an application is deployed.
We should add that it has been our experience that applications already deployed rarely get ex post facto design review scrutiny. That sort of review would usually happen only for the most security-critical software and, even then, usually only if other security flaws have been discovered. Another factor that makes this difficult is when staff is reassigned to work on other projects after any given project has drawn to completion.