- Presentation Is Everything
- Management's Perception of Risk May Differ From Yours
- Visuals that Communicate Disaster Recovery Concepts
- Assembling Your Tools
- Summary
Management's Perception of Risk May Differ From Yours
Suppose 10 people in a room have identical incomes, bills, homes, cars, numbers of dependents, etc. Now ask those 10 people how much life insurance they should carry. You're likely to get 10 different answers. Some people are risk-adverse and want the maximum amount of protection they can afford. Others want the least that they can get by with.
The same behaviors apply to decision-making executives. Since you probably won't know in advance how risk-averse your audience is, concentrate on presenting accurate probability data that spells out precisely how exposed your organization is to a catastrophic event.
How can you know the probability that a given piece of equipment will fail? This is why you have to be a technologist. You have a pretty good idea of the risks, but need to be able to communicate those risks to management in terms that they can understand. After all, management needs to know the probability of a failure in order to commit to funding your effort. They also need to understand where you got your figures in order to back them up with cash.
This is where the FMEA comes in, giving you a way to quantify the probability of a catastrophic failure in a piece of equipment or a system. I became familiar with this method in part from a conversation with a former U.S. Air Force general who was then the CIO of a $75 billion financial services organization. Essentially, everything looking down at Iraq today from orbit was put up there by this guy's subordinates when he was still on active duty. He summed up the issue this way: "When we send up a military satellite, everything has to be perfect the first time. This is because no one has yet invented a 23,000-mile-long screwdriver to fix it."
I got the feeling from his tone that this was the voice of experience. Being a veteran myself, however (a sergeant, not a general), I could only imagine how many butts would be on the line if someone botched a $100 million satellite launch. Therefore, in order to enhance the odds of getting it right the first time, the military uses FMEA to compute the probability that something really pear-shaped might happen. It goes something like this:
- Identify every mission-critical component that could fail.
- Compute or acquire from the manufacturer a mean time between failure (MTBF) figure for each identified mission-critical component.
- Combine the failure probabilities into a single mathematical factor that describes the probability of failure of a given system.
- Use these figures to justify and prioritize expenditures to "harden" equipment rooms, networks, or other facilities.
It's possible for corporate contingency planners to adapt this military FMEA methodology to our own uses. We adopted the Internet, right? As with many other aspects of recovery planning, here's another military invention we can dust off for commercial use.
Suppose you're tasked with performing a detailed analysis of your equipment room and associated networks. The objective of this analysis is to determine single points of failure, critical components that cause failures to multiple users, and an overall assessment of past performance or incidents. In order to describe this information, your FMEA will be broken down into three steps:
- Identify the problem. Determine what can possibly go wrong, including failure rates of the equipment itself as well as external factors (heat, water, air, people, etc.) that could affect it. Refer to the previous articles in this series for tips on how to complete this step.
- Assign a risk priority number (RPN) value for each of the issues identified in step 1. (See part 2 of this series for examples.) Use a number from 1 to 10, where the higher the number, the greater the associated risk (except in the case of problem resolution, where the higher the number, the better your speed in fixing the problem, as we'll show shortly).
- Quantify the reaction process. Think "How fast can we fix the problem?" This step is shortened by modifying your company's operating and security standards to moderate the risk and change the environment to make the selected systems more likely to survive. Here's a useful formula:
S x O x D = RPN
That is, severity (S) times occurrence (O) times detection (D) equals the risk priority number (RPN).
Once you have the "ingredients" you need to cook (in the form of the FMEA), you can borrow the "recipe" (presentation techniques) of an expensive French chef (consultant) to whip into something delicious that management will swallow and enjoy. A few techniques for doing this follow.