- Risk Perception
- What Makes Management Endorse a Disaster Recovery Plan?
- What Can Happen?
- What Is the Probability That It Will Happen?
- KISS the Boss
- Pulling the Concept Together
What Is the Probability That It Will Happen?
Here’s where a lot of people fall down in their bid to management. Let’s say that you jolt management with the "What can happen?" slide, and you’re really rolling. A manager says, "I agree about the fire issue, but where did you get that 5% probability figure?" You reply, "Well, it seemed like a good figure to me."
The meeting just ended.
What seemed to be a good figure to you isn’t necessarily a good figure to management. As I said earlier, people’s perception of risk differs. Here’s a much better answer to the manager’s question: "I checked with the National Fire Protection Association (NFPA) and with our insurance carrier. They said that a company like ours stands a 5% probability in a given year of having a catastrophic fire, absent the protective systems I’m recommending." That answer will get you much further with management.
I also suggest taking a shot at conducting a failure mode effects analysis (FMEA). An FMEA takes probabilities of specific events, weights them, and then presents them in a format that management can understand. Consider the following four exhibits as an example.
Step 1: Consider What Can Happen and Approximate Its Probability
What kinds of events can you imagine fitting into each category in Figure 1?
- High probability, high damage. Probably already corrected, such as with an uninterruptible power supply.
- Low probability, low damage. Who cares if it happens?
Where would you spend your effort? Probably in the middle of the curve, with medium probability, preventable disasters.
Step 2: Assign a Severity Number Based on the Event
When a component fails, severity is classified in a 10-point system.
Description |
Rating |
If one user is affected |
1 |
If a workgroup is affected |
2 |
If an entire bay is affected |
4 |
If an entire floor is affected |
6 |
If an entire building is affected |
8 |
If the entire backbone is affected |
10 |
Step 3: Assign a Frequency Number
The frequency of an event is a concern, regardless of the severity, if it occurs often enough.
Occurrence Level or Frequency |
Rating |
Every day |
10 |
Weekly intervals |
8 |
Monthly intervals |
6 |
Quarterly intervals |
4 |
Every 12 months or longer |
2 |
If the occurrences continually happen on a daily basis, the critical rating is the highest. Occasional interruptions are rated lower.
Step 4: Factor in Difficulty of Detection and Repair
Detection works in the reverse of the occurrences, whereby the easier it is to detect the problem and begin to corrective measures, the lower the critical rating. The longer it takes to detect and repair, the higher the rating.
Description/Detection |
Rating |
Resolution achieved within 1 hour |
2 |
Greater than 1 hour but less than 4 hours |
4 |
Greater than 4 hours but less than 8 hours |
6 |
Greater than 8 hours but less than 24 hours |
8 |
Greater than 24 hours |
10 |
Tally the Final Figure
Voilà! With the help of the tables in the preceding steps, you actually get figures that management can understand—and believe!
The final product is an assessment, or risk priority number (RPN). This is a mathematical computation:
S x O x D = RPN
For example:
10 x 10 x 10 = 1,000
The higher the risk priority number, the higher the risk associated with a specific component, and the higher the value that would be placed on solving the problem. A 720 score would be worse than a 450, for example. This gives management (and you) a critical frame of reference. For example, if a support system was composed of several critical components, each of which had very high RPNs, you would have to show where the system could be improved to support the service levels that are expected.