- Moving Toward High Availability
- Step 1—Launching a Phase 0 (Zero) HA Assessment
- Step 2—HA Primary Variables Gauge
- Step 3—Determining the Optimal HA Solution
- Summary
Step 1Launching a Phase 0 (Zero) HA Assessment
The hardest part of getting a Phase 0 HA assessment started is rounding up the right resources to pull it off well. You are going to want to use your best folks for this effort; it is so critical to your company's existence. In addition, timing is everything. It would be nice if you have launched this Phase 0 HA assessment before you have gotten too far down the path on a new system's development. Or, if this is after the fact, put all the attention you can on completing this as accurately and as completely as possible.
Resources for a Phase 0 HA Assessment
A Phase 0 (Zero) HA assessment will require that you assemble between two and three resources (professionals) with the ability to properly understand and capture the technical components of your environment, along with the business drivers behind the application being assessed. Again, these should be some of the best folks you have. If your best folks don't have enough bandwidth to take this on, then get outside help; don't settle for anything less (such as your lower skilled employees). The small amount of time and budget that this assessment will cost will be minimal compared to the far-reaching impact of the results of this assessment. The type of person and their skill set would be
A system architect/data architect (SA/DA)Someone with both extensive system design and data design experience who will be able to understand the hardware, software, and database aspects of high availability.
A very senior business analyst (SBA)This person must be completely versed in development methodologies and the business requirements that are being targeted by the application (and by the assessment).
A part-time senior technical lead (STL)A software engineer type with good overall system development skills so that they can help in assessing the coding standards that are being followed, the completeness of the system testing tasks, and the general software configuration that has been (or will be) implemented.
The Phase 0 HA Assessment Tasks
Once these folks are assembled, the assessment itself will be broken down into several tasks that will yield the different critical pieces of information needed to determine the correct high availability solution. Some tasks are used when you are assessing existing systems. These same tasks might not apply to a system that is brand new.
Nine out of ten Phase 0 assessments that are conducted are for existing systems. What this seems to indicate is that most folks are retrofitting their applications to be more highly available after they have been implemented. Of course, it would have been best to have identified and analyzed the high availability requirements of an application during development in the first place.
A few of the tasks that will be described here may not be needed in determining the correct HA solution. However, we have included them here for the sake of completeness, and they often help form a more complete picture of the environment and processing that is being implemented. Remember, this type of assessment becomes a valuable depiction of what you were trying to achieve based on what you were being asked to support. Salient areas (points) within each task will be outlined as well. Let's dig into these tasks:
-
Task 1Describe the current state of the application
-
Data (data usage and physical implementation)
-
Process (business processes being supported)
-
Technology (hardware/software platform/configuration)
-
Backup/recovery procedures
-
Standards/guidelines used
-
Testing/QA process employed
-
Service level agreement (SLA) currently defined
-
Level of expertise of personnel administering system
-
Level of expertise of personnel developing/testing system
-
Task 2Describe the future state of the application
-
Data (data usage and physical implementation, data volume growth, data resilience)
-
Process (business processes being supported, expanded functionality anticipated, and application resilience)
-
Technology (hardware/software platform/configuration, new technology being acquired)
-
Backup/recovery procedures being planned
-
Standards/guidelines used or being enhanced
-
Testing/QA process being changed or enhanced
-
Service level agreement(SLA) desired (from here on out)
-
Level of expertise of personnel administering system (planned training and hiring)
-
Level of expertise of personnel developing/testing system (planned training and hiring)
-
Task 3Describe the unplanned downtime reasons at different intervals (last seven days, last month, last quarter, last six months, last year)
-
Task 4Describe the planned downtime reasons at different intervals (last seven days, last month, last quarter, last six months, last year)
-
Task 5Calculate the availability percentage across different time intervals (last seven days, last month, last quarter, last six months, last year). Please refer back to Chapter 1, "Essential Elements of High Availability," for this complete calculation.
-
Task 6Calculate the loss of downtime
-
Revenue loss (per hour of unavailability)As an example in an online order entry system, look at any peak order entry hour and calculate the total order amounts for that peak hour. This will be your revenue loss per hour value.
-
Productivity dollar loss (per hour of unavailability)As an example in an internal financial data warehouse that is used for executive decision support, calculate the length of time that this data mart/warehouse was not available within the last month or two and multiply this times the number of executives/managers who were supposed to be querying it during that period. This would be the "productivity effect." Then multiply this by the average salary of these execs/managers. This would be a rough estimate of productivity dollar loss. This does not consider the bad business decisions they might have made without having their data mart/warehouse available and the dollar loss of those bad business decisions. Calculating a productivity dollar loss might be a bit aggressive to be included in this assessment, but there needs to be something to measure against and to help justify the return on investment. For applications that are not productivity applications, this value will not be calculated.
-
Goodwill dollar loss (in terms of customers lost per hour of unavailability)It's extremely important to include this component. Goodwill loss can be measured by taking the average number of customers for a period of time (such as last month's online order customer average) and comparing it with a period of processing following a system failure (where there was a significant amount of downtime). Chances are that there was a drop-off of the same amount that can be rationalized as goodwill loss (the online customer didn't come back to you, they went to the competition). You must then take that percentage drop-off (like 2%) and multiply it by the peak order amount averages for the defined period. This period loss number is like a repeating loss overhead value that should be included in the ROI calculation for every month.
[NOTE: If this is a new application then this task is skipped!]
[NOTE: If this is a new application then this task is an estimate of the future month, quarter, six-month, and one year intervals]
[NOTE: If this is a new application then this task is an estimate of the future month, quarter, six-month, and one-year intervals]
[NOTE: If this is a new application then this task is an estimate of the future month, quarter, six-month, and one-year intervals]
[NOTE: If this is a new application then this task is an estimate of the losses].
This might be a little difficult to calculate but will help in any justification process for purchase of HA-enabling products and in the measurement of ROI.
Once you have completed the above tasks you will be able to complete the HA Primary Variables gauge without much trouble.