Availability: A User Metric
Keep in mind that availability is measured from the user's point of view. A system is available if the user can use the application he or she needsotherwise it's unavailable. Accordingly, availability must be measured end-to-endall components needed to run the application must be available. Many IT organizations mistakenly believe that availability is simply equal to main server or network availability. Some may only measure the availability of critical system components. These are grave mistakes. A user may be prevented from using an application because his PC is broken, or his data is unavailable, or his PC is infected with a computer virus.
IT organizations that subscribe to a narrow or undisciplined availability mindset go through several stages of alienation from their users:
User unhappiness is the first and least severe stage. Users simply express unhappiness with poor system availability. The IT organization may either recognize a problem or deny it, citing their host or network availability statistics as proof. Those who deny the problem's existence bring their organization to the next stage of user alienation.
User distrust is characterized by user disbelief in much of what the IT organization says. Users may begin to view IT's action plans as insufficient, or view the IT organization as incapable of implementing its plans. They gradually lose interest in helping IT with end-user surveys and consultations. IT organizations that can deliver on promises and provide better availability from the user's point of view can prevent users from moving to the next stage of user alienation.
User opposition is the third stage of alienation. Here, users don't merely ignore IT plansthey begin to actively oppose them, suggesting alternatives that may not align with IT's overall plans. Users start to take matters into their own hands, researching alternatives that might help solve their problems. The challenge for the IT organization is to convince users that the IT plan is superior. The best way to meet this challenge is to conduct a pilot test of the user's suggested alternative, and then evaluate the results hand-in-hand with users. In contrast, we have seen some IT organizations react arrogantly, telling users, "Do what you want, but don't come crying to us for help." These organizations find themselves facing the final stage of user alienation.
User outsourcing is the final stage of user alienation. Users convince management that the best solution lies outside the IT organization. Outsourcing can take the form of hiring an outside consultant to design their system, going directly to an outside system supplier, or even setting up their own IT organization. At this stage, users have completely broken off from the IT organization, and reducedif not totally eliminatedthe need to fund it.
Beyond user alienation, there are other serious side effects of insisting on narrow-minded availability measurement:
Failure to identify root causes of availability problems. If only a few components are considered when system availability is evaluated, the root causes of the outages may well lie in components whose availability is not monitored. We have seen several banking IT organizations that have denied the existence of automated teller machine problems by pointing out that their mainframes, switches, and network are always available. They fail to observe that the ATM machines themselves cause most ATM outages.
Conflicts between IT divisions. Many IT organizations delegate critical elements of their systems to individual groups within IT. Each then measures the availability of its assigned area, without correlating it with the availability of other areas. This leads to territorial disputes, where one group blames others for poor system availability. "Don't blame my group; our network was up 100% of the time."
Expensive and ineffective remedial measures. If you don't know what the root cause of a problem is, you'll probably spend money on the wrong "solution." Or you'll concentrate on improving only your assigned system component, without regard to overall system availability.
Inability to determine true system health. Availability measurements of each component cannot easily be "added up" to reveal true system availability. 99% host availability plus 99% network availability plus 99% database availability doesn't necessarily equal 99% system availability. Outages in each area usually occur at different times, and an outage in any component brings down the entire system. In this example, actual system availability can be anywhere from 9799%.
Why do many IT organizations fall into the trap of measuring only a few system components and not actual end-to-end availability? There are two reasons:
It's easier to measure a few system components. Few tools are available for analyzing and monitoring end-to-end system availability. Many tools measure network or host availability, but few actually check for application outages from the perspective of the user.
It's easier to achieve higher availability on a per-component basis, since outages rarely occur repeatedly on the same component. Outages for different components usually occur at different times, but may all affect the availability of the system to the user, resulting in far worse availability statistics.