Designing High-Availability Windows Systems
- What Exactly Is High Availability?
- Eliminate Single Points of Failure
- High-Availability Servers
- Dealing with Disaster
- Between Paranoia and Penny-Pinching
There was a time when your computer going down was annoying. Today it’s still merely annoying, in a lot of cases. But in an increasing number of businesses and situations, a failed computer is anywhere from crippling to disastrous.
As a result, designing systems to keep critical business processes running has become more important to IT professionals everywhere. Everyone from Microsoft to hardware vendors recognizes this new reality, which is why systems today have the potential for a lot less downtime than they did 10 years ago. Everything from the Windows operating system to hard disks and processors has been redesigned to make these items more reliable and keep systems more available.
But while the improvements are wonderful, they’re not enough for some jobs. Sometimes it’s necessary to design computer systems for high availability.
What Exactly Is ’High Availability’?
Usually the demand for a high-availability system doesn’t originate with the IT department—or at least not entirely. As a response to a business need, it’s usually the result of requests from users and/or decision makers. The users are the "customers" within the organization, and the decision makers are the ones who hold the purse strings. In a typical high-availability system, those groups are important stakeholders. And therein lies the first complication of high-availability Windows.
The first job in designing a high-availability system with Windows (or any operating system) is to make sure that everyone understands what you’re talking about. The IT department has to understand what the customers and decision makers want, and the customers and decision makers have to understand the options and their associated costs. To that end, it’s important for the customers and the IT department to define exactly—in numbers— what they mean by "acceptable availability." In addition, everyone must understand that high availability comes in several flavors, and it can get very expensive very rapidly. This comprehension is important because one of the most common scenarios in designing a high-reliability system is for the customers to start out wanting very high reliability. But once you’ve spent the time doing the preliminary design and come back with an initial cost estimate, the customers (or the decision makers) balk; then you have to go through the whole process again with redefined goals.
One of the first issues you have to settle in designing a high-availability system is determining what needs to be protected. Next, you have to agree on how well it needs to be protected.
This isn’t an idle exercise. A lot of the time you’ll find that what your customers really want is a more available system, not necessarily one that can meet specific performance goals. Often just providing some form of fast file restore, such as Microsoft’s Data Protection Manager, is all that’s really needed to satisfy customers. If they need something more, you can guide them into thinking about costs versus benefits early on.
Technically there’s nothing mysterious about high availability: It’s simply the amount of time in any given time period when the system is available to users. It might be expressed as "x minutes of unavailability per month," but it’s commonly expressed as the percentage of uptime per time period.
However, high availability is a continuum, not a fixed point. There are degrees of high availability. Although the most common single definition of "high available" is 99.999% availability, which translates into 5.3 minutes of downtime each year, the different in cost between five 9’s and four 9’s (99.99 percent, or 52.6 minutes of downtime per year) can be substantial. It’s a good idea to settle on how much downtime is actually acceptable in this application.