Dealing with Disaster
Of course, none of this interesting information matters if your high-availability system is sitting somewhere like southern New Orleans with Hurricane Katrina barreling down on you. Disaster recovery is perhaps the most extreme test of high reliability, and, as with most extreme tests, a passing grade can be extremely expensive.
There are several approaches to high-availability disaster recovery, and all of them rely on not having all your eggs in one basket. The fundamental problems for all of the strategies are time and bandwidth. Resources remote enough to be useful in the event of a major disaster are going to introduce a significant delay compared to resources sitting on the same LAN or SAN. Also, the connection to those remote resources is likely to be narrower than the pipe connecting systems in the enterprise datacenter. Obviously these problems can be solved, but they add a layer of complexity to your planning.
The simplest solution is a standby system at the remote location. However, in order to assure availability, that server needs to have its own local storage, with some method of keeping its data updated. Mirroring is the logical solution—everything written to the server’s main storage is also written over the network to the standby server. Because of the performance penalties associated with remote connections, however, this setup usually means some kind of asynchronous scheme rather than synchronous mirroring, in which every write must be confirmed by the remote system before it’s considered complete. Because near-real-time replication is expensive in both hardware and bandwidth, it might make more sense to update relatively infrequently, say once every few minutes or even once an hour, and simply roll everything back in the event of a disaster. Alternatively, you can simply have a cold server and load the data from backups—if you can afford the downtime. This is a question that users and decision makers need to consider carefully in light of the costs.