- The Fat Finger Factor
- Vendors and Change
- The Risk/Benefit Ratio
- Summary
- Q&A
- Workshop
The Risk/Benefit Ratio
If this chapter has pointed out one thing to you, it's probably that any type of change can dink up your network. It's enough to make you run screaming away from any type of upgrade or project, much less the installation of new software. Software, though, tends to be so problematic that it's a good jumping-off point to talk about the risk-benefit ratio: the risk of change in relation to the benefit that the change will introduce.
The breakneck speed of Internet time means that software developers have unbearable pressure on them to be first to market. This usually translates into quick/nonexistent product testing, which means that the programs are released with at least a few bugs. (The practice of shipping software with problems so serious that it might not even perform its intended function used to be so common that software developers themselves coined a phrase for it: "shipping a brick.")
Experience also shows that for every new feature introduced, there are probably two new bugs in a product; when do you decide to risk the problems and upgrade?
And, at what point is it worth it to you make your network even more complex than it is by introducing a new application?
TIP
If you don't see any bug fixes or service packs posted on a new product's support site, it probably means that the product is not mature. Wait a while to deploy.
Because you have better things to do with your day than report bugs to the software vendor, it's a good idea to not be the first one on your block to put a new application or operating system on your network. Unless you desperately need the new features of a new product, you should wait until six months have passed or the first service pack is releasedor perhaps bothuntil you seriously contemplate rolling it out. (For example, many industry observers were pretty sure that nobody would roll out Windows 2000 before the advent of the first service pack.)
This is a clear example of the risk/benefit ratiothe amount of risk compared to the potential benefits. Unless there is a very clear benefit (the old product doesn't work well anyway, you're losing money by not having a working product, and the new pro-duct promises to clear up the old problems), taking the risk of green software isn't too appealing.
Again, the risk/benefit ratio applies to any kind of change, not just software. It's pretty easy to think about the benefits, but thinking about risk is harder. Some of the things to consider when thinking about risk are the following attributes of a proposed change:
Scope: How far-reaching is the new system or the change to the system? For example, for end-user software, you'd consider increasing levels of risk to be one person; a department; and organization-wide. Minimizing scope is the primary reason for an incremental rollout.
Distribution: Is this a centralized change (for example, one server), or is it a change that gets replicated to many servers? Although it is easier to roll back a change to a centralized system, it is less risky to modify a decentralized system; if one component of a decentralized system fails, other parts of the system still function.
Inspection: How well inspected is the proposed system? A system becomes more perfect as it is widely studied, criticized, and improved. (The cryptography world, where no cipher is trusted until it has been pounded on for several years, is a good example.) Hidden systems tend to have more hidden flaws, and thus are riskier to use; well-used and inspected ones generally have fewer flaws and are less risky.
Reversibility: How difficult is it to undo the change? Some system upgrades are easy to undo; these are the least risky; others are a one-way trip: Clearly, these are the riskiest. One way to reduce the risk of low reversibility is to enact the change in a lab; that is a "mini" reproduction of the large network that includes most aspects of the system before the change. This, obviously, is pretty expensive to do both in terms of time and money, but can be worth it when the scope of change is large.
Interactivity: How much does the system interact with other network components? Upgrading a word processor would have a very low score here, and thus, low risk; upgrading a Windows 2000 domain controller or the firmware for an Ethernet switch would have a higher scores, and thus higher risk.
The interactivity of the change in question can be quite serious and hard to fathom: Even if you don't have problems during a rollout, a new application or device can produce secondary effects in another item that don't seem to be related to the new item. Accordingly, a good rule of thumb is to reverse recent changes (if feasible) during network or communications trouble. (Perhaps you can shut down that brand-new Cisco switch that you're test-driving or your pilot-test installation of Active Directory.) The trouble might not be related to the new device or program that you've installed, but if you shut it down, you've ruled it out as the source of the trouble.
If the trouble goes away, you can then kick the problem back to the vendor you bought the offending item from (or to the manufacturer). However, make sure that the problem is reproducible (that is, make sure that it happens repeatedly when you reintroduce the program or device back into the network) before going to your vendor, or you will likely not get taken seriously.
For example, where I work, a certain Nortel switch started crashing on a regular basis. Nortel support couldn't help us; they recommended patches, which we applied to no avail. We then realized that the most recent piece of gear to be put on the network was a new model Cisco switch. When we turned the Cisco switch off, we discovered that the Nortel switch would stay up all the time. When we told Nortel, they admitted that there was indeed an interoperability problem with Cisco's CDP (Cisco Discovery Protocol); it did, in fact, kill certain Nortel gear. After we disabled CDP on the port shared with the Nortel gear, all was well.
Discovering change in a large-staff environment is no accident, although informal means, such as quick morning meetings, are good ways for everyone to be tuned in to what's happening on the network at large, documentation such as logs, work orders, and incident reports are also highly important.