Availability: Requirements and Measurement
The first step in designing for availability is to discover your users' true requirements for availability and for IT services in general. This requires close consultation with as many users as possible, covering at least users of critical applications. The initial response of most users is that the system must be available all the time. Of course, you need to explain that the cost for providing system availability gets higher and higher as more availability is needed. You also need to explain that these costs will be passed on to users somehow either directly, as IT chargeback for services, or indirectly (as in most small to mid-sized companies), as the IT organization takes a larger share of the corporate budget.
The Service Level Agreement
These consultations with users form the basis of a Service Level Agreement between the provider of IT services and the users. You can choose to limit yourselves to a simple agreement that covers just system availability, or you can expand the agreement to include response time, help desk availability, new feature request turnaround time, and many other performance and quality issues. If you're starting from scratch, we recommend including just the system availability portion. Then, as the system becomes more stable and your IT organization matures, you can expand on that agreement. This approach has many benefits:
The users don't expect too much too soon. The final judges of the IT organization's performance are the users, so it's crucial to manage their expectations.
It buys the IT organization time to improve on services. This is an opportunity for the IT organization to be one step ahead of user requirements. It gives the organization a better feel for the resource demands associated with meeting availability requirements, and allows for better planning.
It allows for a less demanding agreement. Since users know that the agreement will be improved later, they're more willing to settle for a realistic short-term target.
Never commit what you know you can't achieve. Agree on a target that you can achieve in the short term, and establish a timetable for achieving higher system availability in the future. Pilot the system availability target internally within the IT organization or with one small department. Once you've demonstrated that you can meet your target, roll out the new service level standards throughout the rest of the organization.