1.4 Reliability
Reliability is the ability of a system to provide a service without interruption. Although related to fault tolerance, performance, and scalability, reliability is the ability to provide continual service without regard to the amount of use or failure within the system. Reliability is related closely to availability, for the availability of a system proves its reliability.
Using the earlier example of the automobile, the reliability of a new automobile is expected. When the key is turned, the engine should turn over. A racecar driver knows the necessity of reliability. Mechanics go through the racecar in detail before the driver is allowed to race. The failure of one part within the vehicle could cause the driver to lose the raceor to lose his or her life.
Similarly, to ensure reliability, a systems designer must inspect every part of the system. A failure within any part of the system could cause the application or service to be unavailable to requests, proving its unreliability. To build reliability into a system, the designer must analyze all of the points previously discussed:
Fault tolerance of the system
Performance of the system
Scalability of the system
Failure within a hard drive or network card could cause the server to quit servicing clients. Too many users could cause the system to respond slowly to requests, or to quit responding all together. If the application cannot scale to meet the needs of the clients as those needs change from day to day, the system could experience hours or even days of downtime caused by frequent hardware upgrades or replacements.