- 1 Fault Tolerance
- 2 Performance
- 3 Scalability
- 4 Reliability
- 5 Designing Reliable Data Access
- 6 Summary
1.5 Designing Reliable Data Access
Throughout the corporate world, systems and services provide a communications and data storage channel that assists companies in generating revenue. The loss of productivity due to systems failures numbers in the billions of dollars per year. Client/server systems of the past have been powerless to reduce downtime, but things are changing. Now, more than ever, there are products and services available to keep systems up and running in spite of component failure, power failures, or user error, and although no software or hardware product can guarantee a zero percent failure rate, today's technologies can "virtually" eliminate hardware and software failures.
There are two scenarios that cause a system or service to be unavailable.
A component that is part of the system, or a dependency of the system, malfunctions, leaving the application or service (or path to the application or service) in an unusable or unavailable state.
The network load on an application or service is so great that the server cannot respond to client requests in a timely fashion.
To (virtually) eliminate these two events from taking place, a systems engineer or designer must seek to design systems that apply the following methodology:
Downtime due to failures within components of the system or components that the system is dependent upon does not affect the operation of the application or service being provided.
The application or service must respond in a timely manner to all requests, regardless of network load or requests being processed.
NOTE
It is important to note that no system can guarantee 100 percent uptime. There are many products and services on the market that tout figures such as 99.9995 percent uptime, but these are statistics typically taken from a controlled environment with minimal load on the server(s). Realistically, an engineer should never hope to achieve 100 percent uptime, but should strive to minimize unnecessary downtime due to component failures or software upgrades. The focus should always be the quality and reliability of service provided to the client, not the amount of uptime.
Let's take a brief look at how a designer might approach the construction of a reliable data access solution. To approach the design correctly, the designer must seek to provide the following:
- Fault tolerance
- High performance
- Scalability
- Reliability
The questions a designer must ask include, What are the needs of my application? Does the application need to operate 24 hours a day, 7 days a week? Where will the server physically be placed, and how will clients access it? Does the application need to be accessed by clients on the Internet? These questions help the designer discern what connectivity must be engineered to allow clients to access the application.
The designer continues the planning by envisioning the completed system and systematically tracing the connectivity from the client to the server, listing each component that is encountered as a client might access the system. Assuming this is an Internet application, let's start with the typical Internet client.
The client machine itself has many points of failure, but because they are beyond the control of the systems designer, they are ignored for the moment. The client machine is connected to the Internet through a dial-up line or a network connection of some sort. This is a critical point of failure, but again, beyond the control of the designer. The client makes a request for our application through an ISP (Internet service provider). The request travels across the Internet (through many routers and data lines) until it reaches the designer's Internet connection. At that point the request will most likely hit a Channel Service Unit/Data Service Unit (CSU/DSU), then a router, which will pass the request to a firewall. The firewall will pass the request through a cable to a switch or hub, which will pass the request on to the server, which is also connected with a network cable. The server will access the application by reading data from the hard drive or memory and send the desired information back to the client through the same route.
If you count all of the devices between the client and the server, you will easily see that there are many points of failure that could prevent the application from being accessible. Some of these items are not within the control of the designer, but many of them are. Take a look at the items that are within the control of the designer:
The Internet circuit
The DSU/CSU
The router
The firewall
The switch or hub
The server network card
The server power supply
The server hard drive
The server memory
All cables used to attach all network devices
All of the items listed can be controlled by the designer and must be evaluated to determine optimal fault tolerance. The failure of any one of these items will cause the application or service to be unavailable. Some of the items typically cannot be made fault-tolerant without great expense, but each item should be scrutinized to see what can be done to boost the reliability of the system. Table 1-1 lists each item and some of the fault tolerance options that are typically used.
Beyond fault tolerance, a designer must also consider the performance and scalability of the system. Hardware and software must be chosen that will provide high performance. The performance of a system is determined by usage. If you are in a controlled environment where the number of users is constant, it can be easy to determine performance requirements, but if you are building an Internet application, determining the performance requirements can be a bit tricky. A good plan is to always overestimate your need. If you believe the servers within your system need 256 MB of RAM to operate efficiently, buy 512 MB of RAM for each server. Always leave plenty of room. Don't just meet the requirementsexceed them. This will allow room for future growth.
Table 1-1 Fault Tolerance Options
Device/Item |
Fault Tolerance Options |
Internet Circuit |
A secondary, low-bandwidth circuit could be installed to direct traffic through the event of a failure on the primary circuit. |
DSU/CSU |
DSU/CSU can be purchased with redundant power supplies. It is also a good idea to keep one DSU/CSU on the shelf in case you need to swap them. |
Router |
Routers can be purchased with redundant power supplies. Just as with the DSU/CSU, it is a good idea to have an extra router, configured exactly like the one you are using, just in case you need to swap it. |
Firewall |
There are different types of firewalls, so make sure your firewall is reliable and can handle the traffic. Many firewalls also have redundant power supplies and failover functionality. |
Switch/Hub |
Switches and hubs can be purchased to provide redundant power supplies and service. |
Server Network Card |
Within the server, you can use two network cards or you can build server clusters. |
Power Supply |
Most devices can be purchased with redundant power supplies. Also, make sure all power supplies use UPS. |
Hard Drive |
RAID 5 controllers can be purchased to make hard disks more fault-tolerant. |
Memory |
Clustering is the only answer to memory failure. |
Cabling |
Make sure all cabling in use is certified for the data that will travel on it. |