- Cluster History
- Clustering for Fault Tolerance
- Distributed Systems Clusters
- Failover Clusters
- Load-Balanced Clusters
- Conclusion
Clustering for Fault Tolerance
Clustering is a means to and end. The final goal of clustering is fault tolerance eliminating every single point of failure within a system. Fault tolerance is the ability of a system to respond to failure in a way that does not hinder the service offering provided by the server. Clustering and load-balancing technologies provide fault tolerance by replicating the actions of a server. Before the appearance of affordable clustering software, many administrators would build a server and place it into production, and then create an exact duplicate of the server to keep as a "hot spare." This server would sit unused (and in many cases powered off) until it was needed. Once a failure occurred, the administrator would turn the production box off and turn the spare on. This provided some degree of manual fault tolerance, but lacked the high availability and reliability necessary to support mission-critical applications.
Clusters simplify this process. Instead of having a hot spare that is not running, clustering software actually groups two or more computers together to act as one virtual server. In this case, the "hot swap" is automated. If a server fails, the other servers in the cluster are notified and all traffic destined for that server is distributed across all of the remaining servers. This is ideal for Web servers and database servers alike.
The virtual server is not a physical server, but a server created using a network address that all of the servers in the cluster share. All servers within the cluster have a virtual connection to the server and can respond to requests. If one of these machines were to quit functioning, the remaining servers would respond by eliminating that machine from the cluster and redistributing the load across the remaining servers within the cluster. If the server were to come back online, the clustered servers would recognize the server and dynamically allow it to join the cluster once again.
This is an example of only one type of cluster, but you can see how a clustered server design virtually eliminates the possibility for downtime. It is important to note that the cluster does not prevent failure, but allows the server(s) to continue to respond to requests in spite of a failure that may occur.
There are many types of clusters in use today, but all of them have the same goal fault tolerance. Some of them, like the early VAXcluster systems, share information and resources, while others act independently. There are different types of clusters because applications have different needs. One application may demand scalability, while another places higher priority on fault tolerance. In many cases, system designers use different types of clusters operating within the same system to provide the maximum amount of fault tolerance, scalability, and reliability.