Clustering Overview
With the release of the 2.4 kernel, Linux became much more appropriate for use in the corporate environment. Threading pervaded more of the subsystems, kernel locks became more finely grained, and there were many improvements to general scalability. The result of this is a faster, more capable system that can perform at least closer to the levels of proprietary systems.
So, now that Linux is becoming a viable option for high-demand, large applications, it needs to be capable of participating effectively in clustered environment. In scientific facilities, this usually means that solutions should be broken up and computed in parallel across the network, using all available machines as a distributed CPU. In relatively simple commercial applications, such as Web hosting, clustering relies mainly on load balancing because each machine could service a request as easily as the next. Failure of one node means that future requests are answered by another node transparently, usually from the same network address. In more complicated application environments, failure of a node means that all application contexts that were present on that machine need to continue on another. In this situation, there usually needs to be a large amount of duplication between nodes so that when one does go down, the exact context of all work being done there can easily be picked up and run on another system.
The first two of these situations are fairly well known and handled in the Linux community. Scientific applications usually use a distributed approach such as Beowulf. With this method, programs are written so that small parts of the computations are distributed among nodes that are known to be members of the cluster. This method is known to be very scalablesome of the largest supercomputers in the world are made up of machines participating in a Beowulf cluster. The downside of this approach is that the programs using Beowulf need to be written specifically to use the cluster[md[it is not possible to drop a nonparallelized program into this kind of cluster and expect to get any kind of gain from it. In the arena of scientific problems, most research being done is known to be a custom problem, requiring custom code. Because the software is usually done in-house by those who understand the problem itself, the code can be parallelized with relative ease.
Similar to Beowulf is the MOSIX approach. This uses a special version of the Linux kernel that allows programs that weren't written to be distributed to take advantage of a cluster. All nodes that will participate need to be running a MOSIX-enabled kernel. The cluster is configured on the nodes so that each knows which others are involved. Applications are run on top of one of the nodes, and, based on resource availability, the work will be distributed as needed. As mentioned, you don't need to write the application to be explicitly aware of clustering technology, although there are a few considerations. Nonthreaded applications, or those that are heavily dependent on a single point of data input, will see minimal gains from this approach. The kernel will balance resources that it can and will try to localize data accesses. Work that requires a lot of disk activity will be moved to the node where that data resides. If all of the data is on a single machine, it will become a choke point for activity, and other nodes will not be used to the full extent. If this system is used, you need to try to distribute I/O points as much as possible.
Next is the common failover approach, the Linux Virtual Server (LVS) system. If requests against your system can be distributed so that any node that services the request is as capable as any other, LVS might be perfect. The most common use for this is Web farms, in which each individual HTTP request can be handled by any node as long as each node has access to the same data.
LVS can be configured in a few different ways, depending on your needs. The most scalable is one in which each node in the cluster is connected to the requisite network; on the front end, a load balancer answers client requests by forwarding them on to a node in the cluster. Several scheduling mechanisms are possible, and the load balancer can track the capabilities of each node, taking into account the load limits of each while scheduling. There is a lot more to it than just this brief overview, and I urge you to look further into this solution if it might fit your needs. Depending on the configuration, the cluster can consist of 2 to 100 machines with relative ease, so it can handle a cluster of nearly any size.
These solutions have existed in the field for a while and have been proven effective for many users. But there are still some places where Linux clustering needs to expand. Some application writers have taken it upon themselves to build custom load balancing and failover into each app. While this works, it results in a large duplication of effort. So, some development is going on to offer a more advanced solution.
Egenera is one company working on a new solution. Many of those who used VMS clusters have been waiting for a similar solution to appear for Linux. The resource-sharing and failover capabilities present then have taken a while to surface again, but a true contained failover solution is on the way. Egenera's solution consists of a rack of hardware constructed of several blades, each of which is running an instance of Linux. The rack tracks the status of each running instance and keeps a few handy to take over in the event of a failure. Any failed nodes are transparently rotated to a new node, to the application, and to its users. Egenera is starting to come forward with announcements for its system; while it is relatively expensive, it can yield a very stable environment for high-availability systems.
So, while there are some very solid technologies out there for Linux clusters, there is still room for growth and expansion in the field. Linux continues to scale higher into the enterprise market. To truly compete, the clustering technology needs to follow suit. These coming solutions are filling the gaps in commercial and scientific needs and, in turn, will allow Linux to effectively compete with the biggest players in the market.