Cluster Software Components
The software architecture for your cluster may fit somewhere between two extremes: an application-specific cluster that runs a single application or provides a single service, and a more general-purpose cluster that might support many different users and their own pet applications. The first case represents the best starting point for a cluster beginner; it simplifies the software infrastructure required for the cluster. Let's highlight some of those software components.
Linux: The Cluster Operating System
We're discussing Linux clusters, which narrows our choice of operating systems. As mentioned in the why portion of this series, we choose Linux for clusters because of its stability, manageability, and flexibility. But there's another really good reason: The client/server subsystems that come with Linux distributions, coupled with a wide range of freely available software packages, provide a full design pallet. The built-in, ready to go, already-there configuration saves money on the software portion of the cluster implementation.
Figures 2 and 3 showed a lot of general classes of software components that are found in clusters. Which components we choose will be driven by the cluster's primary application(s) and the environments with which the cluster must interact. An Oracle database cluster requires a different set of software services from that of a full-blown, job-based HPC cluster. Frankly, one of our most difficult tasks is to choose only the right tools and make the individual pieces behave in a cohesive way.
Software Infrastructure
We don't have the space to discuss all of the possible software components that might be present in a cluster, nor can we cover the proper configuration of each subsystem. There are multiple choices in each of the categories: user authentication, name resolution services, time synchronization services, system image installation, error logging, system performance monitoring, crash and core-dump management, system event and status monitoring, load balancing, job scheduling, and file serving. These categories fit into the highlighted areas of the cluster architecture shown in Figure 4.
Figure 4 Cluster architectural components—software elements.
Uncomfortable with integrating the necessary Linux subsystems to create a cluster? There are alternatives to the do-it-yourself approach. First, vendors such as Scali offer commercially available cluster infrastructures. Freely available cluster toolkits also can drastically reduce the work of integrating a cluster's software. We'll look at the free toolkits in the next section.
Cluster Software Toolkits
Between the full do-it-yourself approach and the commercially available cluster packages are some well-designed cluster software toolkits. Two of those toolkits are the Open Source Cluster Application Resources, known as OSCAR; and Rocks, from the National Partnership for Advanced Computing Infrastructure (NPACI). If the toolkit's contents meet your cluster needs, you can drastically reduce the amount of effort required to create the cluster's software infrastructure.
OSCAR supports a number of Linux distributions, including Red Hat 9.0, Red Hat Enterprise Linux (RHEL) 3, and Fedora Core 2. The OSCAR toolkit is an integration of a number of existing open-source packages that are commonly used in cluster construction, along with tools from the OSCAR team. There is ongoing work on multiple flavors of OSCAR, including thin-OSCAR, HA-OSCAR, and others. One important distinction about OSCAR is that you need to provide your own Linux distribution to the installer.
The Rocks cluster toolkit contains its own Linux distribution, so it's not an option if you need a licensed, supported Linux distribution for your cluster. Rocks also contains a number of optional rolls that add functionality to the cluster's basic software configuration. While its primary target is HPC clusters, there's no reason that the toolkit can't be used for more general-purpose cluster installations.
It's important to understand the physical architecture supported by each toolkit and the strengths and weaknesses of each. The toolkit approach is particularly useful if your organization wants to "kick the tires" on a cluster before committing to the technology. The chances of getting a functional cluster on the first try are much higher with the toolkit approach.
Finally, while it's not a cluster toolkit, with one cluster technology you can test the cluster "waters" in your organization: openMosix allows you to boot existing hardware into a cluster from a CD-ROM.