- Introduction
- Defining Cluster
- Why Clusters?
- Why Linux for Clusters?
- Summary
Why Clusters?
I firmly believe that the adoption of Linux clusters for nonscientific applications will continue to grow and accelerate as more cluster-enabled software becomes available. This increase will require us to get better at making the cluster creation process predictable and repeatable, or the adoption rate will slow. Some examples will illustrate why interest in a clustered solution is becoming more prevalent in both technical computing and traditional IT environments.
In the scientific world, Linux clusters are enabling smaller research organizations to have supercomputing capabilities that would previously have been out of their reach due to limited budgets. This is particularly the case if large, proprietary SMP systems were the only choice. The advantage here is lower cost for a given amount of resources.
On the other hand, larger research institutions such as the U.S. National Laboratories are extending the upper limits of cluster size to tens of thousands of CPUs and terabytes of physical RAM, producing cluster supercomputers with aggregated resources that are simply not possible with single-box SMP solutions. The advantage of clusters in this situation is the ability to scale RAM and CPU resources far beyond the physical limitations of even the largest SMP system.
In commercial database applications like those using Oracle 9i RAC (Real Application Clusters), a clustered approach allows a single instance of the database or other service to run across multiple physical systems, providing two distinct advantages: performance scaling (scaling up) and immunity to failure with multiple sources of a service (scaling out).
Scalable visualization clusters allow spreading of the 3D graphics rendering operations for large, complex datasets over multiple clustered systems and their embedded graphics cards. One benefit to this approach is the ability to overcome the display-memory limitation on individual graphics cards. A parallel rendering approach increases the response time for real-time operations such as rotation and panning that are used to analyze the data being displayed. This is an example of parallel computation providing increased performance or throughput.
Let's sum up some possible advantages of a clustered solution:
- Built with commodity hardware and software
- Allows flexible scaling for performance ("scaling up")
- Provides immunity from failure ("scaling out")
- Cheaper than large SMP systems for the same number of CPUs
- Easier to expand and upgrade than large SMP systems
A cluster may not be the most cost-effective solution for every situation, however. Special cluster-aware software may be required to utilize the cluster's hardware, or your organization may not have the required expertise to create and operate a complete cluster solution. Check the details before you start buying or building. To be fair, we need to list some of the possible disadvantages of clustered solutions:
- More complex than similarly sized SMP systems
- Requires specialized knowledge to build and operate
- Requires specialized applications
One key benefit appears repeatedly with respect to clusters: Within a given budget, a cluster may be the most cost-effective, scalable way to provide large amounts of hardware resources (RAM and CPU) to special applications that can make use of them. A simple comparison between SMP and cluster hardware costs illustrates one of the reasons that clusters are candidates for replacing large, expensive SMP systems. Figure 1 shows this comparison.
Figure 1 Comparison of SMP and cluster hardware cost versus CPU scalability—logarithmic scale
In this diagram, we can compare the hardware costs for five different configurations of hardware:
- SMP system utilizing proprietary CPU technology that can scale to 128 processors within the same complex (black square)
- SMP system using "Type 1" processors that can scale to 64 processors within the same complex (blue diamond)
- SMP system utilizing commodity processors (IA-32) that can scale to eight processors within the same chassis (red circle)
- Cluster built of 2-CPU "Type 1" compute slices (blue triangle)
- Cluster built of 2-CPU IA-32 compute slices (purple square)
The term SMP refers to a system that provides all processors with a consistent view of the memory contained within the system's physical scope. But saying that a system is an SMP system doesn't specify that all memory visible within the system can be accessed with equal performance. Most of the largest SMP systems today are a complex built from smaller hardware cells or domains that contain their own processors, bus, and local RAM. Computing elements are tied together hierarchically with a high-speed interconnect (HSI) to allow access to remote RAM. Systems with this type of memory architecture are termed cache-coherent non-uniform memory access (ccNUMA) systems.
Notice that the SMP systems have a physical limitation to the number of CPUs they can hold. This is primarily due to the bus and memory bandwidth limitations within the particular chassis or complex. The dotted lines show that the IA-32, "Type 1," and proprietary SMP systems have 8-, 64-, and 128-CPU limits, respectively. Similar limitations exist with regard to the amount of physical RAM the systems can hold. The arrow in the upper right of the figure shows the cluster configuration's ability to continue scaling well beyond the 512 CPUs shown at the graph's extreme right.
Figure 1 shows price on a logarithmic scale, which visually distorts the price differences between the options. Figure 2 shows the same data on a linear scale.
Figure 2 Comparison of SMP and cluster hardware cost versus CPU scalability—linear scale
Let's take the 64-CPU point on the graph for our discussion. These are the raw hardware costs:
Configuration |
Cost |
SMP proprietary |
$3,373,000 |
SMP "Type 1" |
$2,743,000 |
2-CPU "Type 1" cluster |
$1,121,000 |
2-CPU IA-32 cluster |
$501,000 |
It doesn't take a genius to see that if your application and performance requirements allow it, the commodity, clustered approach with IA-32 systems is the hands-down cost winner for this number of CPUs. The cost savings may be even more significant if the calculations are done for hardware compute slices that are not top of the line, as may be possible for applications that are not compute-intensive. As the requirements expand the number of CPUs in the cluster, the SMP solutions no longer even play in the calculations.