- What is a Compute Cluster?
- Different Types of Compute Jobs
- Building a Compute Cluster
- Computing Resources Needed
- Price Per CPU
- Optimal Solution Economics
- Beowulf Solution
- Beowulf Cluster on SPARC Hardware
- SUN Supported Beowulf Cluster
- How To Build Your Compute Cluster
- Advantages of a Sun Based Cluster
- Grid Computing
- Conclusion
- Compute Cluster Software
How To Build Your Compute Cluster
To build an optimal compute cluster, you must:
Identify your current computing needs. For each type of job you must know:
Type: single threaded, multithreaded, or multiprocess
Memory requirement per process
Execution time per process
If multiprocess, communication intensity. (Do you need SMP or fast link?)
Estimate how the requirements will change in the near future.
Find out if any applications are available only on a specific platform (you can have multiple platforms in your cluster).
Find out if any existing machines (workstations, and so forth) can be used as a computing resource, either as is or with a slight modification (such as adding more memory).
Make the cluster manageable as a single resource:
Develop or configure tools to "soft update" or migrate the operating system.
Develop or configure tools to automatically manage patch installation.
Make any license server scalable and highly available.
Develop or configure tools to run applications from any compute node and other nodes from which you want users to be able to launch jobs.
Develop or configure tools to make your compute cluster manageable from a single point of control (that is, from a single display).
Decide on, install, and configure a queuing system (Sun Grid Engine software, PBS, and so forth). You need one machine as the queue master. If you do not already have file sharing, you need it between all machines plus access for all the services (NIS, and so forth). Depending on your applications, you may also need to install other software such as MPI libraries.
Train the users to submit their jobs to a queue instead of running everything interactively.
Decide whether additional computing resources are needed, and try to fill the needs with the smallest machines possible. That is, make single CPU machines handle the single CPU workload (if they can have enough memory), use dual 8.
CPU machines for the dual CPU load, and so forth. To minimize administration when you add machines that are to be used as a computing resource only, the machines should be as similar as possible, network installed, and not have local data storage (other than temporary files).
If some applications are parallelized, but are moderately communication intensive, you should consider connecting a few machines through a fast interconnect such as Myrinet or include an SMP.
If some applications are parallel, but are heavily communication intensive, add one or more larger machines to the cluster. The same is true if there are multithreaded applications in the application mix.