- What is a Compute Cluster?
- Different Types of Compute Jobs
- Building a Compute Cluster
- Computing Resources Needed
- Price Per CPU
- Optimal Solution Economics
- Beowulf Solution
- Beowulf Cluster on SPARC Hardware
- SUN Supported Beowulf Cluster
- How To Build Your Compute Cluster
- Advantages of a Sun Based Cluster
- Grid Computing
- Conclusion
- Compute Cluster Software
Building a Compute Cluster
A production compute cluster is a configuration of a number of machines into a single computing resource. Instead of starting a job on a specific machine (or host), the user submits a job to a queue. The queuing system runs the job on the best available machine. It is possible, although not very common, to run interactive jobs through the queuing system, too. If a user submits the task of running an xterm, the queuing system will start the task on a lightly-loaded host.
A compute cluster consists of the following parts:
Network - Ethernet, Myrinet, and so forth
File sharingNetwork File System (NFS), Andrew File System (AFS), or Distributed File System (DFS)
Queuing systemSunTM Grid Engine software, parallel batch system (PBS), or Platform Computing's LSF (load-sharing facility)
Message passing (if used)an MPI library (Sun HPC ClusterTools software or MPICH)
Compiler (if needed)Forte Developer software or GNU
Maintenance toolsJumpStart' software, automatic patch installation tools, and so forth
Administration toolshardware health checking tools (SunVTSTM software, Sun' Management Center software (hereafter called Sun MC), resource allocation (Solaris' BandWidth Manager software, disk quota, Network Information System (NIS), and so forth)
Terminal servers for the consoles
Job execution depends on how the queuing system is configured. You can optimize for the use of expensive software licenses, maximize total resource utilization, prioritize a certain group of users, and so forth.
Note that the optimal application development environment may be different than the optimal production environment. In a development environment, response time is more important than throughput.
To benefit the user and system administrator the most, the cluster must have these characteristics
- Powerful
- Simple to use
- Easy to program
- Simple to administer
- Easy to add more resources
- Good price and performance