- What is a Compute Cluster?
- Different Types of Compute Jobs
- Building a Compute Cluster
- Computing Resources Needed
- Price Per CPU
- Optimal Solution Economics
- Beowulf Solution
- Beowulf Cluster on SPARC Hardware
- SUN Supported Beowulf Cluster
- How To Build Your Compute Cluster
- Advantages of a Sun Based Cluster
- Grid Computing
- Conclusion
- Compute Cluster Software
Optimal Solution Economics
From an economical standpoint, the optimal solution is to run all jobs on the smallest and least expensive machines possible. Single-threaded jobs should be run on single CPU machines. Multithreaded jobs that only scale to a few CPUs run most economically on a small or medium-size machine with several CPUs. Only jobs that scale to many CPUs, and for which the execution wall time (real time) is very important, such as weather simulations and large crash simulations, should be run on a large machine or cluster.
These requirements indicate that the optimal compute cluster for a large organization is probably many small single CPU machines, a few medium-size machines, and one or more large parallel machines. FIGURE 3 shows the number of CPUs per job versus the number of jobs.
FIGURE 3 Optimal Computer Cluster GraphNo. of CPUs Per Job Versus No. of Jobs
Similarly, for some applications, such as electronic design automation (EDA), memory is the most important factor. Most EDA simulation tools are single threaded and can utilize only one CPU, but you might need 10 Gbytes or more of memory for a large verification.
A very important consideration is the optimal use of expensive licenses. The annual license fee for an EDA tool is often in the order of $10,000 per CPU; thus, it is important to run as many jobs as you have licenses for, and to run these jobs on the fastest CPUs available. When looking at a solution for multiprocess applications, you must also look at the communication characteristics. An application with minimal communication needs, such as a CFD calculation with a static grid, might run very well over a slow interconnect like Ethernet. If the communication needs are greater, such as those of CFD with a moving grid (when simulating the interior of a cylinder in an engine), a low-latency interconnect like Myricom's Myrinet or an SMP, in which the processes can communicate through shared memory, will perform much better.
Often, many unused computing resources are available within an organization. In a common computer aided design (CAD) or software development environment, the CPUs in the workstations are used only 5 to 10 percent of the time on average. Usually, these machines are unused at nights, during weekends, and on holidays. Even during normal working hours, users attend meetings, make phone calls, eat lunch, and so forth. Often, all of this CPU time is wasted. By setting up a compute cluster on these workstations, a large portion of the wasted resources can be made available to the users in a very simple way. To the organizations, this means that their existing investments can yield 10 to 20 times more compute power. You should note though, that this configuration requires sufficient network bandwidth between the computing resources to transfer the data needed for computing.
One example of a very successful compute cluster is at Saab Automobile, where 130 CAD workstations are used for background CFD simulations.