SKIP THE SHIPPING
Use code NOSHIP during checkout to save 40% on eligible eBooks, now through January 5. Shop now.
Register your product to gain access to bonus material or receive a coupon.
1378E-5
A comprehensive guide to today's most advanced R&D in highly parallel programming and applications.
Volume 1 of this two-volume set collected today's best work on the systems aspects of high performance cluster computing. Now, in High Performance Cluster Computing: Programming and Application Issues, Volume 2, Rajkumar Buyya brings together the world's leading work on programming and applications for today's state-of-the-art "commodity supercomputers."
The book is organized into three areas: programming environments and development tools; Java(tm) as a language of choice for development in highly parallel systems; and state-of-the-art high performance algorithms and applications. All three areas have seen major advances in recent years-and in all three areas, this book offers unprecented breadth and depth. Coverage includes:
Preface.
I. PROGRAMMING ENVIRONMENTS AND DEVELOPMENT TOOLS.
1. Parallel Programming Models and Paradigms.Introduction. A Cluster Computer and its Architecture. Parallel Applications and Their Development.
Strategies for Developing Parallel Applications.
Code Granularity and Levels of Parallelism. Parallel Programming Models and Tools.
Parallelizing Compilers. Parallel Languages. High Performance Fortran. Message Passing. Virtual Shared Memory. Parallel Object-Oriented Programming. Programming Skeletons.
Methodical Design of Parallel Algorithms.
Partitioning. Communication. Agglomeration. Mapping.
Parallel Programming Paradigms.
Choice of Paradigms. Task-Farming (or Master/Slave). Single-Program Multiple-Data (SPMD). Data Pipelining. Divide and Conquer. Speculative Parallelism. Hybrid Models.
Programming Skeletons and Templates.
Programmability. Reusability. Portability. Efficiency.
Conclusions. Bibliography.
2. Parallel Programming Languages and Environments.Introduction. Early Mechanisms.
Semaphores. Conditional Critical Regions. Monitors. Sockets. Remote Procedure Calls. Rendezvous.
Shared Memory Environments.
Pure Shared Memory Environments. Virtual Shared Memory Environments.
Distributed Memory Environments.
Ada. Message Passing Interface. Parallel Virtual Machine. Distributed Computing Environment. Distributed Java.
Parallel Declarative Environments.
Parallel Logic Languages. Parallel Functional Languages.
Summary. Bibliography.
3. MPI and PVM Programming.Introduction. Comparison of MPI and PVM. The All Pairs Shortest Path Problem.
Description of the Problem. Dijkstra's Algorithm. Floyd's Algorithm. Parallel Algorithms.
The MPI Programming Environment.
Communication Models in MPI. Creating an MPI Program. Running an MPI Program. PDIJK SVIPI. PFLOYD_MPI. Tools for MPI Programs.
The PVM Programming Environment.
Communication Models in PVM. Creating a PVM Program. Running a PVM Program. PDIJK AVM. P FLOYD_PVM. Tools for PVM Programs.
Porting Hints.
Initial Environment Setup. Parallel Tasks Setup. Group Management. Intertask Communication. Utility Functions.
Summary. Bibliography.
4. Linking Message-Passing Environments.Interoperability Between Message-Passing Interfaces.
Message Passing Between Programming Environments. Access to Resource Management Systems.
An Overview of the PLUS Library.
Process Creation and Management.
System Architecture.
Library. Daemons.
Adding New Message-Passing Environments. Performance Results. Related Work. Summary. Bibliography.
5. Active Objects.Objects in Cluster-Based Parallel Systems. Active Versus Passive Objects. Objects and Atomicity. BaLinda K Objects. Speculative Processing. Summary. Bibliography.
6. Using Scoped Behavior to Optimize Data Sharing Idioms.Introduction. Motivation: Data Sharing Idioms. Aurora: A Distributed Shared Data System.
Process Models. Shared-Data Objects. Scoped Behavior. Illustrative Example: Matrix Multiplication. Discussion: Programming in Aurora.
Implementation Overview.
Handle-Body Composite Objects. Scoped Behavior Objects.
Experience with Parallel Programs.
Matrix Multiplication. 2-D Diffusion. Parallel Sorting by Regular Sampling.
Discussion and Related Work. Concluding Remarks. Bibliography.
7. Component-Based Development Approach.Introduction. Component-Based Application Development.
Overall Framework. Programmer Interface. Building Reusable Design Components. Assembling Reusable Design Components. Compiling and Running a Distributed Application. Alternative Design Styles. Consistency Checks.
Advanced Features.
Architecture Models as Reusable Components. Generation of Communication and Synchronization Code. Manipulating Messages. Repetitive Substructures in Architecture Models.
Reusing Simulation Software in a Distributed Setting. Comparison Between Approaches. Concluding Remarks. Bibliography.
8. Hypercomputing with LiPS.Generative Communication.
Tuple Space. Benefits of Generative Communication. Example Applications.
Using LiPS.
Writing and Generating a LIPS Application. Starting the System and its Application. Monitoring the System and its Applications. Some Remarks on Fault-Tolerant Applications.
The LIPS Runtime Systems.
The Fault-Tolerant Tuple Space Machine. Performance. The LIPS Daemon (lipsd()). The LIPS Daemon Controller (lipsdc()).
The LIPS Development System.
Programming in CWEB. The LIPS Test Environment.
Bibliography.
9. An Efficient Tuple Space Programming Environment.Introduction. Tuple Space Programming.
Fundamentals. Example Linda Program. Associative Memory Analysis.
Compilation Environment.
Basic Translation. Optimizing Compilers.
Run-time Environment.
Processor Location of Data. Data Structures for Efficient Data Access. Data Transfer Protocol. Process Creation. Cluster Execution Environment. Run-time Optimizations.
Extensions. Conclusions. Bibliography.
10. Debugging Parallelized Code.Introduction. Automatic Parallelization. The Debugging Problem. Debugging with Code Liberation.
Overview of the Technique. The Single Assignment Form via Global Renaming. Renaming Simple Variables in Structured Code. Array Renaming. Renaming Unstructured Code. Name Reclamation. Reclaiming the Names. The Debugging Interface.
Experimental Results. Conclusions. Bibliography.
11. WebOS: Operating System Services for Wide-Area Applications.Introduction. WebOS Overview. Naming. Persistent Shared State. Security and Authentication.
Validating and Revoking Statements. Processes and Roles. Authorization.
Process Control. Rent-A-Server.
Current Approaches. System Design. Performance.
Related Work. Conclusions. Bibliography.
II. JAVA FOR HIGH PERFORMANCE COMPUTING.
12. Distributed-Object Computing.Introduction. CORBA.
Basic Model - C ORBA 2.0 Architecture. OMG IDL(Interface Definition Language) and its Mapping. C ORBA Object Services. A Matrix Multiplication Example.
Java RMI.
Basic Model. RMI Features. Matrix Multiplication.
DCOM.
Basic Model. Identification of DCOM Interfaces and Classes. IDL and its Mapping. Creation of Objects. Using COM Interfaces and Object Lifetime. DCOM Programming in Java. Matrix Multiplication Example.
Voyager.
Basic Model. Voyager Features.
A Simple Performance Measurement.
Matrix Multiplication. Which Paradigm is Superior?
Bibliography.
13. Java and Different Flavors of Parallel Programming Models.Introduction. Java Threads-Built-in Support for Parallelism and Concurrency.
Are Java Threads the Correct Model? Java Support for Parallelism.
Parallel Programming Models.
Functional Models. Object-Oriented Models. Data Parallel Models. Message Passing Models. Shared Memory Models.
Summary. Bibliography.
14. The HPspmd Model and its Java Binding.Introduction. Java Language Binding.
Basic Concepts. Global Variables. Program Execution Control. Communication Library Functions.
Java Packages for HPspmd Programming. Programming Examples. Issues in the Language Design.
15. Web-Based Parallel Computing with Java.Introduction. Web-Based Parallel Computing. Comparing Cluster with Web-Based Parallel Computing. Examples of Internet-Based Parallel Computing. Can Java be Used for Web-Based Parallel Computing? Problems to be Solved in Web-Based Parallel Computing. A Case Study: The JET Platform. Some Performance Results. Conclusions. Bibliography.
III. ALGORITHMS AND APPLICATIONS.
16. Object-Oriented Implementation of Parallel Genetic Algorithms.Introduction. Short Overview of GA Systems. Object-Oriented Approach to PGAs. Classes Representing Individuals. Local Genetic Operations. Island Model. Global Population Model. Load Balancing. File and I/O Operations. PGA Application Framework. Sample Results. Concluding Remarks. Bibliography.
17. Application-Specific Load Balancing on Heterogeneous Systems.Introduction. System Overview. Implementation of a Complex FDTD Equation.
Implementation Details. Experimental Result.
Load Balancing.
Methodology. Processor Weights. Application Data Movement (ADM).
Analysis.
Perfect Load Balancing System. Experimental Results. Future Improvements.
Conclusions. Bibliography.
18. Time Management in Parallel Simulation.Introduction.
Different Types of Simulations.
Major Issues of Parallel Simulation. Principles of Parallel Simulation.
Distributed Programming: A Simulation Example. The Apparently Sequential Nature of Simulation. A Natural Correspondence Between Problem and Solution. Logical Process Simulation. Conservative Vs. Optimistic Simulation.
Conservative Synchronization Protocols.
Variants of the Chandy-Misra Null Message Protocol. Deadlock Detection and Recovery. Global Vs. Local Lookahead. Conservative Time Windows.
Conclusion. Bibliography.
19. Hardware System Simulation.Introduction. NEPSi.
ASIMUT. NEPSi as a Client Server Architecture. Overview of the NEPSi Network. Communication Model. VHDL Models Used. Tests for TNP. Initial Tests. Varying Numbers of PEs and Simulators. Matrix Multiplication on TNP. Variable Machines, Processes and PEs. Rollback. Multi Campus Simulation. Effect of Migration / Load Balancing. Network Traffic Effects.
Discussion. Bibliography.
20. Real-Time Resource Management Middleware: Open Systems and Applications.Introduction. Architecture of Dynamic (aoS Management Middleware. Adaptive Resource Allocation.
Automatic Survivability. Automatic Scalability. Assessment of Real-Time QoS. Resource Allocation and Management.
Experiences with the Adaptive Resource Management Services.
The Navy Testbed. A Real-Time Control System Benchmark. Experiments.
Conclusions. Bibliography.
21. Data Placement in Shared-Nothing Database Systems.Introduction. Data Placement. Declustering.
Evaluation Time Reduction for Select and Project. Relation CoLocation for Join.
Placement. Re-Distribution.
Update Re-Distribution. Query Frequency Change Re-Distribution.
Dynamic Re-Organization. Summary. Bibliography.
22. Parallel Inference with Very Large Knowledge Bases.Introduction.
Parallel Reflexive Reasoning on Cluster Computers. Contributions.
SHRUTI: A Structured Connectionist Reasoning System.
Terminology. Knowledge Encoding and Inference in Backward Reasoning. Constraints on Rules and Inferences.
Mapping SHRUTI onto Parallel Machines. SHRUTI on the CM-5-Design and Implementation.
The Connection Machine CM-5. The SHRUTI-CM5 Knowledge Base. Design Considerations. Encoding the Knowledge Base. Spreading Activation and Inference.
SHRUTI-CM5-A Mathematical Analysis.
Motivation. A Summary of the Analysis. Implications of the Analysis.
SHRUTI-CM5-Experiments with Large Knowledge Bases.
Experiments with Random Knowledge Bases. Experiments with WordNet. Empirical Validation of the Analysis.
Conclusion. Bibliography.
23. MaRT: Lazy Evaluation for Parallel Ray Tracing.Introduction.
Principles of Ray Tracing. Accelerated Ray Tracing Techniques. The Time-Memory Balance.
On Ray Tracing Parallelization Techniques.
The Image Parallel Approach. The Object Parallel Approach. Parallel Computation Analysis. Synthesis.
MaRT: a Lazy Ray Tracer.
On Lazy Evaluation. The Lazy Octree. Lazy Construction of Polygon Data.
Parallel MaRT.
Implementation of Parallel MaRT. Results.
Concluding Remarks. Bibliography.
24. Fast Content-Based Image Retrieval.Introduction. Image Feature Extraction.
AnOverview. Wavelet Based Multiple Image Feature Extraction.
Dynamic Image Indexing. Image Similarity Measurement. Image Searching. Parallel Implementation.
Divide-and-Conquer Parallel Algorithms.
Experimental Results.
Parallel Wavelet Transform Using PVM.
Parallel Image Feature Extraction - PVM Vs. DSM.
Hierarchical Image Matching in a PVM Environment.
Conclusion. Bibliography.
25. Climate Ocean Modeling.Introduction. Model Description. Parallel Partition on Irregular Geometries.
A Simple Partition. An Efficient Partition.
Ocean Modeling on Various Systems.
Overview of Parallel Systems. Cray T3D. Cray T3E. HP SPP2000. Beowulf System-PC Clusters. Code Performance and Intercomparisons.
Scientific Results. Summary. Bibliography.
26. Computational Electromagnetics.Introduction. Physical Optics Method. Finite-Difference Time-Domain Method. Finite-Element Integral-Equation Coupled Method. Conclusions. Bibliography.
27. CFD Simulation: A Case Study in Software Engineering.Introduction. TfC - a State-of-the-Art Industrial CFD Package. Requirements for Parallel CFD Simulation. Design and Implementation of ParTfC. Object Oriented Design of Scientific Software. Productive Use of Parallel Scientific Computing Software.
Resource Management in Workstation Clusters. The Pre- and Post-Processing Bottlenecks. Network Based Collaborative Design Environments.
Bibliography.
28. Quantum Reactive Scattering Calculations.Introduction. The Many Body Problem: Description, Decomposition, and Solutions.
The Decomposition of the Problem and the Mathematical Formalism. The Integration of Scattering Equations.
Parallelization Strategies.
Task Farm Model. Pipeline Model.
Parallel Implementations on CRAY T3E.
Task Farm Model. Pipeline Model. Performance Evaluation.
Parallel Implemention on SGI Origin 2000.
Task Farm Model. Performance Evaluation.
Parallel Implementation on a Metacomputer.
Implementation Issues. Performance Evaluation.
Concluding Remarks. Bibliography.
29. Biomedical Applications Modeling.Introduction. The Chromosome Reconstruction Problem.
Physical Mapping via Clone Ordering.
PVM Algorithms for Chromosome Reconstruction.
Simulated Annealing and Microcanonical Annealing. Parallel SA and MCA Using PVM.
Heart Rate Variability and Kolmogorov Entropy.
K2 Entropy. Serial Computation of the Correlation Integral.
A Parallel Algorithm for K2 Entropy Computation using PVM. Optimal Scaling Region Determination Algorithm. Experimental Results.
Chromosome Reconstruction. K2 Entropy Computation.
Conclusions. Bibliography.
Appendix A. Glossary.
The initial idea leading to clusters computing was developed in the 1960s by IBM as a way of linking large mainframes to provide a cost-effective form of commercial parallelism. During those days, IBM's HASP (Houston Automatic Spooling Priority) system and its successor, JES (Job Entry System), provided a way of distributing work to a user-constructed mainframe cluster. IBM still supports clustering of mainframes through their Parallel Sysplex system, which allows the hardware, operating system, middleware, and system management software to provide dramatic performance and cost improvements while permitting large mainframe users to continue to run their existing applications.
However, cluster computing did not gain momentum until three trends converged in the 1980s: high performance microprocessors, high-speed networks, and standard tools for high performance distributed computing. A possible fourth trend is the increased need of computing power for computational science and commercial applications coupled with the high cost and low accessibility of traditional supercomputers. These building blocks are also known as killer-microprocessors, killer-networks, killer-tools, and killer-applications, respectively. The recent advances in these technologies and their availability as cheap and commodity components are making clusters or networks of computers (PCs, workstations, and SMPs) an appealing vehicle for cost-effective parallel computing. Clusters, built using commodity-off-the-shelf (COTS) hardware components as well as free, or commonly used, software, are playing a major role in redefining the concept of supercomputing.
The trend in parallel computing is to move away from specialized traditional supercomputing platforms, such as the Cray/SGI T3E, to cheaper and general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations. This approach has a number of advantages, including being able to build a platform for a given budget which is suitable for a large class of applications and workloads.
This book is motivated by the fact that parallel computing on a network of computers using commodity components has received increased attention recently, and noticeable progress towards usable systems has been made. A number of researchers in academia and industry have been active in this field of research. Although research in this area is still in its early stage, promising results have been demonstrated by experimental systems built in academic and industrial laboratories. There is a need for better understanding of what cluster computing can offer, how cluster computers can be constructed, and what the impacts of clustering on high performance computing will be.
Though a significant number of research articles have been published in various conference proceedings and journals, the results are scattered in many places, are hard to obtain, and are difficult to understand, especially for beginners. This book, the first of its kind, gathers in one place the current and comprehensive technical coverage of the field and presents it in a tutorial form. The book's coverage reflects the state of the art in high-level architecture, design, and development, and points out possible directions for further research and development.
This book is a collection of chapters written by leading scientists active in the area of parallel computing using networked computers. The primary purpose of the book is to provide an authoritative overview of this field's state of the art. The emphasis is on the following aspects of cluster computing:
The work on High Performance Cluster Computing appears in two volumes:
This book, Volume 2, consists of 29 chapters, which are grouped into the following three parts:
Part I focuses on various programming paradigms, models, and environments, including MPI, PVM, tuple space programming, component based programming, debuggers, and OS services for wide area applications. Part II covers Java for high performance computing, focusing on Java variants supporting MPI, JVM, SPMD paradigm, and web-based computing. Part III discusses various parallel algorithms and applications designed for your cluster programming environments. The application areas discussed include the use of clusters in image processing, electromagnetics, ocean modeling, CFD simulation, and biological applications modeling.
The book is primarily written for graduate students and researchers interested in the area of parallel and distributed computing. However, it is also suitable for practitioners in industry and government laboratories.
The interdisciplinary nature of the book is likely to appeal to a wide audience. They will find this book to be a valuable source of information on recent advances and future directions of parallel computation using networked computers. This is the first book addressing various technological aspects of cluster computing in-depth, and we expect that the book will be an informative and useful reference in this new and fast growing research area.
The organization of this book makes it particularly useful for graduate courses. It can be used as a text for a research-oriented or seminar-based advanced graduate course. Graduate students will find the material covered by this book to be stimulating and inspiring. Using this book, they can identify interesting and important research topics for their Master's and Ph.D. work. It can also serve as a supplementary book for regular courses taught in Computer Science, Computer Engineering, Electrical Engineering, and Computational Science and Informatics Departments, including:
The various software systems discussed in this book are freely available for download through the Internet. Please visit this book's website,
for pointers/links to further information on downloading Educational Resources, Cluster Computing Environments, and Cluster Management Systems.
Acknowledgments
First and foremost, I am grateful to all the contributing authors for their time, effort, and understanding during the preparation of the book.
I thank Albert Zomaya (University of Western Australia) for his advice and encouragement while starting this book project.
I would like to thank Kennith Birman (Cornell University), Marcin Paprzycki (University of Southern Mississippi), and Hamid R. Arabnia (The University of Georgia) for their critical comments and suggestions on improving the book.
I thank Toni Cortes (Universitat Politecnica de Catalunya) for his consistent support and invaluable LaTeX expertise.
I thank Mark Baker (University of Portsmouth), Erich Schikuta (Universitaet Wien), Dror G. Feitelson (Hebrew University of Jerusalem), Daniel F. Savarese and Thomas Sterling (California Institute of Technology), Ira Pramanick (Silicon Graphics Inc), and Daniel S. Katz (Jet Propulsion Laboratory, California Institute of Technology) for writing overviews for various parts of the book.
I thank my wife, Smrithi, and my daughter, Soumya, for their love and understanding (my long absences from home) during the preparation of the book.
I acknowledge the support of the Australian Government Overseas Postgraduate Research Scholarship, the Queensland University of Technology Postgraduate Research Award (Programming Languages and Systems Research Centre Scholarship), the Monash University Graduate Scholarship, and the Distributed Systems and Software Engineering Centre Scholarship.
I thank Clemens Szyperski (Queensland University of Technology) and David Abramson (Monash University) for advising my Ph.D research program.
Finally, I would like to thank the staff at Prentice Hall, particularly Greg Doench, Mary Treacy, Joan L. McNamara, Barbara Cotton, Mary Loudin, Lisa Iarkowski, Anne Trowbridge, and Bryan Gambrel. They were wonderful to work with!
Rajkumar Buyya
Monash University, Melbourne, Australia
rajkumar@dgs.monash.edu.au, rajkumar@ieee.org
March, 1999