The Algorithm Structure Design Space in Parallel Programming
-
4.1 INTRODUCTION
-
4.2 CHOOSING AN ALGORITHM STRUCTURE PATTERN
-
4.3 EXAMPLES
-
4.4 THE TASK PARALLELISM PATTERN
-
4.5 THE DIVIDE AND CONQUER PATTERN
-
4.6 THE GEOMETRIC DECOMPOSITION PATTERN
-
4.7 THE RECURSIVE DATA PATTERN
-
4.8 THE PIPELINE PATTERN
-
4.9 THE EVENT-BASED COORDINATION PATTERN
4.1 INTRODUCTION
The first phase of designing a parallel algorithm consists of analyzing the problem to identify exploitable concurrency, usually by using the patterns of the Finding Concurrency design space. The output from the Finding Concurrency design space is a decomposition of the problem into design elements:
-
A task decomposition that identifies tasks that can execute concurrently
-
A data decomposition that identifies data local to each task
-
A way of grouping tasks and ordering the groups to satisfy temporal constraints
-
An analysis of dependencies among tasks
These elements provide the connection from the Finding Concurrency design space to the Algorithm Structure design space. Our goal in the Algorithm Structure design space is to refine the design and move it closer to a program that can execute tasks concurrently by mapping the concurrency onto multiple UEs running on a parallel computer.
Of the countless ways to define an algorithm structure, most follow one of six basic design patterns. These patterns make up the Algorithm Structure design space. An overview of this design space and its place in the pattern language is shown in Fig. 4.1.
Figure 4.1 Overview of the Algorithm Structure design space and its place in the pattern language
The key issue at this stage is to decide which pattern or patterns are most appropriate for the problem.
First of all, we need to keep in mind that different aspects of the analysis can pull the design in different directions; one aspect might suggest one structure while another suggests a different structure. In nearly every case, however, the following forces should be kept in mind.
-
Efficiency. It is crucial that a parallel program run quickly and make good use of the computer resources.
-
Simplicity. A simple algorithm resulting in easy-to-understand code is easier to develop, debug, verify, and modify.
-
Portability. Ideally, programs should run on the widest range of parallel computers. This will maximize the "market" for a particular program. More importantly, a program is used for many years, while any particular computer system is used for only a few years. Portable programs protect a software investment.
-
Scalability. Ideally, an algorithm should be effective on a wide range of numbers of processing elements (PEs), from a few up to hundreds or even thousands.
These forces conflict in several ways, however.
Efficiency conflicts with portability: Making a program efficient almost always requires that the code take into account the characteristics of the specific system on which it is intended to run, which limits portability. A design that makes use of the special features of a particular system or programming environment may lead to an efficient program for that particular environment, but be unusable for a different platform, either because it performs poorly or because it is difficult or even impossible to implement for the new platform.
Efficiency also can conflict with simplicity: For example, to write efficient programs that use the Task Parallelism pattern, it is sometimes necessary to use complicated scheduling algorithms. These algorithms in many cases, however, make the program very difficult to understand.
Thus, a good algorithm design must strike a balance between (1) abstraction and portability and (2) suitability for a particular target architecture. The challenge faced by the designer, especially at this early phase of the algorithm design, is to leave the parallel algorithm design abstract enough to support portability while ensuring that it can eventually be implemented effectively for the parallel systems on which it will be executed.