4.2 CHOOSING AN ALGORITHM STRUCTURE PATTERN
Finding an effective Algorithm Structure pattern for a given problem can be accomplished by considering the questions in the following sections.
4.2.1 Target Platform
What constraints are placed on the parallel algorithm by the target machine or programming environment?
In an ideal world, it would not be necessary to consider the details of the target platform at this stage of the design, because doing so works against keeping the program portable and scalable. This is not an ideal world, however, and software designed without considering the major features of the target platform is unlikely to run efficiently.
The primary issue is how many units of execution (UEs) the system will effectively support, because an algorithm that works well for ten UEs may not work well for hundreds of UEs. It is not necessary to decide on a specific number (in fact to do so would overly constrain the applicability of the design), but it is important to have in mind at this point an order of magnitude for the number of UEs.
Another issue is how expensive it is to share information among UEs. If there is hardware support for shared memory, information exchange takes place through shared access to common memory, and frequent data sharing makes sense. If the target is a collection of nodes connected by a slow network, however, the communication required to share information is very expensive and must be avoided wherever possible.
When thinking about both of these issuesthe number of UEs and the cost of sharing informationavoid the tendency to over-constrain the design. Software typically outlives hardware, so over the course of a program's life it may be used on a tremendous range of target platforms. The goal is to obtain a design that works well on the original target platform, but at the same time is flexible enough to adapt to different classes of hardware.
Finally, in addition to multiple UEs and some way to share information among them, a parallel computer has one or more programming environments that can be used to implement parallel algorithms. Different programming environments provide different ways to create tasks and share information among UEs, and a design that does not map well onto the characteristics of the target programming environment will be difficult to implement.
4.2.2 Major Organizing Principle
When considering the concurrency in the problem, is there a particular way of looking at it that stands out and provides a high-level mechanism for organizing this concurrency?
The analysis carried out using the patterns of the Finding Concurrency design space describes the potential concurrency in terms of tasks and groups of tasks, data (both shared and task-local), and ordering constraints among task groups. The next step is to find an algorithm structure that represents how this concurrency maps onto the UEs. There is usually a major organizing principle implied by the concurrency. This usually falls into one of three camps: organization by tasks, organization by data decomposition, and organization by flow of data. We now consider each of these in more detail.
For some problems, there is really only one group of tasks active at one time, and the way the tasks within this group interact is the major feature of the concurrency. Examples include so-called embarrassingly parallel programs in which the tasks are completely independent, as well as programs in which the tasks in a single group cooperate to compute a result.
For other problems, the way data is decomposed and shared among tasks stands out as the major way to organize the concurrency. For example, many problems focus on the update of a few large data structures, and the most productive way to think about the concurrency is in terms of how this structure is decomposed and distributed among UEs. Programs to solve differential equations or carry out linear algebra computations often fall into this category because they are frequently based on updating large data structures.
Finally, for some problems, the major feature of the concurrency is the presence of well-defined interacting groups of tasks, and the key issue is how the data flows among the tasks. For example, in a signal-processing application, data may flow through a sequence of tasks organized as a pipeline, each performing a transformation on successive data elements. Or a discrete-event simulation might be parallelized by decomposing it into a tasks interacting via "events". Here, the major feature of the concurrency is the way in which these distinct task groups interact.
Notice also that the most effective parallel algorithm design might make use of multiple algorithm structures (combined hierarchically, compositionally, or in sequence), and this is the point at which to consider whether such a design makes sense. For example, it often happens that the very top level of the design is a sequential composition of one or more Algorithm Structure patterns. Other designs might be organized hierarchically, with one pattern used to organize the interaction of the major task groups and other patterns used to organize tasks within the groupsfor example, an instance of the Pipeline pattern in which individual stages are instances of the Task Parallelism pattern.
4.2.3 The Algorithm Structure Decision Tree
For each subset of tasks, which Algorithm Structure design pattern most effectively defines how to map the tasks onto UEs?
Having considered the questions raised in the preceding sections, we are now ready to select an algorithm structure, guided by an understanding of constraints imposed by the target platform, an appreciation of the role of hierarchy and composition, and a major organizing principle for the problem. The decision is guided by the decision tree shown in Fig. 4.2. Starting at the top of the tree, consider the concurrency and the major organizing principle, and use this information to select one of the three branches of the tree; then follow the upcoming discussion for the appropriate subtree. Notice again that for some problems, the final design might combine more than one algorithm structure: If no single structure seems suitable, it might be necessary to divide the tasks making up the problem into two or more groups, work through this procedure separately for each group, and then determine how to combine the resulting algorithm structures.
Figure 4.2 Decision tree for the Algorithm Structure design space
Organize By Tasks
Select the Organize By Tasks branch when the execution of the tasks themselves is the best organizing principle. Then determine how the tasks are enumerated. If they can be gathered into a set linear in any number of dimensions, choose the Task Parallelism pattern. This pattern includes both situations in which the tasks are independent of each other (so-called embarrassingly parallel algorithms) and situations in which there are some dependencies among the tasks in the form of access to shared data or a need to exchange messages. If the tasks are enumerated by a recursive procedure, choose the Divide and Conquer pattern. In this pattern, the problem is solved by recursively dividing it into subproblems, solving each subproblem independently, and then recombining the subsolutions into a solution to the original problem.
Organize By Data Decomposition
Select the Organize By Data Decomposition branch when the decomposition of the data is the major organizing principle in understanding the concurrency. There are two patterns in this group, differing in how the decomposition is structuredlinearly in each dimension or recursively. Choose the Geometric Decomposition pattern when the problem space is decomposed into discrete subspaces and the problem is solved by computing solutions for the subspaces, with the solution for each subspace typically requiring data from a small number of other subspaces. Many instances of this pattern can be found in scientific computing, where it is useful in parallelizing grid-based computations, for example. Choose the Recursive Data pattern when the problem is defined in terms of following links through a recursive data structure (for example, a binary tree).
Organize By Flow of Data
Select the Organize By Flow of Data branch when the major organizing principle is how the flow of data imposes an ordering on the groups of tasks. This pattern group has two members, one that applies when this ordering is regular and static and one that applies when it is irregular and/or dynamic. Choose the Pipeline pattern when the flow of data among task groups is regular, one-way, and does not change during the algorithm (that is, the task groups can be arranged into a pipeline through which the data flows). Choose the Event-Based Coordination pattern when the flow of data is irregular, dynamic, and/or unpredictable (that is, when the task groups can be thought of as interacting via asynchronous events).
4.2.4 Re-evaluation
Is the Algorithm Structure pattern (or patterns) suitable for the target platform? It is important to frequently review decisions made so far to be sure the chosen pattern(s) are a good fit with the target platform.
After choosing one or more Algorithm Structure patterns to be used in the design, skim through their descriptions to be sure they are reasonably suitable for the target platform. (For example, if the target platform consists of a large number of workstations connected by a slow network, and one of the chosen Algorithm Structure patterns requires frequent communication among tasks, it might be difficult to implement the design efficiently.) If the chosen patterns seem wildly unsuitable for the target platform, try identifying a secondary organizing principle and working through the preceding step again.