The Joys of Concurrent Programming
- What is Concurrency?
- The Benefits of Parallel Programming
- The Benefits of Distributed Programming
- The Minimal Effort Required
- The Basic Layers of Software Concurrency
- No Keyword Support for Parallelism in C++
- Programming Environments for Parallel and Distributed Programming
- Summary·Toward Concurrency
“I suspect that concurrency is best supported by a library and that such a library can be implemented without major language extensions.”
—Bjarne Stroustrup, inventor of C++
In this Chapter
-
What is Concurrency?
-
The Benefits of Parallel Programming
-
The Benefits of Distributed Programming
-
The Minimal Effort Required
-
The Basic Layers of Software Concurrency
-
No Keyword Support for Parallelism in C++
-
Programming Environments for Parallel and Distributed Programming
-
Summary—Toward Concurrency
The software development process now requires a working knowledge of parallel and distributed programming. The requirement for a piece of software to work properly over the Internet, on an intranet, or over some network is almost universal. Once the piece of software is deployed in one or more of these environments it is subjected to the most rigorous of performance demands. The user wants instantaneous and reliable results. In many situations the user wants the software to satisfy many requests at the same time. The capability to perform multiple simultaneous downloads of software and data from the Internet is a typical expectation of the user. Software designed to broadcast video must also be able to render graphics and digitally process sound seamlessly and without interruption. Web server software is often subjected to hundreds of thousands of hits per day. It is not uncommon for frequently used e-mail servers to be forced to survive the stress of a million sent and received messages during business hours. And it’s not just the quantity of the messages that can require tremendous work, it’s also the content. For instance, data transmissions containing digitized music, movies, or graphics devour network bandwidth and can inflict a serious penalty on server software that has not been properly designed. The typical computing environment is networked and the computers involved have multiple processors. The more the software does, the more it is required to do. To meet the minimal user’s requirements, today’s software must work harder and smarter. Software must be designed to take advantage of computers that have multiple processors. Since networked computers are more the rule than the exception, software must be designed to correctly and effectively run, with some of its pieces executing simultaneously on different computers. In some cases, the different computers have totally different operating systems with different network protocols! To accommodate these realities, a software development repertoire must include techniques for implementing concurrency through parallel and distributed programming.
1.1 What is Concurrency?
Two events are said to be concurrent if they occur within the same time interval. Two or more tasks executing over the same time interval are said to execute concurrently. For our purposes, concurrent doesn’t necessarily mean at the same exact instant. For example, two tasks may occur concurrently within the same second but with each task executing within different fractions of the second. The first task may execute for the first tenth of the second and pause, the second task may execute for the next tenth of the second and pause, the first task may start again executing in the third tenth of a second, and so on. Each task may alternate executing. However, the length of a second is so short that it appears that both tasks are executing simultaneously. We may extend this notion to longer time intervals. Two programs performing some task within the same hour continuously make progress of the task during that hour, although they may or may not be executing at the same exact instant. We say that the two programs are executing concurrently for that hour. Tasks that exist at the same time and perform in the same time period are concurrent. Concurrent tasks can execute in a single or multiprocessing environment. In a single processing environment, concurrent tasks exist at the same time and execute within the same time period by context switching. In a multiprocessor environment, if enough processors are free, concurrent tasks may execute at the same instant over the same time period. The determining factor for what makes an acceptable time period for concurrency is relative to the application.
Concurrency techniques are used to allow a computer program to do more work over the same time period or time interval. Rather than designing the program to do one task at a time, the program is broken down in such a way that some of the tasks can be executed concurrently. In some situations, doing more work over the same time period is not the goal. Rather, simplifying the programming solution is the goal. Sometimes it makes more sense to think of the solution to the problem as a set of concurrently executed tasks. For instance, the solution to the problem of losing weight is best thought of as concurrently executed tasks: diet and exercise. That is, the improved diet and exercise regimen are supposed to occur over the same time interval (not necessarily at the same instant). It is typically not very beneficial to do one during one time period and the other within a totally different time period. The concurrency of both processes is the natural form of the solution. Sometimes concurrency is used to make software faster or get done with its work sooner. Sometimes concurrency is used to make software do more work over the same interval where speed is secondary to capacity. For instance, some web sites want customers to stay logged on as long as possible. So it’s not how fast they can get the customers on and off of the site that is the concern—it’s how many customers the site can support concurrently. So the goal of the software design is to handle as many connections as possible for as long a time period as possible. Finally, concurrency can be used to make the software simpler. Often, one long, complicated sequence of operations can be implemented easier as a series of small, concurrently executing operations. Whether concurrency is used to make the software faster, handle larger loads, or simplify the programming solution, the main object is software improvement using concurrency to make the software better.
1.1.1 The Two Basic Approaches to Achieving Concurrency
Parallel programming and distributed programming are two basic approaches for achieving concurrency with a piece of software. They are two different programming paradigms that sometimes intersect. Parallel programming techniques assign the work a program has to do to two or more processors within a single physical or a single virtual computer. Distributed programming techniques assign the work a program has to do to two or more processes—where the processes may or may not exist on the same computer. That is, the parts of a distributed program often run on different computers connected by a network or at least in different processes. A program that contains parallelism executes on the same physical or virtual computer. The parallelism within a program may be divided into processes or threads. We discuss processes in Chapter 3 and threads in Chapter 4. For our purposes, distributed programs can only be divided into processes. Multithreading is restricted to parallelism. Technically, parallel programs are sometimes distributed, as is the case with PVM (Parallel Virtual Machine) programming. Distributed programming is sometimes used to implement parallelism, as is the case with MPI (Message Passing Interface) programming. However, not all distributed programs involve parallelism. The parts of a distributed program may execute at different instances and over different time periods. For instance, a software calendar program might be divided into two parts: One part provides the user with a calendar and a method for recording important appointments and the other part provides the user with a set of alarms for each different type of appointment. The user schedules the appointments using part of the software, and the other part of the software executes separately at a different time. The alarms and the scheduling component together make a single application, but they are divided into two separately executing parts. In pure parallelism, the concurrently executing parts are all components of the same program. In distributed programs, the parts are usually implemented as separate programs. Figure 1-1 shows the typical architecture for a parallel and distributed program.
Figure 1-1. Typical architecture for a parallel and distributed program.
The parallel application in Figure 1-1 consists of one program divided into four tasks. Each task executes on a separate processor, therefore, each task may execute simultaneously. The tasks can be implemented by either a process or a thread. On the other hand, the distributed application in Figure 1-1 consists of three separate programs with each program executing on a separate computer. Program 3 consists of two separate parts that execute on the same computer. Although Task A and D of Program 3 are on the same computer, they are distributed because they are implemented by two separate processes. Tasks within a parallel program are more tightly coupled than tasks within a distributed program. In general, processors associated with distributed programs are on different computers, whereas processors associated with programs that involve parallelism are on the same computer. Of course, there are hybrid programs that are both parallel and distributed. These hybrid combinations are becoming the norm.