- Mechanical Solutions: Parallel Computing at the Operating System Level
- Automated Network Routing: Parallel Computing by Predetermined Logic
- Grid Computing: Parallel Computing by Distribution
- Parallel Computing for Business Applications
- The Solution: Software Pipelines
- Fluid Dynamics
- Software Pipelines Example
- Summary
Software Pipelines Example
To show you how Software Pipelines work, we’ll use a banking example. A large bank has a distributed network of ATMs, which access a centralized resource to process account transactions. Transaction volume is highly variable, response times are critical, and key business rules must be enforced—all of which make the bank’s back-end application an ideal use case for parallel pipelines. We must apply the following business requirements:
- Make sure each transaction is performed by an authorized user.
- Make sure each transaction is valid. For example, if the transaction is a withdrawal, make sure the account has sufficient funds to handle the transaction.
- Guarantee that multiple transactions on each account are performed sequentially. The bank wants to prevent any customer from overdrawing his or her account by using near-simultaneous transactions. Therefore, FIFO order is mandatory for withdrawal transactions.
Before we cover pipeline design, let’s take a look at the traditional design for a monolithic, tightly coupled, centralized software component. You can see the main flow for this design in Figure 1.3.
Figure 1.3 Traditional design for an ATM application
The simplicity of this design has several benefits:
- It’s very easy to implement.
- All business rules are in a single set of code.
- Sequence of transactions is guaranteed.
However, this design forces every user transaction to wait for any previous transactions to complete. If the volume of transactions shoots up (as it does in peak periods) and the input flow outstrips the load capacity of this single component, a lot of customers end up waiting for their transactions to process. All too often, waiting customers mean lost customers—an intolerable condition for a successful bank.
To use Software Pipelines to solve this problem, we’ll do a pipeline analysis. The first step is to divide the process into logical units of parallel work. We’ll start by decomposing the steps required for processing. Figure 1.4 shows the steps of the ATM process.
Figure 1.4 Steps in the ATM process
The steps are
- Authenticate the user (customer).
- Ensure the transaction is valid. For example, if the transaction is a withdrawal, make sure the account has sufficient funds to handle the transaction.
- Process the transaction and update the ATM daily record for the account.
Now that we understand the steps of the business process, we can identify the pipelines we’ll use for parallel processing. To do this, we determine which portions of the business process can execute in parallel.
For the initial ATM design (Figure 1.5), it seems safe to authenticate users in a separate pipeline. This task performs its work in a separate system, and after it returns the authentication, the process can perform the next two steps. In fact, because we’re not concerned with ordering at this stage, it’s safe to use multiple pipelines for this single task. Our goal is simply to process as many authentications as we can per unit of time, regardless of order.
Figure 1.5 Initial pipeline design: Distribute the authentication step.
This design speeds up the process, but most of the work—updating the ATM accounts—is still a serial process. You’ll still get bottlenecks, because the updating step is downstream from the authentication step. To improve performance by an order of magnitude, we’ll analyze the process further. We want to find other places where the process can be optimized, while still enforcing the key business rules.
After authenticating a user, the next step is to validate the requested transaction. The application does this by evaluating the user’s current account information. Business requirements allow us to perform multiple validations at the same time, as long as we don’t process any two transactions for the same account at the same time or do them out of sequence. This is a FIFO requirement, a key bottleneck in parallel business applications. Our first configuration with the single pipeline guarantees compliance with this requirement; but we want to distribute the process, so we need a parallel solution that also supports the FIFO requirement.
The key to the solution is the use of multiple pipelines, as shown in Figure 1.6. We assign a segment of the incoming transactions to each of several pipelines. Each pipeline maintains FIFO order, but we use content-based distribution to limit the pipeline’s load to a small subset of the entire number of transactions.
Figure 1.6 Distribute the validation step.
To implement the new design, we create a pipeline for each branch of the bank (named branch_1 through branch_5), so that each pipeline controls a subset of accounts. We want the pipelines to handle delegated transactions sequentially, so we specify FIFO order for the new pipelines.
The Pipeline Distributor checks the branch ID in each transaction (which is an example of content-based distribution), then sends the transaction to the matching pipeline.
Now, by processing many branches in parallel, the system completes many more transactions per unit of time.
You can use this approach to scale the application up even further, as shown in Figure 1.7. Let’s assume the bank has a very large branch with more than 100,000 accounts. The branch’s peak transaction volume overloads the previous pipeline configuration, so we create additional downstream pipelines. The distributor divides the transactions by using a range of account numbers (A1000_1999, A2000_2999, etc.).
Figure 1.7 Scale the application further by adding downstream pipelines.
At this point, whenever the bank’s business increases, it’s a simple matter to build additional pipeline structures to accommodate the increased volume.
To sum up, the ATM example illustrates how you can use Software Pipelines to increase process performance by an order of magnitude. It’s a simple example, but the basic principles can be used in many other applications.