- Software and Performance
- 2 The Importance of Performance
- 3 How Should You Manage Performance?
- 4 Software Performance Engineering
- 5 SPE for Object-Oriented Systems
- 6 Summary
1.2 The Importance of Performance
It's fair to ask at the outset: "Why is performance important?" The following anecdotes illustrate the answer:
NASA was forced to delay the launch of a satellite for at least eight months. The satellite and the Flight Operations Segment (FOS) software running it are a key component of the multibillion-dollar Earth Science Enterprise, an international research effort to study the interdependence of the Earth's ecosystems. The delay was caused because the FOS software had unacceptable response times for developing satellite schedules, and poor performance in analyzing satellite status and telemetry data. There were also problems with the implementation of a control language used to automate operations. The cost of this rework and the resulting delay has not yet been determined. Nevertheless it is clearly significant, and the high visibility and bad press is potentially damaging to the overall mission. Members of Congress also questioned NASA's ability to manage the program. [Harreld 1998a], [Harreld 1998b]
And, on a lighter note:
The lingerie retailer Victoria's Secret used its new Web site to broadcast its spring fashion show over the Internet. To make sure that there would be a large number of viewers, the company announced the show in a 30-second advertisement during the Super Bowl. A total of 1.5 million people logged on to the Web site to view the broadcast, which used concurrent video streams. Despite extensive pre-planning and the addition of more servers and load-balancing software, viewers experienced jerky video and interrupted audio. At least five percent of those trying to view the show were unable to access it. [Trott 1999]
Both of these anecdotes illustrate performance failuresthe inability of a software product to meet its overall objectives due to inadequate performance. Additional instances of performance failures appear in Example 1-1.
Example 1-1: Performance Failures
Distributed Order Management System
One Fortune 100 company at-tempted to implement a new distributed order management system that would integrate several legacy systems with new software to track the status of orders and trigger actions in the other systems at the proper time. Performance problems in the initial version prevented its timely deployment. After three significant schedule delays, a comprehensive study of the end-to-end performance of critical use cases identified significant architectural problems that could not be corrected with either tuning or additional hardware. An attempt was made to deploy the system to meet schedule requirements. Users were disgruntled, and did not use features intended to improve order management because of performance problems. The system was ineffective because of a failure to meet its performance requirements.
Accounting System
Another large company attempted a reimplementation of its accounting system. The original schedule estimated a two-year completion. After seven years, the system had been reimplemented three times; none of these implementations met the performance objectives. The third attempt used 60 times the CPU time of the original attempt.
Dynamic Reporting with COTS
A large bank attempted to avoid risks by using a commercial off-the-shelf (COTS) package that provided most of the functions it required. The interactive portion of the system performed acceptably, but the bank experienced serious problems with a desired dynamic reporting function. The internal database organization of the COTS package was not in the order desired in the reports, and the processing required to produce the "roll-ups" for desired totals was excessive. The bank was forced to create a new reporting function that would run at night, thus losing the benefit of producing the reports at the time they were desired.
Call Processing
A re-implementation of a call processing system for a telecommunication switch also did not consider performance early in development. The initial object-oriented design required several hundred times the allotted time to complete a call.
Automated Teller Machine
An object-oriented design for teller machines focused on reuse to streamline the customization required for each bank that purchased the machines. Developers were not worried about performance because the hardware speeds were far greater than typically required for user interactions, and they had never had performance problems with previous implementations. The first implementation was unusable due to performance problems, and required substantial re-work to correct problems.
Electronic Trading
Several online brokerage houses experienced un-usually large numbers of hits on their Web sites following a stock market dip on October 27, 1997. The Web sites could not scale to meet the demand, so customers experienced long delays in using the sites, if they could get in at all. The result was that investors lost hundreds of thousands of dollars. At least one lawsuit has been filed alleging that online capacity was insufficient to meet users' needs.
Performance is an essential quality attribute of every software system. Many software systems, however, both object-oriented and non-object- oriented, cannot be used as they are initially implemented due to performance problems. For example, if the system is an end-user application, it may not respond rapidly enough to user actions, or handle the number of transactions that occur during peak load conditions. Or, if it is an embedded system, it may not respond rapidly enough to an external stimulus, or be able to process events that occur with a high frequency.
1.2.1 Consequences of Performance Failures
As the anecdotes above and in Example 1-1 illustrate, performance failures can have a variety of negative consequences. These include:
Damaged customer relations: Your organization's image suffers because of poor performance. Even if the problem is fixed later, users will continue to associate poor performance with the product.
Business failures: Poor performance means that your staff needs more time to complete key tasks, or that you need more staff to complete these tasks in the same amount of time. This may mean that you are unable to operate on a peak business day, to respond to customer inquiries, or to generate bills or payments in a timely fashion.
Lost income: You lose revenue due to late delivery. In some cases, you may find yourself paying penalties for late delivery or failure to meet performance objectives.
Additional project resources: Project costs rise as additional resources are allocated for "tuning" or redesign.
Reduced competitiveness: "Tuning" or redesign results in late delivery that can mean missed market windows.
Project failure: In some cases it will be impossible to meet performance objectives by tuning, and too expensive to redesign the system late in the process. These projects will be canceled.
1.2.2 Causes of Performance Failures
How and why does this happen? Our experience is that performance problems are most often due to fundamental architecture or design factors rather than inefficient coding. As Clements and Northrup point out:
Whether or not a system will be able to exhibit its desired (or required) quality attributes is largely determined by the time the architecture is chosen. [Clements and Northrup 1996]
This means that performance problems are introduced early in the development process. However, most organizations ignore performance until integration testing or later. With pressure to deliver finished software in shorter and shorter times, their attitude is: "Let's get it done. If there is a performance problem, we'll fix it later." Thus, performance problems are not discovered until late in the development process, when they are more difficult (and more expensive) to fix.
This "fix-it-later" attitude is actually encouraged by many in the object-oriented community. The following quote from Auer and Beck illustrates this misinformation:
Fix-It-Later Attitude
Ignore efficiency through most of the development cycle. Tune performance once the program is running correctly and the design reflects your best understanding of how the code should be structured. The needed changes will be limited in scope or will illuminate opportunities for better design. [Auer and Beck 1996]
Reliance on "fix-it-later" has its origins in two performance myths.
Performance Myth #1
This myth is based on the assumption that you need something to measure before you can begin to manage performance:
Performance Myth
It is not possible to do anything about performance until you have something executing to measure.
The following quote from Jacobson et al. is typical of this misconception:
Changes in the system architecture to improve performance should as a rule be postponed until the system is being (partly) built. Experience shows that one frequently makes the wrong guesses, at least in large and complex systems, when it comes to the location of the bottlenecks critical to performance. To make correct assessments regarding necessary performance optimization, in most cases, we need something to measure ... [Jacobson et al. 1999]
Performance Reality
The reality is that you don't need to wait to address performance until you have some running code to measure. Performance models can predict performance during the architectural and early design phases of the project. Performance estimation and uncertainty-management techniques can compensate for the lack of precise measurements. The models are sufficient to allow evaluation of architectural or design alternatives. It is not necessary to "guess" the location of bottlenecks or to wait until measurements are available to begin the modeling.
In fact, waiting until there is sufficient code to make detailed measurements can be dangerous. As Clements notes:
Performance is largely a function of the frequency and nature of intercomponent communication, in addition to the performance characteristics of the components themselves, and hence can be predicted by studying the architecture of a system. [Clements 1996]
As we noted earlier, the architectural decisions, those that have the greatest impact on performance, are made early in the project. Waiting until there is running code to evaluate the performance impact of these decisions means that you don't find problems until much laterwhen the architectural decisions are more difficult and expensive to change, if they can be changed at all.
Performance Myth #2
This myth is based on the fear that adding performance management to the software process will delay project completion:
Performance Myth
Managing performance takes too much time.
Performance Reality
The reality is that performance management efforts do not automatically require significant amounts of time. The level of effort devoted to performance management depends on the level of risk. If there is little or no risk of a performance failure, then there is no need for an elaborate performance management program. If the risk of performance failure is high, then a higher level of effort is needed. In these cases, however, managing performance from the beginning of the software development process can actually reduce the overall project time by eliminating the need for time-consuming redesign and tuning.
Performance Myth #3
This myth is based on the fear that performance modeling will take too much time and consume too many project resources:
Performance Myth
Performance models are complex and expensive to construct.
Performance Reality
The reality is that simple models can provide the information required to identify performance problems and evaluate alternatives for correcting them. These models are inexpensive to construct and evaluate. It is no longer necessary to build a complex simulation model that is as difficult to write as the software itself. Numerous examples throughout this book illustrate these models and the results they provide. Using simple models in conjunction with the modeling principles discussed later in this chapter ensures that you get the information that you need to make software architectural and design decisions when you need to, and in a cost-effective manner.
1.2.3 Getting It Right
Managing performance throughout the development process can reduce the risk of performance failure and lead to performance successes such as these:
An airline reservation service bureau revised its airfare quote system to improve the accuracy of the "lowest fare" quotes. Performance engineers worked closely with developers throughout the project. The result was a system with 100 percent accurate quotes and improved performance.
A major insurance company designed a system to provide Web access for its own agents as well as independent agents. The first version of the design called for a large amount of code (in the form of ActiveX agents) to be downloaded to client machines. Performance models of this approach showed that, if the downloaded code underwent a significant upgrade, it would take approximately three days at full bandwidth to download the changes to all of the client machines. The design was changed to rely less on downloaded code, and the system was deployed successfully.
These anecdotes illustrate that managing performance from the initial stages of the project can pay off in systems that meet performance objectives. Additional performance success stories appear in Example 1-2.
Example 1-2: Performance Successes
Event Update
Performance engineers conducted a study of a new system early in the requirements analysis phase. Initial requirements called for events to be posted to an online relational database within three minutes of occurrence. The analysts estimated the size of the hardware required to support the requirement (assuming a streamlined software system) to be 20 mainframes!
Three minutes appeared to be a reasonable goal, but the tremendous data volume had dramatic consequences on hardware capacity. The performance engineering analysis allowed a quantitative assessment of the impact of this requirement, and made it possible to redefine the requirements to meet the underlying business goal with a more reasonable hardware configuration.
Distributed Data Access
In another study, performance engineers studied the architecture for a new distributed system to provide customers with data about their telecommunication usage. The analysis showed that three of the use cases would meet their performance objectives, but one would require significant configuration upgrades to handle the stated workload intensity.
Developers evaluated trade-offs in the frequency of requests, the hardware and network configuration, and the software architecture and design. They selected a software architecture alternative that handled the required workload without hardware upgrades. It was easy to make the required changes before code was written.
This example shows the power of addressing performance early in development. Without the early performance analysis, the problems would have been discovered much later in the development process. Many of the software alternatives would no longer have been cost-effective, and customer pricing would have been fixed, so configuration upgrades would adversely affect the bottom line. It was easy to prevent these problems with early performance management.
Data Acquisition and Reporting
In this case, performance analysis helped prevent a bad situation from becoming worse. An existing system failed to meet performance objectives because it did not scale up to support the number of users specified in the contract. As a result, the organization incurred substantial monetary penalties for failure to meet contract terms. After failing to correct problems through tuning, developers proposed replacing key portions of the system with a new object-oriented subsystem. In the process of constructing performance models of the original system (for model calibration, verification, and validation), two key problems were detected in the process synchronization strategy and in the system's technical architecture. Correcting these problems resolved the contract performance failure and provided time to address scalability in a more systematic manner. The proposed new subsystem would not have met performance requirements; in fact, the performance models predicted worse performance than with the original system!
Airline Reservations
An airline reservation system included a component to recover and restore the state of the reservations data after a major outage. The project employed good performance management techniques throughout the development process. While the developers could do partial performance tests on the recovery component, the only way to determine actual success or failure was to experience an outage. It was not possible to run an end-to-end test. The performance models predicted that the system could handle the recovery in the required time, and, the first time that a recovery was needed, the performance goals were met.
Two significant morals emerge from these success stories.
NOTE
If you intend to rely on hardware to solve performance problems, use performance models early (before other options are closed) to verify that this is a cost-effective solution.
There may be other, more cost-effective solutions than hardware upgrades. As the "distributed data access" anecdote in Example 1-2 shows, there are typically more alternatives early in the process. As more and more architecture and design decisions are made, some options may become prohibitively expensive or simply unavailable.
For its part, the "airline reservations" system story in Example 1-2 shows that
NOTE
In some cases, end-to-end performance testing is not possible, so performance models are the only option.