Introduction to Software Performance Engineering
George Santayana
In This Chapter:
- Performance failures and their consequences
- Managing performance
- Performance successes
- What is software performance engineering?
- SPE models and modeling strategies
1.1 Software and Performance
This book is about developing software systems that meet performance objectives. Performance is an indicator of how well a software system or component meets its requirements for timeliness. Timeliness is measured in terms of response time or throughput. The response time is the time required to respond to a request. It may be the time required for a single transaction, or the end-to-end time for a user task. For example, we may require that an online system provide a result within one-half second after the user presses the "enter" key. For embedded systems, it is the time required to respond to events, or the number of events processed in a time interval. The throughput of a system is the number of requests that can be processed in some specified time interval. For example, a telephony switch may be required to process 100,000 calls per hour.
NOTE
Performance is the degree to which a software system or component meets its objectives for timeliness.
Thus, performance is any characteristic of a software product that you could, in principle, measure by sitting at the computer with a stopwatch in your hand.
NOTE
Other definitions of performance include additional characteristics such as footprint or memory usage. In this book, however, we are concerned primarily with issues of timeliness.
There are two important dimensions to software performance timeliness: responsiveness and scalability.
1.1.1 Responsiveness
Responsiveness is the ability of a system to meet its objectives for response time or throughput. In end-user systems, responsiveness is typically defined from a user perspective. For example, responsiveness might refer to the amount of time it takes to complete a user task, or the number of transactions that can be processed in a given amount of time. In real-time systems, responsiveness is a measure of how fast the system responds to an event, or the number of events that can be processed in a given time.
In end-user applications, responsiveness has both an objective and a subjective component. For example, we may require that the end-to-end time for a withdrawal transaction at an ATM be one minute. However, that minute may feel very different to different users. For a user in Santa Fe in the summer, it may seem quite reasonable. To a user in Minneapolis in January, a minute may seem excessively long. Both objective and user-perceived (subjective) responsiveness must be addressed when performance objectives are specified. For example, you can improve the perceived responsiveness of a Web application by presenting user-writable fields first. Then, build the rest of the page (e.g., the fancy graphics) while the user is filling in those fields.
NOTE
Responsiveness is the ability of a system to meet its objectives for response time or throughput.
1.1.2 Scalability
Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for the software functions increases. The graph in Figure 1-1 illustrates how increasing use of a system affects its response time.
Figure 1-1: Scalability Curve
In Figure 1-1, we've plotted response time against the load on the system, as measured by the number of requests per unit time. As you can see from the curve, as long as you are below a certain threshold, increasing the load does not have a great effect on response time. In this region, the response time increases linearly with the load. At some point, however, a small increase in load begins to have a great effect on response time. In this region (at the right of the curve), the response time increases exponentially with the load. This change from a linear to an exponential increase in response time is usually due to some resource in the system (e.g., the CPU, a disk, the network, sockets, or threads) nearing one hundred percent utilization. This resource is known as the "bottleneck" resource. The region where the curve changes from linear to exponential is known as the "knee" because of its resemblance to a bent knee.
Web applications are discussed in Chapters 5, 7, and 13.
NOTE
Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for the software functions increases.
Scalability is an increasingly important aspect of today's software systems. Web applications are a case in point. It is important to maintain the responsiveness of a Web application as more and more users converge on a site. In today's competitive environment, users will go elsewhere rather than endure slow response times.
In order to build scalability into your system, you must know where the "knee" of the scalability curve falls for your hardware/software environment. If the "knee" occurs before your target load requirements, you must either reduce the utilization of the bottleneck resource by streamlining the processing, or add additional hardware (e.g., a faster CPU or an extra disk) to remove the bottleneck.
This book presents an integrated set of solutions that you can use to build responsiveness and scalability into your software systems. These solutions include a combination of modeling, measurement, and other techniques, as well as a systematic process for applying them. They also include principles, patterns, and antipatterns that help you design responsiveness and scalability into your software. These techniques focus primarily on early life cycle phases to maximize your ability to economically build performance into your software. However, we also present solutions for systems that already exhibit performance problems.