- The Emergence of Web Applications
- Basic Definitions
- The Nature of the Web and Its Challenges
- Performance and Scalability
- Performance
- Scalability
- The Internet Medium
- Wide Audience
- Interactive
- Dynamic
- Always On
- Integrated
- Lack of Complete Control
- Measuring Performance and Scalability
- Measuring Performance
- Beyond Benchmarking
- Measuring Scalability
- Throughput and Price/Performance
- Scalability and Performance Hints
- Think End-to-End
- Scalability Doesn't Equal Performance
- Summary
Measuring Scalability
Scalability is almost as easy to measure as performance is. We know that scalability refers to an application's ability to accommodate rising resource demand gracefully, without a noticeable loss in QoS. To measure scalability, it would seem that we need to calculate how well increasing demand is handled. But how exactly do we do this?
Let's consider a simple example. Suppose that we deploy an online banking application. One type of request that clients can make is to view recent bank transactions. Suppose that when a single client connects to the system, it takes a speedy 10 ms of server-side time to process this request. Note that network latency and other client or network issues affecting the delivery of the response will increase the end-to-end response time; for example, maybe end-to-end response time will be 1,000 ms for a single client. But, to keep our example simple, let's consider just server-side time.
Next, suppose that 50 users simultaneously want to view their recent transactions, and that it takes an average of 500 ms of server-side time to process each of these 50 concurrent requests. Obviously, our server-side response time has slowed because of the concurrency of demands. That is to be expected.
Our next question might be: How well does our application scale? To answer this, we need some scalability metrics, such as the following:
Throughputthe rate at which transactions are processed by the system
Resource usagethe usage levels for the various resources involved (CPU, memory, disk, bandwidth)
Costthe price per transaction
A more detailed discussion of these and other metrics can be found in Scaling for E-Business: Technologies, Models, Performance, and Capacity Planning (Menasce and Almeida, 2000). Measuring resource use is fairly easy; measuring throughput and cost requires a bit more explanation.
What is the throughput in both of the cases described, with one user and with 50 users? To calculate this, we can take advantage of something called Little's law, a simple but very useful measure that can be applied very broadly. Consider the simple black box shown in Figure 13. Little's law says that if this box contains an average of N users, and the average user spends R seconds in that box, then the throughput X of that box is roughly
X = N/R.
Little's law can be applied to almost any device: a server, a disk, a system, or a Web application. Indeed, any system that employs a notion of input and output and that can be considered a black box is a candidate for this kind of analysis.
Figure 13 Little's law
Armed with this knowledge, we can now apply it to our example. Specifically, we can calculate application throughput for different numbers of concurrent users. Our N will be transactions, and since R is in seconds, we will measure throughput in terms of transactions per second (tps). At the same time, let's add some data to our banking example. Table 13 summarizes what we might observe, along with throughputs calculated using Little's law. Again, keep in mind that this is just an example; I pulled these response times from thin air. Even so, they are not unreasonable.
Based on these numbers, how well does our application scale? It's still hard to say. We can quote numbers, but do they mean anything? Not really. The problem here is that we need a comparisonsomething to hold up against our mythical application so we can judge how well or how poorly our example scales.
Table 13: Sample Application Response and Throughput Times Average Response
Concurrent Users |
Time (ms) |
Throughput (tps) |
1 |
10 |
100 |
50 |
500 |
100 |
100 |
1200 |
83.333 |
150 |
2200 |
68.182 |
200 |
4000 |
50 |
One good comparison is against a "linearly scalable" version of our application, by which I mean an application that continues to do exactly the same amount of work per second no matter how many clients use it. This is not to say the average response time will remain constantno way. In fact, it will increase, but in a perfectly predictable manner. However, our throughput will remain constant. Linearly scalable applications are perfectly scalable in that their performance degrades at a constant rate directly proportional to their demands.
If our application is indeed linearly scalable, we'll see the numbers shown in Table 14. Notice that our performance degrades in a constant manner: The average response time is ten times the number of concurrent users. However, our throughput is constant at 100 tps.
To understand this data better, and how we can use it in a comparison with our original mythical application results, let's view their trends in graph form. Figure 14 illustrates average response time as a function of the number of concurrent users; Figure 15 shows throughput as a function of the number of users. These graphs also compare our results with results for an idealized system whose response time increases linearly with the number of concurrent users.
Figure 14 Scalability from the client's point of view
Figure 15 Scalability from the server's point of view
Figure 14 shows that our application starts to deviate from linear scalability after about 50 concurrent users. With a higher number of concurrent sessions, the line migrates toward an exponential trend. Notice that I'm drawing attention to the nature of the line, not the numbers to which the line corresponds. As we discussed earlier, scalability analysis is not the same as performance analysis; (that is, a slow application is not necessarily unable to scale). While we are interested in the average time per request from a performance standpoint, we are more interested in performance trends with higher concurrent demand, or how well an application deals with increased load, when it comes to scalability.
Figure 15 shows that a theoretical application should maintain a constant number of transactions per second. This makes sense: Even though our average response time may increase, the amount of work done per unit time remains the same. (Think of a kitchen faucet: It is reasonable that even though it takes longer to wash 100 dishes than to wash one, the number of dishes per second should remain constant.) Notice that our mythical application becomes less productive after 50 concurrent users. In this sense, it would be better to replicate our application and limit the number of concurrent users to 50 if we want to achieve maximum throughput.
Table 14: Linearly Scalable Application Response and Throughput Times Average Response
Concurrent Users |
Time (ms) |
Throughput (tps) |
1 |
10 |
100 |
50 |
500 |
100 |
100 |
1000 |
100 |
150 |
1500 |
100 |
200 |
2000 |
100 |
Analyzing response time and throughput trends, as we have done here, is important for gauging the scalability of your system. Figure 14 and 15 show how to compare an application and its theoretical potential. Figure 14 illustrates the efficiency from the client's point of view, where the focus is on latency; Figure 15 shows application efficiency from the server's point of view, where the focus is on productivity (work done per time unit).