3.2 Measurement
The next phase of the performance management methodology involves measuring actual system or application performance. For this phase, performance tools are used to perform the data collection and presentation.
Measurement is based upon the answers to several interrelated questions that must be answered before any data is collected.
-
Which performance tools are available?
-
What is the purpose of the measurement?
-
Is the measurement baseline-or crisis-oriented?
-
How long should the data be collected, and at what intervals?
-
What data metrics are to be collected?
-
How well are the metrics documented?
-
How accurate are the data presented by the tool?
-
Are certain system resources already saturated?
-
How do these measurements relate to the organization's business needs?
3.2.1 Tool Availability
Some performance tools come standard with the HP-UX Operating System. Others are available as separately purchasable products. The availability of the various tools on the system being measured will constrain the answers to the other questions. Tool familiarity also affects the answer to this question. Comfort and experience with using a tool often limit the use of other tools even if other tools are more useful in a given situation. However, the best tool for a given purpose should be used to make the measurements, even if that tool must be purchased.
3.2.2 Purpose of the Measurement: Baseline versus Crisis
The purpose of the data collection must be determined in advance in order to select the correct set of tools for gathering the data. It is important to measure performance on the system when performance is acceptable. This type of measurement is called a baseline measurement. Think of a baseline as a signature or profile which can be used for purposes of comparison at those times when performance is not acceptable or is degrading. Baseline measurements require fewer metrics and a longer sampling interval, because a particular problem is not being investigated. Instead, the goals are to characterize system performance only, and to watch for trends.
An analogy to baseline measurements is a routine physical exam. The physician takes certain vital signs like blood pressure, pulse rate, temperature, and blood tests, including cholesterol counts. A visual inspection by the physician is correlated with the internal measurements and an historical record is kept to monitor trends over time. Any unusual symptoms are investigated immediately so that the physician can treat the problem before it becomes chronic.
Baseline measurements should be reviewed to develop conclusions about system performance without the immediate goal of altering performance by tuning the system or application. Baseline measurements should be archived for historical purposes. They can then be used to:
-
Review performance trends over time
-
Compare against current performance when investigating or diagnosing current performance problems
-
Provide data for performance forecasting
-
Develop and monitor service level agreements
-
Provide data for capacity planning
Data collected and archived over time does not typically need to be as voluminous and detailed as for performance problem resolution.
Performance crises usually result from failing to manage performance. Crisis measurements (those typically made during performance problem diagnosis) require much more detail so that performance problems can be adequately investigated. The additional detail involves additional metrics as well as more frequent measurement, resulting in a much larger volume of data. The purpose of crisis measurements is to characterize current system performance in detail so that appropriate tuning can be done. Managing a performance crisis is much more difficult when there are no baselines against which a comparison can be made.
Baseline measurements should be made, archived, and reviewed periodically so that future performance crises can be prevented, and dangerous performance trends acted upon before they become serious. As the data ages, less and less detail is required. As changes in the system occur, for example, adding users or making changes in the application, new baseline measurements should be taken. Another tactic is to compare baselines from similarly configured systems, to help understand the variances before problems occur.
Baseline measurements can help provide the necessary translation between the language of performance tools and the needs of users. If the data can be presented and reviewed prior to a crisis, then communication during the crisis should be easier.
Baseline measurements should include information that is not related to existing sources. For instance, reviewing the "Other Application" category in MeasureWare can indicate if new work is being added, or whether the trend is to move work away from existing applications.
3.2.3 Duration and Interval of the Measurement
Some tools are good for displaying performance metrics in real-time. Other tools are better for collecting performance metrics in the background over a long period for casual analysis later. If performance problem diagnosis is the goal of the data collection and the problem manifests itself consistently or after a short time, then real-time performance tools would be the best choice. The amount of data produced by real-time performance tools is quite large. Therefore, this type of data should not be gathered for long periods of time since there is a large storage requirement, and it is difficult to review large volumes of data. Performance problems that cannot be readily reproduced, or those that occur unpredictably or only occasionally, will warrant longer-term data collection. Tools that provide summarization and detail capabilities are the best choice.
If performance characterization or forecasting is the goal, longer-term trending tools would be the tools of choice. Chapter 5, "Survey of Unix Performance Tools" on page 51 will discuss in detail the individual tools and when they are best used. The duration of the measurement should greatly influence selection of the tool or tools that will be used to collect the data.
The measurement interval must also be determined. Sampling intervals for baseline measurements should be measured in minutes or hours rather than seconds to reduce the volume of data that will be collected. Sampling intervals for crisis measurements or for performance problem resolution need to be shorter and are measured in seconds. If performance problems are of short duration, i.e., tend to spike, sampling intervals must also be short: typically one to five seconds. The shorter the sampling interval, the higher the overhead of making the measurement. This is of particular concern if one or more of the system resources are already saturated.
3.2.4 Particular Metric Needs
Performance tools come from various sources, and people get used to using certain ones, depending on their background. These then become their favorite tools, whether or not they are best for the job. Some tools were written for specific purposes; for example, vmstat was written specifically to display virtual memory and CPU metrics, but not disk statistics. The tool chosen should be useful in diagnosing the performance issue at hand.
3.2.5 Metric Documentation
There are several hundred performance metrics available from the kernel. These metrics were developed over time, and some are better documented than others. Some of the metrics may have a one-line description that is incomprehensible. Only by reviewing kernel source code can one hope to determine the meaning of some of the more esoteric metrics. Of course, the availability of kernel source code is limited, as is the desire to review it.
For example, the manual page for the tool vmstat defines the field at as the number of address translation faults. Those of us who are familiar with hardware components of modern computers might readily conclude that this field counts the number of Translation Lookaside Buffer (TLB) faults. This would be a very desirable metric, since it would give one indication of how well the CPU address translation cache is performing. Unfortunately, only by reviewing the kernel source code can one determine that the at field in the vmstat report is really referring to the number of page faults, a virtual memory system metric.
Good documentation is needed to determine what metrics are important to the purpose of the measurement and to learn how to interpret them.
3.2.6 Metric Accuracy
In order to understand completely why some performance metrics are inaccurate, one must understand how the kernel is designed. For instance, although most CPU hardware clocks measure time in microseconds, the granularity of the HP-UX system clock is 10 milliseconds. Most Unix-based operating systems record CPU consumption on a per-process basis by noting which process was running when the clock ticked. That process is charged for consuming CPU time during the entire clock tick, whether or not it used all of the tick. So, the saying "garbage in, garbage out" applies to performance tools in a loose sense. If the source of the data is inaccurate, the data will be inaccurate.
The standard Unix performance tools use the kernel's standard sources of data. This method of determining global and per-process CPU utilization was fine in the early days of Unix when systems were much slower. Newer methods have been developed by Hewlett-Packard to more accurately characterize system performance. Tools developed by HP get their data from the new sources in the kernel, which provides for greater metric accuracy. The IEEE POSIX (P1004) committee, the Open Group, and the Performance Working Group (PWG) are all reviewing HP's implementation for adoption into a standard that could be implemented in other versions of Unix.
3.2.7 Saturation of Certain System Resources
When one or more system resources are saturated, it may be necessary to concentrate on particular metrics in order to determine the source of the saturation. However, merely invoking a performance tool causes additional overhead. The tool chosen should be one that does not exacerbate the saturation of the resource of concern. Tool overhead will be discussed more fully in Chapter 4, "Kernel Instrumentation and Performance Metrics" on page 39 and Chapter 5, "Survey of Unix Performance Tools" on page 51.
3.2.8 Relationship Between the Metric and the Application
Metrics by their nature usually count how many times some piece of code or hardware is used. The relationship of a given metric to a particular application and to the needs of its users must be established by analysis and interpretation of the data, plus a complete understanding of the application.
3.2.9 Qualitative versus Quantitative Measurements
One last point about measurements. The preceding discussion involved quantitative measures of how the system is performing. Other, equally important measures of system performance are qualitative in nature. There is a management style called "Management by Wandering Around," or MBWA, in which the manager visits a wide variety of managed personnel and asks questions that help to monitor the pulse of the organization. In the performance arena, MBWA becomes "Measuring by Wandering Around." Talk to the users. Ask them how they perceive system and application performance. Find out why they might feel that performance is bad. Watch how they interact with the system. It's possible that the users may be interacting with the system in a way that the application designers never imagined, and that is causing the application to behave poorly.
Another type of qualitative measurement can be made by interacting with the system directly and noting its responsiveness or the lack of it. Logging on the system, initiating a simple command like ls(1) for a directory listing, and noting the response might result in a qualitative measure of interactive performance.
3.2.10 Summary
In summary, the reasons for making measurements are:
-
Measurements are key to understanding what is happening in the system.
-
Measurements are the only way to know what to tune.
In addition, it is necessary to measure periodically in order to proactively monitor system and application performance so that problems can be prevented or resolved quickly.