- Introduction
- Performance and Disease
- Business Requirements
- Medical Analogues
- Lab Tests and Record Keeping
- Traps and Pitfalls
- Where Does the Time Go?
- Diagnostic Strategies
- Selected Tools and Techniques
- References
- Third-Party URLs
- Acknowledgments
- About the Author
- Ordering Sun Documents
- Accessing Sun Documentation Online
Lab Tests and Record Keeping
Patients typically spend less time with the doctor than the doctor's organization spends on the patient's file. Indeed, patients are not generally allowed to actually see a doctor until their file is complete.
Lab Tests
Because medical diagnostic tests have real costs associated with them, they are ordered sparingly by doctors based on clinical signs. In computing, tests are cheap and easily automated so that vast quantities of data can be easily gathered.
Notwithstanding the low direct cost of collecting performance data, it should still be done only as clinical signs warrant. The main hazard of computer performance monitoring lies in the fact that it is not totally free. The mechanics of measurement can significantly skew the system being measured. This is sometimes referred to as probe effect or sometimes as a Heisenburg effect. Just ask anyone who has ever been wired up for a sleep study, and they will likely testify that it was among the worst night's sleep they ever had!
The degree to which a workload can be skewed by monitoring depends on the intrusiveness of the tools being used and the degree to which they are used. The impact of monitoring particular components of a very complex system varies depending on how the monitored component relates to other components in the overall work flow. Keeping the effect of measurement overhead aligned with strategic goals and tradeoffs is part of the art of choosing appropriate instrumentation. There are various strategic motivations for use of performance monitoring tools:
Health monitoring (establishing norms for a workload and using them to detect when operations become abnormal)
Capacity planning (gathering data to help forecast when additional capacity will be needed)
Gaining insight (discovering opportunities for optimization or formulating hypotheses for making high-level architectural decisions)
Diagnosing (accurately discovering the root cause of a performance problem)
Tools and techniques used for health monitoring and capacity planning purposes should not be expected to provide much value in the pursuit of gaining insight or diagnosing. Conversely, incorrect or excessive use of tools for gaining insight or diagnosis can seriously skew the data used for health monitoring and capacity planning. It is not uncommon to diagnose that excessive monitoring lies at the root of performance complaints. Benchmarks tend to yield their best results with little or no monitoring activities competing for resources.
Record Keeping
Every item in a medical patient folder has a clear purpose:
Basic patient data, including contact and billing information
Drug allergies and current medications
History of complaints, diagnoses, treatments, and ongoing conditions
Results of routine physicals
Lab reports
Referrals and reports from other doctors
In contrast, there is a noteworthy lack of standard practices in building a similar folder for computer systems. Perhaps such a file should contain, at a minimum:
Basic customer data, including contact and billing information
Configuration information, not only for the system itself, but also for key applications
History of complaints, diagnoses, treatments, and ongoing conditions
Routinely monitored data
Special test reports, including the conditions under which they were obtained
Reports and findings of consultants and system administrators
A frequently recommended practice in systems management is to record any and all system changes in a central change log. This is a good way to capture the system history, but it can be difficult to determine the system's state at any given point in time without keeping periodic snapshots of its configuration state. One principle method of acquiring system configuration data from the Solaris OS is the Sun™ Explorer tool, which is freely downloadable from SunSolve OnlineSM program site at http://sunsolve.sun.com.
Run with the defaults, the Sun Explorer software can be quite intrusive on a system running a production workload, but its usage can be tailored to be less intrusive to fit the occasion. Among the data it collects are:
/etc/system and other /etc files
Details of the storage stack, down to the disk firmware level, when possible
showrev -p and pkginfo -l output, which can be fed into the patchdiag tool for patch analysis
The patchdiag tool is available on the SunSolve Online program site.
The Sun Explorer software collects its outputs in a simple file hierarchy, which is easily navigated both by humans and automated system-analysis tools.
One of the biggest challenges in data collection is the observation and the logging of real business metrics and first-order indicators of business performance. The value of archival configuration and operational data is limited if it cannot be correlated with actual business throughput.