STEP 4. Monitoring CPUs
Having identified any memory, disk, and network bottlenecks, we are finally ready to look at CPU utilization.
One of the reasons for leaving CPU until last is that there is more to monitor with CPUs. If you start here, you risk getting bogged down in detail and losing sight of the big picture. But the main reason for monitoring CPU last is that it isn't necessarily bad if your CPUs are heavily utilized.
Why would server CPUs be less than heavily utilized? CPUs on a well-tuned server (one with no memory, disk, or network bottlenecks) will either be idle because there is no work to do or will be busy much of the time.
If there is work to do, you should expect the CPUs to be doing it. If there is work to do and the CPUs aren't busy, it is probably because there is a bottleneck somewhere else, perhaps in the I/O or memory subsystems. Non-CPU bottlenecks should be resolved if at all possible to allow work to proceed uninterrupted.
The aim, then, is to ensure that your workload is CPU-limited rather than limited by memory availability, disk performance, or network performance. Once you have achieved that, optimization remains important, especially application optimization, to ensure that the CPUs are not wasting cycles.
If CPU power were infinite, a server would never be CPU-bound. In the real world, however, significant idle CPU suggests the system has been oversized.
That said, on a multiuser system there are nearly always periods when some users are idle. If CPUs are heavily utilized doing useful work all or most of the time, check user response times and batch job completion times. If response and completion times prove barely acceptable during periods of normal processing load, the server is unlikely to be able to handle peak periods gracefully. On a large multiuser SMP system, a reasonable average CPU utilization is 70%, increasing to 90% during peak periods.
Don't immediately assume that a CPU-limited system is behaving normally, though. To monitor the health of a system with respect to CPU, start by looking at system utilization.
What to Look For: System Utilization
First, use vmstat to check how busy the CPUs are. We're not looking for detail initiallythe aim at this point is to get the view from 20,000 feet. The relevant statistics to look at are CPU user% (us) and system% (sy), and the size of the run queue (r).
Consider the vmstat trace in Figure 6.
Figure 6 vmstat trace of a lightly loaded system
The CPU is only lightly utilized: id (CPU idle%) is significantly greater than zero. Not surprisingly, the run queue (r under procs) is zero, meaning no runnable processes are waiting for CPU time.
By contrast, the vmstat trace in Figure 7 shows a fully utilized system.
Figure 7 vmstat trace of a fully utilized system
The run queue shows between 30 and 50 runnable processes and 4 or 5 blocked for I/O, and no idle CPU at all. The run queue does not include processes currently executing on the CPUs, only processes waiting for CPU time. A large number of processes blocked for I/O (the b column under procs) can suggest a disk bottleneck.
Is it a problem to have an average of 40 processes waiting on the run queue for a turn on the CPUs? That depends entirely on the number of CPUs in the system: on a 64-CPU system, that situation may not be an issue; on a single CPU server, it is likely to be a major problem.
The us/sy (user/system) ratio in Figure 7 is over 4.5/1, which typically indicates a very healthy balance between CPU time spent on user applications and on kernel activity (including I/O). If system% approaches or exceeds user%, a lot of time is being spent processing system calls and interrupts, possibly indicating that excessive time is being spent on disk or network I/O.
What to Look For: Kernel Statistics
The information in Figure 8 is extracted from a statit trace monitoring system activity over a 30-second period (run with statit sleep 30). Most of the disk information has been removed to reduce the size of the output.
Figure 8 statit trace
statit shows a lot of information, including CPU, memory paging, and network and disk statistics. Part of the attraction of statit is the comprehensiveness of the information it provides. Let's look at a few highlights.
When looking at CPU utilization, don't be confused by I/O wait time. I/O wait time is highly misleading and should be regarded simply as idle. So add wait time and idle time together to determine the true idle time. Do the same for sar also (add wio and idl to determine idle time).
Check context switches and involuntary context switches. A context switch occurs when a process or thread is moved onto or off a CPU. An involuntary context switch occurs when a running process or thread has consumed its allocated time quantum or when it is preempted by a thread with a higher priority. If the ratio of context/involuntary is significantly less than about 3/1, it can indicate that processes are being preempted before they have completed processing (usually processes will yieldthat is, give upthe CPU when they request an I/O). A high level of involuntary context switching suggests there might be a benefit from using a modified TS dispatch table if your server is not a Starfire server (refer to "The TS Class" on page 220 of Configuring and Tuning Databases on the Solaris Platform for more information).
Semaphore operations (semop() calls) and message queue calls (msgrcv()+msgsnd() calls) are the typical mechanisms used by databases for Interprocess Communication (IPC) and indicate the degree of synchronization traffic between database processes (usually primarily for internal locks and latches).
TIP
Semaphore operations can increase exponentially when a database server becomes significantly overloaded. Such behavior is a symptom rather than a cause of poor performance, but it is a good indication that the CPU is unable to effectively complete the work it is doing and that more CPU resource is required.
For the sake of reference, pageouts and pgs xmnd by pgout daemon are equivalent to po and sr, respectively, in a vmstat trace.
A high level of faults due to s/w locking reqs can suggest that ISM is not being used when shared memory is attached (ISM is described in "Intimate Shared Memory" on page 24 of Configuring and Tuning Databases on the Solaris Platform). Oracle and Sybase, for example, will try to attach shared memory as ISM, but if unsuccessful will attach shared memory without ISM. In each case an advisory message is placed in the database log file, but the onus is on you to notice it. A method of determining whether ISM is being used is discussed in "EXTRA STEP: Checking for ISM" on page 24.
Drilling Down Further
Monitoring Processes
Sometimes individual processes hog CPU resource, causing poor performance for other users. Use /usr/ucb/ps -aux, or prstat as of Solaris 8, to find out the processes consuming the most CPU (note that the ps process is itself a reasonably heavy consumer of CPU cycles, especially on systems with many processes). The sdtprocess utility (shipped as part of the CDE package within Solaris) offers a useful X11-based representation of the same data.
Figure 9 shows a trace where no particular process is hogging CPU.
Figure 9 ps trace of multiple Oracle shadow processes
In this example, a lot of processes are running, but none are taking more than 0.4% of all available CPU.
In Figure 10 a couple of processes are consuming many times more CPU than other processes.
Figure 10 ps trace of CPU hogging processes
The TIME column also shows that the CPU hogs have each consumed 5 minutes of CPU time. How much performance impact they will have depends on the number of online CPUs and other active processes. The %CPU column shows the percentage of all available CPUs, not the percentage of a single CPU. In this example, the two rogue processes are each consuming one full CPU out of eight (hence 12.5 %CPU). You can use pstack and truss to get an indication of what these CPU-hogging processes are doing.
Figure 11 illustrates a method of finding the process consuming the most CPU (with ps), then listing the system calls the process is running (with truss). I stopped truss after about 10 seconds with Control-C; at that point the system call stats were printed.
Figure 11 truss system call trace of single process
Read system calls dominated, with lseek close behind. The use of lseek indicates that the application is not using the pread(2) system call, which saves a system call by eliminating the need for lseek(2).
Monitoring Interrupts
The trace in Figure 12 is from mpstat on an 8-CPU server.
Figure 12 mpstat trace for an 8-CPU server
Note that interrupts (intr) are not evenly spread across all CPUs. CPUs 11, 14, and 18 are processing more interrupts than the other CPUs. These CPUs show the highest system (sys) activity, but not the highest user (usr) activity.
Disk array and network drivers are bound to specific CPUs when Solaris boots. These CPUs handle interrupts related to these devices on behalf of all other CPUs. Notice, too, that Solaris has scheduled the running processes on the CPUs that are not busy servicing interrupts (CPUs 10, 1, and 19 show significantly greater user activity).
What You Can Do to Optimize CPU Usage
Following is a list of some causes of heavy CPU utilization, along with ways to track down the causes, and remedies to apply.
An unusually high level of system calls. If the ratio of user% to system% is low, try to find out the causes by using truss -c to identify the main system calls for a few of the most CPU-intensive processes. If read I/Os are a major factor, increasing the size of the database buffer cache might reduce the number of read I/Os.
A low ratio of context switches to involuntary context switches. On non-Starfire platforms, load the Starfire TimeShare Dispatch Table (see Chapter 15 of Configuring and Tuning Databases on the Solaris Platform).
One or more inefficient applications. Monitor processes with ps to identify the applications consuming the most CPU. Poorly written applications can have a major impact on CPU requirements and therefore offer one of the most fruitful places to begin looking when you are trying to free up CPU resources.
Poor database tuning. The next chapters of Configuring and Tuning Databases on the Solaris Platform might help you identify inefficient database behavior. A high level of latch contention, for example, can cause CPU to be consumed for little benefit. If latch contention is the cause of heavy CPU utilization, adding more CPUs may not always help.
Insufficient CPU resources. If your CPUs are consistently fully or heavily utilized and there are many more processes on the run queue than there are CPUs, you may simply need to add more CPUs. Fortunately, Solaris scales well enough that more adding CPUs is likely to help when CPU resources are scarce.
An alternative might be to remove applications from the database server onto another server to use the database's client/server capabilities. Removing applications can make a big difference to CPU utilization on the database server.