Measuring CPU Load
Various tools are available for monitoring CPU load, but it's usually the combination of these tools that provides the most useful data.
Other than just showing how long the system has been up, the uptime command can be used to give you a rough estimate of the system load. The uptime command prints the current time, the length of time that the system has been up, and the average number of jobs in the run queue over the last 1, 5, and 15 minutes. When I type the uptime command, the system responds with the following:
11:23am up 2 day(s), 19:15, 1 user, load average: 0.01, 0.02, 0.04
Let's look at the load average numbers. The load average is the sum of the run queue length and the number of jobs currently running on CPUs. In short, it's a rough estimate of CPU usage. Notice the figures, showing averages over the last 1, 5, and 15 minutes. High load averages mean that the system is being used heavily and the response time is sluggish. What is a high load average? It depends on your system. If you've been keeping an eye on the load average, you'll know what is a good average and what is a bad average based on the history of the system. Normally, I would say a load average of 3 or less is good, but I've seen systems with a load average of 5 in which performance is still good. Different system configurations behave differently under the same load averages.
Keep in mind that the load average is simply a starting point. Just because the load average is low, it doesn't mean you are not experiencing slow response times.
The ps command will give you more useful information regarding what is going on with your system. Use the following options with the ps command to get a complete picture of all the processes running on your system:
ps elf
The system responds with the following:
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 19 T root 0 0 0 0 SY ? 0 Apr 30 ? 0:18 sched 8 S root 1 0 0 40 20 ? 150 ? Apr 30 ? 0:00 /etc/init - 19 S root 2 0 0 0 SY ? 0 ? Apr 30 ? 0:00 pageout 19 S root 3 0 0 0 SY ? 0 ? Apr 30 ? 1:01 fsflush 8 S root 333 1 0 40 20 ? 217 ? Apr 30 ? 0:00 \ /usr/lib/saf/sac -t 300 8 S root 2087 1 0 40 20 ? 239 ? 10:42:32 ? \ 0:00/bin/ksh /usr/dt/bin/sdtvolcheck -d 8 S root 144 1 0 40 20 ? 273 ? Apr 30 ? 0:00 \ /usr/sbin/rpcbind 8 S root 52 1 0 40 20 ? 268 ? Apr 30 ? 0:00 \ /usr/lib/sysevent/syseventd 8 S root 62 1 0 40 20 ? 343 ? Apr 30 ? 0:01 \ /usr/lib/picl/picld 8 S root 190 1 0 40 20 ? 562 ? Apr 30 ? 0:00 \ /usr/lib/autofs/automountd 8 S root 233 1 0 40 20 ? 173 ? Apr 30 ? 0:00 \ /usr/lib/power/powerd 8 S root 166 1 0 40 20 ? 292 ? Apr 30 ? 0:00 \ /usr/sbin/inetd -s 8 S daemon 183 1 0 40 20 ? 306 ? Apr 30 ? 0:00 \ /usr/lib/nfs/statd 8 S root 201 1 0 40 20 ? 410 ? Apr 30 ? 0:00 \ /usr/sbin/syslogd 8 S root 220 1 0 40 20 ? 394 ? Apr 30 ? 0:00 \ /usr/lib/lpsched 8 S root 180 1 0 40 20 ? 266 ? Apr 30 ? 0:00 \ /usr/lib/nfs/lockd 8 S root 215 1 0 40 20 ? 449 ? Apr 30 ? 0:01 \ /usr/openwin/bin/fbconsole -d :0
The ps command was covered in detail in Chapter 15, "Managing Processes," so I won't go into detail on this command again.
The prstat command is similar to the ps command, except (as shown in Chapter 15) it continually updates the display of information on your screen. Use this command to watch processes on your system that might be eating up system resources. The sdtprocess GUI, also described in Chapter 15, provides a friendlier graphical version of this command.
vmstat provides a convenient summary of system activity as well. When you run vmstat for the first time, the displayed result represents a summary of information since boot time. To obtain useful real-time statistics, run vmstat with a time step as follows:
vmstat 30
This tells vmstat to run every 30 seconds and to display the results on the screen as follows until you type Ctrl+C to interrupt the command:
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr dd f0 s0 -- in sy cs us sy id 0 0 0 596704 31592 0 1 0 0 0 0 0 0 0 0 0 403 96 61 2 0 98 0 0 0 595040 24624 2 12 0 0 0 0 0 1 0 0 0 404 104 62 0 0 99 0 0 0 595040 24624 2 11 0 0 0 0 0 1 0 0 0 413 147 79 0 1 99
NOTE
Disregard the first line of output. This is a summary of information since the system was booted.
The vmstat command outputs columns of information with a header across the top. Each field of output is described in Table 19.1.
Table 19.1 vmstat Fields
Field |
Description |
kthr/r |
Run queue length. |
kthr/b |
Kernel threads blocked while waiting for I/O. |
kthr/w |
Idle processes that have been swapped. |
memory/swap |
Free, unreserved swap space (KB). |
memory/free |
Free memory (KB). |
page/re |
Pages reclaimed from the free list. |
page/mf |
Minor faults (page in memory but not mapped). If the page is still in memory, a minor fault remaps the page. |
page/pi |
Paged in from swap (KB/s). (When a page is brought back from the swap device, the process will stop execution and wait. This might affect performance.) |
page/po |
Paged out to swap (KB/s). The page has been written and freed. |
page/fr |
Freed or destroyed (KB/s). This column reports the activity of the page scanner. |
page/de |
Anticipated short-term memory shortfall (KB). |
page/sr |
Scan rate (pages). This number is not reported as a "rate" but as a total number of pages scanned. |
disk/s# |
Disk activity for disk # (disk operations per second). |
faults/in |
Interrupts per second. |
faults/sy |
System calls per second. |
faults/cs |
Context switches per second. |
cpu/us |
User CPU time (%). |
cpu/sy |
System (kernel) CPU time (%). |
cpu/id |
Idle + I/O wait CPU time (%). |
NOTE
The free column in vmstat now really does mean memory that is free and not used by the page cache. In the past, it gave unreliable results.
The column labeled r under the kthr section is the run queue of processes waiting to get on the CPU(s). The id column is CPU idle time. If a 0 (zero) appears in this column, the system lacks the CPU resources to keep up with the process demand. Here's an example of a system that lacks CPU resources:
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr m0 m1 m2 m3 in sy cs us sy id 45 0 0 2887216 182104 3 707 449 6 455 0 80 2 6 1 0 1531 5797 983 61 30 9 58 0 0 2831312 46408 5 983 582 56 3211 0 492 0 0 0 0 1413 4797 1027 69 31 0 55 0 0 2830944 56064 2 649 656 3 806 0 121 0 0 0 0 1441 4627 989 69 31 0
See that the CPU idle time is zero, and the CPU is spending the majority of CPU time in user space (see us column). Two approaches can be taken here: Add extra CPUs or look over the application code to determine if the application can be opti