- 4.1 Chapter Objectives
- 4.2 Tools That Report System Configuration
- 4.3 Tools That Report Current System Status
- 4.4 Process- and Processor-Specific Tools
- 4.5 Information about Applications
4.3 Tools That Report Current System Status
4.3.1 Introduction
This section covers tools that report system-wide information, such as what processes are being run and how much the disk is being utilized.
4.3.2 Reporting Virtual Memory Utilization (vmstat)
vmstat is a very useful tool that ships with Solaris and reports the system's virtual memory and processor utilization. The information is aggregated over all the tasks of all the users of the system. Example 4.11 shows sample output from vmstat.
Example 4.11. Sample Output from vmstat
$ vmstat 1 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr f0 sd sd -- in sy cs us sy id 0 0 0 5798208 1784568 25 61 1 1 1 0 0 0 1 0 0 120 170 94 9 6 85 0 0 0 5684752 1720704 0 15 0 0 0 0 0 0 0 0 0 155 35 135 50 0 50 0 0 0 5684752 1720688 0 0 0 0 0 0 0 0 0 0 0 117 10 98 50 0 50 0 0 0 5684560 1720496 0 493 0 0 0 0 0 0 0 0 0 114 260 91 49 1 50 0 0 0 5680816 1716744 2 2 0 0 0 0 0 0 0 0 0 118 196 103 50 0 50 0 0 0 5680816 1716648 18 18 0 0 0 0 0 0 0 0 0 148 23 116 50 0 50 0 0 0 5680816 1716584 0 0 0 0 0 0 0 0 0 0 0 115 19 100 50 0 50 0 0 0 5680752 1716520 0 40 0 0 0 0 0 0 22 0 0 129 14 99 50 4 46 0 0 0 5680496 1716264 0 0 0 0 0 0 0 0 0 0 0 109 24 100 50 0 50 0 0 0 5680496 1716184 11 11 0 0 0 0 0 0 0 0 0 140 23 107 50 0 50
Each column of the output shown in Example 4.11 represents a different metric; the command-line argument of 1 requested that vmstat report status at one-second intervals. The first row is the average of the machine since it was switched on; subsequent rows are the results at one-second intervals.
The columns that vmstat reports are as follows.
- procs: The first three columns report the status of processes on the system. The r column lists the number of processes in the run queue (i.e., waiting for CPU resources to run on), the b column lists the number of processes blocked (e.g., waiting on I/O, or waiting for memory to be paged in from disk), and the w column lists the number of processes swapped out to disk. If the number of processes in the run queue is greater than the number of virtual processors, the system may have too many active tasks or too few CPUs.
- memory: The two columns referring to memory show the amount of swap space available and the amount of memory on the free list, both reported in kilobytes. The swap space corresponds to how much data the processor can map before it runs out of virtual memory to hold it. The free list corresponds to how much data can fit into physical memory at one time. A low value for remaining swap space may cause processes to report out-of-memory errors. You can get additional information about the available swap space through the swap command (covered in Section 4.3.3).
- page: The columns labeled re to sr refer to paging information. The re column lists the number of pages containing data from files, either executables or data, that have been accessed again and therefore reclaimed from the list of free pages. The mf column lists the number of minor page faults, in which a page was mapped into the process that needed it. The pi column lists the number of kilobytes paged in from disk and the po column lists the number of kilobytes paged out to disk. The de column lists the anticipated short-term memory shortfall in kilobytes, which gives the page scanner a target number of pages to free up. The sr column lists the number of pages scanned per second. A high scan rate (sr) is also an indication of low memory, and that the machine is having to search through memory to find pages to send to disk. The solution is to either run fewer applications or put more memory into the machine. Continuously high values of pi and po indicate significant disk activity, due to either a high volume of I/O or to paging of data to and from disk when the system runs low on memory.
- disk: There is space to report on up to four disk drives, and these columns show the number of disk operations per second for each of the four drives.
- faults: There are three columns on faults (i.e., traps and interrupts). The in column lists the number of interrupts; these are used for tasks such as handling a packet of data from the network interface card. The sy column lists the number of system calls; these are calls into the kernel for the system to perform a task. The cs column lists the number of context switches, whereby one thread leaves the CPU and another is placed on the CPU.
- cpu: The final three columns are the percentage of user, system, and idle time. This is an aggregate over all the processors. Example 4.11 shows output from a two-CPU machine. With an idle time of 50%, this can mean that both CPUs are busy, but each only half the time, or that only one of the two CPUs is busy. In an ideal world, most of the time should be spent in user code, performing computations, rather than in the system, managing resources. Of course, this does not mean that the time in user code is being spent efficiently, just that the time isn't spent in the kernel or being idle. High levels of system time mean something is wrong, or the application is making many system calls. Investigating the cause of high system time is always worthwhile.
4.3.3 Reporting Swap File Usage (swap)
Swap space is disk space reserved for anonymous data (data that is not otherwise held on a filesystem). You can use the swap command to add and delete swap space from a system. It can also list the locations of swap space using the -l flag, and report a summary of swap space usage under the -s flag. Examples of output from both of these flags is shown in Example 4.12.
Example 4.12. Output from the swap Command
% swap -l swapfile dev swaplo blocks free /dev/dsk/c1t0d0s1 118,33 16 25175408 25175408 % swap -s total: 2062392k bytes allocated + 1655952k reserved = 3718344k used, 36500448k available
4.3.4 Reporting Process Resource Utilization (prstat)
prstat was a very useful addition to Solaris 8. It prints out a list of the processes that are consuming the most processor time, which can be helpful in identifying processes that are consuming excessive amounts of CPU resources. It also reports useful values, such as the amount of memory used.
Example 4.13 shows the first few lines of output from prstat. It reports a screen of information, each line representing a particular process. By default, the processes are listed starting with the one that is consuming the most CPU time.
Example 4.13. Sample Output from prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 29013 martin 4904K 1944K cpu0 40 0 0:01:15 44% myapplication/1 210 root 4504K 2008K sleep 59 0 0:27:34 0.1% automountd/2 29029 martin 4544K 4256K cpu1 59 0 0:00:00 0.1% prstat/1 261 root 2072K 0K sleep 59 0 0:00:00 0.0% smcboot/1 ...
The columns are as follows.
- PID: The process ID (PID), which is a unique number assigned to identify a particular process.
- USERNAME: The ID of the user owning the process.
- SIZE: The total size of the process. This is a measure of how much virtual address space has been allocated to the process. It does not measure how much physical memory the process is currently using.
- RSS: The resident set size (RSS) of the process, that is, how much of the process is actually in memory. The RSS of an application can fluctuate depending on how much data the application is currently using, and how much of the application has been swapped out to disk.
- STATE: The state of the process, that is, whether it is sleeping, on a CPU (as the two processes for "martin" are in the example), or waiting for a processor to run on.
- PRI: The priority of the process, which is a measure of how important it is for CPU time to be allocated to a particular process. The higher the priority, the more time the kernel will allow the process to be on a CPU.
- NICE: The nice value for the process, which allows the user to reduce the priority of an application to allow other applications to run. The higher the nice value, the less CPU time will be allocated to it.
- TIME: The CPU time that the process has accumulated since it started.
- CPU: The percentage of the CPU that the process has recently consumed.
- PROCESS/NLWP: The name of the executable, together with the number of lightweight processes (LWPs) in the process. From Solaris 9 onward, LWPs are equivalent to threads. prstat can also report activity on a per-thread basis using the -L flag.
You can obtain a more accurate view of system utilization by using the prstat command with the -m flag. This flag reports processor utilization using microstate accounting information. Microstate accounting is a more accurate breakdown of where the process spends its time. Solaris 10 collects microstate accounting data by default. Example 4.14 shows example output from this command.
Example 4.14. Output from prstat -m
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 1946 martin 0.1 0.3 0.0 0.0 0.0 0.0 100 0.0 23 0 280 0 prstat/1 5063 martin 0.2 0.0 0.0 0.0 0.0 0.0 100 0.0 24 0 95 0 gnome-panel/1 5065 martin 0.2 0.0 0.0 0.0 0.0 0.0 100 0.0 13 0 22 0 nautilus/3 7743 martin 0.1 0.0 0.0 0.0 0.0 0.0 100 0.0 61 0 76 0 soffice1.bin/6 5202 martin 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 24 2 40 0 gnome-termin/1 ... Total: 115 processes, 207 lwps, load averages: 0.00, 0.01, 0.02
The columns in Example 4.14 are as follows.
- PID: The PID of the process.
- USERNAME: The User ID of the process owner.
- USR to LAT: The percentage of time spent by the process in the various modes: user mode (USR), system mode (SYS), system traps (TRP), text (i.e., program instruction) page faults (TFL), data page faults (DFL), user locks (LCK), sleeping (SLP), and waiting for the CPU (LAT).
- VCX and ICX: The number of context switches, voluntary (VCX) and involuntary (ICX). A voluntary context switch is one in which the process either completes its task and yields the CPU, or enters a wait state (such as waiting for data from disk). An involuntary context switch is one in which another higher-priority task is assigned to the CPU, or the process uses up its allocation of time on the CPU.
- SCL: The number of system calls.
- SIG: The number of signals received.
- PROCESS/NLWP: The name of the process (PROCESS) and the number of LWPs (NLWP).
It is possible to use the <column> flag -s to sort by a particular column. In Example 4.15, this is used to sort the processes by RSS.
Example 4.15. Output from prstat Sorted by RSS
$ prstat -s rss PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 8453 root 403M 222M sleep 49 0 13:17:50 0.0% Xsun/1 28059 robin 218M 133M sleep 49 0 0:06:04 0.1% soffice2.bin/5 28182 robin 193M 88M sleep 49 0 0:00:54 0.0% soffice1.bin/7 26704 robin 87M 72M sleep 49 0 0:06:35 0.0% firefox-bin/4 ...
4.3.5 Listing Processes (ps)
ps displays a list of all the processes in the system. It is a very flexible tool and has many options. The output in Example 4.16 shows one example of what ps can report.
Example 4.16. Sample Output from ps
$ ps -ef UID PID PPID C STIME TTY TIME CMD root 0 0 0 Jul 06 ? 0:00 sched root 1 0 0 Jul 06 ? 0:01 /etc/init - root 2 0 0 Jul 06 ? 0:13 pageout ...
The options passed to the ps command in Example 4.16 are -e, to list all the processes; and -f, to give a "full" listing, which is a particular set of columns (in particular, it gives more information about how the application was invoked than the alternative -l "long" listing).
The columns in the output are as follows.
- UID: The UID of the user who owns the process. A large number of processes are going to be owned by root.
- PID: The PID of the process.
- PPID: The PID of the parent process.
- C: This column is obsolete. It used to report processor utilization used in scheduling.
- STIME: The start date/time of the application.
- TTY: The controlling terminal for the process (where the commands that go to the process are being typed). A question mark indicates that the process does not have a controlling terminal.
- TIME: The accumulated CPU time of the process.
- CMD: The command being executed (truncated to 80 characters). Under the -f flag, the arguments are printed as well, which can be useful for distinguishing between two processes of the same name.
One of the most useful columns is the total accumulated CPU time for a process, which is the amount of time it has been running on a CPU since it started. This column is worth watching to check that the critical programs are not being starved of time by the noncritical programs.
Most of the time it is best to pipe the output of ps to some other utility (e.g., grep), because even on an idle system there can be many processes.
4.3.6 Locating the Process ID of an Application (pgrep)
It is often necessary to find out the PID of a process to examine the process further. It is possible to do this using the ps command, but it is often more convenient to use the pgrep command. This command returns processes with names that match a given text string, or processes that are owned by a given user. Example 4.17 shows two examples of the use of this command. The first example shows the tool being used to match the name of an executable. In the example, the -l flag specifies that the long output format should be generated, which includes the name of the program. The second example shows the -U flag, which takes a username and returns a list of processes owned by that particular user—in this case, the processes owned by root.
Example 4.17. Output from pgrep
% pgrep -l soff 28059 soffice2.bin 28182 soffice1.bin % pgrep -lU root 0 sched 1 init 2 pageout 3 fsflush 760 sac ...
4.3.7 Reporting Activity for All Processors (mpstat)
The mpstat tool reports activity on a per-processor basis. It reports a number of useful measures of activity that may indicate issues at the system level. Like vmstat, mpstat takes an interval parameter that specifies how frequently the data should be reported. The first lines of output reported give the data since boot time; the rates are reported in events per second. Sample output from mpstat is shown in Example 4.18.
Example 4.18. Sample Output from mpstat
$ mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 29 1 38 214 108 288 10 6 14 0 562 36 2 0 62 1 27 1 27 44 29 177 9 6 67 0 516 33 2 0 65 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 7 0 11 207 103 64 9 10 0 0 7 39 1 0 60 1 0 0 4 14 2 54 11 11 0 0 5 61 0 0 39 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 6 208 106 60 7 8 0 0 14 47 0 0 53 1 0 0 65 16 9 46 6 7 0 0 4 53 2 0 45 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 6 204 103 36 6 3 0 0 5 68 0 0 32 1 0 0 1 9 2 64 6 7 0 0 5 32 0 0 68 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 2 205 104 14 10 2 0 0 4 98 0 0 2 1 0 0 1 34 31 93 2 2 0 0 15 2 0 0 98 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 8 204 104 40 2 6 0 0 5 51 0 0 49 1 0 0 0 11 2 58 8 6 0 0 5 49 0 0 51 ....
Each line of output corresponds to a particular CPU for the previous second. The columns in the mpstat output are as follows.
- CPU: The ID of the CPU to which the data belongs. Data is reported on a per-CPU basis.
- minf: The number of minor page faults per second. These occur when a page of memory is mapped into a process.
- mjf: The number of major page faults per second. These occur when the requested page of data has to be brought in from disk.
- xcal: The number of interprocess cross-calls per second. This occurs when a process on one CPU requests action from another. An example of this is where memory is unmapped through a call to munmap. The munmap call will use a cross call to ensure that other CPUs also remove the mapping to the target memory range from their TLB.
- intr: The number of interrupts per second.
- ithr: The number of interrupt threads per second, not counting the clock interrupt. These are lower-priority interrupts that are handled by threads that are scheduled onto the processor to handle the interrupt event.
- csw: The number of context switches per second, where the process either voluntarily yields its time on the processor before the end of its allocated slot or is involuntarily displaced by a higher-priority process.
- icsw: The number of involuntary context switches per second, where the process is removed from the processor either to make way for a higher-priority thread or because it has fully utilized its time slot.
- migr: The number of thread migrations to another processor per second. Usually, best performance is obtained if the operating system keeps the process on the same CPU. In some instances, this may not be possible and the process is migrated to a different CPU.
- smtx: The number of times a mutex lock was not acquired on the first try.
- srw: The number of times a read/write lock was not acquired on the first try.
- syscl: The number of system calls per second.
- usr: The percentage of time spent in user code.
- sys: The percentage of time spent in system code.
- wt: The percentage of time spent waiting on I/O. From Solaris 10 onward, this will report zero because the method of calculating wait time has changed.
- idl: The percentage of time spent idle.
In the code in Example 4.18, the two processors are spending about 50% of their time in user code and 50% of their time idle. In fact, just a single process is running. What is interesting is that this process is migrating between the two processors (you can see this in the migrations per second). It is also apparent that processor 0 is handling most of the interrupts.
4.3.8 Reporting Kernel Statistics (kstat)
kstat is a very powerful tool for returning information about the kernel. The counts it produces are the number of events since the processor was switched on. So, to determine the number of events that an application causes, it is necessary to run kstat before and after the application, and determine the difference between the two values. Of course, this is accurate only if just one process is running on the system. Otherwise, the other processes can change the numbers.
One of the metrics that kstat reports is the number of emulated floating-point instructions. Not all floating-point operations are performed in hardware; some have been left to software. Obviously, software is more flexible, but it is slower than hardware, so determining whether an application is doing any floating-point operations in software can be useful. An example of checking for unfinished floating-point traps is shown in Example 4.19. The -p option tells kstat to report statistics in a parsable format; the -s option selects the statistic of interest.
Example 4.19. Using kstat to Check for Unfinished Floating-Point Traps
$ kstat -p -s 'fpu_unfinished_traps' unix:0:fpu_traps:fpu_unfinished_traps 32044940 $ a.out $ kstat -p -s 'fpu_unfinished_traps' unix:0:fpu_traps:fpu_unfinished_traps 32044991
Example 4.19 shows the number of unfinished floating-point operations reported by kstat before the application ran, and the number afterward. The difference between the two values is 51, which means that 51 unfinished floating-point operations were handled by trapping to software between the two calls to kstat. It is likely that these traps were caused by the application a.out, but if there was other activity on the system, these traps cannot be confidently attributed to any one particular process. To have some degree of confidence in the number of traps on a busy system, it is best to repeat the measurement several times, and to measure the number of traps that occur when the application is not running.
4.3.9 Generating a Report of System Activity (sar)
The sar utility records system activity over a period of time into an archive for later analysis. It is possible to select which aspects of system performance are recorded. Once an archive of data has been recorded, sar is also used to extract the particular activities of interest.
To record a sar data file it is necessary to specify which system events should be recorded, the name of the file in which to record the events, the interval between samples, and the number of samples you want. An example command line for sar is shown in Example 4.20.
Example 4.20. Example Command Line for sar
$ sar -A -o /tmp/sar.dat 5 10
This instructs sar to do the following.
- Record all types of events (-A).
- Store the events in the file /tmp/sar.dat.
- Record a sample at 5-second intervals.
- Record a total of 10 samples.
When sar runs it will output data to the screen as well as to the data file, as shown in Example 4.21.
Example 4.21. Output from sar as It Runs
$ sar -A -o /tmp/sar.dat 5 10 SunOS machinename 5.9 Generic_112233-01 sun4u 08/26/2003 21:07:39 %usr %sys %wio %idle device %busy avque r+w/s blks/s avwait avserv runq-sz %runocc swpq-sz %swpocc bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s swpin/s bswin/s swpot/s bswot/s pswch/s scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s iget/s namei/s dirbk/s rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s proc-sz ov inod-sz ov file-sz ov lock-sz msg/s sema/s atch/s pgin/s ppgin/s pflt/s vflt/s slock/s pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf freemem freeswap sml_mem alloc fail lg_mem alloc fail ovsz_alloc fail 21:07:44 50 1 0 50 fd0 0 0.0 0 0 0.0 0.0 ssd0 0 0.0 0 0 0.0 0.0 ssd0,a 0 0.0 0 0 0.0 0.0 ssd0,b 0 0.0 0 0 0.0 0.0 ssd0,c 0 0.0 0 0 0.0 0.0 ssd0,h 0 0.0 0 0 0.0 0.0 ssd1 0 0.0 0 0 0.0 0.0 ssd1,a 0 0.0 0 0 0.0 0.0 ssd1,b 0 0.0 0 0 0.0 0.0 ssd1,c 0 0.0 0 0 0.0 0.0 ssd1,h 0 0.0 0 0 0.0 0.0 0.0 0 0.0 0 0 0 100 0 0 100 0 0 0.00 0.0 0.00 0.0 99 76 6 14 0.00 0.00 1550 2850 0 0 0 0 0 161 0 0 0 65/30000 0 157574/157574 0 0/0 0 0/0 0.00 0.00 0.00 0.00 0.00 0.60 2.20 0.00 0.00 0.00 0.00 0.00 0.00 247682 17041644 0 0 0 0 0 0 17858560 0
Example 4.21 presents a lot of information. The text at the beginning supplies a template that indicates what the counters represent. The information is as follows.
- First it reports the time the system spent in user (%usr), system (%sys), waiting for block I/O (%wio), and idle (%idle).
- Next is a section for each device that reports the device name, percentage of time busy (%busy), average queue length while the device was busy (avque), number of reads and writes per second (r+w/s), number of 512-byte blocks transferred per second (blk/s), average wait time in ms (avwait), and average service time in ms (avserv).
- The length of the queue of runnable processes (runq_sz) and the percentage of time occupied (%runocc) are listed next. The fields swpq-sz and %swpocc no longer have values reported for them.
- Next is the number of transfers per second of buffers to disk or other block devices. Read transfers per second (bread/s), reads of system buffers (lread/s), cache hit rate for reads (%rcache), write transfers per second (bwrit/s), writes of system buffers (lwrit/s), cache hit rate for writes (%wcache), raw physical device reads (pread/s), and raw physical device writes (pwrit/s) are included.
- Swapping activity is recorded as the number of swap-ins per second (swpin/s), number of blocks of 512 bytes swapped in (bswin/s), number of swap-outs per second (swpot/s), number of 512-byte blocks swapped out (bswot/s), and number of process switches per second (pswch/s). The number of 512-byte blocks transfered includes the loading of programs.
- System calls are reported as the total number of system calls per second (scall/s), number of read calls per second (sread/s), number of write calls per second (swrit/s), number of forks per second (fork/s), number of execs per second (exec/s), number of characters transferred by read (rchar/s), and number of characters transferred by write (wchar/s).
- Next is a report of file access system routines called per second. The number of files located by inode entry (iget/s), number of file system pathname searchs (namei/s), and number of directory block reads (dirbk/s) are included.
- TTY I/O reports stats on character I/O to the controlling terminal. This includes raw character rate (rawch/s), processed character rate (canch/s), output character rate (outch/s), receive rate (rcvin/s), transmit rate (xmtin/s), and modem interrupts per second (mdmin/s).
- Process, inode, file, and lock table sizes are reported as proc-sz, inod-sz, file-sz, and lock_sz. The associated overflow (ov) fields report the overflows that occur between samples for each table.
- The number of messages and semaphores per second is reported as msg/s and sema/s.
- Paging to memory is reported as the number of page faults per second that were satisfied by reclaiming a page from memory (atch/s), the number of page-in requests per second (pgin/s), the number of page-ins per second (ppgin/s), the number of copy on write page faults per second (pflt/s), the number of page not in memory faults per second (vflt/s), and the number of faults per second caused by software locks requiring physical I/O (slock/s).
- Paging to disk is reported as the number of requested page-outs per second (pgout/s), the number of page-outs per second (ppgout/s), the number of pages placed on the free list per second (pgfree/s), the number of pages scanned per second (pgscan/s), and the percentage of igets that required a page flush (%ufs_ipf).
- Free memory is reported as the average number of pages available to user processes (freemem), and the number of disk blocks available for swapping (freeswap).
- Kernel memory allocation is reported as a small memory pool of free memory (sml_mem), the number of bytes allocated from the small memory pool (alloc), and the number of requests for small memory that failed (fail). Similar counters exist for the large pool (lg_mem, alloc, fail). The amount of memory allocated for oversize requests is reported as ovsz_alloc, and the number of times this failed as fail.
The command to read an existing sar output file is shown in Example 4.22.
Example 4.22. Command Line to Instruct sar to Read an Existing Data File
$ sar -A -f /tmp/sar.dat
This asks sar to output all (-A) the information from the sar archive (/tmp/sar.dat). It is possible to request that sar output only a subset of counters.
In the sar output shown in Example 4.21, it is apparent that the CPU is 50% busy (in fact, it is a two-CPU system, and one CPU is busy compiling an application), and that there is some character output and some read and write system calls. It is reasonably apparent that the system is CPU-bound, although it has additional CPU resources which could potentially be used to do more work.
4.3.10 Reporting I/O Activity (iostat)
The iostat utility is very similar to vmstat, except that it reports I/O activity rather than memory statistics.
The first line of output from iostat is the activity since boot. Subsequent lines represent the activity over the time interval between reports. Example output from iostat is shown in Example 4.23.
Example 4.23. Example of iostat Output
% iostat 1 tty ssd0 ssd1 nfs1 nfs58 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 2 17 1 90 22 1 45 0 0 0 0 0 27 20 1 0 79 0 234 0 0 0 8 1 6 0 0 0 0 0 0 50 2 0 48 0 80 0 0 0 0 0 0 0 0 0 0 0 0 50 0 0 50 0 80 0 0 0 560 4 16 0 0 0 0 0 0 46 2 1 50 0 80 0 0 0 352 4 13 0 0 0 0 0 0 48 8 0 44 0 80 0 0 0 560 15 13 0 0 0 0 0 0 42 6 2 50
The information is as follows.
- The first two columns give the number of characters read (tin) and written (tout) for the tty devices.
- The next four sets of three columns give information for four disks. The kps column lists the number of kilobytes per second, tps the number of transfers per second, and serv the average service time in ms.
- CPU time is reported as a percentage in user (us), system (sy), waiting for I/O (wt), and idle (id).
Another view of I/O statistics is provided by passing the -Cnx option to iostat. Output from this is shown in Example 4.24.
Example 4.24. Output from iostat -Cnx 1
$ iostat -Cnx 1 .... extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 25.0 0.0 594.0 0.0 0.3 0.0 12.7 0 4 c0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 0.0 25.0 0.0 594.0 0.0 0.3 0.0 12.7 0 8 c0t1d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t2d0 ...
In Example 4.24, each disk gets a separate row in the output. The output comprises the following columns:
- r/s: Reads per second
- w/s: Writes per second
- kr/s: Kilobytes read per second
- kw/s: Kilobytes written per second
- wait: Average number of transactions waiting for service
- actv: Average number of transactions actively being serviced
- wsvc_t: Average service time in wait queue, in milliseconds
- asvc_t: Average time actively being serviced, in milliseconds
- %w: Percentage of time the waiting queue is nonempty
- %b: Percentage of time the disk is busy
- device: The device that this applies to
In the example shown in Example 4.24, the busy device is c0t1d0, which is writing out about 600KB/s from 25 writes (about 24KB/write), each write taking about 13 ms. The device is busy about 8% of the time and has an average of about 0.3 writes going on at any one time.
If a disk is continuously busy more than about 20% of the time, it is worth checking the average service time, or the time spent waiting in the queue, to ensure that these are low. Once the disk starts to become busy, the service times may increase significantly. If this is the case, it may be worth investigating whether to spread file activity over multiple disks. The iostat options -e and -E will report the errors that have occurred for each device since boot.
4.3.11 Reporting Network Activity (netstat)
netstat can provide a variety of different reports. The -s flag shows statistics per protocol. A sample of the output showing the statistics for the IPv4 protocol is shown in Example 4.25.
Example 4.25. Example of netstat -s Output
% netstat -s ... IPv4 ipForwarding = 2 ipDefaultTTL = 255 ipInReceives =8332869 ipInHdrErrors = 0 ipInAddrErrors = 0 ipInCksumErrs = 0 ipForwDatagrams = 0 ipForwProhibits = 0 ipInUnknownProtos = 2 ipInDiscards = 0 ipInDelivers =8316558 ipOutRequests =13089344 ipOutDiscards = 0 ipOutNoRoutes = 0 ipReasmTimeout = 60 ipReasmReqds = 0 ipReasmOKs = 0 ipReasmFails = 0 ipReasmDuplicates = 0 ipReasmPartDups = 0 ipFragOKs = 0 ipFragFails = 0 ipFragCreates = 0 ipRoutingDiscards = 0 tcpInErrs = 0 udpNoPorts = 17125 udpInCksumErrs = 0 udpInOverflows = 0 rawipInOverflows = 0 ipsecInSucceeded = 0 ipsecInFailed = 0 ipInIPv6 = 0 ipOutIPv6 = 0 ipOutSwitchIPv6 = 213 ...
You can obtain a report showing input and output from netstat -i; an example is shown in Example 4.26. This output shows the number of packets sent and received, the number of errors, and finally the number of collisions.
Example 4.26. Example of netstat -i Output
% netstat -i 1 input eri0 output input (Total) output packets errs packets errs colls packets errs packets errs colls 486408174 5 499073054 3 0 530744745 5 543409625 3 0 5 0 9 0 0 12 0 16 0 0 6 0 10 0 0 13 0 17 0 0 6 0 10 0 0 14 0 18 0 0
The collision rate is the number of collisions divided by the number of output packets. A value greater than about 5% to 10% may indicate a problem. Similarly, you can calculate error rates by dividing the number of errors by the total input or output packets. An error rate greater than about one-fourth of a percent may indicate a problem.
4.3.12 The snoop command
The snoop command, which you must run with superuser privileges, gathers information on the packets that are passed over the network. It is a very powerful way of examining what the network is doing, and consequently the command has a large number of options. In "promiscuous" mode, it gathers all packets that the local machine can see. In nonpromiscuous mode (enabled using the -P flag), it only gathers information on packages that are addressed to the local machine. It is also possible to gather a trace of the packets (using the -o flag) for later processing by the snoop command (using the -i flag). The packets collected (or examined) can be filtered in various ways, perhaps most usefully by the machines communicating, alternatively individual packets can be printed out. An example of the output from the snoop command is shown in Example 4.27.
Example 4.27. Output from the snoop Command
$ snoop Using device /dev/eri (promiscuous mode) here -> mc1 TCP D=1460 S=5901 Ack=2068723218 Seq=3477475694 Len=0 Win=50400 here -> mc2 TCP D=2049 S=809 Ack=3715747853 Seq=3916150345 Len=0 Win=49640 mc1 -> here TCP D=22 S=1451 Ack=3432082168 Seq=2253017191 Len=0 Win=33078 ...
Note that snoop can capture and display unencrypted data being passed over the network. As such, use of this tool may have privacy, policy, or legal issues in some domains.
4.3.13 Reporting Disk Space Utilization (df)
The df command reports the amount of space used on the disk drives. Example output from the df command in shown in Example 4.28.
Example 4.28. Example Output from the df Command
% df -kl Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t1d0s0 3096423 1172450 1862045 39% / /proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab fd 0 0 0 0% /dev/fd swap 9475568 48 9475520 1% /var/run swap 9738072 262552 9475520 3% /tmp /dev/dsk/c0t1d0s7 28358357 26823065 1251709 96% /data /dev/dsk/c0t2d0s7 28814842 23970250 4556444 85% /export/home
The -kl option tells df to report disk space in kilobytes (rather than as the number of 512-byte blocks), and to only report data for local drives. The columns are reasonably self-explanatory and include the name of the disk, the size, the amount used, the amount remaining, and the percentage amount used. The final column shows the mount point. In this example, both the /data and the /export/home file systems are running low on available space. On Solaris 9 and later there is a -h option to produce the output in a more human-readable format.
4.3.14 Reporting Disk Space Used by Files (du)
The du utility reports the disk space used by a given directory and its subdirectories. Once again, there is a -k option to report usage in kilobytes. On Solaris 9 and later, there is also a -h option to report in a human-readable format. Example output from the du command is shown in Example 4.29.
Example 4.29. Example of Output from the du Command
% du -k 8 ./.X11-unix 8 ./.X11-pipe 3704 . % du -h 8K ./.X11-unix 8K ./.X11-pipe 3.6M .
The du command in Example 4.29 reported that two directories consume 8KB each, and there is about 3.6MB of other data in the current directory.