Other Tools
In addition to the general tools and subsystem-specific ones, you have a variety of mixed and other performance-monitoring tools at your disposal. The next sections look at these in more detail.
ps
The ps command is another highly used tool where performance is concerned. Most often it is used to isolate a particular process. However, it also has numerous options that can help you get more out of ps and perhaps save some time while trying to isolate a particular process.
The ps command basically reports process status. When invoked without any options, the output looks something like this:
$ ps PID TTY TIME CMD 3220 pts/0 00:00:00 bash 3251 pts/0 00:00:00 ps
This basically tells you everything that the current session of the user who invoked it is doing.
Obviously, just seeing what you are doing in your current session is not always all that helpfulunless, of course, you are doing something very detrimental in the background!
To look at other users or the system as a whole, ps requires some further options. The ps command's options on Linux are actually grouped into sections based on selection criteria.
Let's look at these sections and what they can do.
Simple Process Selection
Using simple process selection, you can be a little selective about what you see. For example, if you want to see only processes that are attached to your current terminal, you would use the -T option:
[jfink@kerry jfink]$ ps -T PID TTY STAT TIME COMMAND 1668 pts/0 S 0:00 login -- jfink 1669 pts/0 S 0:00 -bash 1708 pts/0 R 0:00 ps -T
Process Selection by List
Another way to control what you see with ps is to view by a list type. As an example, if you want to see all the identd processes running, you would use the -C option from this group that displays a given command:
[jfink@kerry jfink]$ ps -C identd PID TTY TIME CMD 535 ? 00:00:00 identd 542 ? 00:00:00 identd 545 ? 00:00:00 identd 546 ? 00:00:00 identd 550 ? 00:00:00 identd
Output Format Control
Following process selection is output control. This is helpful when you want to see information in a particular format. A good example is using the jobs format with the -j option:
[jfink@kerry jfink]$ ps -j PID PGID SID TTY TIME CMD 1669 1669 1668 pts/0 00:00:00 bash 1729 1729 1668 pts/0 00:00:00 ps
Output Modifiers
Output modifiers can apply high-level changes to the output. The following is the output using the -e option to show the environment after running ps:
[jfink@kerry jfink]$ ps ae PID TTY STAT TIME COMMAND 1668 pts/0 S 0:00 login -- jfink 1669 pts/0 S 0:00 -bash TERM=ansi REMOTEHOST=172.16.14.102 HOME=/home/j 1754 pts/0 R 0:00 ps ae LESSOPEN=|/usr/bin/lesspipe.sh %s
The remaining sections are INFORMATION, which provides versioning information and help, and OBSOLETE options. The next three sections give some specific cases of using ps with certain options.
Some Sample ps Output
Of course, reading the man page helps, but a few practical applied examples always light the way a little better.
The most commonly used ps switch on Linux and BSD systems is this:
$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1116 380 ? S Jan27 0:01 init [3] root 2 0.0 0.0 0 0 ? SW Jan27 0:03 [kflushd] root 3 0.0 0.0 0 0 ? SW Jan27 0:18 [kupdate] root 4 0.0 0.0 0 0 ? SW Jan27 0:00 [kpiod] root 5 0.0 0.0 0 0 ? SW Jan27 0:38 [kswapd] bin 260 0.0 0.0 1112 452 ? S Jan27 0:00 portmap root 283 0.0 0.0 1292 564 ? S Jan27 0:00 syslogd -m 0 root 294 0.0 0.0 1480 700 ? S Jan27 0:00 klogd daemon 308 0.0 0.0 1132 460 ? S Jan27 0:00 /usr/sbin/atd root 322 0.0 0.0 1316 460 ? S Jan27 0:00 crond root 322 0.0 0.0 1316 460 ? S Jan27 0:00 crond root 336 0.0 0.0 1260 412 ? S Jan27 0:00 inetd root 371 0.0 0.0 1096 408 ? S Jan27 0:00 rpc.rquotad root 382 0.0 0.0 1464 160 ? S Jan27 0:00 [rpc.mountd] root 393 0.0 0.0 0 0 ? SW Jan27 2:15 [nfsd] root 394 0.0 0.0 0 0 ? SW Jan27 2:13 [nfsd] root 395 0.0 0.0 0 0 ? SW Jan27 2:13 [nfsd] root 396 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 397 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 398 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 399 0.0 0.0 0 0 ? SW Jan27 2:11 [nfsd] root 400 0.0 0.0 0 0 ? SW Jan27 2:14 [nfsd] root 428 0.0 0.0 1144 488 ? S Jan27 0:00 gpm -t ps/2 root 466 0.0 0.0 1080 408 tty1 S Jan27 0:00 /sbin/mingetty tt root 467 0.0 0.0 1080 408 tty2 S Jan27 0:00 /sbin/mingetty tt root 468 0.0 0.0 1080 408 tty3 S Jan27 0:00 /sbin/mingetty tt root 469 0.0 0.0 1080 408 tty4 S Jan27 0:00 /sbin/mingetty tt root 470 0.0 0.0 1080 408 tty5 S Jan27 0:00 /sbin/mingetty tt root 471 0.0 0.0 1080 408 tty6 S Jan27 0:00 /sbin/mingetty tt root 3326 0.0 0.0 1708 892 ? R Jan30 0:00 in.telnetd root 3327 0.0 0.1 2196 1096 pts/0 S Jan30 0:00 login -- jfink jfink 3328 0.0 0.0 1764 1012 pts/0 S Jan30 0:00 -bash jfink 3372 0.0 0.0 2692 1008 pts/0 R Jan30 0:00 ps aux
The output implies that this system's main job is to serve files via NFS, and indeed it is. It also doubles as an FTP server, but no connections were active when this output was captured.
The output of ps can tell you a lot moresometimes just simple things that can improve performance. Looking at this NFS server again, you can see that it is not too busy; actually, it gets used only a few times a day. So what are some simple things that could be done to make it run even faster? Well, for starters, you could reduce the number of virtual consoles that are accessible via the system console. I like to have a minimum of three running (in case I lock one or two). A total of six are shown in the output (the mingetty processes). There are also nine available nfsd processes; if the system is not used very often and only by a few users, that number can be reduced to something a little more reasonable.
Now you can see where tuning can be applied outside the kernel. Sometimes just entire processes do not need to be running, but those that require multiple instances (such as NFS, MySQL, or HTTP, for example) can be minimized to what is required for good operations.
The Process Forest
The process forest is a great way of seeing exactly how processes and their parents are related. The following output is a portion of the same system used in the previous section:
... root 336 0.0 0.0 1260 412 ? S Jan27 0:00 inetd root 3326 0.0 0.0 1708 892 ? S Jan30 0:00 \_ in.telnetd root 3327 0.0 0.1 2196 1096 pts/0 S Jan30 0:00 \_ login -- jfink 3328 0.0 0.0 1768 1016 pts/0 S Jan30 0:00 \_ -bash jfink 3384 0.0 0.0 2680 976 pts/0 R Jan30 0:00 \_ p s ...
Based on that output, you easily can see how the system call fork got its name.
The application here is great. Sometimes a process itself is not to blameand what if you kill an offending process only to find it respawned? The tree view can help track down the original process and kill it.
Singling Out a User
Last but definitely not least, you might need (or want) to look at a particular user's activities. On this particular system, my user account is the only userland account that does anything. I have chosen root to be the user to look at:
$ ps u --User root USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1116 380 ? S Jan27 0:01 init [3] root 2 0.0 0.0 0 0 ? SW Jan27 0:03 [kflushd] root 3 0.0 0.0 0 0 ? SW Jan27 0:18 [kupdate] root 4 0.0 0.0 0 0 ? SW Jan27 0:00 [kpiod] root 5 0.0 0.0 0 0 ? SW Jan27 0:38 [kswapd] root 283 0.0 0.0 1292 564 ? S Jan27 0:00 syslogd -m 0 root 294 0.0 0.0 1480 700 ? S Jan27 0:00 klogd daemon 308 0.0 0.0 1132 460 ? S Jan27 0:00 /usr/sbin/atd root 322 0.0 0.0 1316 460 ? S Jan27 0:00 crond root 336 0.0 0.0 1260 412 ? S Jan27 0:00 inetd root 350 0.0 0.0 1312 512 ? S Jan27 0:00 lpd root 371 0.0 0.0 1096 408 ? S Jan27 0:00 rpc.rquotad root 382 0.0 0.0 1464 160 ? S Jan27 0:00 [rpc.mountd] root 393 0.0 0.0 0 0 ? SW Jan27 2:15 [nfsd] root 394 0.0 0.0 0 0 ? SW Jan27 2:13 [nfsd] root 395 0.0 0.0 0 0 ? SW Jan27 2:13 [nfsd] root 396 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 397 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 398 0.0 0.0 0 0 ? SW Jan27 2:12 [nfsd] root 399 0.0 0.0 0 0 ? SW Jan27 2:11 [nfsd] root 400 0.0 0.0 0 0 ? SW Jan27 2:14 [nfsd] root 428 0.0 0.0 1144 488 ? S Jan27 0:00 gpm -t ps/2 root 466 0.0 0.0 1080 408 tty1 S Jan27 0:00 /sbin/mingetty tt y root 467 0.0 0.0 1080 408 tty2 S Jan27 0:00 /sbin/mingetty tt y root 468 0.0 0.0 1080 408 tty3 S Jan27 0:00 /sbin/mingetty tt y root 469 0.0 0.0 1080 408 tty4 S Jan27 0:00 /sbin/mingetty tt y root 470 0.0 0.0 1080 408 tty5 S Jan27 0:00 /sbin/mingetty tt y root 471 0.0 0.0 1080 408 tty6 S Jan27 0:00 /sbin/mingetty tt y root 3326 0.0 0.0 1708 892 ? R Jan30 0:00 in.telnetd root 3327 0.0 0.1 2196 1096 pts/0 S Jan30 0:00 login - jfink
Applying only a single user's process is helpful when a user might have a runaway. Here's a quick example: A particular piece of software used by the company for which I work did not properly die when an attached terminal disappeared (it has been cleaned up since then). It collected error messages into memory until it was killed. To make matters worse, these error message went into shared memory queues.
The only solution was for the system administrator to log in and kill the offending process. Of course, after a period of time, a script was written that would allow users to do this in a safe manner. On this particular system, there were thousands of concurrent processes. Only by filtering based on the user or doing a grep from the whole process table was it possible to figure out which process it was and any other processes that it might be affecting.
free
The free command rapidly snags information about the state of memory on your Linux system. The syntax for free is pretty straightforward:
$ free
The following is an example of free's output:
$ free total used free shared buffers cached Mem: 1036152 1033560 2592 8596 84848 932080 -/+ buffers/cache: 16632 1019520 Swap: 265064 380 264684
The first line of output shows the physical memory, and the last line shows similar information about swap. Table 3.9 explains the output of free.
Table 3.9 free Command Output Fields
Field |
Description |
---|---|
total |
Total amount of user available memory, excluding the kernel memory. (Don't be alarmed when this is lower than the memory on the machine.) |
used |
Total amount of used memory. |
free |
Total amount of memory that is free. |
shared |
Total amount of shared memory that is in use. |
buffers |
Current size of the disk buffer cache. |
cached |
Amount of memory that has been cached off onto disk. |
An analysis of the sample output shows that this system seems to be pretty healthy. Of course, this is only one measurement. What if you want to watch the memory usage over time? The free command provides an option to do just that: the -s option. The -s option activates polling at a specified interval. The following is an example:
[jfink@kerry jfink]$ free -s 60 total used free shared buffers cached Mem: 257584 65244 192340 12132 40064 4576 -/+ buffers/cache: 20604 236980 Swap: 1028120 0 1028120 total used free shared buffers cached Mem: 257584 66424 191160 12200 40084 5728 -/+ buffers/cache: 20612 236972 Swap: 1028120 0 1028120 total used free shared buffers cached Mem: 257584 66528 191056 12200 40084 5812 -/+ buffers/cache: 20632 236952 Swap: 1028120 0 1028120 ...
To stop free from polling, hit an interrupt key.
These measurements show a pretty quiet system, but the free command can come in handy if you want to see the effect of one particular command on the system. Run the command when the system is idling, and poll memory with free. free is well suited for this because of the granularity that you get in the output.
time
One very simple tool for examining the system is the time command. The time command comes in handy for relatively quick checks of how the system performs when a certain command is invoked. The way this works is simple: time returns a string value with information about the process and is launched with process like this:
$ time <command_name> [options]
Here is an example:
$ time cc hello.c -o hello
The output from the time command looks like this:
$ time cc hello.c -o hello 0.08user 0.04system 0:00.11elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (985major+522minor)pagefaults 0swaps
Even though this output is quite low-level, the time command can return very enlightening information about a particular command or program. It becomes very helpful in large environments in which operations normally take a long time. An example of this is comparing kernel compile times between different machines.