Scripts
Table 9-3 summarizes the scripts that follow and the providers they use.
Table 9-3. Application Script Summary
Script |
Description |
Provider |
procsnoop |
Snoop process execution |
proc |
procsystime |
System call time statistics by process |
syscall |
uoncpu.d |
Profile application on-CPU user stacks |
profile |
uoffcpu.d |
Count application off-CPU user stacks by time |
sched |
plockstat |
User-level mutex and read/write lock statistics |
plockstat |
kill.d |
Snoop process signals |
syscall |
sigdist.d |
Signal distribution by source and destination processes |
syscall |
threaded.d |
Sample multithreaded CPU usage |
profile |
procsnoop.d
This is a script version of the "New Processes" one-liner shown earlier. Tracing the execution of new processes provides important visibility for applications that call the command line; some applications can call shell commands so frequently that it becomes a performance issue—one that is difficult to spot in traditional tools (such as prstat(1M) and top(1)) because the processes are so short-lived.
Script
1 #!/usr/sbin/dtrace -s 2 3 #pragma D option quiet 4 #pragma D option switchrate=10hz 5 6 dtrace:::BEGIN 7 { 8 printf("%-8s %5s %6s %6s %s\n", "TIME(ms)", "UID", "PID", "PPID", 9 "COMMAND"); 10 start = timestamp; 11 } 12 13 proc:::exec-success 14 { 15 printf("%-8d %5d %6d %6d %s\n", (timestamp - start) / 1000000, 16 uid, pid, ppid, curpsinfo->pr_psargs); 17 } Script procsnoop.d |
Example
The following shows the Oracle Solaris commands executed as a consequence of restarting the cron daemon via svcadm(1M):
solaris# procsnoop.d TIME(ms) UID PID PPID COMMAND 3227 0 13273 12224 svcadm restart cron 3709 0 13274 106 /sbin/sh -c exec /lib/svc/method/svc-cron 3763 0 13274 106 /sbin/sh /lib/svc/method/svc-cron 3773 0 13275 13274 /usr/bin/rm -f /var/run/cron_fifo 3782 0 13276 13274 /usr/sbin/cron |
The TIME(ms) column is printed so that the output can be postsorted if desired (DTrace may shuffle the output slightly because it collects buffers from multiple CPUs).
See Also: execsnoop
A program called execsnoop exists from the DTraceToolkit, which has similar functionality to that of procsnoop. It was written originally for Oracle Solaris and is now shipped on Mac OS X by default. execsnoop wraps the D script in the shell so that command-line options are available:
macosx# execsnoop -h USAGE: execsnoop [-a|-A|-ehjsvZ] [-c command] execsnoop # default output -a # print all data -A # dump all data, space delimited -e # safe output, parseable -j # print project ID -s # print start time, us -v # print start time, string -Z # print zonename -c command # command name to snoop eg, execsnoop -v # human readable timestamps execsnoop –Z # print zonename execsnoop -c ls # snoop ls commands only |
execsnoop traces process execution by tracing the exec() system call (and variants), which do differ slightly between operating systems. Unfortunately, system calls are not a stable interface, even across different versions of the same operating system. Small changes to execsnoop have been necessary to keep it working across different versions of Oracle Solaris, because of subtle changes with the names of the exec() system calls. The lesson here is to always prefer the stable providers, such as the proc provider (which is stable) instead of syscall (which isn't).
procsystime
procsystime is a generic system call time reporter. It can count the execution of system calls, their elapsed time, and on-CPU time and can produce a report showing the system call type and process details. It is from the DTraceToolkit and shipped on Mac OS X by default in /usr/bin.
Script
The essence of the script is explained here; the actual script is too long and too uninteresting (mostly dealing with command-line options) to list; see the DTraceToolkit for the full listing.
1 syscall:::entry 2 /self->ok/ 3 { 4 @Counts[probefunc] = count(); 5 self->start = timestamp; 6 self->vstart = vtimestamp; 7 } 8 9 syscall:::return 10 /self->start/ 11 { 12 this->elapsed = timestamp - self->start; 13 this->oncpu = vtimestamp - self->vstart; 14 @Elapsed[probefunc] = sum(this->elapsed); 15 @CPU[probefunc] = sum(this->cpu); 16 self->start = 0; 17 self->vstart = 0; 18 } |
A self->ok variable is set beforehand to true if the current process is supposed to be traced. The code is then straightforward: Time stamps are set on the entry to syscalls so that deltas can be calculated on the return.
Examples
Examples include usage and file system archive.
Usage
Command-line options can be listed using -h:
solaris# procsystime -h lox# ./procsystime -h USAGE: procsystime [-aceho] [ -p PID | -n name | command ] -p PID # examine this PID -n name # examine this process name -a # print all details -e # print elapsed times -c # print syscall counts -o # print CPU times -T # print totals eg, procsystime -p 1871 # examine PID 1871 procsystime -n tar # examine processes called "tar" procsystime -aTn bash # print all details for bash procsystime df -h # run and examine "df -h" |
File System Archive
The tar(1) command was used to archive a file system, with procsystime tracing elapsed times (which is the default) for processes named tar:
solaris# procsystime -n tar Tracing... Hit Ctrl-C to end... ^C Elapsed Times for processes tar, SYSCALL TIME (ns) fcntl 58138 fstat64 96490 openat 280246 chdir 1444153 write 8922505 open64 15294117 openat64 16804949 close 17855422 getdents64 46679462 fstatat64 98011589 read 1551039139 |
Most of the elapsed time for the tar(1) command was in the read() syscall, which is expected because tar(1) is reading files from disk (which is slow I/O). The total time spent waiting for read() syscalls during the procsystime trace was 1.55 seconds.
uoncpu.d
This is a script version of the DTrace one-liner to profile the user stack trace of a given application process name. As one of the most useful one-liners, it may save typing to provide it as a script, where it can also be more easily enhanced.
Script
1 #!/usr/sbin/dtrace -s 2 3 profile:::profile-1001 4 /execname == $$1/ 5 { 6 @["\n on-cpu (count @1001hz):", ustack()] = count(); 7 } Script uoncpu.d |
Example
Here the uoncpu.d script is used to frequency count the user stack trace of all currently running Perl programs. Note perl is passed as a command-line argument, evaluated in the predicate (line 4):
# uoncpu.d perl dtrace: script 'uoncpu.d' matched 1 probe ^C [...output truncated...] on-cpu (count @1001hz): libperl.so.1`Perl_sv_setnv+0xc8 libperl.so.1`Perl_pp_multiply+0x3fe libperl.so.1`Perl_runops_standard+0x3b libperl.so.1`S_run_body+0xfa libperl.so.1`perl_run+0x1eb perl`main+0x8a perl`_start+0x7d 105 on-cpu (count @1001hz): libperl.so.1`Perl_pp_multiply+0x3f7 libperl.so.1`Perl_runops_standard+0x3b libperl.so.1`S_run_body+0xfa libperl.so.1`perl_run+0x1eb perl`main+0x8a perl`_start+0x7d 111 |
The hottest stacks identified include the Perl_pp_multiply() function, suggesting that Perl is spending most of its time doing multiplications. Further analysis of those functions and using the perl provider, if available (see Chapter 8), could confirm.
uoffcpu.d
As a companion to uoncpu.d, the uoffcpu.d script measures the time spent off-CPU by user stack trace. This time includes device I/O, lock wait, and dispatcher queue latency.
Script
1 #!/usr/sbin/dtrace -s 2 3 sched:::off-cpu 4 /execname == $$1/ 5 { 6 self->start = timestamp; 7 } 8 9 sched:::on-cpu 10 /self->start/ 11 { 12 this->delta = (timestamp - self->start) / 1000; 13 @["off-cpu (us):", ustack()] = quantize(this->delta); 14 self->start = 0; 15 } Script uoffcpu.d |
Example
Here the uoffcpu.d script was used to trace CPU time of bash shell processes:
# uoffcpu.d bash dtrace: script 'uoffcpu.d' matched 6 probes ^C [...] off-cpu (us): libc.so.1`__waitid+0x7 libc.so.1`waitpid+0x65 bash`0x8090627 bash`wait_for+0x1a4 bash`execute_command_internal+0x6f1 bash`execute_command+0x5b bash`reader_loop+0x1bf bash`main+0x7df bash`_start+0x7d value ------------- Distribution ------------- count 262144 | 0 524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 1048576 | 0 off-cpu (us): libc.so.1`__read+0x7 bash`rl_getc+0x47 bash`rl_read_key+0xeb bash`readline_internal_char+0x99 bash`0x80d945a bash`0x80d9481 bash`readline+0x55 bash`0x806e11f bash`0x806dff4 bash`0x806ed06 bash`0x806f9b4 bash`0x806f3a4 bash`yyparse+0x4b9 bash`parse_command+0x80 bash`read_command+0xd9 bash`reader_loop+0x147 bash`main+0x7df bash`_start+0x7d value ------------- Distribution ------------- count 32768 | 0 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5 131072 |@@@@@@@@@@@ 2 262144 | 0 |
While tracing, in another bash shell, the command sleep 1 was typed and executed. The previous output shows the keystroke latency (mostly 65 ms to 131 ms) of the read commands, as well as the time spent waiting for the sleep(1) command to complete (in the 524 to 1048 ms range, which matches expectation: 1000 ms).
Note the user stack frame generated by the ustack() function contains a mix of symbol names and hex values (for example, bash`0x806dff4) in the output. This can happen for one of several reasons whenever ustack() is used. DTrace actually collects and stores the stack frames has hex values. User addresses are resolved to symbol names as a postprocessing step before the output is generated. It is possible DTrace will not be able to resolve a user address to a symbol name if any of the following is true:
- The user process being traced has exited before the processing can be done.
- The symbol table has been stripped, either from the user process binary or from the shared object libraries it has linked.
- We are executing user code out of data via jump tables. 6
plockstat
plockstat(1M) is a powerful tool to examine user-level lock events, providing details on contention and hold time. It uses the DTrace plockstat provider, which is available for developing custom user-land lock analysis scripts. The plockstat provider (and the plockstat(1M) tool) is available on Solaris and Mac OS X and is currently being developed for FreeBSD.
Script
plockstat(1M) is a binary executable that dynamically produces a D script that is sent to libdtrace (instead of a static D script sent to libdtrace via dtrace(1M)). If it is of interest, this D script can be examined using the -V option:
solaris# plockstat -V -p 12219 plockstat: vvvv D program vvvv plockstat$target:::rw-block { self->rwblock[arg0] = timestamp; } plockstat$target:::mutex-block { self->mtxblock[arg0] = timestamp; } plockstat$target:::mutex-spin { self->mtxspin[arg0] = timestamp; } plockstat$target:::rw-blocked /self->rwblock[arg0] && arg1 == 1 && arg2 != 0/ { @rw_w_block[arg0, ustack(5)] = sum(timestamp - self->rwblock[arg0]); @rw_w_block_count[arg0, ustack(5)] = count(); self->rwblock[arg0] = 0; rw_w_block_found = 1; } [...output truncated...] |
Example
Here the plockstat(1M) command traced all lock events (-A for both hold and contention) on the Name Service Cache Daemon (nscd) for 60 seconds:
solaris# plockstat -A -e 60 -p `pgrep nscd` Mutex hold Count nsec Lock Caller ------------------------------------------------------------------------------- 30 1302583 0x814c08c libnsl.so.1`rpc_fd_unlock+0x4d 326 15687 0x8089ab8 nscd`_nscd_restart_if_cfgfile_changed+0x6c 7 709342 libumem.so.1`umem_cache_lock libumem.so.1`umem_cache_applyall+0x5e 112 16702 0x80b67b8 nscd`lookup_int+0x611 3 570898 0x81a0548 libscf.so.1`scf_handle_bind+0x231 60 24592 0x80b20e8 nscd`_nscd_mutex_unlock+0x8d 50 24306 0x80b2868 nscd`_nscd_mutex_unlock+0x8d 30 19839 libnsl.so.1`_ti_userlock libnsl.so.1`sig_mutex_unlock+0x1e 7 83100 libumem.so.1`umem_update_lock libumem.so.1`umem_update_thread+0x129 [...output truncated...] R/W reader hold Count nsec Lock Caller ------------------------------------------------------------------------------- 30 95341 0x80c0e14 nscd`_nscd_get+0xb8 120 15586 nscd`nscd_nsw_state_base_lock nscd`_get_nsw_state_int+0x19c 60 20256 0x80e0a7c nscd`_nscd_get+0xb8 120 9806 nscd`addrDB_rwlock nscd`_nscd_is_int_addr+0xd1 30 39155 0x8145944 nscd`_nscd_get+0xb8 [...output truncated...] R/W writer hold Count nsec Lock Caller ------------------------------------------------------------------------------- 30 16293 nscd`addrDB_rwlock nscd`_nscd_del_int_addr+0xeb 30 15440 nscd`addrDB_rwlock nscd`_nscd_add_int_addr+0x9c 3 14279 nscd`nscd_smf_service_state_lock nscd`query_smf_state+0x17b Mutex block Count nsec Lock Caller ------------------------------------------------------------------------------- 2 119957 0x8089ab8 nscd`_nscd_restart_if_cfgfile_changed+0x3e Mutex spin Count nsec Lock Caller ------------------------------------------------------------------------------- 1 37959 0x8089ab8 nscd`_nscd_restart_if_cfgfile_changed+0x3e Mutex unsuccessful spin Count nsec Lock Caller ------------------------------------------------------------------------------- 2 42988 0x8089ab8 nscd`_nscd_restart_if_cfgfile_changed+0x3e |
While tracing, there were very few contention events and many hold events. Hold events are normal for software execution and are ideally as short as possible, while contention events can cause performance issues as threads are waiting for locks.
The output has caught a spin event for the lock at address 0x8089ab8 (no symbol name) from the code path location nscd`_nscd_restart_if_cfgfile_changed+0x3e, which was for 38 us. This means a thread span on-CPU for 38 us before being able to grab the lock. On two other occasions, the thread gave up spinning after an average of 43 us (unsuccessful spin) and was blocked for 120 us (block), both also shown in the output.
kill.d
The kill.d script prints details of process signals as they are sent, such as the PID source and destination, signal number, and result. It's named kill.d after the kill() system call that it traces, which is used by processes to send signals.
Script
This is based on the kill.d script from the DTraceToolkit, which uses the syscall provider to trace the kill() syscall. The proc provider could also be used via the signal-* probes, which will match other signals other than via kill() (see sigdist.d next).
1 #!/usr/sbin/dtrace -s 2 3 #pragma D option quiet 4 5 dtrace:::BEGIN 6 { 7 printf("%-6s %12s %6s %-8s %s\n", 8 "FROM", "COMMAND", "SIG", "TO", "RESULT"); 9 } 10 11 syscall::kill:entry 12 { 13 self->target = (int)arg0; 14 self->signal = arg1; 15 } 16 17 syscall::kill:return 18 { 19 printf("%-6d %12s %6d %-8d %d\n", 20 pid, execname, self->signal, self->target, (int)arg0); 21 self->target = 0; 22 self->signal = 0; 23 } Script kill.d |
Note that the target PID is cast as a signed integer on line 13; this is because the kill() syscall can also send signals to process groups by providing the process group ID as a negative number, instead of the PID. By casting it, it will be correctly printed as a signed integer on line 19.
Example
Here the kill.d script has traced the bash shell sending signal 9 (SIGKILL) to PID 12838 and sending signal 2 (SIGINT) to itself, which was a Ctrl-C. kill.d has also traced utmpd sending a 0 signal (the null signal) to various processes: This signal is used to check that PIDs are still valid, without signaling them to do anything (see kill(2)).
# kill.d FROM COMMAND SIG TO RESULT 12224 bash 9 12838 0 3728 utmpd 0 4174 0 3728 utmpd 0 3949 0 3728 utmpd 0 10621 0 3728 utmpd 0 12221 0 12224 bash 2 12224 0 |
sigdist.d
The sigdist.d script shows which processes are sending which signals to other processes, including the process names. This traces all signals: the kill() system call as well as kernel-based signals (for example, alarms).
Script
This script is based on /usr/demo/dtrace/sig.d from Oracle Solaris and uses the proc provider signal-send probe.
1 #!/usr/sbin/dtrace -s [...] 45 #pragma D option quiet 46 47 dtrace:::BEGIN 48 { 49 printf("Tracing... Hit Ctrl-C to end.\n"); 50 } 51 52 proc:::signal-send 53 { 54 @Count[execname, stringof(args[1]->pr_fname), args[2]] = count(); 55 } 56 57 dtrace:::END 58 { 59 printf("%16s %16s %6s %6s\n", "SENDER", "RECIPIENT", "SIG", "COUNT"); 60 printa("%16s %16s %6d %6@d\n", @Count); 61 } Script sigdist.d |
Example
The sigdist.d script has traced the bash shell sending signal 9 (SIGKILL) to a sleep process and also signal 2 (SIGINT, Ctrl-C) to itself. It's also picked up sshd sending bash the SIGINT, which happened via a syscall write() of the Ctrl-C to the ptm (STREAMS pseudo-tty master driver) device for bash, not via the kill() syscall.
# sigdist.d Tracing... Hit Ctrl-C to end. ^C SENDER RECIPIENT SIG COUNT bash bash 2 1 bash sleep 9 1 sshd bash 2 1 sshd dtrace 2 1 sched bash 18 2 bash bash 20 3 sched sendmail 14 3 sched sendmail 18 3 sched proftpd 14 7 sched in.mpathd 14 10 |
threaded.d
The threaded.d script provides data for quantifying how well multithreaded applications are performing, in terms of parallel execution across CPUs. If an application has sufficient CPU bound work and is running on a system with multiple CPUs, then ideally the application would have multiple threads running on those CPUs to process the work in parallel.
Script
This is based on the threaded.d script from the DTraceToolkit.
1 #!/usr/sbin/dtrace -s 2 3 #pragma D option quiet 4 5 profile:::profile-101 6 /pid != 0/ 7 { 8 @sample[pid, execname] = lquantize(tid, 0, 128, 1); 9 } 10 11 profile:::tick-1sec 12 { 13 printf("%Y,\n", walltimestamp); 14 printa("\n @101hz PID: %-8d CMD: %s\n%@d", @sample); 15 printf("\n"); 16 trunc(@sample); 17 } Script threaded.d |
Example
To demonstrate threaded.d, two programs were written (called test0 and test1) that perform work on multiple threads in parallel. One of the programs was coded with a lock "serialization" issue, where only the thread holding the lock can really make forward progress. See whether you can tell which one:
# threaded.d 2010 Jul 4 05:17:09, @101hz PID: 12974 CMD: test0 value ------------- Distribution ------------- count 1 | 0 2 |@@@@@@@@@ 28 3 |@@ 6 4 |@@@@@@@@@@@ 32 5 |@@@@@ 14 6 |@@@@@ 15 7 |@@@ 8 8 |@@ 5 9 |@@@ 10 10 | 0 @101hz PID: 12977 CMD: test1 value ------------- Distribution ------------- count 1 | 0 2 |@@@@ 77 3 |@@@@@@ 97 4 |@@@@ 77 5 |@@@@@ 87 6 |@@@@ 76 7 |@@@@@@ 101 8 |@@@@ 76 9 |@@@@@@ 100 10 | 0 [...] |
threaded.d prints output every second, which shows a distribution plot where value is the thread ID and count is the number of samples during that second. By glancing at the output, both programs had every thread sampled on-CPU during the one second, so the issue may not be clear. The clue is in the counts: threaded.d is sampling at 101 Hertz (101 times per second), and the sample counts for test0 only add up to 118 (a little over one second worth of samples on one CPU), whereas test1 adds up to 691. The program with the issue is test0, which is using a fraction of the CPU cycles that test1 is able to consume in the same interval.
This was a simple way to analyze the CPU execution of a multithreaded application. A more sophisticated approach would be to trace kernel scheduling events (the sched provider) as the application threads stepped on- and off-CPU.