- Capabilities
- Strategy
- Checklist
- Providers
- Scripts
- Case Studies
- Summary
Providers
Table 9-2 shows providers of most interest when tracing applications.
Table 9-2. Providers for Applications
Provider |
Description |
proc |
Trace application process and thread creation and destruction and signals. |
syscall |
Trace entry and return of operating system calls, arguments, and return values. |
profile |
Sample application CPU activity at a custom rate. |
sched |
Trace application thread scheduling events. |
vminfo |
Virtual memory statistic probes, based on vmstat(1M) statistics. |
sysinfo |
Kernel statistics probes, based on mpstat(1M) statistics. |
plockstat |
Trace user-land lock events. |
cpc |
CPU Performance Counters provider, for CPU cache hit/miss by function. |
pid |
Trace internals of the application including calls to system libraries. |
language |
Specific language provider: See Chapter 8. |
You can find complete lists of provider probes and arguments in the DTrace Guide.1
pid Provider
The Process ID (pid) provider instruments user-land function execution, providing probes for function entry and return points and for every instruction in the function. It also provides access to function arguments, return codes, return instruction offsets, and register values. By tracing function entry and return, the elapsed time and on-CPU time during function execution can also be measured. It is available on Solaris and Mac OS X and is currently being developed for FreeBSD.2
The pid provider is associated with a particular process ID, which is part of the provider name: pid<PID>. The PID can be written literally, such as pid123, or specified using the macro variable $target, which provides the PID when either the -p PID or -c command option is used.
Listing pid provider function entry probes for the bash shell (running as PID 1122) yields the following:
#dtrace -ln 'pid$target:::entry' -p 1122 ID PROVIDER MODULE FUNCTION NAME 12539 pid1122 bash _start entry 12540 pid1122 bash __fsr entry 12541 pid1122 bash main entry 12542 pid1122 bash parse_long_options entry 12543 pid1122 bash parse_shell_options entry 12544 pid1122 bash exit_shell entry 12545 pid1122 bash sh_exit entry 12546 pid1122 bash execute_env_file entry 12547 pid1122 bash run_startup_files entry 12548 pid1122 bash shell_is_restricted entry 12549 pid1122 bash maybe_make_restricted entry 12550 pid1122 bash uidget entry 12551 pid1122 bash disable_priv_mode entry 12552 pid1122 bash run_wordexp entry 12553 pid1122 bash run_one_command entry [...] 15144 pid1122 libcurses.so.1 addstr entry 15145 pid1122 libcurses.so.1 attroff entry 15146 pid1122 libcurses.so.1 attron entry 15147 pid1122 libcurses.so.1 attrset entry 15148 pid1122 libcurses.so.1 beep entry 15149 pid1122 libcurses.so.1 bkgd entry [...] 15704 pid1122 libsocket.so.1 endnetent entry 15705 pid1122 libsocket.so.1 getnetent_r entry 15706 pid1122 libsocket.so.1 str2netent entry 15707 pid1122 libsocket.so.1 getprotobyname entry 15708 pid1122 libsocket.so.1 getprotobynumber entry 15709 pid1122 libsocket.so.1 getprotoent entry [...] 19019 pid1122 libc.so.1 fopen entry 19020 pid1122 libc.so.1 _freopen_null entry 19021 pid1122 libc.so.1 freopen entry 19022 pid1122 libc.so.1 fgetpos entry 19023 pid1122 libc.so.1 fsetpos entry 19024 pid1122 libc.so.1 fputc entry [...] |
There were 8,003 entry probes listed. The previous truncated output shows a sample of the available probes from the bash code segment and three libraries: libcurses, libsocket, and libc. The probe module name is the segment name.
Listing all pid provider probes for the libc function fputc() yields the following:
#dtrace -ln 'pid$target::fputc:' -p 1122 ID PROVIDER MODULE FUNCTION NAME 19024 pid1122 libc.so.1 fputc entry 20542 pid1122 libc.so.1 fputc return 20543 pid1122 libc.so.1 fputc 0 20544 pid1122 libc.so.1 fputc 1 20545 pid1122 libc.so.1 fputc 3 20546 pid1122 libc.so.1 fputc 4 20547 pid1122 libc.so.1 fputc 7 20548 pid1122 libc.so.1 fputc c 20549 pid1122 libc.so.1 fputc d 20550 pid1122 libc.so.1 fputc 13 20551 pid1122 libc.so.1 fputc 16 20552 pid1122 libc.so.1 fputc 19 20553 pid1122 libc.so.1 fputc 1c 20554 pid1122 libc.so.1 fputc 21 20555 pid1122 libc.so.1 fputc 24 20556 pid1122 libc.so.1 fputc 25 20557 pid1122 libc.so.1 fputc 26 |
The probes listed are the entry and return probes for the fputc() function, as well as probes for each instruction offset in hexadecimal (0, 1, 3, 4, 7, c, d, and so on).
Be careful when using the pid provider, especially in production environments. Application processes vary greatly in size, and many production applications have large text segments with a large number of instrumentable functions, each with tens to hundreds of instructions and with each instruction another potential probe target for the pid provider. The invocation dtrace -n 'pid1234::::' will instruct DTrace to instrument every function entry and return and to instrument every instruction in process PID 1234. Here's an example:
solaris#dtrace -n 'pid1471:::' dtrace: invalid probe specifier pid1471:::: failed to create offset probes in '__1cFStateM_sub_Op_ConI6MpknENode__v_': Not enough space solaris# dtrace -n 'pid1471:::entry' dtrace: description 'pid1471:::entry' matched 26847 probes |
Process PID 1471 was a Java JVM process. The first DTrace command attempted to insert a probe at every instruction location in the JVM but was unable to complete. The Not enough space error means the default number of 250,000 pid provider probes was not enough to complete the instrumentation. The second invocation in the example instruments the same process, but this time with the entry string in the name component of the probe, instructing DTrace to insert a probe at the entry point of every function in the process. In this case, DTrace found 26,847 instrumentation points.
Once a process is instrumented with the pid provider, depending on the number of probes and how busy the process is, using the pid provider will induce some probe effect, meaning it can slow the execution speed of the target process, in some cases dramatically.
Stability
The pid provider is considered an unstable interface, meaning that the provider interface (which consists of the probe names and arguments) may be subject to change between application software versions. This is because the interface is dynamically constructed based on the thousands of compiled functions that make up a software application. It is these functions that are subject to change, and when they do, so does the pid provider. This means that any DTrace scripts or one-liners based on the pid provider may be dependent on the application software version they were written for.
Although application software can and is likely to change between versions, many library interfaces are likely to remain unchanged, such as libc, libsocket, libpthread, and many others, especially those exporting standard interfaces such as POSIX. These can make good targets for tracing with the pid provider, because one-liners and scripts will have a higher degree of stability than when tracing application-specific software.
If a pid-based script has stopped working because of minor software changes, then ideally the script can be repaired with equivalent minor changes to match the newer software. If the software has changed significantly, then the pid-based script may need to be rewritten entirely. Because of this instability, it is recommended to use pid only when needed. If there are stable providers available that can serve a similar role, they should be used instead, and the scripts that use them will not need to be rewritten as the software changes.
Since pid is an unstable interface, the pid provider one-liners and scripts in this book are not guaranteed to work or be supported by software vendors.
The pid provider scripts in this book serve not just as examples of using the pid provider in D programs but also as example data that DTrace can make available and why that can be useful. If these scripts stop working, you can try fixing them or check for updated versions on the Web (try this book's Web site, www.dtracebook.com).
Arguments and Return Value
The arguments and return value for functions can be inspected on the pid entry and return probes.
- pid<PID>:::entry: The function arguments is (uint64_t) arg0 ... argn.
- pid<PID>:::return: The program counter is (uint64_t) arg0; the return value is (uint64_t) arg1.
The uregs[] array can also be accessed to examine individual user registers.
cpc Provider
The CPU Performance Counter (cpc) provider provides probes for profiling CPU events, such as instructions, cache misses, and stall cycles. These CPU events are based on the performance counters that the CPUs provide, which vary between manufacturers, types, and sometimes versions of the same type of CPU. A generic interface for the performance counters has been developed, the Performance Application Programming Interface (PAPI),3 which is supported by the cpc provider in addition to the platform-specific counters. The cpc provider is fully documented in the cpc provider section of the DTrace Guide and is currently available only in Solaris Nevada.4
The cpc provider probe names have the following format:
cpc:::<event name>-<mode>-<optional mask-><count> |
The event name may be a PAPI name or a platform-specific event name. On Solaris, events for the current CPU type can be listed using cpustat(1M):
solaris#cpustat -h Usage: cpustat [-c events] [-p period] [-nstD] [-T d|u] [interval [count]] [...] Generic Events: event[0-3]: PAPI_br_ins PAPI_br_msp PAPI_br_tkn PAPI_fp_ops PAPI_fad_ins PAPI_fml_ins PAPI_fpu_idl PAPI_tot_cyc PAPI_tot_ins PAPI_l1_dca PAPI_l1_dcm PAPI_l1_ldm PAPI_l1_stm PAPI_l1_ica PAPI_l1_icm PAPI_l1_icr PAPI_l2_dch PAPI_l2_dcm PAPI_l2_dcr PAPI_l2_dcw PAPI_l2_ich PAPI_l2_icm PAPI_l2_ldm PAPI_l2_stm PAPI_res_stl PAPI_stl_icy PAPI_hw_int PAPI_tlb_dm PAPI_tlb_im PAPI_l3_dcr PAPI_l3_icr PAPI_l3_tcr PAPI_l3_stm PAPI_l3_ldm PAPI_l3_tcm See generic_events(3CPC) for descriptions of these events Platform Specific Events: event[0-3]: FP_dispatched_fpu_ops FP_cycles_no_fpu_ops_retired FP_dispatched_fpu_ops_ff LS_seg_reg_load LS_uarch_resync_self_modify LS_uarch_resync_snoop LS_buffer_2_full LS_locked_operation LS_retired_cflush LS_retired_cpuid DC_access DC_miss DC_refill_from_L2 DC_refill_from_system DC_copyback DC_dtlb_L1_miss_L2_hit DC_dtlb_L1_miss_L2_miss DC_misaligned_data_ref [...] See "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" (AMD publication 31116) |
The first group, Generic Events, is the PAPI events and is documented on Solaris in the generic_events(3CPC) man page. The second group, Platform Specific Events, is from the CPU manufacturer and is typically documented in the CPU user guide referenced in the cpustat(1M) output.
The mode component of the probe name can be user for profiling user-mode, kernel for kernel-mode, or all for both.
The optional mask component is sometimes used by platform-specific events, as directed by the CPU user guide.
The final component of the probe name is the overflow count: Once this many of the specified event has occurred on the CPU, the probe fires on that CPU. For frequent events, such as cycle and instruction counts, this can be set to a high number to reduce the rate that the probe fires and therefore reduce the impact on target application performance.
cpc provider probes have two arguments: arg0 is the kernel program counter or 0 if not executing in the kernel, and arg1 is the user-level program counter or 0 if not executing in user-mode.
Depending on the CPU type, it may not be possible to enable more than one cpc probe simultaneously. Subsequent enablings will encounter a Failed to enable probe error. This behavior is similar to, and for the same reason as, the operating system, allowing only one invocation of cpustat(1M) at a time. There is a finite number of performance counter registers available for each CPU type.
The sections that follow have example cpc provider one-liners and output.
See Also
There are many topics relevant to application analysis, most of which are covered fully in separate chapters of this book.
- Chapter 3: System View
- Chapter 4: Disk I/O
- Chapter 5: File Systems
- Chapter 6: Network Lower-Level Protocols
- Chapter 7: Application-Level Protocols
- Chapter 8: Languages
All of these can be considered part of this chapter. The one-liners and scripts that follow summarize application analysis with DTrace and introduce some remaining topics such as signals, thread scaling, and the cpc provider.
One-Liners
For many of these, a Web server with processes named httpd is used as the target application. Modify httpd to be the name of the application process of interest.
proc provider
Trace new processes:
dtrace -n 'proc:::exec-success { trace(execname); }' |
Trace new processes (current FreeBSD5):
dtrace -n 'proc:::exec_success { trace(execname); }' |
New processes (with arguments):
dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }' |
New threads created, by process:
dtrace -n 'proc:::lwp-create { @[pid, execname] = count(); }' |
Successful signal details:
dtrace -n 'proc:::signal-send { printf("%s -%d %d", execname, args[2], args[1]->pr_pid); }' |
syscall provider
System call counts for processes named httpd:
dtrace -n 'syscall:::entry /execname == "httpd"/ { @[probefunc] = count(); }' |
System calls with non-zero errno (errors):
dtrace -n 'syscall:::return /errno/ { @[probefunc, errno] = count(); }' |
profile provider
User stack trace profile at 101 Hertz, showing process name and stack:
dtrace -n 'profile-101 { @[execname, ustack()] = count(); }' |
User stack trace profile at 101 Hertz, showing process name and top five stack frames:
dtrace -n 'profile-101 { @[execname, ustack(5)] = count(); }' |
User stack trace profile at 101 Hertz, showing process name and stack, top ten only:
dtrace -n 'profile-101 { @[execname, ustack()] = count(); } END { trunc(@, 10); }' |
User stack trace profile at 101 Hertz for processes named httpd:
dtrace -n 'profile-101 /execname == "httpd"/ { @[ustack()] = count(); }' |
User function name profile at 101 Hertz for processes named httpd:
dtrace -n 'profile-101 /execname == "httpd"/ { @[ufunc(arg1)] = count(); }' |
User module name profile at 101 Hertz for processes named httpd:
dtrace -n 'profile-101 /execname == "httpd"/ { @[umod(arg1)] = count(); }' |
sched provider
Count user stack traces when processes named httpd leave CPU:
dtrace -n 'sched:::off-cpu /execname == "httpd"/ { @[ustack()] = count(); }' |
pid provider
The pid provider instruments functions from a particular software version; these example one-liners may therefore require modifications to match the software version you are running. They can be executed on an existing process by using -p PID or by running a new process using -c command.
Count process segment function calls:
dtrace -n 'pid$target:a.out::entry { @[probefunc] = count(); }' -p PID |
Count libc function calls:
dtrace -n 'pid$target:libc::entry { @[probefunc] = count(); }' -p PID |
Count libc string function calls:
dtrace -n 'pid$target:libc:str*:entry { @[probefunc] = count(); }' -p PID |
Trace libc fsync() calls showing file descriptor:
dtrace -n 'pid$target:libc:fsync:entry { trace(arg0); }' -p PID |
Trace libc fsync() calls showing file path name:
dtrace -n 'pid$target:libc:fsync:entry { trace(fds[arg0].fi_pathname); }' -p PID |
Count requested malloc() bytes by user stack trace:
dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }' -p PID |
Trace failed malloc() requests:
dtrace -n 'pid$target::malloc:return /arg1 == NULL/ { ustack(); }' -p PID |
See the "C" section of Chapter 8 for more pid provider one-liners.
plockstat provider
As with the pid provider, these can also be run using the -c command.
Mutex blocks by user-level stack trace:
dtrace -n 'plockstat$target:::mutex-block { @[ustack()] = count(); }' -p PID |
Mutex spin counts by user-level stack trace:
dtrace -n 'plockstat$target:::mutex-acquire /arg2/ { @[ustack()] = sum(arg2); }' -p PID |
Reader/writer blocks by user-level stack trace:
dtrace -n 'plockstat$target:::rw-block { @[ustack()] = count(); }' -p PID |
cpc provider
These cpc provider one-liners are dependent on the availability of both the cpc provider and the event probes (for Solaris, see cpustat(1M) to learn what events are available on your system). The following overflow counts (200,000; 50,000; and 10,000) have been picked to balance between the rate of events and fired DTrace probes.
User-mode instructions by process name:
dtrace -n 'cpc:::PAPI_tot_ins-user-200000 { @[execname] = count(); }' |
User-mode instructions by process name and function name:
dtrace -n 'cpc:::PAPI_tot_ins-user-200000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode instructions for processes named httpd by function name:
dtrace -n 'cpc:::PAPI_tot_ins-user-200000 /execname == "httpd"/ { @[ufunc(arg1)] = count(); }' |
User-mode CPU cycles by process name and function name:
dtrace -n 'cpc:::PAPI_tot_cyc-user-200000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode level-one cache misses by process name and function name:
dtrace -n 'cpc:::PAPI_l1_tcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode level-one instruction cache misses by process name and function name:
dtrace -n 'cpc:::PAPI_l1_icm-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode level-one data cache misses by process name and function name:
dtrace -n 'cpc:::PAPI_l1_dcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode level-two cache misses by process name and function name:
dtrace -n 'cpc:::PAPI_l2_tcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode level-three cache misses by process name and function name:
dtrace -n 'cpc:::PAPI_l3_tcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode conditional branch misprediction by process name and function name:
dtrace -n 'cpc:::PAPI_br_msp-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode resource stall cycles by process name and function name:
dtrace -n 'cpc:::PAPI_res_stl-user-50000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode floating-point operations by process name and function name:
dtrace -n 'cpc:::PAPI_fp_ops-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
User-mode TLB misses by process name and function name:
dtrace -n 'cpc:::PAPI_tlb_tl-user-10000 { @[execname, ufunc(arg1)] = count(); }' |
One-Liner Selected Examples
There are additional examples of one-liners in the "Case Study" section.
New Processes (with Arguments)
New processes were traced on Solaris while the man ls command was executed:
solaris#dtrace -n 'proc:::exec-success { trace(curpsinfo->pr_psargs); }' dtrace: description 'proc:::exec-success ' matched 1 probe CPU ID FUNCTION:NAME 0 13487 exec_common:exec-success man ls 0 13487 exec_common:exec-success sh -c cd /usr/share/man; tbl /usr/share/ man/man1/ls.1 |neqn /usr/share/lib/pub/ 0 13487 exec_common:exec-success tbl /usr/share/man/man1/ls.1 0 13487 exec_common:exec-success neqn /usr/share/lib/pub/eqnchar - 0 13487 exec_common:exec-success nroff -u0 -Tlp -man - 0 13487 exec_common:exec-success col -x 0 13487 exec_common:exec-success sh -c trap '' 1 15; /usr/bin/mv -f /tmp/ mpcJaP5g /usr/share/man/cat1/ls.1 2> /d 0 13487 exec_common:exec-success /usr/bin/mv -f /tmp/mpcJaP5g /usr/share/ man/cat1/ls.1 0 13487 exec_common:exec-success sh -c more -s /tmp/mpcJaP5g 0 13487 exec_common:exec-success more -s /tmp/mpcJaP5g ^C |
The variety of programs that are executed to process man ls are visible, ending with the more(1) command that shows the man page.
Mac OS X currently doesn't provide the full argument list in pr_psargs, which is noted in the comments of the curpsinfo translator:
macosx#grep pr_psargs /usr/lib/dtrace/darwin.d char pr_psargs[80]; /* initial characters of arg list */ pr_psargs = P->p_comm; /* XXX omits command line arguments XXX */ pr_psargs = xlate <psinfo_t> ((struct proc *)(T->task->bsd_info)).pr_psargs; /* XXX omits command line arguments XXX */ |
And using pr_psargs in trace() on Mac OS X can trigger tracemem() behavior, printing hex dumps from the address, which makes reading the output a little difficult. It may be easier to just use the execname for this one-liner for now. Here's an example of tracing man ls on Mac OS X:
macosx#dtrace -n 'proc:::exec-success { trace(execname); }' dtrace: description 'proc:::exec-success ' matched 2 probes CPU ID FUNCTION:NAME 0 19374 posix_spawn:exec-success sh 0 19374 posix_spawn:exec-success sh 0 19368 __mac_execve:exec-success sh 0 19368 __mac_execve:exec-success tbl 0 19368 __mac_execve:exec-success sh 0 19368 __mac_execve:exec-success grotty 0 19368 __mac_execve:exec-success more 1 19368 __mac_execve:exec-success man 1 19368 __mac_execve:exec-success sh 1 19368 __mac_execve:exec-success gzip 1 19368 __mac_execve:exec-success gzip 1 19374 posix_spawn:exec-success sh 1 19368 __mac_execve:exec-success groff 1 19368 __mac_execve:exec-success troff 1 19368 __mac_execve:exec-success gzip ^C |
Note that the output is shuffled (the CPU ID change is a hint). For the correct order, include a time stamp in the output and postsort.
System Call Counts for Processes Called httpd
The Apache Web server runs multiple httpd processes to serve Web traffic. This can be a problem for traditional system call debuggers (such as truss(1)), which can examine only one process at a time, usually by providing a process ID. DTrace can examine all processes simultaneously, making it especially useful for multiprocess applications such as Apache.
This one-liner frequency counts system calls from all running Apache httpd processes:
solaris#dtrace -n 'syscall:::entry /execname == "httpd"/ { @[probefunc] = count(); }' dtrace: description 'syscall:::entry ' matched 225 probes ^C accept 1 getpid 1 lwp_mutex_timedlock 1 lwp_mutex_unlock 1 shutdown 1 brk 4 gtime 5 portfs 7 mmap64 10 waitsys 30 munmap 33 doorfs 39 openat 49 writev 51 stat64 60 close 61 fcntl 73 read 74 lwp_sigmask 78 getdents64 98 pollsys 100 fstat64 109 open64 207 lstat64 245 |
The most frequently called system call was lstat64(), called 245 times.
User Stack Trace Profile at 101 Hertz, Showing Process Name and Top Five Stack Frames
This one-liner is a quick way to see not just who is on-CPU but what they are doing:
solaris#dtrace -n 'profile-101 { @[execname, ustack(5)] = count(); }' dtrace: description 'profile-101 ' matched 1 probe ^C [...] mpstat libc.so.1`p_online+0x7 mpstat`acquire_snapshot+0x131 mpstat`main+0x27d mpstat`_start+0x7d 13 httpd libc.so.1`__forkx+0xb libc.so.1`fork+0x1d mod_php5.2.so`zif_proc_open+0x970 mod_php5.2.so`execute_internal+0x45 mod_php5.2.so`dtrace_execute_internal+0x59 42 sched 541 |
No stack trace was shown for sched (the kernel), since this one-liner is examining user-mode stacks (ustack()), not kernel stacks (stack()). This could be eliminated from the output by adding the predicate /arg1/ (check that the user-mode program counter is nonzero) to ensure that only user stacks are sampled.
User-Mode Instructions by Process Name
To introduce this one-liner, a couple of test applications were written and executed called app1 and app2, each single-threaded and running a continuous loop of code. Examining these applications using top(1) shows the following:
last pid: 4378; load avg: 2.13, 2.00, 1.62; up 4+02:53:19 06:24:05 98 processes: 95 sleeping, 3 on cpu CPU states: 73.9% idle, 25.2% user, 0.9% kernel, 0.0% iowait, 0.0% swap Kernel: 866 ctxsw, 19 trap, 1884 intr, 2671 syscall Memory: 32G phys mem, 1298M free mem, 4096M total swap, 4096M free swap PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 4319 root 1 10 0 1026M 513M cpu/3 10:50 12.50% app2 4318 root 1 10 0 1580K 808K cpu/7 10:56 12.50% app1 [...] |
top(1) reports that each application is using 12.5 percent of the total CPU capacity, which is a single core on this eight-core system. The Solaris prstat -mL breaks down the CPU time into microstates and shows this in terms of a single thread:
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 4318 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 8 0 0 app1/1 4319 root 100 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 8 0 0 app2/1 [...] |
prstat(1M) shows that each thread is running at 100 percent user time (USR). This is a little more information than simply %CPU from top(1), and it indicates that these applications are both spending time executing their own code.
The cpc provider allows %CPU time to be understood in greater depth. This one-liner uses the cpc provider to profile instructions by process name. The probe specified fires for every 200,000th user-level instruction, counting the current process name at the time:
solaris#dtrace -n 'cpc:::PAPI_tot_ins-user-200000 { @[execname] = count(); }' dtrace: description 'cpc:::PAPI_tot_ins-user-200000 ' matched 1 probe ^C sendmail 1 dtrace 2 mysqld 6 sshd 7 nscd 14 httpd 16 prstat 23 mpstat 52 app2 498 app1 154801 |
So, although the output from top(1) and prstat(1M) suggests that both applications are very similar in terms of CPU usage, the cpc provider shows that they are in fact very different. During the same interval, app1 executed roughly 300 times more CPU instructions than app2.
The other cpc one-liners can explain this further; app1 was written to continually execute fast register-based instructions, while app2 continually performs much slower main memory I/O.
User-Mode Instructions for Processes Named httpd by Function Name
This one-liner matches processes named httpd and profiles instructions by function, counting on every 200,000th instruction:
solaris#dtrace -n 'cpc:::PAPI_tot_ins-user-200000 /execname == "httpd"/ { @[ufunc(arg1)] = count(); }' dtrace: description 'cpc:::PAPI_tot_ins-user-200000 ' matched 1 probe ^C httpd`ap_invoke_handler 1 httpd`pcre_exec 1 libcrypto.so.0.9.8`SHA1_Update 1 [...] libcrypto.so.0.9.8`bn_sqr_comba8 39 libz.so.1`crc32_little 41 libcrypto.so.0.9.8`sha1_block_data_order 50 libcrypto.so.0.9.8`_x86_AES_encrypt 88 libz.so.1`compress_block 103 libcrypto.so.0.9.8`bn_mul_add_words 117 libcrypto.so.0.9.8`bn_mul_add_words 127 libcrypto.so.0.9.8`bn_mul_add_words 133 libcrypto.so.0.9.8`bn_mul_add_words 134 libz.so.1`fill_window 222 libz.so.1`deflate_slow 374 libz.so.1`longest_match 1022 |
The functions executing the most instructions are in the libz library, which performs compression.
User-Mode Level-Two Cache Misses by Process Name and Function Name
This example is included to suggest what to do when encountering this error:
solaris#dtrace -n 'cpc:::PAPI_l2_tcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' dtrace: invalid probe specifier cpc:::PAPI_l2_tcm-user-10000 { @[execname, ufunc(arg1)] = count(); }: probe description cpc:::PAPI_l2_tcm-user-10000 does not match any probes |
This system does have the cpc provider; however, this probe is invalid. After checking for typos, check whether the event name is supported on this system using cpustat(1M) (Solaris):
solaris#cpustat -h Usage: cpustat [-c events] [-p period] [-nstD] [-T d|u] [interval [count]] [...] Generic Events: event[0-3]: PAPI_br_ins PAPI_br_msp PAPI_br_tkn PAPI_fp_ops PAPI_fad_ins PAPI_fml_ins PAPI_fpu_idl PAPI_tot_cyc PAPI_tot_ins PAPI_l1_dca PAPI_l1_dcm PAPI_l1_ldm PAPI_l1_stm PAPI_l1_ica PAPI_l1_icm PAPI_l1_icr PAPI_l2_dch PAPI_l2_dcm PAPI_l2_dcr PAPI_l2_dcw PAPI_l2_ich PAPI_l2_icm PAPI_l2_ldm PAPI_l2_stm PAPI_res_stl PAPI_stl_icy PAPI_hw_int PAPI_tlb_dm PAPI_tlb_im PAPI_l3_dcr PAPI_l3_icr PAPI_l3_tcr PAPI_l3_stm PAPI_l3_ldm PAPI_l3_tcm See generic_events(3CPC) for descriptions of these events Platform Specific Events: event[0-3]: FP_dispatched_fpu_ops FP_cycles_no_fpu_ops_retired [...] |
This output shows that the PAPI_l2_tcm event (level-two cache miss) is not supported on this system. However, it also shows that PAPI_l2_dcm (level-two data cache miss) and PAPI_l2_icm (level-two instruction cache miss) are supported. Adjusting the one-liner for, say, data cache misses only is demonstrated by the following one-liner:
solaris#dtrace -n 'cpc:::PAPI_l2_dcm-user-10000 { @[execname, ufunc(arg1)] = count(); }' dtrace: description 'cpc:::PAPI_l2_dcm-user-10000 ' matched 1 probe ^C dtrace libproc.so.1`byaddr_cmp 1 dtrace libproc.so.1`symtab_getsym 1 dtrace libc.so.1`memset 1 mysqld mysqld`srv_lock_timeout_and_monitor_thread 1 mysqld mysqld`sync_array_print_long_waits 1 dtrace libproc.so.1`byaddr_cmp_common 2 dtrace libc.so.1`qsort 2 dtrace libproc.so.1`optimize_symtab 3 dtrace libproc.so.1`byname_cmp 6 dtrace libc.so.1`strcmp 17 app2 app2`main 399 |
This one-liner can then be run for instruction cache misses so that both types of misses can be considered.
Should the generic PAPI events be unavailable or unsuitable, the platform-specific events (as listed by cpustat(1M)) may allow the event to be examined, albeit in a way that is tied to the current CPU version.