Case Studies
In this section, we apply the scripts and methods discussed in this chapter to observe and measure applications with DTrace.
Firefox idle
This case study examines the Mozilla Firefox Web browser version 3, running on Oracle Solaris.
The Problem
Firefox is 8.9 percent on-CPU yet has not been used for hours. What is costing 8.9 percent CPU?
# prstat PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 27060 brendan 856M 668M sleep 59 0 7:30:44 8.9% firefox-bin/17 27035 brendan 150M 136M sleep 59 0 0:20:51 0.4% opera/3 18722 brendan 164M 38M sleep 59 0 0:57:53 0.1% java/18 1748 brendan 6396K 4936K sleep 59 0 0:03:13 0.1% screen-4.0.2/1 17303 brendan 305M 247M sleep 59 0 34:16:57 0.1% Xorg/1 27754 brendan 9564K 3772K sleep 59 0 0:00:00 0.0% sshd/1 19998 brendan 68M 7008K sleep 59 0 2:41:34 0.0% gnome-netstatus/1 27871 root 3360K 2792K cpu0 49 0 0:00:00 0.0% prstat/1 29805 brendan 54M 46M sleep 59 0 1:53:23 0.0% elinks/1 [...] |
Profiling User Stacks
The uoncpu.d script (from the "Scripts" section) was run for ten seconds:
# uoncpu.d firefox-bin dtrace: script 'uoncpu.d' matched 1 probe ^C [...output truncated...] on-cpu (count @1001hz): libmozjs.so`js_FlushPropertyCacheForScript+0xe6 libmozjs.so`js_DestroyScript+0xc1 libmozjs.so`JS_EvaluateUCScriptForPrincipals+0x87 libxul.so`__1cLnsJSContextOEvaluateString6MrknSnsAString_internal_pvpnMn sIPrincip8 libxul.so`__1cOnsGlobalWindowKRunTimeout6MpnJnsTimeout__v_+0x59c libxul.so`__1cOnsGlobalWindowNTimerCallback6FpnInsITimer_pv_v_+0x2e libxul.so`__1cLnsTimerImplEFire6M_v_+0x144 libxul.so`__1cMnsTimerEventDRun6M_I_+0x51 libxul.so`__1cInsThreadQProcessNextEvent6Mipi_I_+0x143 libxul.so`__1cVNS_ProcessNextEvent_P6FpnJnsIThread_i_i_+0x44 libxul.so`__1cOnsBaseAppShellDRun6M_I_+0x3a libxul.so`__1cMnsAppStartupDRun6M_I_+0x34 libxul.so`XRE_main+0x35e3 firefox-bin`main+0x223 firefox-bin`_start+0x7d 42 |
The output was many pages long and includes C++ signatures as function names (they can be passed through c++filt to improve readability). The hottest stack is in libmozjs, which is SpiderMonkey—the Firefox JavaScript engine. However, the count for this hot stack is only 42, which, when the other counts from the numerous truncated pages are tallied, is likely to represent only a fraction of the CPU cycles. (uoncpu.d can be enhanced to print a total sample count and the end to make this ratio calculation easy to do.)
Profiling User Modules
Perhaps an easier way to find the origin of the CPU usage is to not aggregate on the entire user stack track but just the top-level user module. This won't be as accurate—a user module may be consuming CPU by calling functions from a generic library such as libc—but it is worth a try:
# dtrace -n 'profile-1001 /execname == "firefox-bin"/ { @[umod(arg1)] = count(); } tick-60sec { exit(0); }' dtrace: description 'profile-1001 ' matched 2 probes CPU ID FUNCTION:NAME 1 63284 :tick-60sec libsqlite3.so 1 0xf0800000 2 libplds4.so 2 libORBit-2.so.0.0.0 5 0xf1600000 8 libgthread-2.0.so.0.1400.4 10 libgdk-x11-2.0.so.0.1200.3 14 libplc4.so 16 libm.so.2 19 libX11.so.4 50 libnspr4.so 314 libglib-2.0.so.0.1400.4 527 0x0 533 libflashplayer.so 1143 libc.so.1 1444 libmozjs.so 2671 libxul.so 4143 |
The hottest module was libxul, which is the core Firefox library. The next was libmozjs (JavaScript) and then libc (generic system library). It is possible that libmozjs is responsible for the CPU time in both libc and libxul, by calling functions from them. We'll investigate libmozjs (JavaScript) first; if this turns out to be a dead end, we'll return to libxul.
Function Counts and Stacks
To investigate JavaScript, the DTrace JavaScript provider can be used (see Chapter 8). For the purposes of this case study, let's assume that such a convenient provider is not available. To understand what the libmosjs library is doing, we'll first frequency count function calls:
# dtrace -n 'pid$target:libmozjs::entry { @[probefunc] = count(); }' -p `pgrep firefox-bin` dtrace: description 'pid$target:libmozjs::entry ' matched 1617 probes ^C CloseNativeIterators 1 DestroyGCArenas 1 JS_CompareValues 1 JS_DefineElement 1 JS_FloorLog2 1 JS_GC 1 [...] JS_free 90312 js_IsAboutToBeFinalized 92414 js_GetToken 99666 JS_DHashTableOperate 102908 GetChar 109323 fun_trace 132924 JS_GetPrivate 197322 js_TraceObject 213983 JS_TraceChildren 228323 js_SearchScope 267826 js_TraceScopeProperty 505450 JS_CallTracer 1923784 |
The most frequent function called was JS_CallTracer(), which was called almost two million times during the ten seconds that this one-liner was tracing. To see what it does, the source code could be examined; but before we do that, we can get more information from DTrace including frequency counting the user stack trace to see who is calling this function:
# dtrace -n 'pid$target:libmozjs:JS_CallTracer:entry { @[ustack()] = count(); }' -p `pgrep firefox-bin` [...] libmozjs.so`JS_CallTracer libmozjs.so`js_TraceScopeProperty+0x54 libmozjs.so`js_TraceObject+0xd5 libmozjs.so`JS_TraceChildren+0x351 libxul.so`__1cLnsXPConnectITraverse6MpvrnbInsCycleCollectionTraversalCal lback__I_+0xc7 libxul.so`__1cQnsCycleCollectorJMarkRoots6MrnOGCGraphBuilder__v_+0x96 libxul.so`__1cQnsCycleCollectorPBeginCollection6M_i_+0xf1 libxul.so`__1cbGnsCycleCollector_beginCollection6F_i_+0x26 libxul.so`__1cZXPCCycleCollectGCCallback6FpnJJSContext_nKJSGCStatus__i_+0xd8 libmozjs.so`js_GC+0x5ef libmozjs.so`JS_GC+0x4e libxul.so`__1cLnsXPConnectHCollect6M_i_+0xaf libxul.so`__1cQnsCycleCollectorHCollect6MI_I_+0xee libxul.so`__1cYnsCycleCollector_collect6F_I_+0x28 libxul.so`__1cLnsJSContextGNotify6MpnInsITimer__I_+0x375 libxul.so`__1cLnsTimerImplEFire6M_v_+0x12d libxul.so`__1cMnsTimerEventDRun6M_I_+0x51 libxul.so`__1cInsThreadQProcessNextEvent6Mipi_I_+0x143 libxul.so`__1cVNS_ProcessNextEvent_P6FpnJnsIThread_i_i_+0x44 libxul.so`__1cOnsBaseAppShellDRun6M_I_+0x3a 40190 |
The stack trace here has been truncated (increase the ustackframes tunable to see all); however, enough has been seen for this and the truncated stack traces to see that they originate from JS_GC()—a quick look at the code confirms that this is JavaScript Garbage Collect.
Function CPU Time
Given the name of the garbage collect function, a script can be quickly written to check the CPU time spent in it (named jsgc.d):
1 #!/usr/sbin/dtrace -s 2 3 #pragma D option quiet 4 5 pid$target::JS_GC:entry 6 { 7 self->vstart = vtimestamp; 8 } 9 10 pid$target::JS_GC:return 11 /self->vstart/ 12 { 13 this->oncpu = (vtimestamp - self->vstart) / 1000000; 14 printf("%Y GC: %d CPU ms\n", walltimestamp, this->oncpu); 15 self->vstart = 0; 16 } Script jsgc.d |
This specifically measures the elapsed CPU time (vtimestamp) for JS_GC(). (Another approach would be to use the profile provider and count stack traces that included JS_GC().)
Here we execute jsgc.d:
# jsgc.d -p `pgrep firefox-bin` 2010 Jul 4 01:06:57 GC: 331 CPU ms 2010 Jul 4 01:07:38 GC: 316 CPU ms 2010 Jul 4 01:08:18 GC: 315 CPU ms ^C |
So, although GC is on-CPU for a significant time, more than 300 ms per call, it's not happening frequently enough to explain the 9 percent CPU average of Firefox. This may be a problem, but it's not the problem. (This is included here for completeness; this is the exact approach used to study this issue.)
Another frequently called function was js_SearchScope(). Checking its stack trace is also worth a look:
# dtrace -n 'pid$target:libmozjs:js_SearchScope:entry { @[ustack()] = count(); }' -p `pgrep firefox-bin` dtrace: description 'pid$target:libmozjs:js_SearchScope:entry ' matched 1 probe ^C [...output truncated...] libmozjs.so`js_SearchScope libmozjs.so`js_DefineNativeProperty+0x2f1 libmozjs.so`call_resolve+0x1e7 libmozjs.so`js_LookupProperty+0x3d3 libmozjs.so`js_PutCallObject+0x164 libmozjs.so`js_Interpret+0x9cd4 libmozjs.so`js_Execute+0x3b4 libmozjs.so`JS_EvaluateUCScriptForPrincipals+0x58 libxul.so`__1cLnsJSContextOEvaluateString6MrknSnsAString_internal_pvpnMn sIPrincipal_pkcIIp1pi_I_+0x2e8 libxul.so`__1cOnsGlobalWindowKRunTimeout6MpnJnsTimeout__v_+0x59c libxul.so`__1cOnsGlobalWindowNTimerCallback6FpnInsITimer_pv_v_+0x2e libxul.so`__1cLnsTimerImplEFire6M_v_+0x144 libxul.so`__1cMnsTimerEventDRun6M_I_+0x51 libxul.so`__1cInsThreadQProcessNextEvent6Mipi_I_+0x143 libxul.so`__1cVNS_ProcessNextEvent_P6FpnJnsIThread_i_i_+0x44 libxul.so`__1cOnsBaseAppShellDRun6M_I_+0x3a libxul.so`__1cMnsAppStartupDRun6M_I_+0x34 libxul.so`XRE_main+0x35e3 firefox-bin`main+0x223 firefox-bin`_start+0x7d 9287 |
This time, the function is being called by js_Execute(), the entry point for JavaScript code execution (and itself was called by JS_EvaluateUCScriptFor-Principals()). Here we are modifying the earlier script to examine on-CPU time (now jsexecute.d):
1 #!/usr/sbin/dtrace -s 2 3 pid$target::js_Execute:entry 4 { 5 self->vstart = vtimestamp; 6 } 7 8 pid$target::js_Execute:return 9 /self->vstart/ 10 { 11 this->oncpu = vtimestamp - self->vstart; 12 @["js_Execute Total(ns):"] = sum(this->oncpu); 13 self->vstart = 0; 14 } Script jsexecute.d |
Here we run it for ten seconds:
# jsexecute.d -p `pgrep firefox-bin` -n 'tick-10sec { exit(0); }' dtrace: script 'jsexecute.d' matched 2 probes dtrace: description 'tick-10sec ' matched 1 probe CPU ID FUNCTION:NAME 0 64907 :tick-10sec js_Execute Total(ns): 427936779 |
This shows 428 ms of time in js_Execute() during those ten seconds, and so this CPU cost can explain about half of the Firefox CPU time (this is a single-CPU system; therefore, there is 10,000 ms of available CPU time every 10 seconds, so this is about 4.3 percent of CPU).
The JavaScript functions could be further examined with DTrace to find out why this JavaScript program is hot on-CPU, in other words, what exactly it is doing (the DTrace JavaScript provider would help here, or a Firefox add-on could be tried).
Fetching Context
Here we will find what is being executed: preferably the URL. Examining the earlier stack trace along with the Firefox source (which is publically available) showed the JavaScript filename is the sixth argument to the JS_EvaluateUCScriptFor-Principals() function. Here we are pulling this in and frequency counting:
# dtrace -n 'pid$target::*EvaluateUCScriptForPrincipals*:entry { @[copyinstr(arg5)] = count(); } tick-10sec { exit(0); }' -p `pgrep firefox-bin` dtrace: description 'pid$target::*EvaluateUCScriptForPrincipals*:entry ' matched 2 probes CPU ID FUNCTION:NAME 1 64907 :tick-10sec http://www.example.com/js/st188.js 7056 |
The name of the URL has been modified in this output (to avoid embarrassing anyone); it pointed to a site that I didn't think I was using, yet their script was getting executed more than 700 times per second anyway, which is consuming (wasting!) at least 4 percent of the CPU on this system.
The Fix
An add-on was already available that could help at this point: SaveMemory, which allows browser tabs to be paused. The DTrace one-liner was modified to print continual one-second summaries, while all tabs were paused as an experiment:
# dtrace -n 'pid$target::*EvaluateUCScriptForPrincipals*:entry { @[copyinstr(arg5)] = count(); } tick-1sec { printa(@); trunc(@); }' -p `pgrep firefox-bin` [...] 1 63140 :tick-1sec http://www.example.com/js/st188.js 697 1 63140 :tick-1sec http://www.example.com/js/st188.js 703 1 63140 :tick-1sec file:///export/home/brendan/.mozilla/firefox/3c8k4kh0.default/extensions/%7Be4a8a97b-f 2ed-450b-b12d-ee082ba24781%7D/components/greasemonkey.js 1 http://www.example.com/js/st188.js 126 1 63140 :tick-1sec 1 63140 :tick-1sec |
The execution count for the JavaScript program begins at around 700 executions per second and then vanishes when pausing all tabs. (The output has also caught the execution of greasemonkey.js, executed as the add-on was used.)
prstat(1M) shows the CPU problem is no longer there (shown after waiting a few minutes for the %CPU decayed average to settle):
# prstat PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 27035 brendan 150M 136M sleep 49 0 0:27:15 0.2% opera/4 27060 brendan 407M 304M sleep 59 0 7:35:12 0.1% firefox-bin/17 28424 root 3392K 2824K cpu1 49 0 0:00:00 0.0% prstat/1 [...] |
Next, the browser tabs were unpaused one by one to identify the culprit, while still running the DTrace one-liner to track JavaScript execution by file. This showed that there were seven tabs open on the same Web site that was running the JavaScript program—each of them executing it about 100 times per second. The Web site is a popular blogging platform, and the JavaScript was being executed by what appears to be an inert icon that links to a different Web site (but as we found out—it is not inert).7 The exact operation of that JavaScript program can now be investigated using the DTrace JavaScript provider or a Firefox add-on debugger.
Conclusion
A large component of this issue turned out to be a rogue JavaScript program, an issue that could also have been identified with Firefox add-ons. The advantage of using DTrace is that if there is an issue, the root cause can be identified—no matter where it lives in the software stack. As an example of this,8 about a year ago a performance issue was identified in Firefox and investigated in the same way—and found to be a bug in a kernel frame buffer driver (video driver); this would be extremely difficult to have identified from the application layer alone.
Xvnc
Xvnc is a Virtual Network Computing (VNC) server that allows remote access to X server–based desktops. This case study represents examining an Xvnc process that is CPU-bound and demonstrates using the syscall and profile providers.
When performing a routine check of running processes on a Solaris system by using prstat(1), it was discovered that an Xvnc process was the top CPU consumer. Looking just at that process yields the following:
solaris# prstat -c -Lmp 5459 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 5459 nobody 86 14 0.0 0.0 0.0 0.0 0.0 0.0 0 36 .2M 166 Xvnc/1 |
We can see the Xvnc process is spending most of its time executing in user mode (USR, 86 percent) and some of its time in the kernel (SYS, 14 percent). Also worth noting is it is executing about 200,000 system calls per second (SCL value of .2M).
syscall Provider
Let's start by checking what those system calls are. This one-liner uses the syscall provider to frequency count system calls for this process and prints a summary every second:
solaris# dtrace -qn 'syscall:::entry /pid == 5459/ { @[probefunc] = count(); } tick-1sec { printa(@); trunc(@); }' read 4 lwp_sigmask 34 setcontext 34 setitimer 68 accept 48439 gtime 48439 pollsys 48440 write 97382 read 4 lwp_sigmask 33 setcontext 33 setitimer 66 gtime 48307 pollsys 48307 accept 48308 write 97117 |
Because the rate of system calls was relatively high, as reported by prstat(1M), we opted to display per-second rates with DTrace. The output shows more than 97,000 write() system calls per second and just more than 48,000 accept(), poll(), and gtime() calls.
Let's take a look at the target of all the writes and the requested number of bytes to write:
solaris# dtrace -qn 'syscall::write:entry /pid == 5459/ { @[fds[arg0].fi_pathname, arg2] = count(); }' ^C /var/adm/X2msgs 26 8 /devices/pseudo/mm@0:null 8192 3752 /var/adm/X2msgs 82 361594 /var/adm/X2msgs 35 361595 |
The vast majority of the writes are to a file, /var/adm/X2msgs. The number of bytes to write was 82 bytes and 35 bytes for the most part (more than 361,000 times each). Checking that file yields the following:
solaris# ls -l /var/adm/X2msgs -rw-r--r-- 1 root nobody 2147483647 Aug 13 15:05 /var/adm/X2msgs solaris# tail /var/adm/X2msgs connection: Invalid argument (22) XserverDesktop: XserverDesktop::wakeupHandler: unable to accept new connection: Invalid argument (22) XserverDesktop: XserverDesktop::wakeupHandler: unable to accept new connection: Invalid argument (22) XserverDesktop: XserverDesktop::wakeupHandler: unable to accept new connection: Invalid argument (22) XserverDesktop: XserverDesktop::wakeupHandler: unable to accept new connection: Invalid argument (22) |
Looking at the file Xvnc is writing to, we can see it is getting very large (more than 2GB), and the messages themselves appear to be error messages. We will explore that more closely in just a minute.
Given the rate of 97,000 writes per second, we can already extrapolate that each write is taking much less than 1 ms (1/97000 = 0.000010), so we know the data is probably being written to main memory (since the file resides on a file system and the writes are not synchronous, they are being satisfied by the in-memory file system cache). We can of course time these writes with DTrace:
solaris# dtrace -qn 'syscall::write:entry /pid == 5459/ { @[fds[arg0].fi_fs] = count(); }' ^C specfs 2766 zfs 533090 solaris# cat -n w.d 1 #!/usr/sbin/dtrace -qs 2 3 syscall::write:entry 4 /pid == 5459 && fds[arg0].fi_fs == "zfs"/ 5 { 6 self->st = timestamp; 7 } 8 syscall::write:return 9 /self->st/ 10 { 11 @ = quantize(timestamp - self->st); 12 self->st = 0; 13 } solaris# ./w.d ^C value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1477349 1024 | 2312 2048 | 3100 4096 | 250 8192 | 233 16384 | 145 32768 | 90 65536 | 0 |
Before measuring the write time, we wanted to be sure we knew the target file system type of the file being written, which was ZFS. We used that in the predicate in the w.d script to measure write system calls for this process (along with the process PID test). The output of w.d is a quantize aggregation that displays wall clock time for all the write calls executed to a ZFS file system from that process during the sampling period. We see that most of the writes fall in the 512-nanosecond to 1024-nanosecond range, so these are most certainly writes to memory.
We can determine the user code path leading up to the writes by aggregating on the user stack when the write system call is called:
solaris# dtrace -qn 'syscall::write:entry /pid == 5459 && fds[arg0].fi_fs == "zfs"/ { @[ustack()] = count(); }' ^C [...] libc.so.1`_write+0x7 libc.so.1`_ndoprnt+0x2816 libc.so.1`fprintf+0x99 Xvnc`_ZN3rfb11Logger_File5writeEiPKcS2_+0x1a5 Xvnc`_ZN3rfb6Logger5writeEiPKcS2_Pc+0x36 Xvnc`_ZN3rfb9LogWriter5errorEPKcz+0x2d Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x28b Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 430879 libc.so.1`_write+0x7 libc.so.1`_ndoprnt+0x2816 libc.so.1`fprintf+0x99 Xvnc`_ZN3rfb11Logger_File5writeEiPKcS2_+0x1eb Xvnc`_ZN3rfb6Logger5writeEiPKcS2_Pc+0x36 Xvnc`_ZN3rfb9LogWriter5errorEPKcz+0x2d Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x28b Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 430879 |
We see two very similar stack frames, indicating a log event is causing the Xvnc process to write to its log file.
We can even use DTrace to observe what is being written to the file, by examining the contents of the buffer pointer from the write(2) system call. It is passed to the copyinstr() function, both to copy the data from user-land into the kernel address space and to treat it as a string:
solaris# dtrace -n 'syscall::write:entry /pid == 5459/ { @[copyinstr(arg1)] = count(); }' dtrace: description 'syscall::write:entry ' matched 1 probe ^C Sun Aug 22 00:09:05 2010 ent (22) keupHandler: unable to accept new st! Ltd. See http://www.realvnc.com for information on VNC. 1 Sun Aug 22 00:09:06 2010 ent (22) keupHandler: unable to accept new st! 2 [...] upHandler: unable to accept new connection: Invalid argument (22)XserverDesktop::wakeu pHandler: unable to accept new connection: Invalid argument (22)XserverDesktop::wakeup Handler: unable to accept new connection: Invalid argument (22)XserverDesktop::wake 59 valid argument (22)XserverDesktop::wakeupHandler: unable to accept new connection: I nvalid argument (22)XserverDesktop::wakeupHandler: unable to accept new connection: In valid argument (22)XserverDesktop::wakeupHandler: unable to accept new connection: In 59 |
This shows the text being written to the log file, which largely contains errors describing invalid arguments used for new connections. Remember that our initial one-liner discovered more than 48,000 accept() system calls per-second—it would appear that these are failing because of invalid arguments, which is being written as an error message to the /var/adm/X2msgs log.
DTrace can confirm that the accept() system calls are failing in this way, by examining the error number (errno) on syscall return:
solaris# dtrace -n 'syscall::accept:return /pid == 5459/ { @[errno] = count(); }' dtrace: description 'syscall::accept:return ' matched 1 probe ^C 22 566135 solaris# grep 22 /usr/include/sys/errno.h #define EINVAL 22 /* Invalid argument */ |
All the accept() system calls are returning with errno 22, EINVAL (Invalid argument). The reason for this can be investigated by examining the arguments to the accept() system call.
solaris# dtrace -n 'syscall::accept:entry /execname == "Xvnc"/ { @[arg0, arg1, arg2] = count(); }' dtrace: description 'syscall::accept:entry ' matched 1 probe ^C 3 0 0 150059 |
We see the first argument to accept is 3, which is the file descriptor for the socket. The second two arguments are both NULL, which may be the cause of the EINVAL error return from accept. It is possible it is valid to call accept with the second and third arguments as NULL values,9 in which case the Xvnc code is not handling the error return properly. In either case, the next step would be to look at the Xvnc source code and find the problem. The code is burning a lot of CPU with calls to accept(2) that are returning an error and each time generating a log file write.
While still using the syscall provider, the user code path for another of the other hot system calls can be examined:
solaris# dtrace -n 'syscall::gtime:entry /pid == 5459/ { @[ustack()] = count(); }' dtrace: description 'syscall::gtime:entry ' matched 1 probe ^C libc.so.1`__time+0x7 Xvnc`_ZN3rfb11Logger_File5writeEiPKcS2_+0xce Xvnc`_ZN3rfb6Logger5writeEiPKcS2_Pc+0x36 Xvnc`_ZN3rfb9LogWriter5errorEPKcz+0x2d Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x28b Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 370156 |
This shows that calls to gtime(2) are part of the log file writes in the application, based on the user function names we see in the stack frames.
profile Provider
To further understand the performance of this process, we will sample the on-CPU code at a certain frequency, using the profile provider.
solaris# dtrace -n 'profile-997hz /arg1 && pid == 5459/ { @[ufunc(arg1)] = count(); }' dtrace: description 'profile-997hz ' matched 1 probe ^C [...] libc.so.1`memcpy 905 Xvnc`_ZN14XserverDesktop12blockHandlerEP6fd_set 957 libgcc_s.so.1`uw_update_context_1 1155 Xvnc`_ZN3rdr15SystemExceptionC2EPKci 1205 libgcc_s.so.1`execute_cfa_program 1278 libc.so.1`strncat 1418 libc.so.1`pselect 1686 libstdc++.so.6.0.3`_Z12read_uleb128PKhPj 1700 libstdc++.so.6.0.3`_Z28read_encoded_value_with_basehjPKhPj 2198 libstdc++.so.6.0.3`__gxx_personality_v0 2445 libc.so.1`_ndoprnt 3918 |
This one-liner shows which user functions were on-CPU most frequently. It tests for user mode (arg1) and the process of interest and uses the ufunc() function to convert the user-mode on-CPU program counter (arg1) into the user function name. The most frequent is a libc function, _ndoprnt(), followed by several functions from the standard C++ library.
For a detailed look of the user-land code path that is responsible for consuming CPU cycles, aggregate on the user stack:
solaris# dtrace -n 'profile-997hz /arg1 && pid == 5459/ { @[ustack()] = count(); } tick-10sec { trunc(@, 20); exit(0); }' ^c [...] libstdc++.so.6.0.3`__gxx_personality_v0+0x29f libgcc_s.so.1`_Unwind_RaiseException+0x88 libstdc++.so.6.0.3`__cxa_throw+0x64 Xvnc`_ZN7network11TcpListener6acceptEv+0xb3 Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x13d Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 125 libc.so.1`memset+0x10c libgcc_s.so.1`_Unwind_RaiseException+0xb7 libstdc++.so.6.0.3`__cxa_throw+0x64 Xvnc`_ZN7network11TcpListener6acceptEv+0xb3 Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x13d Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 213 |
Note that only the two most frequent stack frames are shown here. We see the event loop in the Xvnc code and visually decoding the mangled function names; we can see a function with network TCPListener accept in the function name. This makes sense for an application like Xvnc, which would be listening on a network socket for incoming requests and data. And we know that there's an issue with the issued accept(2) calls inducing a lot of looping around with the error returns.
We can also take a look at the kernel component of the CPU cycles consumed by this process, again using the profile provider and aggregating on kernel stacks:
solaris# dtrace -n 'profile-997hz /pid == 5459 && arg0/ { @[stack()] = count(); }' ^c [...] unix`mutex_enter+0x10 genunix`pcache_poll+0x1a5 genunix`poll_common+0x27f genunix`pollsys+0xbe unix`sys_syscall32+0x101 31 unix`tsc_read+0x3 genunix`gethrtime+0xa unix`pc_gethrestime+0x31 genunix`gethrestime+0xa unix`gethrestime_sec+0x11 genunix`gtime+0x9 unix`sys_syscall32+0x101 41 unix`tsc_read+0x3 genunix`gethrtime_unscaled+0xa genunix`syscall_mstate+0x4f unix`sys_syscall32+0x11d 111 unix`lock_try+0x8 genunix`post_syscall+0x3b6 genunix`syscall_exit+0x59 unix`sys_syscall32+0x1a0 229 |
The kernel stack is consistent with previously observed data. We see system call processing (remember, this process is doing 200,000 system calls per second), we see the gtime system call stack in the kernel, as well as the poll system call kernel stack. We could measure this to get more detail, but the process profile was only 14 percent kernel time, and given the rate and type of system calls being executed by this process, there is minimal additional value in terms of understanding the CPU consumption by this process in measuring kernel functions.
For a more connected view, we can trace code flow from user mode through the kernel by aggregating on both stacks:
solaris# dtrace -n 'profile-997hz /pid == 5459/ { @[stack(), ustack()] = count(); } tick-10sec { trunc(@, 2); exit(0); }' dtrace: description 'profile-997hz ' matched 2 probes CPU ID FUNCTION:NAME 1 122538 :tick-10sec unix`lock_try+0x8 genunix`post_syscall+0x3b6 genunix`syscall_exit+0x59 unix`sys_syscall32+0x1a0 libc.so.1`_write+0x7 libc.so.1`_ndoprnt+0x2816 libc.so.1`fprintf+0x99 Xvnc`_ZN3rfb11Logger_File5writeEiPKcS2_+0x1eb Xvnc`_ZN3rfb6Logger5writeEiPKcS2_Pc+0x36 Xvnc`_ZN3rfb9LogWriter5errorEPKcz+0x2d Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x28b Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 211 unix`lock_try+0x8 genunix`post_syscall+0x3b6 genunix`syscall_exit+0x59 unix`sys_syscall32+0x1a0 libc.so.1`_so_accept+0x7 Xvnc`_ZN7network11TcpListener6acceptEv+0x18 Xvnc`_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x13d Xvnc`vncWakeupHandler+0x3d Xvnc`WakeupHandler+0x36 Xvnc`WaitForSomething+0x28d Xvnc`Dispatch+0x76 Xvnc`main+0x3e5 Xvnc`_start+0x80 493 |
Here we see the event loop calling into the accept(3S) interface in libc and entering the system call entry point in the kernel. The second set of stack frames shows the log write path. One of the stacks has also caught _ndoprnt, which we know from earlier to be the hottest on-CPU function, calling write() as part of Xvnc logging.
Conclusions
The initial analysis with standard operating system tools showed that the single-threaded Xvnc process was CPU bound, spending most of its CPU cycles in user-mode and performing more than 200,000 system calls per second. DTrace was used to discover that the application was continually encountering new connection failures because of invalid arguments (accept(2)) and was writing this message to a log file, thousands of times per second.