5.3 File System Latency
When analyzing performance, consider the file system as a black box. Look at the latency as it impacts the application and then identify the causes of the latency. For example, if an application is making read() calls at the POSIX layer, your first interest should be in how long each read() takes as a percentage of the overall application thread-response time. Only when you want to dig deeper should you consider the I/O latency behind the read(), such as disk service times—which ironically is where the performance investigation has historically begun. Figure 5.2 shows an example of how you can estimate performance. You can evaluate the percentage of time in the file system (Tfilesys) against the total elapsed time (Ttotal).
Figure 5.2 Estimating File System Performance Impact
Using truss, you can examine the POSIX-level I/O calls. You can observe the file descriptor and the size and duration for each logical I/O. In the following example, you can see read() and write() calls during a dd between two files.
# dd if=filea of=fileb bs=1024k& # truss -D -p 13092 13092: 0.0326 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0186 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0293 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0259 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0305 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0267 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0242 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0184 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0368 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0333 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0297 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0175 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0315 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0231 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0338 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0181 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0381 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0177 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0323 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0199 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0364 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 13092: 0.0189 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0".., 1048576) = 1048576 ...
The truss example shows that read() occurs on file descriptor 3 with an average response time of 30 ms and write() occurs on file descriptor 4 with an average response time of 25 ms. This gives some insight into the high-level activity but no other process statistics with which to formulate any baselines.
By using DTrace, you could gather a little more information about the proportion of the time taken to perform I/O in relation to the total execution time. The following excerpt from the pfilestat DTrace command shows how to sample the time within each system call. By tracing the entry and return from a file system system call, you can observe the total latency as experienced by the application. You could then use probes within the file system to discover where the latency is being incurred.
/* sample reads */ syscall::read:entry, syscall::pread*:entry /pid == PID && OPT_read/ { runstate = READ; @logical["running", (uint64_t)0, ""] = sum(timestamp - last); totaltime += timestamp - last; last = timestamp; self->fd = arg0 + 1; self->bytes = arg2; totalbytes += arg2; } fbt::fop_read:entry, fbt::fop_write:entry /self->fd/ { self->vp = (vnode_t *)arg0; self->path = self->vp->v_path == 0 ? "<none>" : cleanpath(self->vp->v_path); } syscall::read:return, syscall::pread*:return /pid == PID && OPT_read/ { runstate = OTHER; @logical["read", self->fd - 1, self->path] = sum(timestamp - last); @bytes["read", self->fd - 1, self->path] = sum(self->bytes); totaltime += timestamp - last; last = timestamp; }
Using an example target process (tar) with pfilestat, you can observe that tar spends 10% of the time during read() calls of /var/crash/rmcferrari/vmcore.0 and 14% during write() calls to test.tar out of the total elapsed sample time, and a total of 75% of its time waiting for file system read-level I/O.
# ./pfilestat 13092 STATE FDNUM Time Filename waitcpu 0 4% running 0 9% read 11 10% /var/crash/rmcferrari/vmcore.0 write 3 14% /export/home/rmc/book/examples/test.tar sleep-r 0 75% STATE FDNUM KB/s Filename read 11 53776 /var/crash/rmcferrari/vmcore.0 write 3 53781 /export/home/rmc/book/examples/test.tar Total event time (ms): 1840 Total Mbytes/sec: 89