STEP 2. Monitoring Disks
If you've made it to Step 2, then you know by now exactly what is going on with memory on your system. The next step is to find out whether you have any disk bottlenecks or disks that may soon become bottlenecks.
Use iostat, statit, or sar to check for a disk bottleneck. The statit utility is available on the book website.
Try iostat -xn 5 (the -n option, which displays disk names in the cntndn format, is only available from Solaris 2.6 on). If you have a lot of disks, you may be so overwhelmed by the output that you find it hard to make sense of all that data. You can use grep to remove idle disks from the display (after asking yourself why you have idle disks!):
iostat -xn 5 | grep -v "0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0"
Don't try to key this command inyou need exactly the right delimiters between all the 0s. I simply extract the grep string shown above from an iostat trace and save the whole command as iostat2 for future use. Solaris 8 adds a new -z option to achieve the same thing (you can simply use iostat -xnz 5).
If you want to save a disk activity report for later reference, you will find the sar binary file format useful since each data point has an associated timestamp.
What to Look For
The statit extract in Figure 3 shows disk behavior for three disks. The first disk is fully utilized, the second is almost idle, and the third is appropriately utilized. The key information is util% (the percentage utilization of the disk) and srv-ms (service time in milliseconds). Note that service time (the time taken to complete the I/O at the disk) is mislabeled; it is actually response time: the time taken to complete the I/O from the time it leaves the disk device driver on the host, including queuing effects at the controller and the disk. iostat also reports the same values (util% is shown as %b, and srv-ms as svc_t).
Figure 3 statit output for three disks
For OLTP workloads, if utilization is consistently greater than about 60% or response time is consistently greater than about 35 msecs, the load on this disk is likely to negatively affect application performance.
For DSS workloads, utilization may exceed 60% and response times may exceed 35 mseca single 1-Mbyte transfer from a 36-Mbyte disk could take 35 msecs. The wtqlen field in Figure 3 (wait in iostat) reports how many other I/O requests are queued and, therefore, how much of the response time is due to queuing time. The svqlen field (actv in iostat) shows the number of requests taken off the queue and actively being processed. With queue lengths consistently greater than 1.0 and response times consistently larger than 35 msecs, disk load is likely to negatively affect application performance.
For both workloads, the key issue is to check how busy the other disks are. You want to avoid the situation where some disks are busy and others are idle. In that respect, the disk utilization and service times shown in Figure 4 reveal a disk layout that is sadly lacking. Disk layout recommendations are discussed in Chapter 17 of Configuring and Tuning Databases on the Solaris Platform.
An extract from an iostat trace (iostat -xn 5) is shown in Figure 4 for reference.
Figure 4 iostat trace
The disk utilization shown in this trace is once again unbalanced, suggesting that improvements in the disk layout are needed.
Some utilities, such as sar, for example, report disk names as sdn, or ssdn, rather than cntndn. Having identified a hot disk, you may then find it difficult to locate the disk in question. Thanks to the /etc/path_to_inst file, it is possible to convert the name to a more recognizable form. The procedure is illustrated below.
Suppose a sar trace on host pae280 identifies a hot disk with the name ssd4. First we need to find out more about ssd4.
pae280% grep " 4 " /etc/path_to_inst | grep ssd "/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d688,0" 4 "ssd"
The complicated string returned from the /etc/path_to_inst file (the first string surrounded by double quotes) corresponds to the details for the disk in the /devices tree. Entries in the /dev/rdsk directory (and also in /dev/dsk) are actually symlinks to the /devices tree, so we can search for the entry above in the /dev/rdsk directory:
pae280% ls -l /dev/rdsk/*s2 | grep \ "/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d688,0" lrwxrwxrwx 1 root root 74 Jun 27 18:43 /dev/rdsk/c2t1d0s2 -> ../../devices/pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100002037e3d688,0:c,raw
This final step shows that the disk corresponding to ssd4 is /dev/rdsk/c2t1d0s2.
You can use the same procedure by substituting the appropriate details for 4 and ssd in the first step. The string returned can then be substituted for the string beginning /pci in the second step.
Drilling Down Further
Note that recent versions of iostat have the -p option, which shows per partition disk statistics. This option can be helpful in tracking down exactly which database device is responsible for a performance problem.
For systems using Veritas Volume Manager (Veritas), the partition is less useful because Veritas places all its volumes in partition 4. However, Veritas provides the vxstat program to monitor I/O activity per volume. This program is invaluable for drill-downs to find the volumes associated with heavy I/O, especially important when multiple volumes reside on the same disk.
The vxstat utility can be run as follows:
vxstat -g group -i interval -c iterations
BEWARE: Unlike vmstat and iostat, the statistics reported by vxstat represent totals for the whole interval period, not per second. Consequently, you need to divide the reported bytes read and written by the number of seconds in the interval (to get bytes read and written per second) and also by 1024 (to get kilobytes per second).
What You Can Do to Avoid Bottlenecks
To overcome a disk bottleneck, try one of the following:
Stripe the data on the disk across a greater number of disks. Take into account, though, the recommendations in "Deciding How Broad to Make a Single Stripe" on page 249 of Configuring and Tuning Databases on the Solaris Platform. Bear in mind, too, that the wider the stripe, the greater the number of disks that will be affected by the failure of a disk within the stripe.
If there is more than one database volume on the disk, move one or more volumes to other disks.
Increase the size of the database buffer cache to try to reduce the number of reads to the disk.
Add more spindles and disk controllers.
An effective disk layout will avoid most disk bottlenecks. If you see uneven disk utilization, revisit the disk layout recommendations in Chapter 17 of Configuring and Tuning Databases on the Solaris Platform.