Stage 4: Optimizing Disk IO
At this point, we've normalized the application workload—in particular the amount of logical IO demanded by the application. We've eliminated contention that might be blocking—and therefore masking—those logical IO requests. Finally, we've configured available memory to minimize the amount of logical IO that ends up causing physical IO. Now—and only now—it makes sense to make sure that our disk IO subsystem is up to the challenge.
To be sure, optimizing disk IO subsystems can be a complex and specialized task, but the basic principles are straightforward:
- Ensure the IO subsystem has enough bandwidth to cope with the physical IO demand. This is primarily determined by the number of distinct disk devices you have allocated. Disks vary in performance, but the average disk device might be able to perform approximately 100 random IOs per second before becoming saturated. Good response time usually requires that the disk be less than 100 percent utilized—say 50 percent to 75 percent. For most databases, meeting IO requirements means acquiring much more disk than simple storage requirements dictate. You need to acquire enough disks to sustain your IO rate with acceptable response time, not just enough disks to store all your data.
- Spread your load evenly across the disks you have allocated. The best way to do this is RAID 0 (Striping). The worst way—for most databases—is RAID 5, which incurs a heavy penalty on write IO.
The obvious symptom of an overly stressed IO subsystem is excessive delays responding to IO requests. The expected delay—called service time—varies from disk to disk, but even on the slowest disks should not exceed approximately 10ms. Disk arrays boasting large memory caches and Solid State Disk (SSD) devices might provide much lower latencies. Network Attached Storage (NAS) devices might also have a high network-related component to the service time.
Spreading the load across spindles is best done by hardware or software striping. Oracle's ASM technology provides a simple and universally available method of doing this for ordinary disk devices. Alternating datafiles across multiple disks is usually less effective, though still better than no striping at all. Most high-end databases employ the striping capabilities of a hardware disk array.
For most databases, optimizing the datafiles for read activity makes the most sense because Oracle sessions do not normally wait for datafile writes; the database writer process (DBWR) writes to disk asynchronously. However, if the DBWR cannot keep up with database activity, sessions need to wait for the DBWR to catch up. Likewise, we need to ensure that the flashback and redo log writer processes can keep up; otherwise, user sessions need to wait for these processes as well.
Chapters 21 and 22 covers the issues associated with optimizing disk IO:
- Understanding the Oracle IO mechanisms—buffered IO and direct IO, redo and archive log IO, flashback IO, and other mechanisms.
- Measuring IO performance and calculating optimal disk configurations.
- Using mechanisms for striping data, including RAID levels.
- Utilizing specific IO-related technologies such as ASM and SSD.