Oracle Performance Tuning: A Methodical Approach
- A Brief History of Oracle Performance Tuning
- Moving Beyond a Symptomatic Approach
- Stage 1: Minimizing the Application Workload
- Stage 2: Reducing Contention and Bottlenecks
- Stage 3: Reducing Physical IO
- Stage 4: Optimizing Disk IO
- Summary
Oracle performance tuning has come a long way over the years, but it is too often still approached in a haphazard or inefficient manner. Consider the following cautionary tale:
A mission-critical application system is experiencing unsatisfactory performance. As an experienced Oracle performance specialist, you are called in to diagnose the problem. The first - you do is examine the database wait times to see where the database is spending the majority of execution time. As we'll see later, this information can easily be found by looking in the V$SYSTEM_EVENT and V$SYS_TIME_MODEL views.
Looking at these views, two things stand out. First, the vast majority of database time is spent reading from disk devices. Second, the average time to read a single block from disk is much higher than you would expect given the capabilities of the disk hardware involved.
You suspect that the disk array might have insufficient IO bandwidth to support the application's demands. In other words, not enough physical disks are in the disk array to support the IO rate required. After a quick calculation, you recommend increasing the number of disk devices in the array by a factor of four. The dollar cost is substantial, as is the downtime required to redistribute data across the new disks within the array.1 Nevertheless, something needs to be done, so management approves the expense and the downtime. Following the implementation, users report they are satisfied with performance, and you modestly take all the credit.
A successful outcome? You think so, until . . .
- Within a few months performance is again a problem and disk IO is again the culprit.
- Another Oracle performance expert is called in and reports that a single indexing change would have fixed the original problem with no dollar cost and no downtime.
- The new index is implemented, following which the IO rate is reduced to one-tenth of that observed during your original engagement. Management prepares to sell the now-surplus disk devices on eBay and marks your consulting record with a "do not reengage" stamp.
- Your significant other leaves you for an Oracle salesperson, and you end up shaving your head and becoming a monk.
After years of silent mediation, you realize that while your tuning efforts correctly focused on the activities consuming the most time within the database, they failed to differentiate between causes and effects. Consequently, you mistakenly dealt with an effect—the high disk IO rate—while neglecting the cause (a missing index).
In this chapter we consider a methodology that ensures that you focus on the root causes of Oracle performance problems. This approach avoids the repetitive trial-and-error process that is characteristic of a lot of performance-tuning efforts and ensures that you get the biggest performance gains for your tuning efforts.
A Brief History of Oracle Performance Tuning
In the early '90s, the discipline of tuning an Oracle server was nowhere near as well established as today. In fact, performance tuning was mostly limited to a couple of well-known "rules of thumb."
The most notorious of these guidelines was that you should tune the Buffer Cache Hit Ratio: the ratio that describes the proportion of blocks of data requested by a SQL that are found in memory. If ten blocks of data are requested, and nine of them are found in memory, the hit ratio is 90 percent. Increasing the buffer cache size until the ratio reached 90 percent to 95 percent was often suggested. Similar target values were suggested for other ratios, such as the latch hit ratio.
The problem with these "ratio-based" techniques was that although the ratios usually reflected some measure of internal Oracle efficiency, they were often only loosely associated with the performance experienced by an application using the database. For example, although it is obviously better for a block of data to be found in memory—resulting in a high hit rate—SQL statements that inefficiently read the same data over and over again would often result in a high hit rate. Indeed, a very high hit ratio is often a symptom of badly tuned SQL.
The emergence of wait information in Oracle version 7.1 provided an alternative method of approaching tuning. This wait information showed the amount of time Oracle sessions spent waiting for various events, such as a lock becoming available or a disk IO completing. By concentrating on the wait events that accounted for the greatest amount of total wait time, Oracle performance tuners could target their tuning efforts more effectively.
Pioneers of systematic Oracle performance tuning, such as Anjo Kolk, author of the famous "Yet Another Performance Profiling" (YAPP) methodology, promoted this technique vigorously.
Wait-based tuning took a surprisingly long time to reach the mainstream: 5–10 years passed between the original release of the wait information and widespread acceptance of the technique; however, today almost all Oracle professionals are familiar with wait-based tuning.