- Solid-State Disk Technology
- Exadata Flash Hardware
- Exadata Smart Flash Cache
- Exadata Smart Flash Logging
- Smart Flash Cache WriteBack
- Summary
Exadata Smart Flash Logging
Exadata Storage Software 11.2.2.4 introduced the Smart Flash Logging feature. The intent of this feature is to reduce overall redo log sync times by allowing the Exadata Flash storage to serve as a secondary destination for redo log writes. During a redo log sync, Oracle writes to the disk and Flash simultaneously and allows the redo log sync operation to complete as soon as either device completes its write.
In the event that the Flash Cache wins the race to write, the data need be held for only a short time until the storage server is certain that all writes have made it to the redo log. Since the Smart Flash Log is only a temporary store, only a small amount of Flash storage is required—512MB per cell (out of 3.2TB on an X4, 1.6TB on an X3, or 365GB on an X2 system).
Figure 15.14 illustrates the essential flow of control. Oracle processes performing DML generate redo entries which are written to the redo buffer (1). Periodically or upon COMMIT the LGWR flushes the buffer (2), resulting in an I/O request to the CELLSRV process (3). CELLSRV writes to Flash and grid disk simultaneously (4), and when either I/O completes, it returns control to the LGWR (5).
Figure 15.14 Exadata Smart Flash Logging
The use of Flash SSD to optimize redo log operations has been a somewhat contentious topic. Many—including this author—have argued that Flash SSD is a poor choice for redo log workloads. The nature of sequential redo log I/O tends to favor the spinning magnetic disk since sequential I/O minimizes seek latency, while penalizing Flash-based SSD, since the continual overwriting of existing blocks makes the probability of a block erase very high.
However, the Exadata Smart Flash Logging feature is not predicated on some theoretical write I/O advantage for Flash SSD. Rather it aims to “smooth out” redo log writes by running redo log writes out through two channels (grid disk and Flash SSD) and allowing the redo log write to complete when either of the two completes.
Redo log sync waits—which occur whenever a COMMIT occurs—generally involve only a couple of milliseconds of wait time since they involve only a small sequential write operation on an (ideally) relatively lightly loaded disk subsystem. Keeping redo logs on separate ASM disk groups from data files helps ensure that heavy data file I/O loads do not affect the time taken for redo operations.
However, it’s inevitable that from time to time a redo log sync operation will conflict with some other I/O—an archive read or Data Guard operation, for instance. In these circumstances some redo log sync operations may take a very long time indeed.
Following is some Oracle trace log data that shows some redo log sync waits:
WAIT #4..648: nam='log file sync’ ela= 710 WAIT #4..648: nam='log file sync’ ela= 733 WAIT #4...648: nam='log file sync’ ela= 621 WAIT #4...648: nam='log file sync’ ela= 507 WAIT #4...648: nam='log file sync’ ela= 683 WAIT #4...648: nam='log file sync’ ela= 2084 WAIT #4...648: nam='log file sync’ ela= 798 WAIT #4...648: nam='log file sync’ ela= 1043 WAIT #4...648: nam='log file sync’ ela= 2394 WAIT #4...648: nam='log file sync’ ela= 932 WAIT #4...648: nam='log file sync’ ela= 291780 WAIT #4...648: nam='log file sync’ ela= 671 WAIT #4...648: nam='log file sync’ ela= 957 WAIT #4...648: nam='log file sync’ ela= 852 WAIT #4...648: nam='log file sync’ ela= 639 WAIT #4...648: nam='log file sync’ ela= 699 WAIT #4...648: nam='log file sync’ ela= 819
The ela entry shows the elapsed time in microseconds. Most of the waits are less than 1 millisecond (1000 microseconds), but in the middle we see an anomalous wait of 291,780 microseconds (about one-third of a second!).
Occasional very high redo log sync waits like the one just shown might not seem too disturbing until you remember that redo log sync waits are frequently included in the most critical application transactions. Online operations such as saving a shopping cart, confirming an order, and saving a profile change all generally involve some sort of commit operation, and it’s well known that today’s online consumers rapidly lose patience when operations delay even by fractions of a second. So even occasional high redo log wait times are cause for concern. It’s the intent of Exadata Smart Flash Logging to smooth out these disturbing outliers.
Controlling and Monitoring Smart Flash Logging
Exadata Smart Flash Logging is enabled by default and you don’t have to do anything specifically to enable it—other than to make sure your Storage Cells are running at least Exadata Storage Software 11.2.2.4.
You can confirm your Flash Log status by issuing a LIST FLASHLOG command at a CellCLI prompt:
CellCLI> list flashlog detail name: exa1cel01_FLASHLOG cellDisk: FD_09_exa1cel01,FD_02_exa1cel01, creationTime: 2012-07-07T06:56:23-07:00 degradedCelldisks: effectiveSize: 512M efficiency: 100.0 id: 3c08cfe1-ea43-4fde-85c2-0bbd5cbd11ec size: 512M status: normal
You can control the behavior of Exadata Smart Flash Logging by using a resource management plan. This allows you to turn Exadata Smart Flash Logging on or off for individual databases.
So, for instance, this command will turn Exadata Smart Flash Logging off for database GUY and leave it on for all other databases:
ALTER IORMPLAN dbplan=((name=’GUY’,flashLog=false), (name=other,flashlog=on))’
You can monitor the behavior of Exadata Smart Flash Logging by using the following CellCLI command:
CellCLI> list metriccurrent where objectType=’FLASHLOG’; FL_ACTUAL_OUTLIERS FLASHLOG 1 IO requests FL_BY_KEEP FLASHLOG 0 FL_DISK_FIRST FLASHLOG 253,540,190 IO requests ...... ...... FL_FLASH_FIRST FLASHLOG 11,881,503 IO requests ...... ...... FL_PREVENTED_OUTLIERS FLASHLOG 275,125 IO requests
These are probably the most interesting CellCLI metrics generated by this command:
- FL_DISK_FIRST—the grid disk log write completed first during the redo log write operation
- FL_FLASH_FIRST—the Flash SSD completed first during the redo log write operation
- FL_PREVENTED_OUTLIERS—the number of redo log writes that were optimized by the Flash Logging that would otherwise have taken longer than 500 milliseconds to complete
Testing Exadata Smart Flash Logging
Let’s look at an example. Say we test Exadata Smart Flash Logging by running 20 concurrent processes, each of which performs 200,000 updates and commits—a total of 4 million redo log sync operations. Now, Exadata Smart Flash Logging is disabled using a resource plan (see the ALTER IORMPLAN statement in the previous section) and the tests are repeated. We capture every redo log sync wait in a DBMS_MONITOR trace file for analysis using the R statistical package.
With Exadata Smart Flash Logging disabled, our key CellCLI metrics look like this:
FL_DISK_FIRST 32,669,310 IO requests FL_FLASH_FIRST 7,318,741 IO requests FL_PREVENTED_OUTLIERS 774,146 IO requests
With Exadata Smart Flash Logging enabled, the metrics look like this:
FL_DISK_FIRST 33,201,462 IO requests FL_FLASH_FIRST 7,337,931 IO requests FL_PREVENTED_OUTLIERS 774,146 IO requests
So for this particular cell the Flash disk “won” only 3.8% of the time (the ratio of FL_FLASH_FIRST and FL_DISK_FIRST) and prevented no outliers. (Outliers are redo log syncs that take longer than 500 milliseconds to complete.)So on the surface, it would seem that very little has been achieved.
However, statistical analysis of the redo log sync times provides a somewhat different interpretation. Table 15.1 summarizes the key statistics for the two tests.
Table 15.1 Effect of Exadata Smart Flash Logging on Redo Log Sync Waits
Redo Log Sync Time (microseconds) |
|||||
Smart Flash Logging |
Min |
Median |
Mean |
99% |
Max |
On |
1.0 |
650 |
723 |
1656 |
75,740 |
Off |
1.0 |
627 |
878 |
4662 |
291,800 |
Exadata Smart Flash Logging reduced the mean log file sync wait time by over 15%—and this difference was statistically significant. There was also a significant reduction in the 99th percentile—the minimum wait time for the top 1% of waits was reduced from about 4.6 seconds to 1.6 seconds.
Figure 15.15 shows the distribution of log file sync waits with the Exadata Smart Flash Logging feature enabled and disabled. Turning Exadata Smart Flash Logging on created a strange hump on the high side of what otherwise looks like a normal bell curve distribution. Understanding that hump requires that we take a look at the distribution of very high outlier redo log waits.
Figure 15.15 Distribution of log file sync waits with Exadata Smart Flash Logging
Figure 15.16 shows the distribution of the top 10,000 waits. This shows far more clearly how Exadata Smart Flash Logging worked to reduce high outlier log file sync waits. These waits have been pulled back, but to a point that is still above the average wait time for other log file sync waits. This creates the hump in Figure 15.15 and represents a significant reduction in the time taken for outlying redo log waits.
Figure 15.16 Distribution of top 10,000 log file sync waits with Exadata Smart Flash Logging
While Flash SSD is not necessarily an ideal storage medium for redo write I/O, Exadata Smart Flash Logging does reduce the impact of very high outlier redo log writes.