- Solid-State Disk Technology
- Exadata Flash Hardware
- Exadata Smart Flash Cache
- Exadata Smart Flash Logging
- Smart Flash Cache WriteBack
- Summary
Exadata Smart Flash Cache
The default configuration in an Exadata system is to use all of the Flash on the system as cache—in the Exadata Smart Flash Cache (ESFC). We’ll see in the next chapter how we can allocate that Flash for other purposes, but out of the box, it’s the ESFC that will deliver most of the Flash advantage.
The ESFC has a similar architecture to the Database Flash Cache that we looked at earlier in this chapter. However, there are some significant differences that you’ll want to be aware of, so avoid the mistake of assuming that the ESFC is just the Database Flash Cache for Exadata.
Exadata Smart Flash Cache Architecture
The ESFC is managed by the Exadata Storage Cell Server software CELLSRV. In general, when a Database Node requests a block of data from an ASM disk, the CELLSRV software issues asynchronous requests to the ESFC and to the grid disks that underlie the ASM disk group. If the data is in the Flash Cache, this is satisfied from the cache, and if not, from the grid disk. After forwarding the block to the Database Node, CELLSRV then stores any blocks retrieved from the grid disks into the Flash Cache—provided that the blocks are “eligible.”
Eligibility for caching is determined by metadata sent to the Storage Cell by the database server. This includes the size and type of I/O, as well as the segment’s CELL_FLASH_CACHE storage clause.
While it’s possible to configure an Oracle Database on Exadata as a single instance, most Exadata databases are configured as RAC clusters. In normal circumstances, therefore, the request arrives at the Storage Cell only when the block has not been found in the buffer cache of the requesting node, or in another node in the cluster.
Figure 15.9 represents the data flow for simple Exadata reads:
- The database looks for the blocks in the local cache.
- If not found in local cache, the database uses Cache Fusion to find the block in the Global Cache across the cluster.
- If not found in the Global Cache, the database requests the block from the storage server.
- The storage server reads the block from both the Flash and the disk system.
- The storage server returns the block from whichever source satisfies the request faster.
The storage server places the block into the Exadata Smart Flash Cache, if it was not already present.
Figure 15.9 Exadata read I/O lifecycle
It may seem unnecessary to belabor the Global Cache architecture of RAC in conjunction with our description of ESFC. However, the relationship between ESFC and the RAC Global Cache is critical to setting our expectations for ESFC performance. The ESFC is actually a third-level cache, resorted to only when Oracle fails to find the required data in the local buffer cache and in the global RAC cache. In some circumstances, the effectiveness of the local buffer cache and the Global Cache are so great that the additional caching of the ESFC offers only incremental advantage.
What the Exadata Smart Flash Cache Stores
Not everything that is sent from the Storage Cell to the database server gets placed in the Flash Cache. The storage server software can differentiate between different types of I/O requests—backup, Data Pump, archive logs, and so on. Only data file and control file blocks are cached in the ESFC. The CELLSRV also differentiates between database blocks accessed via single-block reads and those retrieved via full or Smart table scans.
By default Exadata stores only small I/Os in the Exadata Smart Flash Cache. Small I/Os in most cases are single-block reads. During a full table scan Oracle requests blocks in multiblock lumps (by default 16 blocks), and these are not stored in the Exadata Smart Flash Cache unless you change the CELL_FLASH_CACHE clause for the segment.
Flash Cache Compression
The F40 and F80 Flash SSD devices—provided on Exadata X-3 and X-4 machines respectively—can provide hardware-expedited compression of data within the Flash Cache. Depending on the nature of the data being stored in the cache, this can increase the effective capacity from two to four times. The compression is implemented in the Flash drives so it places virtually no load on the system. The feature requires the Advanced Compression option.
The Flash Cache compression is not enabled by default and is enabled by issuing an ALTER CELL flashCacheCompress=TRUE command and (on an X3 system) ALTER CELL flashCacheCompX3Support=TRUE. These commands must be issued before the Flash Cache is created, so you need to drop and re-create the Flash Cache to take advantage of this feature. See Oracle Support Note ID 1664257.1 for full details.
CELL_FLASH_CACHE Storage Clause
The segment STORAGE clause CELL_FLASH_CACHE controls prioritization of blocks within the ESFC and also the treatment of Smart Scan blocks. It has three possible settings:
- If set to NONE, no blocks for the segment are ever stored in the Exadata Smart Flash Cache.
- If set to DEFAULT, small I/Os (single-block reads) are stored in the Exadata Smart Flash Cache.
- If set to KEEP, Smart Scan and full table scan blocks are stored in the -Exadata Smart Flash Cache. Furthermore, when the storage server needs to evict blocks from the ESFC, blocks with the setting KEEP are evicted last.
We can examine the current settings for the CELL_FLASH_CACHE clause by querying USER_SEGMENTS or DBA_SEGMENTS:
SQL> l 1* SELECT segment_name,segment_type,cell_flash_cache FROM user_segments where segment_name like 'EXA%’ SQL> / SEGMENT_NAME SEGMENT_TYPE CELL_FLASH_CACHE ------------------------ ------------------ ---------------- EXA_TXN_DATA TABLE KEEP EXA_TXN_DATA_EIGHT_PK INDEX KEEP EXA_TXN_DATA_EIGTH TABLE KEEP EXA_TXN_DATA_HALF TABLE NONE EXA_TXN_DATA_HALF_PK INDEX KEEP EXA_TXN_DATA_PK INDEX DEFAULT EXA_TXN_DATA_SAS TABLE KEEP
We can adjust the setting for CELL_FLASH_CACHE during a CREATE TABLE or CREATE INDEX statement or after the fact using ALTER TABLE or ALTER INDEX:
SQL> ALTER TABLE exa_txn_data STORAGE (CELL_FLASH_CACHE none); Table altered.
Flash Cache KEEP Expiration
Some Oracle documentation describes the KEEP clause as pinning blocks into the ESFC, but this is not completely accurate. The KEEP clause prioritizes blocks but does not guarantee that all of an object’s blocks will be in ESFC. Oracle at the most reserves only 80% of the Exadata Smart Flash Cache for KEEP blocks. KEEP blocks are less likely to be aged out than DEFAULT, but eventually they will leave the cache if they are not accessed and other blocks are introduced—especially if more KEEP blocks are introduced.
Additionally, blocks marked for the KEEP segment of the cache are not privileged indefinitely; by default, a block’s KEEP privilege expires after 24 hours. You can observe this behavior by issuing a LIST FLASHCACHECONTENT command.
Here we see blocks in the Exadata Smart Flash Cache introduced as part of a scan of a table that had the CELL_FLASH_CACHE KEEP attribute:
CellCLI> list flashcachecontent where objectNumber=139536 detail cachedKeepSize: 2855739392 cachedSize: 2855936000 dbID: 325854467 dbUniqueName: hitCount: 0 hoursToExpiration: 24 missCount: 2729 objectNumber: 139536 tableSpaceNumber: 5
About 2.8GB of data is shown as both cachedSize and cachedKeepSize and HoursToExpiration shows how long the KEEP attribute is maintained. After 24 hours the entry for this object looks like this:
list flashcachecontent where objectNumber=139536 detail cachedKeepSize: 0 cachedSize: 2855936000 dbID: 325854467 dbUniqueName: hitCount: 0 missCount: 2729 objectNumber: 139536 tableSpaceNumber: 5
After the expiration period, blocks are still in the cache but are no longer marked as KEEP and can be evicted to make way for other non-KEEP blocks that may be introduced.
Monitoring Exadata Smart Flash Cache
In Chapter 16, we’ll look in detail at Exadata Smart Flash Cache monitoring using CellCLI statistics and other tools. But since these techniques are fairly complex—more suited to benchmarking and research projects than day-to-day practical -tuning—let’s look at some simpler ways of determining the effectiveness of the -Exadata Smart Flash Cache.
Clearly, the bottom line for any Flash technology is the reduction in overall I/O time. Therefore, the most effective technique is to alternate between various CELL_FLASH_CACHE settings and measure the difference in observed execution time and wait times in V$SYSTEM_EVENT and V$SESSION_EVENT. However, changing CELL_FLASH_CACHE on a production system is going to be somewhat disruptive, and you’re not always going to be able to perform side-by-side tests of different options.
V$SYSSTAT and V$SESSSTAT contain two statistics that provide quick insight into Exadata Smart Flash Cache performance:
- Cell Flash Cache read hits—This records the number of read requests that found a match in the Exadata Smart Flash Cache.
- Physical read requests optimized—This records the number of read requests that were “optimized” either by the Exadata Smart Flash Cache or through Storage Indexes. While this is less directly applicable to the Exadata Smart Flash Cache than the cell flash cache read hits statistic, it has the advantage of having an analogue column in V$SQL as we will see below.
Comparing these statistics to the physical read total IO requests statistic gives us some indication, as shown in Listing 15.1, of how many I/Os are being optimized (esfc_sessstat_qry.sql).
Listing 15.1 Optimized Cell I/O Statistics
SQL> l 1 SELECT name, VALUE 2 FROM v$mystat JOIN v$statname 3 USING (statistic#) 4 WHERE name IN ('cell flash cache read hits’, 5 'physical read requests optimized’, 6* 'physical read total IO requests’) SQL> / NAME VALUE ---------------------------------------- ------------- physical read total IO requests 117,246 physical read requests optimized 58,916 cell flash cache read hits 58,916
V$SQL records optimized read requests—Flash Cache and/or Storage Index I/O—in the column optimized_phy_read_requests. Therefore we can identify the SQLs cached that have the highest amount of optimized I/O and therefore are likely the heaviest users of the Exadata Smart Flash Cache. Listing 15.2 (esfc_vsql.sql)shows the top five SQLs in terms of optimized I/O.
Listing 15.2 Top Five Optimized I/O SQL Statements
SQL> l 1 SELECT sql_id, 2 sql_text, 3 optimized_phy_read_requests, 4 physical_read_requests, 5 optimized_hit_pct, 6 pct_total_optimized 7 FROM ( SELECT sql_id, 8 substr(sql_text,1,40) sql_text, 9 physical_read_requests, 10 optimized_phy_read_requests, 11 optimized_phy_read_requests * 100 12 / physical_read_requests 13 AS optimized_hit_pct, 14 optimized_phy_read_requests 15 * 100 16 / SUM (optimized_phy_read_requests) 17 OVER () 18 pct_total_optimized, 19 RANK () OVER (ORDER BY 20 optimized_phy_read_requests DESC) 21 AS optimized_rank 22 FROM v$sql 23 WHERE optimized_phy_read_requests > 0 24 ORDER BY optimized_phy_read_requests DESC) 25* WHERE optimized_rank <= 5 SQL> / Optimized Total Optimized Pct Total SQL_ID Read IO Read IO Hit Pct Optimized --------------- ----------- ----------- --------- --------- 77kphjxam5akb 270,098 296,398 91.13 12.19 4mnz7k87ymgur 269,773 296,398 91.02 12.18 8mw2xhnu943jn 176,596 176,596 100.00 7.97 4xt8y8qs3gcca 117,228 117,228 100.00 5.29 bnypjf1kb37p1 117,228 117,228 100.00 5.29
Exadata Smart Flash Cache Performance
The performance gains you can expect from the ESFC vary depending on your workload and configuration. Let’s look at a few examples.
Exadata Smart Flash Cache and Smart Scans
As mentioned earlier, Smart Scans are generally not cached in the Flash Cache, unless the CELL_FLASH_CACHE STORAGE setting is set to KEEP. Figure 15.10 illustrates this effect: successive scan operations on a large (50-million-row) table (using exactly the same SELECT and WHERE clauses) are unaffected by the Flash Cache, unless the table is associated with the CELL_FLASH_CACHE KEEP clause.
Figure 15.10 Effect of CELL_FLASH_CACHE storage setting on Exadata Smart Scans
Full (Not So Smart) Scans
Full table scans are treated very similarly to Smart Scans by the Exadata Smart Flash Cache. Consider the full table scan process: we read the first batch of blocks from the table, place them in the Flash Cache, read the next blocks and cache, and repeat until all the blocks have been read. Now, by the time we reach the end of the table, the first blocks in the table have been pushed down the least-recently-used chain and are now relatively “cold.” Indeed, by the time the last blocks have been read, the first blocks may have already aged out.
If this has happened, when we read the table again we find few or no blocks in the cache. Even worse, we’ve “polluted” the cache by filling it with large numbers of blocks from the full table scan that may never be read again. This is one of the reasons Oracle over time has almost completely eliminated caching of table scan blocks from the buffer cache and why by default Exadata does not cache full table scan blocks in the Exadata Smart Flash Cache.
Figure 15.11 illustrates exactly this phenomenon. When a large (50-million-row) full table scan is repeated with DEFAULT Flash Caching, it finds few blocks present in the Flash Cache and observes essentially no performance advantage over the case in which table storage is defined as CELL_FLASH_CACHE NONE. When the table has the CELL_FLASH_CACHE KEEP clause applied, its blocks are prioritized for retention in the ESFC, and as a result a very high Flash hit rate is obtained and consequently there is a large reduction in scan time.
Figure 15.11 Example of ESFC on a large full (non-Smart) table scan
So, then, how should we set CELL_FLASH_CACHE for segments subjected to frequent full table scans? Again, it depends on your workload and transaction priority, but a setting of KEEP is probably a good idea for tables that are small enough to fit into the Exadata Smart Flash Cache without difficulty, which are subject to full scans at frequent intervals and when you are motivated to optimize those scans. Typical candidates may be smaller tables involved with joins, profile and authentication tables, or anything else that is read constantly via full table scan.
Smart Flash Cache KEEP Overhead
We’ve been trained to regard the cost of storing data in a cache as negligible. After all, it takes only nanoseconds to store data in a RAM-based cache, and that’s the sort of cache we’re most used to—as in the Oracle Database buffer cache. However, the performance dynamics are substantially different for a Flash-based cache. Adding an element to the Exadata Smart Flash Cache is normally faster than writing to disk, but it’s a lot slower than writing to memory (about 95 microseconds versus 10 nanoseconds).
As we discussed earlier in this chapter, write latency for Flash devices can degrade significantly if garbage collection algorithms cannot keep up with a high rate of block updates. In a worst-case scenario, write operations can experience an order-of-magnitude degradation when entire pages of Flash storage require an erase operation prior to a new write. This situation is most likely to occur when large sequential writes are applied to Flash devices.
When we apply the CELL_FLASH_CACHE KEEP clause in order to optimize a full table scan or Smart Scan, we are effectively asking the Flash Cache to store the entire contents of a potentially very large table. The first time this happens we need to apply a large number of potentially sequential writes to the Flash Cache, and this can incur substantial overhead.
Figure 15.12 illustrates this overhead in practice. The first two bars represent the repeatable profile for full scan with a default value for CELL_FLASH_CACHE. When an ALTER TABLE statement is issued setting CELL_FLASH_CACHE to KEEP, performance initially worsens markedly as shown in the third bar. The additional time represents the time it takes the Storage Cell to populate the Exadata Smart Flash Cache with the entire contents of the table being scanned.
Figure 15.12 Overhead of full scans with CELL_FLASH_CACHE set to KEEP
Subsequent scans—as represented in the fourth bar—show a performance improvement since they can be satisfied from data held in the Smart Flash Cache. However, we might anticipate from time to time that the table might age out of the cache and that consequently a costly repopulation of the Exadata Smart Flash Cache would be required.
The overhead of initially populating the Exadata Smart Flash Cache varies depending on the version of Exadata (and hence the version of Flash hardware) but is additive to the overhead of reading from disk. In other words, the first full table scan with CELL_FLASH_CACHE set to KEEP is actually worse than a full table scan with CELL_FLASH_CACHE set to NONE.
Weigh this possibility—and the possibility of pushing other more promising blocks out of the smart cache—before setting CELL_FLASH_CACHE to KEEP for a large table. You should apply the CELL_FLASH_CACHE KEEP setting very judiciously.
Index Lookups and the ESFC
Index lookups have a completely different pattern of interaction with the ESFC compared to both Smart and “dumb” scans.
First, since indexed single-block reads are subject to caching in the buffer cache of each instance, there’s a reduced chance that a disk read will occur at all. Hit rates in the buffer cache of 90% or more are commonplace, so only one in ten logical read requests or fewer might pass through to the storage server.
Second, because Exadata databases are usually RAC databases, the Database Nodes use the RAC interconnect to obtain the required block from another instance that might have the block in its buffer cache if it can’t be found in the local buffer cache.
Blocks that cannot be found in the local buffer cache or Global Cache may be found in the Exadata Smart Flash Cache, but given the relatively large amounts of memory available on the Database Nodes, it’s quite possible that such a block either never has been requested before or has aged out of the cache anyway.
Nevertheless, for many tables, supplementing the buffer cache and Global Cache with the ESFC leads to substantial improvements by reducing the cost of a buffer or Global Cache “miss.” Figure 15.13 shows such a situation. Disabling the ESFC by setting the CELL_FLASH_CACHE clause to NONE results in a significant increase in the time taken to perform random single-block reads (500,000 reads over a random range of 500,000 key values in a 100-million-row table).
Figure 15.13 ESFC and primary key lookups (500,000 primary key lookups)
Setting the CELL_FLASH_CACHE to KEEP is often unnecessary and possibly counterproductive for indexed single-block reads. While KEEP tends to retain the block’s read for a longer period (at the expense of other blocks, of course), the LRU aging out of blocks in the DEFAULT cache probably leads to a more effective cache overall. In other words, you may see some small improvement in indexed reads for a specific table if you set CELL_FLASH_CACHE to KEEP, but you’ll be doing so at the cost of a less efficient ESFC overall and hurting the performance of queries on segments where CELL_FLASH_CACHE is set to DEFAULT. And remember, KEEP affects the caching of blocks accessed via full scans in such a way as to potentially harm performance (discussed earlier in this chapter).