- Overview of the Buffer Cache
- Monitoring the Buffer Cache
- Sizing the Buffer Cache
Sizing the Buffer Cache
Data skew and access patterns vary considerably between workloads, making it difficult to predict caching behavior. Consequently, sizing buffer caches is a challenging exercise. As a simple rule of thumb, some people suggest sizing the buffer cache as a proportion of the size of the database. I have read recommendations ranging from 1% to 20% of total database size; the variation in these recommendations highlights the degree of difficulty involved in such a sizing.
Some time ago, Performance and Availability Engineering conducted a study to better understand the relationship between cache sizes and hit rates and the impact of both on performance. As we will see, the optimal buffer cache size was between 10% and 15% of the database size for the workload used in the study.
The TPC-C schema and workload provided the test environment, and DB2 for Solaris was used as the test database.
The investigation was divided into two phases. In the first phase, the number of CPUs was held constant while the database size and the buffer cache size were varied. In the second phase, the number of CPUs was varied while the buffer cache size and database size were held constant.
Measurements were taken with buffer cache sizes ranging from 199 Mbytes to 3.1 Gbytes, and database scales of 100 (equivalent to 8.7 Gbytes), 250, 375, 425, and 525 (equivalent to 45.7 Gbytes).
The findings of the study included the observations outlined in the following sections.
Influence of Buffer Cache Size on Throughput
Transaction throughput is dependent on the ratio of the database size to the buffer cache size. When the buffer cache is small relative to the size of the database, throughput is far below the peak value achievable for the configuration. Figure 7.1 shows the relationship between the throughput and the size of the buffer cache.
Figure 7.1 Throughput versus buffer cache size
As the buffer cache is gradually increased in size, throughput rises steeply at first, then reaches a "knee point," after which it continues to increase, but at a greatly reduced rate. The 100 scale curve clearly shows this phenomenon: as the buffer cache increases from 50,000 to 300,000 pages, the throughput increases rapidly. Beyond that point, large increases in the size of the buffer cache only yield small gains in throughput. Larger databases (scale 250 and above) also exhibit this behavior.
The 525 scale result highlights another key characteristic: as buffer cache sizes grow, it takes more memory to achieve the same throughput increase. In this case, the throughput at 200,000 buffers doubled as the cache increased by another 200,000 buffers, but beyond 400,000 buffers, it took twice as much memory (400,000 buffers) to increase the throughput by the same amount again.
Figure 7.1 also clearly demonstrates that smaller databases benefit more than larger databases when a small buffer cache is increased in size. This behavior may seem counterintuitive, but it results from the fact that a high proportion of a small database can be cached quickly, whereas the same memory increase caches a small proportion of a large database. At the same time, the growth in the throughput slows much more quickly for small databases, whereas the benefits continue for larger databases. In every case, a point of diminishing returns is eventually reached, beyond which further increases in buffer cache size yield diminishing benefits.
The study demonstrated that, for this workload, the optimal buffer cache size is between 10% and 15% of the database size. Table 7-1 shows the buffer cache sizes required to reach the "knee point," the point beyond which further cache size increases result in diminishing returns.
Table 7-1 Buffer size and data hit rate at knee points
Scale |
Buffer Size (Gbytes) |
% of Database Size |
Data Hit Rate |
100 |
1.24 |
14.2 |
95.7% |
150 |
1.59 |
12.2 |
96.7% |
200 |
1.97 |
11.3 |
96.3% |
250 |
2.19 |
10.1 |
96.3% |
Results for larger database scales are not reported since the knee point was not reached (larger buffer caches would have been required). Note, too, that the cache hit rate shown in the table applies only to data pages (DB2 for Solaris separately reports cache hit rate statistics for data pages, index pages, and all pages).
As Table 7-1 shows, the smaller databases used slightly larger proportions of buffer cache memory to reach their knee points.
A more recent study on a different hardware platform with a different database suggested an optimal buffer cache size of between 5% and 10% of the database size. The difference between these results and those from the earlier study demonstrates that no single answer fits all situations.
Influence of Buffer Cache Size on Data Cache Hit Rate
Figure 7.2 shows the relationship between the data cache hit rate and the buffer cache size.
Figure 7.2 Data cache hit rate versus buffer cache size
As with throughput, the data cache hit rate initially climbs steeply as the buffer cache increases in size, then reaches a knee point and gradually settles onto a plateau. As before, the 100 scale test illustrates the behavior most clearly, whereas the larger database sizes needed more memory to reach the cache hit rate upper limit.
The practical limit to the data cache hit rate for this workload was around 98%. The overall cache hit rate was higher, though; including the index cache hit rate in this figure would have increased the upper limit to around 99% since the index cache hit rate was consistently higher than the data cache hit rate.
Further tests confirmed that the behavior described for 12 CPUs also applied to 4 and 8 CPUs.
In concluding this discussion, it is worth noting that the memory capacities of Sun servers have increased significantly in recent years, a trend that is likely to continue. Although systems with 1 Gbyte of memory per CPU were typical in the past, current generation systems ship with 4 or 8 Gbytes per CPU. Larger memory configurations, combined with 64-bit databases on 64-bit Solaris, should allow buffer caches to be sized generously.
Influence of Page Size on Buffer Cache Effectiveness
Choosing the database page size (or block size) has important implications for performance of the database. Larger page sizes are usually chosen for DSS applications since they reduce overhead when large volumes of data are sequentially processed. Smaller page sizes are usually chosen for OLTP databases since I/O access tends to be random and row sizes relatively small.
One of the main reasons why the database page size impacts performance is due to its influence on the effectiveness of the database buffer cache. The page size determines both the size of pages in the buffer cache and the size of I/Os to and from the database tables on disk.
Consider a table with an average row length of 500 bytes: up to four rows can be accommodated in a 2-Kbyte page. The unit of database I/O is the page size, so every time a row is read into the buffer cache on behalf of an application, up to three other rows will accompany it. If the other three rows are required at the same time by that application or by other users, the buffer space is being used effectively and future I/Os will be saved. If the I/O access is random, though, then the other three rows are unlikely to be used.
The key issue here is access locality: locality of data access is high if data that is physically contiguous is accessed at or around the same time. Remember that in most cases, data is not stored in any particular order within a database table, so locality of access is hard to predict (although for some types of clustered index, data is stored in key sequence). When an entire table is scanned, such as for batch jobs or DSS queries, effective access locality is high since all rows will be processed within each page.
By contrast, up to 64 of the 500-byte rows can be accommodated in a 32-Kbyte page. If access locality is high and most or all of the other 63 rows can be used at the same time, many I/Os have been saved.
Large pages are the worst-case scenario, though, where data access is random and access locality is low. With 32-Kbyte pages, for example, each time a 500-byte row is read or written, a 32-Kbyte I/O must be completed. Worse still, for a buffer cache 2 Gbytes in size, the number of available 32-Kbyte buffers is only 65,536, compared to 1,048,576 buffers with 2-Kbyte pages. So fewer pages can be cached, while little benefit is derived from the large number of rows carried along with each page.
Very large buffer caches with 64-bit databases introduce other considerations. A 32-Gbyte buffer cache, for example, will support 16,777,216 pages 2 Kbytes in size and 1,048,576 pages 32 Kbytes in size. The overhead associated with managing the buffer cache increases as the number of pages increases. Consequently, a point will probably be reached beyond which it is more efficient to deal with fewer large pages.
To summarize, use smaller page sizes (2 to 8 Kbytes) for OLTP applications to maximize the effectiveness of the cache for randomly accessed data. Use larger page sizes for DSS applications to reduce the number of pages involved during access to large tables.