- Overview of In-Memory OLTP
- In-Memory Optimization Requirements
- Limitations of In-Memory OLTP
- Using In-Memory OLTP
- Using Memory-Optimized Tables
- Logging, Checkpoint, and Recovery for In-Memory OLTP
- Managing Memory for In-Memory OLTP
- Backup and Recovery of Memory-Optimized Databases
- Migrating to In-Memory OLTP
- Dynamic Management Views for In-Memory OLTP
- The Buffer Pool Extension
- Summary
Using In-Memory OLTP
In order to use in-memory-optimized tables, the database must first be configured to support In-Memory OLTP and then you need to create tables, specifying that they are to be memory-optimized.
Enabling a Database for In-Memory OLTP
Any database that will contain memory-optimized tables must have a MEMORY_OPTIMIZED_DATA filegroup defined which is used for storing the checkpoint files needed by SQL Server to recover the memory-optimized tables. The syntax for creating the filegroup is similar to creating a regular filestream filegroup, but must also specify the CONTAINS MEMORY_OPTIMIZED_DATA option. (For more information on creating and defining filegroups, see Chapter 20, ”Creating and Managing Databases”)
To create a new database with a MEMORY_OPTIMIZED_DATA filegroup, you would execute a command similar to the following:
CREATE DATABASE InMemory ON PRIMARY ( NAME = N'InMemory_Data', FILENAME = N'D:\SQL2014\Data\InMemoryData.mdf' , SIZE = 2048MB), FILEGROUP [InMem_FG] CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = N'InMemory_FG_dir', FILENAME = N'D:\SQL2014\Data\InMemory_FG_dir') LOG ON ( NAME = N'InMemory_log', FILENAME = N'D:\SQL2014\Data\InMemory.ldf' , SIZE = 1024MB) COLLATE Latin1_General_100_BIN2 GO
This command creates a database named InMemory with a memory-optimized filegroup container and filegroup named InMem_FG to the database. Notice that the MEMORY_OPTIMIZED_DATA filegroup specifies a directory, not a specific file name. Also notice the Latin1_General_100_BIN2 binary collation was specified. Currently, any indexes on memory-optimized tables can only be on columns using a Windows (non-SQL) BIN2 collation and natively compiled procedures only support comparisons, sorting, and grouping on those same collations. If you don’t specify this as the default collation for the entire database, you’ll need to specify the Latin1_General_100_BIN2 collation for any character columns within the CREATE TABLE statement of your memory-optimized tables.
It is also possible to add a MEMORY_OPTIMIZED_DATA filegroup to an existing database with the ALTER DATABASE command and then adding a directory to that filegroup:
ALTER DATABASE AdventureWorks2012 ADD FILEGROUP AW_inmem_FG CONTAINS MEMORY_OPTIMIZED_DATA; GO ALTER DATABASE AdventureWorks2012 ADD FILE (NAME='AW_inmem_FG_dir', FILENAME=N'D:\SQL2014\Data\AW_inmem_FG_dir') TO FILEGROUP AW_inmem_FG; GO
SSMS also supports adding a memory-optimized filegroup to a database on the filegroup page, as shown in Figure 33.2
Figure 33.2 Configuring a MEMORY_OPTIMIZED_DATA filegroup in SSMS.
Once you have configured a MEMORY_OPTIMIZED_DATA filegroup for the database, you can create memory-optimized tables.
Creating Memory-Optimized Tables
Creating a memory-optimized table is similar to creating a regular table, with a few differences. The first is you need to specify the MEMORY_OPTIMIZED = ON option which tells SQL Server that the table is to be memory-optimized. This will create a table DLL which is loaded into memory.
A memory-optimized table can be defined with one of two DURABILITY values:
- SCHEMA_AND_DATA—When this option is specified, both the schema and the data will be persisted on disk. With this option, there are also two possible levels of durability which are controlled at the database level: full durability and delayed durability. The full durability option writes the transaction to disk upon commit, whereas the delayed durability option stores the transaction in memory and writes it to disk later. Delayed durability can improve performance but can also result in data loss in the event of a server crash or power outage. For more information on the delayed durability option, see Chapter 28, ”Transaction Management and the Transaction Log.”
- SCHEMA_ONLY—When this option is specified, only the table definition is persisted. The data is not. Only the empty table will be available after a server restart. Selecting this option provides the best performance but this option should only be used for transient data or in situations where data loss is acceptable.
Also, memory-optimized table must always have at least one index and all tables except for those created with the SCHEMA_ONLY option must have a declared primary key. Thus the single index requirement can be satisfied by defining a primary key constraint in the CREATE TABLE statement.
The following example shows the creation of a memory-optimized table with a nonclustered PRIMARY KEY index created on the ID column:
CREATE TABLE InMemTab ( [ID] int IDENTITY PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024), [Name] nvarchar(32) not null , [City] nvarchar(50) null, [State] char(2) null, [LastModified] datetime2 not null ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
As you can see, creating a memory-optimized table is similar to creating a regular table, with a few minor differences. Here, the first column, ID, is the primary key and is defined using the new hash index. The BUCKET_COUNT keyword specifies the size of the hash table. The general recommendation is that the BUCKET_COUNT value be set to between one and two times the maximum expected number of distinct values in the index (more on this in the ”Indexes on Memory-Optimized Tables” section).
When a memory-optimized table is created, the In-Memory OLTP engine will generate and compile DML routines for accessing that table, and load the routines as DLLs. SQL Server itself does not perform the actual data manipulation on memory-optimized tables, instead it calls the appropriate DLL for the required operation when a memory-optimized table is accessed.
Currently, no schema changes are allowed once a memory-optimized table is created. If you need to make any changes to the table definition, including any index changes, you will need to drop and recreate the table. Indexes can only be specified as part of the CREATE TABLE syntax (inline specification of indexes is a new feature in SQL Server 2014). For example, if you wanted to create an additional nonclustered index on the Name column, you’d have to drop and recreate the table:
DROP TABLE InMemTab go CREATE TABLE InMemTab ( [ID] int IDENTITY PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024), [Name] nvarchar(32) COLLATE Latin1_General_100_BIN2 not null, [City] nvarchar(50) null, [State] char(2) null, [LastModified] datetime2 not null, INDEX IDX_InMemTab_Name NONCLUSTERED (Name) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
Memory-Optimized Tables Row Structure
As mentioned previously, memory-optimized tables are not stored like disk-based tables. The main difference is that rows for memory-optimized tables are not stored on pages like disk-based tables, and space is not allocated from extents. The design principle behind the storage structure of memory-optimized tables is to optimize access for byte-addressable memory instead of block-addressable disk.
Rows for a single table are not stored in any specific order, and not necessarily near other rows from the same table. SQL Server knows which rows belong to the same table via the tables’ indexes, which keep the rows linked together. This is why memory-optimized tables require at least one index be created on them. The index provides the structure for the tables.
The rows themselves also have a structure that is very different from the row structures used for disk-based tables. Each row consists of a header and a payload area containing the data values. Figure 33.3 provides a diagram of the memory-optimized table data row.
Figure 33.3 The SQL Server 2014 engine architecture with In-Memory OLTP.
The row header contains the following values:
- Two 8-byte fields holding In-Memory OLTP timestamps: a Begin-Ts and an End-Ts. The value of Begin-Ts is the timestamp of the transaction that inserted the row. The End-Ts value is the timestamp of the transaction that deleted the row. A special value (referred to as ‘infinity’) is used as the End-Ts value for rows that have not been deleted.
- A four-byte statement ID value—Every statement within a transaction has a unique StmtId value. When a row is created it stores the StmtId for the statement that created the row. It allows the statement to skip rows it has already inserted.
- A two-byte value (idxLinkCount) which is a reference count indicating how many indexes reference this row.
- An index pointer array—Contains a pointer for each index on the table. It is these pointers plus the index data structures that connect the rows of a table together. The number of pointers is equal to the number of indexes defined on the table.
Every database that supports memory-optimized tables manages two internal counters that are used to generate the Begin-Ts and End-Ts timestamps:
- The Transaction-ID counter—A global, unique value that is reset when the SQL Server instance is restarted and is incremented every time a new transaction starts.
- The Global Transaction Timestamp—A global, unique value that is not reset on a restart. It is incremented each time a transaction ends and begins validation processing and the new value is the timestamp for the current transaction.
When a row is first inserted into a memory-optimized table, before the insert transaction is completed, the transaction’s timestamp is not known so the global Transaction_ID value is used for Begin-Ts until the transaction commits. Similarly, for a delete operation, the transaction timestamp is not known, so the End-Ts value for the deleted rows uses the global Transaction_ID value, which is replaced once the real Transaction Timestamp is known. The Begin-Ts and End-Ts values are the mechanism used to determine which other transactions are able to see this row.
The payload area contains the row data, including the key columns plus all the other columns in the row. With this structure, all indexes on a memory-optimized table are essentially covering indexes. The payload format varies depending on the table. When the table is created, the In-Memory OLTP compiler generates the DLLs for table operations. Based upon the payload format used initially when inserting rows into a table, it generates the appropriate commands for all row operations.
Indexes on Memory-Optimized Tables
As mentioned previously, each memory-optimized table must have at least one index because it is the indexes that connect the rows together (more on this later). The indexes on memory-optimized tables must be created as a part of the CREATE TABLE statement. You can’t use the CREATE INDEX statement to create an index for a memory-optimized table after the table has already been created. Memory-optimized tables support a maximum of eight indexes, including the primary key index, and unique indexes are not allowed except for the primary key index.
Indexes on memory-optimized tables are never written to disk, not even to the transaction log. Only the data rows, and changes to the data, are written to the transaction log for durable tables defined with the SCHEMA_AND_DATA option. Instead, indexes on memory-optimized tables are populated based on the index definitions when SQL Server starts.
Memory-optimized indexes are also inherently covering indexes. Covering means that all memory-optimized table columns are virtually included in the index, so bookmark lookups are not needed to read the data. Rather than a reference to the primary key, memory-optimized indexes simply contain a memory pointer to the actual row in the table data structure. (If you are unfamiliar with these terms or concepts, you may want to review Chapters 31, ”Understanding SQL Server Data Structures” and 32, ”Indexes and Performance”).
In addition, fragmentation and fillfactor do not apply to memory-optimized indexes. In disk-based indexes, fragmentation refers to pages in the B-tree being stored out-of-order on disk. This is not an issue for memory-optimized indexes since they are not written to or read from disk. Fillfactor in disk-based B-tree indexes refers to the degree to which the physical page structures are filled with index rows. Memory-optimized index structures do not have fixed-size pages so there’s no degree of free space to consider.
There are two types of indexes available for memory-optimized tables:
- Nonclustered hash indexes which are designed for point lookups.
- Memory-optimized nonclustered indexes which are intended for range scans and ordered scans.
Hash Indexes
A hash index consists of an array of pointers with each element of the array referred to as a hash bucket. A hash function is applied against the index key column of each row and the result of the function determines which bucket it is stored in. All key values that hash to the same value are accessed from the same pointer in the hash index and are linked together in a chain. When a row is added to the table, the hash function is applied to the index key value in the row and inserted into a hash bucket if the hash key doesn’t already exist. Figure 33.4 provides a simplified example of a hash index.
Figure 33.4 A simplified diagram of a hash index.
The figure has three buckets with 4 data rows. The top bucket contains one row. The second bucket from the top contains 2 rows. Notice how the rows are linked together. These rows have the same hash value and are linked together through the chain of index pointers in the data rows. The sixth bucket contains one row.
Figure 33.5 shows a diagram of a memory-optimized table with 2 hash indexes defined on it. The solid arrows represent pointers for Hash Index 1 while the dotted lines represent pointers for Hash Index 2. Notice how the linking of rows from Hash Index 2 is different from the linking for Hash Index 1.
Figure 33.5 A diagram of a memory-optimized table with two hash indexes.
The performance of queries that scan an index chain greatly depends on the number of records in the chain. The greater the number of records, the slower the query. Two factors affect the size of the index chain. The first is the selectivity of the index key, essentially the number of distinct values in the table compared to the total number of rows. If there are a large number of duplicate values resulting in low selectivity, the less efficient the hash index will be.
The other factor is the number of hash buckets defined for the index. This also determines the size and amount of memory used by the hash index. The number of buckets is a fixed amount based upon the bucket count specified when the table is created. It is recommended that the number of buckets be set to a value equal to or greater than the estimated number of unique values to be stored in the index key column. This will allow for a greater likelihood that each bucket will only have rows with a single value in its chain. Be careful not to choose a number that is too big however, because each bucket uses memory and the number of buckets you supply is rounded up to the next power of two. For example, a value of 50,000 will be rounded up to 65,536. Having extra buckets will only waste memory and not improve performance and possibly reduce the performance of scans which have to check each bucket for matching rows.
Hash indexes are optimized for index seeks on equality predicates and also support full index scans. Also, keep in mind that the hash function actually used is based on ALL the key columns, not a subset. For example, if you have a hash index on the columns lastname and firstname, a row with the values ”Smith” and ”John” will probably hash to a different bucket than a row with the values ”Smith” and ”Josh”. A query that supplies just a lastname value, or one with an inexact firstname value (such as ”Jo%”) will not be able to use the hash index to find the matching rows.
If you have no idea of the number of buckets you’ll need for a particular column, or for columns that will participate in searches over a range of values, you may want to consider creating a memory-optimized nonclustered index instead of a hash index.
Memory-Optimized Nonclustered Indexes
Memory-optimized nonclustered indexes are implemented using a new SQL Server data structure called a Bw-tree which is a lock- and latch-free variation of a B-tree index used for disk-based nonclustered indexes.
The general structure of Bw-tree nonclustered index is similar to SQL Server’s regular B-trees, but the index pages are not a fixed size and once the index rows are built they cannot be modified. The maximum size of an index page is 8KB. Bw-tree pages are similar to a regular B-tree page in that each index page contains a set of ordered key values, and for each value there is a corresponding pointer. Unlike regular B-tree indexes where the index rows store the minimum value on the page it points to at the next level down, the key value stored in the Bw-tree index is the highest value possible on the page at the next level down. (For a refresher on nonclustered B-tree index structures in SQL Server, see Chapter 31.)
The upper levels of the index, the root and intermediate pages, are referred to as the internal pages and these pages contain pointers which point to an index page at the next level of the tree. However, unlike a SQL Server B-tree index, the page pointer is a logical page ID (PID) instead of a physical page number. The PID represents a position in a mapping table, which links each PID with a physical memory address where the index page resides. Also, unlike a B-tree index, the pages at the same level are not linked in a doubly linked list: each index page knows the PID of the next page at the same level but not the PID of the previous page. Linking to the next page at the same level is done by looking up the PID in the mapping table.
At the leaf level, the pointers point to the actual chain of data rows with the same index key values. Multiple data rows with the same index key are linked together like In-Memory OLTP hash indexes, via a pointer in the index pointer array. Figure 33.6 shows a simplified example of a memory-optimized nonclustered index.
Figure 33.6 A diagram of memory-optimized nonclustered indexes.
Only a subset of the PID values are represented in Figure 33.6. The index pages are showing the key values that the index references. Each index row in the internal index pages contains a key value and the corresponding PID of the page at the next level down which contains that value as the highest ordered key value on that page. For example, the index key value ”New York” on the root page has PID of 2, which maps to the index page with ”New York” as the highest ordered key value. The leaf level index pages also contain index key values, but instead of a PID, they hold the memory address of a data row, which could be the first data row in a chain of data rows.
When modifications occur to an internal index page, rather than updating the index page, it is replaced with a new page and the mapping table is updated so that the PID now points to the new physical memory address of the index page. This avoids the need to change internal pages with references to obsolete pages.
At the leaf level, data changes are kept track of using a set of delta records. The leaf pages themselves are not replaced for every change. The delta records create a chain of memory pointers with the last pointer referencing the actual leaf index page. Each update to a page, which can be an insert or delete of a key value on that page, produces a page containing a delta record indicating the change that was made. An update is represented by two new delta records, one for the delete of the original value, and one for the insert of the new value. When each delta record is added, the mapping table is updated with the physical address of the page containing the newly added delta record.
Figure 33.7 illustrates this behavior. The mapping table is showing only a single page with logical address 4. The physical address in the mapping table originally was the memory address of the corresponding leaf level index page, shown as PID 4. After an update of R1, two delta records are added to PID 4 to reflect the delete followed by the insert and the physical address in the mapping table is updated to reflect the Delta: Insert R1 record. Then record R2 is deleted, another delta record is added for this delete and the mapping table updated to point to this delta record. Finally, a new record R3 is added and a delta record created to reflect this and again, the mapping table is updated again to now point to this delta record.
Figure 33.7 A single index leaf page and associated delta records.
When searching the nonclustered range index, the In-Memory OLTP engine must combine the delta records with the base index leaf page which makes the search operation a bit more expensive, but not having to completely replace the leaf page for every modification provides an overall performance savings. However, the delta pages are not kept around forever. Eventually, as discussed later in this chapter, the original page and the chain of delta records are consolidated into a new base page.
Memory-Optimized Nonclustered Index Page Structure
As mentioned previously, In-Memory OLTP nonclustered index pages are not a fixed size as they are for indexes on disk-based tables, although the maximum index page size is still 8KB. Memory-optimized nonclustered index pages all have a header area which contains the following information:
- PID—The pointer into the mapping table
- Page Type—leaf, internal, delta, or special
- Right PID—The PID of the next page to the right of the current page
- Height—The number of levels from the current page to the leaf
- The number of index key values stored on the page
- Delta records statistics—The number of delta records plus the space used by the delta key values
- Max Key—The upper limit of the index key values contained on the page
In addition, both leaf and internal pages contain two or three fixed length arrays:
- Values—A pointer array with entry that are each 8 bytes long. For internal pages the entry contains the PID of the page at the next level down. For a leaf page, the entry contains the memory address for the first row in a chain of rows having equal key values.
- Keys—This is the array of the key values. For internal pages, the key represents the first value on the page referenced by the PID. For leaf pages, the key is the value in the chain of rows.
- Offsets—This array exists only for indexes with variable length keys. Each entry is 2 bytes and contains the offset where the corresponding key starts in the key array on the page.
The smallest pages in a memory-optimized nonclustered index are typically the delta pages, which have a header which contains most of the same information as in an internal or leaf page, but the delta page headers don’t include the arrays described for leaf or internal pages. A delta page only contains an operation code (insert or delete), a single key value, and the memory address of the first row in a chain of records.
Internal Maintenance of Memory-Optimized Nonclustered Indexes
As discussed in the previous section, a long chain of delta records can eventually degrade search performance. Eventually, the delta records need to be consolidated. When the In-Memory OLTP attempts to add a new delta record to a chain that already has 16 elements, the changes in the delta records will be consolidated into the referenced index page, and the page will then be rebuilt, including the changes indicated by the new delta record that triggered the consolidation. The newly rebuilt page will have the same PID but a new memory address and the page mapping table is updated with this new address. The old pages (index page plus delta pages) will be marked for garbage collection.
Two other type of index maintenance operations take place in addition to consolidation of delta records: split and merge.
Index pages grow as needed, starting from storing a single row to storing up to a maximum of 8K bytes. Once the index page grows to 8K bytes in size, an attempt to insert a new row into that index page will trigger a page split. For internal pages, this occurs when there is no more room to add another key value and pointer. For leaf pages, it occurs when SQL Server determines that there is insufficient space on the leaf index page to accommodate the new index value once the delta records are consolidated. The space needed is determined from the delta records statistics information in the page header of the leaf page, which is adjusted as each new delta record is added.
A split operation is done in two atomic steps. In the first step, SQL Server creates two new leaf-level pages (let’s call them P1 and P2) and splits the old page values between them. It then adds an entry to the mapping table for the second newly created page (P2), adds the logical pointer to P2 to page P1, repoints the mapping table to the first newly created page (P1), and marks the old page and the delta records for garbage collection. This is all done as a single atomic operation and until complete, P1 and P2 are not accessible to any concurrent operations. Also, at this point, P2 is only accessible via the logical link form P1. It has no entry and is not accessible via a parent internal page.
In the second step, SQL Server creates a new internal page and populates it with key values that represent the new leaf-level pages. When the new internal page is built, the pointer in the mapping table is updated to point to the new page and the old internal page is marked for garbage collection.
Figure 33.8 illustrates both steps in a page split operation.
Figure 33.8 Page splitting of a memory-optimized nonclustered index.
The other process, page merging, occurs when a delete operation leaves an index page with records that use less than 10% of the maximum page size (currently 8K), or results in an index page with a single row on it. When either of these two conditions occurs, the page will be merged with its neighboring page. Like page splitting, this is also a multi-step operation. When a row is deleted from a page, a delta record for the delete is added and a check is made to determine if the page qualifies for merge (i.e. only one row remains or the remaining space after deleting the row will be less than 10% of maximum page size).
If the page qualifies for a merge, the merge is performed in three atomic steps. In the first step, a delta record is created and its pointer is set to point to the leaf index page, plus a special merge-delta record is created and points to the delete delta record. In one atomic step, the pointer in the page mapping table is updated to point to the merge delta page. The results of this step are illustrated in Figure 33.9.
Figure 33.9 Step 1 of a page merge operation.
In the second step, a new internal page that does not reference the page to be merged is created and the mapping table is updated to point to the newly created internal page. The old internal page is marked for garbage collection. The results of step 2 are illustrated in Figure 33.10.
Figure 33.10 Step 2 of a page merge operation.
In the third and final step, a new leaf level page is created containing the merged contents of the two original leaf pages. The mapping table is updated to point to this new leaf page and the old pages and delta records are marked for garbage collection. The results of step 3 are illustrated in Figure 33.11.
Figure 33.11 Step 3 of a page merge operation.
Garbage Collection
Because SQL Server’s In-Memory OLTP is a row-versioning system, data modifications generate versions of rows rather than updating row data. Eventually, older row versions will become stale. As described earlier, every row has a BeginTs and EndTs timestamp that indicate when the row was created and when it was deleted. Transactions only see the versions of rows that were valid at the time when the transaction started, that is, the Global Transaction Timestamp value at the time the transaction started is between the BeginTs and EndTs of the row. When the EndTs of a row is older than the Global Transaction Timestamp of the oldest active transaction in the system, the row is considered to be stale. Since stale rows are invisible to any active transactions, they serve no purpose anymore and will slow down scans of index structures and waste memory. Garbage collection is the process that cleans up these stale rows and reclaims memory.
The garbage collection system is designed to be non-blocking, cooperative, efficient, responsive, and scalable. Although there is a dedicated system thread for the garbage collection process, user threads actually do most of the work (hence the cooperative aspect of the design). If a user thread is scanning an index and comes across a stale row version, it will unlink that version from the current chain, adjust the pointers and also decrement the reference counter (IdxLinkcount) in the row header. However, the user transaction does not deallocate the row. When the user thread completes its transaction, it adds information about the transaction (which rows it created or deleted) to a queue of transactions to be processed by the garbage collection process.
The garbage collector thread periodically (approximately once per minute) goes through the queue and analyzes stale rows to build work items, collections of rows that need to be deallocated. These work items are inserted into other queues partitioned on a logical CPU basis and user threads (and sometimes the system garbage collector) pick up one or more work items from a queue and deallocate the rows, freeing up the memory used by the rows.
The DMV sys.dm_db_xtp_index_stats returns a row for each index on each memory-optimized table, and the column rows_expired indicates how many rows have been detected as being stale during scans of that index. Another column, rows_expired_removed, indicates how many rows have been unlinked from the index. As mentioned previously, once rows have been unlinked from all indexes on a table, they can be removed by the garbage collection thread. The following is an example of a query using the sys.dm_db_xtp_index_stats DMV (the query joins the DMV to the sys.indexes catalog view to be able to return the index names):
SELECT name AS 'index_name', s.index_id, scans_started, rows_returned, rows_expired, rows_expired_removed FROM sys.dm_db_xtp_index_stats s JOIN sys.indexes i ON s.object_id=i.object_id and s.index_id=i.index_id WHERE object_id('InMemTab') = s.object_id;
Maintaining Statistics on Memory-Optimized Tables
SQL Server creates index- and column-level statistics on memory-optimized tables, but in the current release, there is no mechanism in place to update the statistics automatically. The main reason is that SQL Server In-Memory OLTP does not keep any row modification counters so it has no mechanism to calculate when statistics should be auto-updated. Another reason is so that there will not be any chance of dependency failures due to waiting for statistics to be updated.
The problem with the lack of auto-updating of statistics is that indexes can only be created at the time the memory-optimized table is created, so the initial statistics are created when the table is empty. Since they are never updated automatically, is it up to you (or the database admins) to design and implement a statistics maintenance strategy for your system.
The general recommendation is to update the statistics after loading data into the table or whenever SQL Server or the database restarts. If the data in the memory-optimized table is volatile, you should develop a strategy that updates the statistics on a regular basis.
Statistics are updated on memory-optimized tables using the familiar UPDATE STATISTICS command, with one minor variation. When running UPDATE STATISTICS on memory-optimized tables, SQL Server always performs a full scan, unlike for disk-based tables where it samples the data by default.
You can also use the sp_updatestats stored procedure to update all statistics in the database. Unlike for disk-based tables, the sp_updatestats stored procedure always updates all statistics on memory-optimized tables. For disk-based tables, it will skip statistics that do not need to be updated.
Memory-Optimized Index Design Guidelines
The goal of optimization of natively compiled stored procedures is similar to the goal of optimization of interpreted procedures: to find query plans for each of the statements in the procedure so that they can be executed as efficiently as possible. The formula that the optimizer uses for assessing the relative cost of operations on memory-optimized tables is similar to the costing formula for operations on disk-based tables, with a few exceptions. Because of the differences in the way that memory-optimized tables are organized and managed and the different types of indexes available, the optimizer does need to be aware of different choices it may need to make, as well as various execution plan options that are not available when working with memory-optimized tables.
Some of the limitations pertain to the access paths available to the in-memory engine during query optimization with respect to its use of hash and range indexes. If the optimizer finds a not useful hash or range index, it will essentially perform a table scan, although the concept of a table scan is misleading since with memory-optimized tables, all data access is through indexes. The optimizer will choose one of the indexes to use through which all rows will be retrieved. Typically, the optimizer will choose a hash index over a range index if the cost estimations are the same.
If the query is performing a search of a range of values, or requires that rows be returned in a sorted order, a hash index will not be used as ordered scans cannot be performed against a hash index. Also, the optimizer cannot use a hash index unless the query provides search conditions for all columns contained in the index key. For example, if you have a hash index on two columns (city, state), a row for a customer from Albany, NY would hash to a completely different value than a row for a customer from Albany, GA. If a query only supplies a search condition on the city column, a hash value cannot be generated to search against the hash values stored in the index. For similar reasons, a hash index can only be used if the filter is based on an equality. If the query does not specify an exact value for one of the columns in the hash index key, the hash value cannot be determined. For a query with a condition of city LIKE ‘San%’, which is essentially a range search, a hash lookup is not possible. A hash index will revert to a scan given an inequality predicate or if not all index keys are provided as search arguments.
Memory-optimized nonclustered indexes are better at retrieving ranges of values. However, they only support retrieving the table rows in the order that was specified when the index was created. Range indexes cannot be scanned in reverse order because, as shown previously, there is no concept of ”previous pointers” in a range index on a memory-optimized table. For data to be searched or sorted in reverse order, the index would have to be created as a descending index. If reverse order scans or sorting of results in descending order are needed, it is possible to have two indexes on the same column, one defined as ascending and one defined as descending. It is also possible to have both a range and a hash index on the same column to support both equality and range searches.
When a column in a nonclustered index has the same value in many rows (i.e., there are a large number of duplicate values in the index key columns), performance can degrade for updates, inserts, and deletes. One way to improve performance in this situation is to add another column to the nonclustered index.
Do not create indexes on your memory-optimized tables that will be rarely used. Garbage collection works best if all indexes on the table are used frequently. Rarely-used indexes may cause the garbage collection system to not perform optimally for old row versions.
Determining the Optimal Bucket Count for Hash Indexes
You must specify a value for the BUCKET_COUNT parameter when you create the memory-optimized table. If you cannot determine the correct bucket count, use a nonclustered index instead. An incorrect BUCKET_COUNT value, especially one that is too low, can significantly impact workload performance, as well as recovery time of the database. It is better to overestimate the bucket count.
In most cases the bucket count should be between 1 and 2 times the number of distinct values in the index key. If the index key contains a lot of duplicate values (for example, if on average there are more than 10 rows for each index key value), use a nonclustered index instead.
You may not always be able to predict how many values a particular index key may have or will have. Performance should be acceptable if the BUCKET_COUNT value is within 5 times of the actual number of key values.
You can use the sys.dm_db_xtp_hash_index_stats DMV to obtain statistics about the number of empty buckets and the length of row chains in a hash index. The following example query can be used to obtain statistics about all the hash indexes in the current database (note, this query can take several minutes to run if there are large tables in the database):
SELECT object_name(hs.object_id) AS 'object name', i.name as 'index name', hs.total_bucket_count, hs.empty_bucket_count, floor((cast(empty_bucket_count as float) /total_bucket_count) * 100) AS 'empty_bucket_percent', hs.avg_chain_length, hs.max_chain_length FROM sys.dm_db_xtp_hash_index_stats AS hs JOIN sys.indexes AS i ON hs.object_id=i.object_id AND hs.index_id=i.index_id
The key values to focus on to evaluate your hash bucket configuration are empty_bucket_percent and avg_chain_length.
The empty_bucket_percent column indicates the number of empty buckets in the hash index. If empty_bucket_percent is less than 10 percent, the bucket count is likely to be too low. Ideally, the empty_bucket_percent should be 33 percent or greater.
The avg_chain_length column indicates the average length of the row chains in the hash buckets. An average chain length of 1 is ideal. If avg_chain_length is greater than 10 and empty_bucket_percent is greater than 10 percent, there likely are many duplicate index key values. A nonclustered index on that index key would likely be more appropriate.