SNAPSHOT Isolation
SQL Server changes the state of a database by performing a transaction on it. Each transaction is a unit of work consisting of one or more steps. A "perfect" transaction is ACID, meaning it is atomic, consistent, isolated, and durable. In short, this means that the result of performing two transactions on a database, even if they are performed simultaneously by interleaving some of the steps that make them up, will not corrupt the database.
Atomic means that a transaction will perform all of its steps or fail and perform none of its steps. Consistent means that the transaction must not leave the results of a partial calculation in the database; for example, if a transaction is to move money from one account to another, it must not terminate after having subtracted money from one account but not having added it to another. Isolated means that none of the changes a transaction makes to a database become visible to other transactions until the transaction making the changes completes, and then they all appear simultaneously. Durable means that changes made to the database by a transaction that completes are permanent, typically by being written to a medium like a disk.
A transaction need not always be perfect. The isolation level of a transaction determines how close to perfect it is. Prior to SQL Server 2005, SQL Server provided four levels of isolation: READ UNCOMMITTED, REPEATABLE READ, READ COMMITTED, and SERIALIZABLE.
A SERIALIZABLE transaction is a perfect transaction. Functionally, a database could always use SERIALIZABLEthat is, perfect transactions, but doing so would typically adversely affect performance. Judicious use of isolation levels other than SERIALIZABLE, when analysis of an application shows that it does not require perfect transactions, will improve performance in these cases.
SQL Server uses the isolation level of a transaction to control concurrent access to data through a set of read and write locks. It applies these locks pessimistically; that is, they physically prevent any access to data that might compromise the required isolation level. In some cases, this will delay a transaction as it waits for a lock to be freed, or may even cause it to fail because of a timeout waiting for the lock.
SQL Server 2005 adds SNAPSHOT isolation that, in effect, provides alternate implementations of SERIALIZABLE and READ COMMITTED levels of isolation that use optimistic locking to control concurrent access rather than pessimistic locking. For some applications, SNAPSHOT isolation may provide better performance than preSQL Server 2005 implementations did. In addition, SNAPSHOT isolation makes it much easier to port database applications to SQL Server from database engines that make extensive use of SNAPSHOT isolation.
SQL Server 2005 has two kinds of SNAPSHOT isolation: transaction-level and statement level. Transaction-level SNAPSHOT isolation makes transactions perfect, the same as SERIALIZABLE does. Statement-level SNAPSHOT isolation makes transactions that have the same degree of isolation as READ COMMITTED does.
The transaction-level SNAPSHOT isolation optimistically assumes that if a transaction operates on an image of that database's committed data when the transaction started, the result will be the same as a transaction run at the SERIALIZABLE isolation level. Some time before the transaction completes, the optimistic assumption is tested, and if it proves not to be true, the transaction is rolled back.
Transaction-level SNAPSHOT isolation works by, in effect, making a version of the database by taking a snapshot of it when a transaction starts. Figure 7-1 shows this.
There are three transactions in Figure 7-1: transaction 1, transaction 2, and transaction 3. When transaction 1 starts, it is given a snapshot of the initial database. Transaction 2 starts before transaction 1 finishes, so it is also given a snapshot of the initial database. Transaction 3 starts after transaction 1 finishes but before transaction 2 does. Transaction 3 is given a snapshot of the initial database plus all the changes committed by transaction 1.
Figure 7-1: Snapshot Versioning
The result of using SERIALIZABLE or transaction-level SNAPSHOT isolation is the same; some transactions will fail and have to be retried, and may fail again, but the integrity of the database is always guaranteed.
Of course, SQL Server can't actually make a snapshot of the entire database, but it gets that effect by keeping track of each change to the database until all transactions that were started before the change was made are completed. This technique is called row versioning.
The row versioning model is built upon having multiple copies of the data. When reading data, the read happens against the copy, and no locks are held. When writing the data, the write happens against the "real" data, and it is protected with a write lock. For example, in a system implementing row versioning, user A starts a transaction and updates a column in a row. Before the transaction is committed, user B wants to read the same column in the same row. He is allowed to do the read but will read an older value. This is not the value that A is in the process of updating to, but the value A is updating from.
In statement-level SNAPSHOT isolation, the reader always reads the last committed value of a given row, just as READ COMMITTED does in a versioning database. Let's say we have a single-row table (called tab) with two columns: ID and name. Table 7-1 shows a versioning database at READ COMMITTED isolation.
Table 7-1: Versioning Database at READ COMMITTED Isolation
Step |
User 1 |
User 2 |
1 |
BEGIN TRAN SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
2 |
|
BEGIN TRAN -UPDATE tab SET name = WHERE id = 1 |
3 |
SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
4 |
|
COMMIT |
5 |
SELECT name FROM tab WHERE id = 1 **value is 'NewName' |
|
6 |
COMMIT |
|
7 |
SELECT name FROM tab WHERE id = 1 **value is 'NewName' |
|
The other transaction isolation level in a versioning database, SERIALIZABLE, is always implemented by the behavior that the reader always reads the row as of the beginning of the transaction, regardless of whether other users' changes are committed during the duration of the transaction or not. This was shown qualitatively in Figure 7-1. Table 7-2 shows a specific example of how two transactions interoperate when the SERIALIZABLE level of SNAPSHOT isolation is used.
The difference between this table and Table 7-1 occurs at step 5. Even though user 2 has updated a row and committed the update, user 1, using the SERIALIZABLE transaction isolation level, does not "see" the next value until user 1 commits his transaction. He sees the new value only in step 7. In SQL Server this is called "transaction-level SNAPSHOT isolation."
Table 7-2: Versioning Database at SERIALIZABLE Isolation
Step |
User 1 |
User 2 |
1 |
BEGIN TRAN SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
2 |
|
BEGIN TRAN -UPDATE tab SET name = 'Newname' WHERE id = 1 |
3 |
SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
4 |
|
COMMIT |
5 |
SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
6 |
COMMIT |
|
7 |
SELECT name FROM tab WHERE id = 1 **value is 'NewName' |
|
Both statement- and transaction-level SNAPSHOT isolation require that SNAPSHOT be enabled by using the SNAPSHOT isolation option of the ALTER DATABASE command. The following SQL batch does this for the pubs database.
ALTER DATABASE pubs SET ALLOW_SNAPSHOT_ISOLATION ON
SNAPSHOT isolation can be turned on or off as needed.
Once SNAPSHOT isolation has been enabled, transaction-level isolation is used by specifically setting the transaction isolation level to SNAPSHOT. The following SQL batch does this.
ALTER DATABASE pubs SET ALLOW_SNAPSHOT_ISOLATION ON GO USE pubs GO SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRANS SQL Expressions COMMIT TRANS
The SQL expression in the preceding batch will be executed, in effect, against a snapshot of the database that was taken when BEGIN TRANS was executed.
Statement-level SNAPSHOT isolation requires the use of an additional database option, READ_COMMITTED_SNAPSHOT. If this database option and ALLOW_ SNAPSHOT_ISOLATION are ON, all transactions done at the READ UNCOMMITTED or READ COMMITTED levels will be executed as READ COMMITTEDlevel transactions using versioning instead of locking. Both transactions shown in the SQL batch that follows will be executed as READ COMMITTED using versioning.
alter the database ALTER DATABASE pubs SET ALLOW_SNAPSHOT_ISOLATION ON SET READ_COMMITTED_SNAPSHOT ON GO USE pubs GO SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED BEGIN TRAN SQL expression will be executed as READ COMMITTED using versioning END TRAN SET TRANSACTION ISOLATION LEVEL READ COMMITTED BEGIN TRAN SQL expression will be executed as READ COMMITTED using versioning END TRAN
Whether ALLOW_SNAPSHOT_ISOLATION is ON or not can be checked for a particular database by the DATABASEPROPERTYEX command. This command returns the current database option or setting for a particular database. The setting to check is the SnapshotIsolationFramework setting, as in following code for the pubs database:
SELECT DATABASEPROPERTYEX ('pubs', 'SnapshotIsolationFramework')
Table 7-3: Example of SNAPSHOT Isolation
Step |
User 1 |
User 2 |
1 |
SET TRANSACTION ISOLATION BEGIN TRAN UPDATE snapTest SET col1 = 'NewNiels' WHERE id = 1 |
|
2 |
|
-SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT col1 FROM snapTest WHERE id = 1 ** receives value 'Niels' |
3 |
COMMIT TRAN |
|
4 |
|
SELECT col1 FROM snapTest WHERE id = 1 ** receives value 'Niels' |
5 |
|
COMMIT TRAN |
6 |
|
SELECT col1 FROM snapTest WHERE id = 1 -** receives value 'NewNiels' |
As stated earlier, SQL Server does not actually make a copy of a database when a SNAPSHOT transaction is started. Whenever a record is updated, SQL Server stores in TEMPDB a copy (version) of the previously committed value and maintains these changes. All the versions of a record are marked with a timestamp of the transactions that made the change, and the versions are chained in TEMPDB using a linked list. The newest record value is stored in a database page and linked to the version store in TEMPDB. For read access in a SNAPSHOT isolation transaction, SQL Server first accesses from the data page the last committed record. It then retrieves the record value from the version store by traversing the chain of pointers to the specific record version of the data.
The code in Table 7-3 shows an example of how SNAPSHOT isolation works. The example uses a table, snapTest, looking like this.
it is necessary to run SET ALLOW_SNAPSHOT_ISOLATION ON if that's not done already CREATE TABLE snapTest ([id] INT IDENTITY, col1 VARCHAR(15)) insert some data INSERT INTO snapTest VALUES(1,'Niels')
The steps in Table 7-3 do the following:
We start a transaction under SNAPSHOT isolation and update one column in one row. This causes SQL Server to store a copy of the original value in TEMPDB. Notice that we do not commit or roll back at this stage, so locks are held. If we were to run sp_lock, we would see an exclusive lock on the primary key.
We start a new transaction under a new session and try to read from the same row that is being updated at the moment. This is the row with an exclusive lock. If this had been previous versions of SQL Server (running under at least READ COMMITTED), we would be locked out. However, running in SNAPSHOT mode, SQL Server looks in the version store in TEMPDB to retrieve the latest committed value and returns "Niels".
We commit the transaction, so the value is updated in the database and another version is put into the version store.
User 2 does a new SELECT (from within his original transaction) and will now receive the original value, "Niels".
User 2 finally commits the transaction.
User 2 does a new SELECT (after his transaction commits) and will now receive the new value, "NewNiels".
SNAPSHOT isolation is useful for converting an application written for a versioning database to SQL Server. When an application is developed for a versioning database, the developer does not need to be concerned with locking. Converting such an application to SQL Server may result in diminished performance because more locking is done than is required. Prior to SQL Server 2005, this sort of conversion may have required rewriting the application. In version 2005, in many cases the only thing that will have to be done is to enable SNAPSHOT isolation and READ_COMMITTED_SNAPSHOT.
SNAPSHOT isolation is also beneficial for applications that mostly read and do few updates. It is also interesting to note that when SQL Server 2005 is installed, versioning is enabled in the MASTER and MSDB databases by default.
Drawbacks of Versioning
Versioning has the capability to increase concurrency but does come with a few drawbacks of its own. Before you write new applications to use versioning, you should be aware of these drawbacks. You can then assess the value of locking against the convenience of versioning.
It can be costly because record versions need to be maintained even if no read operations are executing. This has the capability of filling up TEMPDB. If a database is set up for versioning, versions are kept in TEMPDB whether or not anyone is running a SNAPSHOT isolationlevel transaction. Although a "garbage collector" algorithm will analyze the older versioning transaction and clean up TEMPDB eventually, you have no control over how often that cleanup in done. Plan the size of TEMPDB accordingly; it is used to keep versions for all databases with SNAPSHOT enabled. If you run out of space in TEMPDB, long-running transactions may fail.
In addition, reading data will sometimes cost more because of the need to traverse the version list. If you are doing versioning at the READ COMMITTED isolation level, the database may have to start at the beginning of the version list and read through it to attempt to read the last committed version.
There is also the possibility of update concurrency problems. Let's suppose that in Table 7-1 user 1 decides to update the row also. Table 7-4 shows how this would look.
Table 7-4: Versioning Database at SERIALIZABLE IsolationConcurrent Updates
Step |
User 1 |
User 2 |
1 |
BEGIN TRAN SELECT name FROM tab WHERE id = 1 **value is 'Name' |
|
2 |
|
BEGIN TRAN -UPDATE tab SET name = 'Newname' WHERE id = 1 |
3 |
|
COMMIT |
4 |
UPDATE tab SET name = WHERE id = 1 ** produces concurrency violation |
|
5 |
ROLLBACK (and try update again?) |
|
In this scenario, user 1 reads the value "Name" and may base his update on that value. If user 2 commits his transaction before user 1 commits his, and user 1 tries to update, he bases his update on possibly bad data (the old value he read in step 1). Rather than allowing this to happen, versioning databases produce an error. The error message in this case is as follows:
Msg 3960, Level 16, State 1, Line 1. Cannot use snapshot isolation to access table 'tab' in database 'pubs'. Snapshot transaction aborted due to update conflict. Retry transaction.
Obviously, retrying transactions often enough will slow down the overall throughput of the application. In addition, the window of time for a concurrency violation to occur increases the longer a transaction reads old values. Because, at the SERIALIZABLE isolation level, the user always reads the old value until he commits the transaction, the window is much biggerthat is, concurrency violations are statistically much more likely to occur. In fact, vendors of versioning databases recommend against using SERIALIZABLE isolation (SQL Server ISOLATION LEVEL SNAPSHOT) in most cases. READ COMMITTED is a better choice with versioning.
Finally, as we said before, in versioning databases reads don't lock writes, which might be what we want. Is this possible with a versioning database? Locking-database programmers, when using versioning, tend to lock too little, introducing subtle concurrency problems. In a versioning database, there must be a way to do insist on a lock on read. Ordinarily this is done by doing a SQL SELECT FOR UPDATE. But SQL Server does not support SELECT FOR UPDATE with the appropriate semantic. There is, however, a solution. Even when READ_COMMITTED_SNAPSHOT is on, you can ensure a read lock by using SQL Server's REPEATABLE READ isolation level, which never does versioning. The SQL Server equivalent of ANSI's SELECT FOR UPDATE is SELECT with (REPEATABLEREAD). Note that this is different from the SQL Server UPDLOCK (update lock), which is a special lock that has similar semantics but only works if all participants in all transactions are using UPDLOCK. This is one place where programs written for versioning databases may have to change their code in porting to SQL Server 2005.
Monitoring Versioning
Allowing versioning to achieve concurrency is a major change. We've already seen how it can affect monitoring and capacity planning for TEMPDB. Therefore, all the tools and techniques that we've used in the past must be updated to account for this new concurrency style. Here are some of the enhancements that make this possible.
There are the following new T-SQL properties and metadata views:
DATABASEPROPERTYEXTells us if SNAPSHOT is on
sys.fn_top_version_generators()Tables with most versions
sys.fn_transaction_snapshot()Transaction active when a SNAPSHOT transaction starts
sys.fn_transactions()Includes information about SNAPSHOT transaction (or not), if SNAPSHOT includes information about version chains and SNAPSHOT timestamps
There are new performance monitor counters for the following:
Average version store data-generation rate (kilobytes per minute)
Size of current version store (kilobytes)
Free space in TEMPDB (kilobytes)
Space used in the version store for each database (kilobytes)
Longest running time in any SNAPSHOT transaction (seconds)
SNAPSHOT isolation information is also available during event tracing. Because a SNAPSHOT transaction has to be aware of any updates committed by other users, other users' updates appear in SQL Profiler while tracing a SNAPSHOT isolation transaction. Beware, since this can significantly increase the amount of data collected by Profiler.