Transaction Logging
So just what is transaction logging and how does it work? Transaction logging is simply a process that is used to keep track of changes made to a database (by transactions), as they are made. Each time an update or a delete operation is performed, the page containing the record to be updated/deleted is retrieved from storage and copied to the appropriate buffer pool, where it is then modified by the update/delete operation (if a new record is created by an insert operation, that record is created directly in the appropriate buffer pool). Once the record has been modified (or inserted), a record reflecting the modification/insertion is written to the log buffer, which is simply another designated storage area in memory. (The actual amount of memory that is reserved for the log buffer is controlled by the logbufsiz database configuration parameter.) If an insert operation is performed, a record containing the new row is written to the log buffer; if a delete operation is performed, a record containing the row's original values is written to the log buffer; and if an update operation is performed, a record containing the row's original values, combined with the row's new values, is written to the log buffer. These kinds of records, along with records that indicate whether the transactions that were responsible for making changes were committed or rolled back, make up the majority of the records stored in the log buffer.
Whenever buffer pool I/O page cleaners are activated, the log buffer becomes full, or a transaction is terminated (by being committed or rolled back), all records stored in the log buffer are immediately written to one or more log files stored on disk. (This is done to minimize the number of log records that might get lost in the event a system failure occurs.) This process is referred to as write-ahead logging and it ensures that log records are always flushed to log files before data changes are recorded in the database (i.e., copied to the appropriate tablespace containers for permanent storage). Eventually, all changes made in the buffer pool are recorded in the database, but only after the corresponding log records have been externalized to one or more log files. The modified data pages themselves remain in memory, where they can be quickly accessed if necessary; eventually they will be overwritten. The transaction logging process is illustrated in Figure 7-4.
Figure 7-4. The transaction logging process.
Because multiple transactions may be working with a database at any given point in time, a single log file may contain log records that belong to several different transactions. Therefore, to keep track of which log records belong to which transactions, every log record is assigned a special "transaction identifier" that ties it to the transaction that created it. By using transaction IDs, log records associated with a particular transaction can be written to one or more log files at any time, without impacting data consistency. Eventually, the execution of the COMMIT or ROLLBACK statement that terminates the transaction will be logged as well.
Since log records are externalized frequently and since changes made by a particular transaction are only externalized to the database after all log records associated with the transaction have been recorded in one or more log files, the ability to return a database to a consistent state after a failure occurs is guaranteed. When the database is restarted, log records are analyzed and each record that has a corresponding COMMIT record is reapplied to the database; every record that does not have a corresponding COMMIT record is either ignored or backed out (which is why "before" and "after" information is recorded for all update operations).
Logging Strategies
When a database is first created, three log files, known as primary log files, are allocated as part of the creation process. On Linux and UNIX platforms, these log files are 1,000 4K (kilobyte) pages in size; on Windows platforms, these log files are 250 4K pages in size. However, the number of primary log files used, along with the amount of data each is capable of holding, is controlled by the logprimary and logfilsiz parameters in the database's configuration file. The way in which all primary log files created are used is determined by the logging strategy chosen for the database. Two very different strategies, known as circular logging and archival logging, are available.
Circular Logging
When circular logging is used, records stored in the log buffer are written to primary log files in a circular sequence. Log records are written to the current "active" log file and when that log file becomes full, it is marked as being "unavailable". At that point, DB2 makes the next log file in the sequence the active log file, and begins writing log records to it. And when that log file becomes full, the process is repeated. In the meantime, as transactions are terminated and their effects are externalized to the database, their corresponding log records are released because they are no longer needed. When all records stored in an individual log file are released, that file is marked as being "reusable" and the next time it becomes the active log file, its contents are overwritten with new log records.
Although primary log files are not marked reusable in any particular order (they are marked reusable when they are no longer needed), they must be written to in sequence. So what happens when the logging cycle gets back to a primary log file that is still marked "unavailable"? When this occurs, the DB2 Database Manager will allocate what is known as a secondary log file and begin writing records to it. As soon as this secondary log file becomes full, the DB2 Database Manager will poll the primary log file again and if its status is still "unavailable", another secondary log file is allocated and filled. This process will continue until either the desired primary log file becomes "reusable" or the number of secondary log files created matches the number of secondary log files allowed. If the former occurs, the DB2 Database Manager will begin writing log records to the appropriate primary log file and logging will pick up where it left off in the logging sequence. In the meantime, the records stored in the secondary log files are eventually released, and when all connections to the database have been terminated and a new connection is established, all secondary log files are destroyed. On the other hand, if the latter happens, all database activity will stop and the following message will be generated:
SQL0964C The transaction log for the database is full.
By default, up to two secondary log files will be created, if necessary, and their size will be the same as that of each primary log file used. However, the total number of secondary log files allowed is controlled by the logsecond parameter in the database configuration file. Circular logging is illustrated in Figure 7-5.
Figure 7-5. Circular logging.
By default, when a new database is first created, circular logging is the logging strategy used.
Archival Logging
Like circular logging, when archival logging (also known as log retention logging) is used, log records stored in the log buffer are written to the primary log files that have been pre-allocated. However, unlike with circular logging, these log files are never reused. Instead, when all records stored in an individual log file are released, that file is marked as being "archived" rather than as being "reusable" and the only time it is used again is if it is needed to support a roll-forward recovery operation. Each time a primary log file becomes full, another primary log file is allocated so that the desired number of primary log files (as specified by the logprimary database configuration parameter) are always available for use. This process continues as long as there is disk space available.
By default, all log records associated with a single transaction must fit within the active log space available (which is determined by the maximum number of primary and secondary log files allowed and the log file size used). Thus, in the event a long running transaction requires more log space than the primary log files provide, one or more secondary log files may be allocated and filled as well. If such a transaction causes the active log space to become full, all database activity will stop and the SQL0964C message we saw earlier will be produced.
Because any number of primary log files can exist when archival logging is used, they are classified according to their current state and location. Log files containing records associated with transactions that have not yet been committed or rolled back that reside in the active log directory (or device) are known as active log files; log files containing records associated with completed transactions (i.e., transactions that have been externalized to the database) that reside in the active log directory are known as online archive log files; and log files containing records that are associated with completed transactions that have been moved to a storage location other than the active log directory are known as offline archive log files. Offline archive files can be moved to their storage location either manually or automatically with a user exit program. Archival logging is illustrated in Figure 7-6.
Figure 7-6. Archival logging.
Infinite Active Logging. You would think that you could avoid running out of log space simply by configuring a database to use a large number of primary and/or secondary log files if needed. However, the maximum number of log files allowed (primary and secondary combined) is 256 and if the size of your log files is relatively small, you can still run out of log space quickly when transaction workloads become heavy or when transactions run for an inordinate amount of time. Furthermore, you want to avoid allocating a large number of secondary log files if possible because performance is affected each time a log file has to be allocated. Ideally, you want to allocate enough primary log files to handle most situations and you want to use just enough secondary log files to handle peaks in transaction workloads.
If you are concerned about running out of log space and you want to avoid allocating a large number of secondary log files, you can configure a database to perform what is known as infinite active logging or infinite logging. Infinite active logging allows an active transaction to span all primary logs and one or more archive logs, effectively allowing a transaction to use an infinite number of log files. To enable infinite active logging, you simply set the database configuration parameters userexit and logsecond to YES and 1, respectively. It is important to note that when the userexit database configuration parameter is set to YES, a user-supplied userexit program will be invoked each time a log file is closed and this program can move unneeded log files to another location for permanent storage (thus the risk of running out of log storage space on the server is eliminated).
When the logsecond database configuration parameter is set to -1, the logprimary and logfilsiz configuration parameters are still used to specify how many primary log files DB2 should keep in the active log path as well as with how big each file should be. If DB2 needs to read log data from a log file, but the file is not in the active log path, DB2 will invoke the userexit program provided to retrieve the log file from the archive and copy it to the active log location so that other reads of log data from the same file will be fast. DB2 manages the retrieval, copying, and removal of these log files as required.
Note
Although infinite active logging can be used to support environments with large jobs that require more log space than you would normally allocate to the primary logs, it does have its tradeoffs. Specifically, rollback operations (both at the savepoint level and at the transaction level) could be very slow due to the need to retrieve log files from the archive storage location. Likewise, crash recovery could be very slow for the same reason.
Log mirroring. With DB2 UDB Version 8.1, you have the ability to configure a database such that the DB2 Database Manager will create and update active log files in two different locations (which is sometimes referred to as dual logging). By storing active log files in one location and mirroring them in another, separate location, database activity can continue if a disk failure or human error causes log files in one location to be destroyed. (Mirroring log files may also aid in database recovery.) To enable log file mirroring, you simply assign the fully qualified name of the mirror log location to the mirrorlogpath database configuration parameter. It is important to note that if log mirroring is used, the primary log file location used must be a directory and not a raw device. And ideally, the mirror log file storage location used should be on a physical disk that is separate from the disk used to store primary log files and that does not have a large amount of I/O.