- Why Transactions?
- Terminology
- Application Structure
- Opening the Environment
- Opening the Databases
- Recoverability and Deadlock Avoidance
- Atomicity
- Repeatable Reads
- Transactional Cursors
- Nested Transactions
- Environment Infrastructure
- Deadlock Detection
- Performing Checkpoints
- Database and Log File Archival Procedures
- Log File Removal
- Recovery Procedures
- Recovery and Filesystem Operations
- Berkeley DB Recoverability
- Transaction Throughput
Recoverability and Deadlock Avoidance
The first reason listed for using transactions was recoverability. Any logical change to a database may require multiple changes to underlying data structures. For example, modifying a record in a Btree may require leaf and internal pages to split, so a single DB->put method call can potentially require that multiple physical database pages be written. If only some of those pages are written and then the system or application fails, the database is left inconsistent and cannot be used until it has been recovered; that is, until the partially completed changes have been undone.
Write-ahead-logging is the term that describes the underlying implementation that Berkeley DB uses to ensure recoverability. What it means is that before any change is made to a database, information about the change is written to a database log. During recovery, the log is read, and databases are checked to ensure that changes described in the log for committed transactions appear in the database. Changes that appear in the database but are related to aborted or unfinished transactions in the log are undone from the database.
For recoverability after application or system failure, operations that modify the database must be protected by transactions. More specifically, operations are not recoverable unless a transaction is begun and each operation is associated with the transaction via the Berkeley DB interfaces, and then the transaction successfully committed. This is true even if logging is turned on in the database environment.
Here is an example function that updates a record in a database in a transactionally protected manner. The function takes a key and data items as arguments and then attempts to store them into the database.
int main(int argc, char *argv) { extern char *optarg; extern int optind; DB *db_cats, *db_color, *db_fruit; DB_ENV *dbenv; pthread_t ptid; int ch;
while ((ch = getopt(argc, argv, "")) != EOF) switch (ch) { case '?': default: usage(); } argc -= optind; argv += optind;
env_dir_create(); env_open(&dbenv);
/* Open database: Key is fruit class; Data is specific type. */ db_open(dbenv, &db_fruit, "fruit", 0);
/* Open database: Key is a color; Data is an integer. */ db_open(dbenv, &db_color, "color", 0);
/* * Open database: * Key is a name; Data is: company name, address, cat breeds. */ db_open(dbenv, &db_cats, "cats", 1);
add_fruit(dbenv, db_fruit, "apple", "yellow delicious");
return (0); }
void add_fruit(DB_ENV *dbenv, DB *db, char *fruit, char *name) { DBT key, data; DB_TXN *tid; int ret;
/* Initialization. */ memset(&key, 0, sizeof(key)); memset(&data, 0, sizeof(data)); key.data = fruit; key.size = strlen(fruit); data.data = name; data.size = strlen(name);
for (;;) { /* Begin the transaction. */ if ((ret = txn_begin(dbenv, NULL, &tid, 0)) != 0) { dbenv->err(dbenv, ret, "txn_begin"); exit (1); }
/* Store the value. */ switch (ret = db->put(db, tid, &key, &data, 0)) { case 0: /* Success: commit the change. */ if ((ret = txn_commit(tid, 0)) != 0) { dbenv->err(dbenv, ret, "txn_commit"); exit (1); } return; case DB_LOCK_DEADLOCK: /* Deadlock: retry the operation. */ if ((ret = txn_abort(tid)) != 0) { dbenv-> err(dbenv, ret, "txn_abort"); exit (1); } break; default: /* Error: run recovery. */ dbenv->err(dbenv, ret, "dbc->put: %s/%s", fruit, name); exit (1); } } }
The second reason listed for using transactions was deadlock avoidance. Each database operation (that is, any call to a function underlying the handles returned by DB->open and DB->cursor) is normally performed on behalf of a unique locker. If multiple calls on behalf of the same locker are desired within a single thread of control, transactions must be used. For example, consider the case in which a cursor scan locates a record and then accesses some other item in the database, based on that record. If these operations are done using the default lockers for the handle, they may conflict. If the application wishes to guarantee that the operations do not conflict, locks must be obtained on behalf of a transaction, instead of the default locker ID; and a transaction must be specified to subsequent DB->cursor and other Berkeley DB calls.
There is a new error return in this function that you may not have seen before. In transactional (not Concurrent Data Store) applications supporting both readers and writers, or just multiple writers, Berkeley DB functions have an additional possible error return: DB_LOCK_DEADLOCK. This return means that our thread of control was deadlocked with another thread of control, and our thread was selected to discard all its Berkeley DB resources in order to resolve the problem. In the sample code, any time the DB->put function returns DB_LOCK_DEADLOCK, the transaction is aborted (by calling txn_abort, which releases the transaction's Berkeley DB resources and undoes any partial changes to the databases) and then the transaction is retried from the beginning.
There is no requirement that the transaction be attempted again, but that is a common course of action for applications. Applications may want to set an upper boundary on the number of times an operation will be retried because some operations on some data sets may simply be unable to succeed. For example, updating all the pages on a large Web site during prime business hours may simply be impossible because of the high access rate to the database.