- Using Indexing
- The MySQL Query Optimizer
- Data Type Choices and Query Efficiency
- Loading Data Efficiently
- Scheduling and Locking Issues
- Optimization for Administrators
Scheduling and Locking Issues
The previous sections focus primarily on making individual queries faster. MySQL also allows you to affect the scheduling priorities of statements, which may allow queries arriving from several clients to cooperate better so that individual clients aren't locked out for a long time. Changing the priorities can also ensure that particular kinds of queries are processed more quickly. This section looks at MySQL's default scheduling policy and the options that are available to you for influencing this policy. It also describes the use of concurrent inserts and the effect that storage engine locking levels have on concurrency among clients. For the purposes of this discussion, a client performing a retrieval (a SELECT) is a reader. A client performing an operation that modifies a table (DELETE, INSERT, REPLACE, or UPDATE) is a writer.
MySQL's default scheduling policy can be summarized like this:
Writes have higher priority than reads.
Writes to a table must occur one at a time, and write requests are processed in the order in which they arrive.
Multiple reads from a table can be processed simultaneously.
The MyISAM and MEMORY storage engines implement this scheduling policy with the aid of table locks. Whenever a client accesses a table, a lock for it must be acquired first. When the client is finished with a table, the lock on it can be released. It's possible to acquire and release locks explicitly by issuing LOCK TABLES and UNLOCK TABLES statements, but normally the server's lock manager automatically acquires locks as necessary and releases them when they no longer are needed. The type of lock required depends on whether a client is writing or reading.
A client performing a write to a table must have a lock for exclusive table access. The table is in an inconsistent state while the operation is in progress because the data record is being deleted, added, or changed, and any indexes on the table may need to be updated to match. Allowing other clients to access the table while the table is in flux would cause problems. It's clearly a bad thing to allow two clients to write to the table at the same time because that would quickly corrupt the table into an unusable mess. But it's not good to allow a client to read from an in-flux table, either, because the table might be changing at the location being read, and the results would be inaccurate.
A client performing a read from a table must have a lock to prevent other clients from writing to the table and changing it during the read. The lock need not be for exclusive access, however. Reading doesn't change the table, so there is no reason one reader should prevent another from accessing the table. Therefore, a read lock allows other clients to read the table at the same time.
MySQL provides several statement modifiers that allow you to influence its scheduling policy:
The LOW_PRIORITY keyword applies to DELETE, INSERT, LOAD DATA, REPLACE, and UPDATE statements.
The HIGH_PRIORITY keyword applies to SELECT and INSERT statements.
The DELAYED keyword applies to INSERT and REPLACE statements.
The LOW_PRIORITY and HIGH_PRIORITY modifiers have an effect for storage engines such as MyISAM and MEMORY that use table locks. The DELAYED modifier works for MyISAM and MEMORY tables.
Changing Statement Scheduling Priorities
The LOW_PRIORITY keyword affects execution scheduling for DELETE, INSERT, LOAD DATA, REPLACE, and UPDATE statements. Normally, if a write operation for a table arrives while the table is being read, the writer blocks until the reader is done. (Once a query has begun it will not be interrupted, so the reader is allowed to finish.) If another read request arrives while the writer is waiting, the reader blocks, too, because the default scheduling policy is that writers have higher priority than readers. When the first reader finishes, the writer proceeds, and when the writer finishes, the second reader proceeds.
If the write request is a LOW_PRIORITY request, the write is not considered to have a higher priority than reads. In this case, if a second read request arrives while the writer is waiting, the second reader is allowed to slip in ahead of the writer. Only when there are no more readers is the writer allowed to proceed. One implication of this scheduling modification is that, theoretically, it's possible for LOW_PRIORITY writes to be blocked forever. If additional read requests keep arriving while previous ones are still in progress, the new requests are allowed to get in ahead of the LOW_PRIORITY write.
The HIGH_PRIORITY keyword for SELECT queries is similar. It allows a SELECT to slip in ahead of a waiting write, even if the write normally has higher priority. Another effect is that a high-priority SELECT will execute ahead of normal SELECT statements, because those will block for the write.
If you want all statements that support the LOW_PRIORITY option to be treated as having low priority by default, start the server with the --low-priority-updates option. The effect of this option can be canceled for individual INSERT statements by using INSERT HIGH_PRIORITY to elevate them to the normal write priority.
Using Delayed Inserts
The DELAYED modifier applies to INSERT and REPLACE statements. When a DELAYED insert request arrives for a table, the server puts the rows in a queue and returns a status to the client immediately so that the client can proceed even before the rows have been inserted. If readers are reading from the table, the rows in the queue are held until there are no readers. Then the server begins inserting the rows in the delayed-row queue. Every now and then, the server checks whether any new read requests have arrived and are waiting. If so, the delayed-row queue is suspended and the readers are allowed to proceed. When there are no readers left, the server begins inserting delayed rows again. This process continues until the queue is empty.
LOW_PRIORITY and DELAYED are similar in the sense that both allow row insertion to be deferred, but they are quite different in how they affect client operation. LOW_ PRIORITY forces the client to wait until the rows can be inserted. DELAYED allows the client to continue and the server buffers the rows in memory until it has time to process them.
INSERT DELAYED is useful if other clients may be running lengthy SELECT statements and you don't want to block waiting for completion of the insertion. The client issuing the INSERT DELAYED can proceed more quickly because the server simply queues the row to be inserted.
You should be aware of certain other differences between normal INSERT and INSERT DELAYED behavior, however. The client gets back an error if the INSERT DELAYED statement contains a syntax error, but other information that would normally be available is not. For example, you can't rely on getting the AUTO_INCREMENT value when the statement returns. Also, you won't get a count for the number of duplicates on unique indexes. This happens because the insert operation returns a status before the operation actually has been completed. Another implication is that because rows from INSERT DELAYED statements are queued in memory, the rows are lost if the server crashes or is killed with kill -9. (This doesn't happen a normal kill -TERM kill; in that case, the server inserts the rows before exiting.)
Using Concurrent Inserts
The MyISAM storage engine allows an exception to the general principle that readers block writers. This occurs under the condition that a MyISAM table has no holes in the middle such as can result from deleting or updating rows. When the table has no holes, any INSERT statements must necessarily add rows at the end rather than in the middle. Under such circumstances, MySQL allows clients to add rows to the table even while other clients are reading from it. These are known as "concurrent inserts" because they take place at the same time as retrievals without being blocked.
If you want to use concurrent inserts, note the following:
Do not use the LOW_PRIORITY modifier with your INSERT statements. It causes INSERT always to block for readers and thus prevents concurrent inserts from being performed.
Readers that need to lock the table explicitly but still want to allow concurrent inserts should use LOCK TABLES ... READ LOCAL rather than LOCK TABLES ... READ. The LOCAL keyword acquires a lock that allows concurrent inserts to proceed, because it applies only to existing rows in the table and does not block new rows from being added to the end.
LOAD DATA operations should use the CONCURRENT modifier to allow SELECT statements for the table to take place at the same time.
A MyISAM table that has holes in the middle cannot be used for concurrent inserts. However, you can defragment the table with the OPTIMIZE TABLE statement.
Locking Levels and Concurrency
The scheduling modifiers discussed in the preceding sections allow you to influence the default scheduling policy. For the most part, these modifiers were introduced to deal with issues that arise from the use of table-level locks, which is what the MyISAM and MEMORY storage engines use to manage table contention.
The BDB and InnoDB storage engines implement locking at different levels and thus have differing performance characteristics in terms of contention management. The BDB engine uses page-level locks. The InnoDB engine uses row-level locks, but only as necessary. (In many cases, such as when only reads are done, InnoDB may use no locks at all.)
The locking level used by a storage engine has a significant effect on concurrency among clients. Suppose that two clients each want to update a row in a given table. To perform the update, each client requires a write lock. For a MyISAM table, the engine will acquire a table lock for the first client, which causes the second client to block until the first one has finished. With a BDB table, greater concurrency can be achieved: Both updates can proceed simultaneously unless both rows are located within the same page. With an InnoDB table, concurrency is even higher; both updates can happen at the same time as long as both clients aren't updating the same row.
The general principle is that table locking at a finer level allows better concurrency, because more clients can be using a table at the same time if they use different parts of it. The practical implication is that different storage engines will be better suited for different statement mixes:
MyISAM is extremely fast for retrievals. However, the use of table-level locks can be a problem in environments with mixed retrievals and updates, especially if the retrievals tend to be long-running. Under these conditions, updates may need to wait a long time before they can proceed.
BDB and InnoDB tables can provide better performance when there are many updates. Because locking is done at the page or row level rather than at the table level, the extent of the table that is locked is smaller. This reduces lock contention and improves concurrency.
Table locking does have an advantage over finer levels of locking in terms of deadlock prevention. With table locks, deadlock never occurs. The server can determine which tables are needed by looking at the statement and locking them all ahead of time. With InnoDB and BDB tables, deadlock can occur because these storage engines do not acquire all necessary locks at the beginning of a transaction. Instead, locks are acquired as they are determined to be necessary during the course of processing the transaction. It's possible that two statements will acquire locks and then try to acquire further locks that each depend on already-held locks being released. As a result, each client holds a lock that the other needs before it can continue. This results in deadlock, and the server must abort one of the transactions.