Identifying Blocking
Identifying Blocking Using sys.dm_os_waiting_tasks
In SQL Server 2005, the sys.dm_os_waiting_tasks DMV is the fundamental repository for blocking information within the server. Before discussing how it can be used to detect the existence of blocking, we cover some of its more important columns.
sys.dm_os_waiting_tasks contains three groups of columns: those that identify the waiter, those that identify the blocker (if applicable), and those that provide information about the wait.
The columns that identify the waiting task are the waiting_task_address, session_id, and exec_context_id columns. The waiting_task_address contains the internal memory address of the object that represents the task. This uniquely identifies a task within SQL Server. The session_id represents the user session with which the task is currently associated. The association of a task with a session lasts only for the duration of the task, generally a batch. The blocking task is identified by the blocking_task_address, blocking_session_id, and blocking_exec_context_id. These have the same semantics as the columns identifying the waiting task.
The relationship between the task address and session id columns should be noted. Only sessions that are currently executing a statement within SQL Server have an associated task. However, a task need not be associated with a session ID: this occurs with various system tasks. Because all waiting in SQL Server is done by tasks, it is not possible for a session that is not currently associated with a task to be waiting or blocked. Therefore, the waiting_task_address is never null, but the session_id and exec_context_id columns may be null if the task is not associated with a session. On the flip side, a task can be blocked by either an active task or an inactive session. This occurs if the resource involved in the wait can be held by a session across tasks (that is, across batches). In SQL Server 2005, the only type of resource that can cause blocking and that is held across batches are locks. Note that although some other resources such as memory associated with a session are held across batches, these do not cause direct blocking but are rather represented through proxy waits such as the LOWFAIL_MEMMGR_QUEUE wait type. Locks are held by transactions, and because it is possible for a transaction to contain multiple batches, it is thus possible for locks to be held by a session that has a transaction active while performing operations on the client side and thus would not have an associated task within the server.
The identity of the blocking task is not always known. In these cases, the columns identifying the blocking task are null. This is rather common because blocker information is not available for most wait types—either because it does not make sense for the particular wait type or because the information is not tracked (due to, for example, performance concerns).
As its name implies, the wait_type column contains the current wait type for the waiting task. Similarly, the wait_duration_ms column contains the duration of the current wait. The resource_address column provides the memory address of the resource on which the task is waiting. This is mainly useful as an identifier to differentiate blocking on different instances of a resource when the wait type names are the same. The resource_description column can also provide differentiating information in such cases, but it is only populated with useful information by a handful of wait types—all lock and latch wait types, the CXPACKET and THREADPOOL wait types.
The simplest method for detecting blocking using sys.dm_os_waiting_tasks is simply to run the T-SQL query select * from sys.dm_os_waiting_tasks and treat any row in the output as signifying blocking. This, however, is not very useful, because there will almost always be at least a handful of task wait states from the categories that can generally be excluded from blocking investigations. For example, the deadlock monitor thread usually is listed as waiting with a wait type of REQUEST_FOR_DEADLOCK_SEARCH, which merely indicates that it is pausing between deadlock detection events and is not blocked by anything. The one time this category of waits should not be ignored is if the tasks in these waits are blocking others, which is almost never the case except with the WAITFOR wait type.
Although it is possible to memorize the set of excludable wait types and mentally exclude them from the result set, this is generally not efficient. A view built on top of sys.dm_os_waiting_tasks can help.
The amalgam.innocuous_wait_types view available on the accompanying CD produces a list of innocuous wait types that usually can be ignored. The amalgam.dm_os_waiting_tasks_filtered view uses the helper view to filter out these innocuous waits:
create view amalgam.dm_os_waiting_tasks_filtered as select * from sys.dm_os_waiting_tasks where wait_type not in ( select * from amalgam.innocuous_wait_types) go
Any rows left in the filtered rowset represent tasks that are truly blocked. This set can be further analyzed and narrowed based on the severity, cause, and nature of the blocking. You learn more about this analysis in the sections dealing with identifying the cause of blocking.
Why Not Use sysprocesses or sys.dm_exec_requests?
In previous versions of SQL Server, the DMV of choice for investigating waiting and blocking was sysprocesses. Although this DMV exists in SQL Server 2005, it has been deprecated and replaced by a set of new DMVs. The new DMV that contains the waiting and blocking information from sysprocesses is sys.dm_exec_requests. Although both of these DMVs contain information that you can use to investigate blocking and waiting, there are good reasons not to use them.
The main reason you should not use these views is that they do not provide the level of detailed information that is available from sys.dm_os_waiting_tasks. Sysprocesses and sys.dm_exec_requests display session-level information. This means that they do not contain system processes that are not associated with session IDs. Their session basis also makes it harder to handle parallel queries where multiple tasks are executing under the same session ID. sys.dm_exec_requests displays just one entry for each session. Because of this, sys.dm_exec_requests is not usable when investigating blocking involving parallel queries, because it displays only the blocked/waiting status of the parent task, not the child tasks.
The session-level focus also affects the display of blocker information. Both sysprocesses and sys.dm_os_waiting_tasks display only the session ID of the blocking task rather than the specific task in case of parallel queries.
Finally, the resource descriptions for sysprocesses and sys.dm_exec_requests are not as complete as those in sys.dm_os_waiting_tasks, which provides the most complete set of resource descriptions available. Further, the resource_address column in sys.dm_os_waiting_tasks can be used to differentiate between multiple resources that do not have resource descriptions. With the other two DMVs, these multiple resources could not be distinguished from each other.
Although sys.dm_exec_requests should not be used as the primary source of blocking information, it does contain information that is useful when you’re investigating certain types of blocking, so it should not be ignored.
Statistically Identifying Blocking
Sys.dm_os_waiting_task, although a useful resource, is not the only resource for detecting blocking. Statistical information on waiting and blocking is available from several sources. This information can at times prove more useful than current wait and blocking data from sys.dm_os_waiting_tasks because the statistical information can differentiate between occasional and frequent, and short and long waits. Frequent long blocking is generally of much more concern than occasional short blocking.
sys.dm_os_wait_stats
In addition to providing a list of wait types, as mentioned in the first section, sys.dm_os_wait_stats provides statistics for each wait type. The information provided includes the number of times a wait with a given wait type has occurred, the total duration of those waits, and the maximum wait time for a single wait.
A single query over this DMV can be used to identify wait types with long average waits. Keep in mind that no significance should be assigned to the absolute wait counts returned by a single snapshot of the view, because these counts are cumulative since the last reset, which might have occurred some time ago. The wait counts are interesting when measured over a known time span. The deltas between two snapshots indicate the severity of blocking during that interval. These snapshots can be generated manually by running a query over the DMV twice and calculating the difference, or, more conveniently, using the SQLDiag tool that ships with SQL Server 2005.
-- Create temporary tables to store the initial -- and final snapshots -- create table #StatsInitial ( wait_type sysname, waiting_tasks_count bigint, wait_time_ms bigint, signal_wait_time_ms bigint); create table #StatsFinal ( wait_type sysname, waiting_tasks_count bigint, wait_time_ms bigint, signal_wait_time_ms bigint); -- Create indexes for join performance -- create index idxInitialWaitType on #StatsInitial (wait_type); create index idxFinalWaitType on #StatsFinal (wait_type); -- Create an initial snapshot -- insert into #StatsInitial select wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms from amalgam.dm_os_wait_stats_filtered; -- Wait for a ten second delay -- This delay can be adjusted to suit your needs -- and preferences -- waitfor delay '00:00:10'; -- Create the final snapshot -- insert into #StatsFinal select wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms from amalgam.dm_os_wait_stats_filtered; -- Report any wait types that had waits during -- the wait delay, and the number and duration -- of the waits -- select f.wait_type, f.waiting_tasks_count - i.waiting_tasks_count as wait_tasks_count_delta, f.wait_time_ms - i.wait_time_ms as wait_time_ms_delta, f.signal_wait_time_ms - i.signal_wait_time_ms as signal_wait_time_delta from #StatsFinal f join #StatsInitial i on f.wait_type = i.wait_type where f.waiting_tasks_count - i.waiting_tasks_count > 0 order by f.waiting_tasks_count - i.waiting_tasks_count desc; -- Finally drop the tables -- drop table #StatsInitial; drop table #StatsFinal; go
Note that, as written, the preceding queries exclude the innocuous wait types by referencing the amalgam.dm_os_wait_stats_filtered view.
Note that although it is possible to simplify this process by clearing the stats and then running the query, this approach has several drawbacks, including the loss of historical statistics, and the negative effect it can have on other tools that are also using the DMV for wait statistics, including some ISV monitoring tools:
dbcc sqlperf ('sys.dm_os_wait_stats', CLEAR) waitfor delay '00:00:10'; select * from amalgam.dm_os_wait_stats_filtered;
When determining the severity of blocking based on these statistics, it is important to keep in mind that these are aggregated statistics for all waits of each type. For many waits, this might not point to a single resource. Prime examples of this are latch and lock waits. The page latch count waits cover all pages in all databases, files, and objects. The same is true for lock waits.
Similarly, comparing overall wait counts is not as important as the rate of waits (waits per second) or the average duration of the waits. A few long waits generally indicate a localized issue, whereas many short or medium-length waits are more indicative of hot spotting or throughput bottlenecks.
Performance Counters
You can also use performance monitor counters to detect blocking. Some of the information provided by the performance counters mirrors the information available from sys.dm_os_wait_stats, but other counters provide information data that is available only from performance monitor counters.
Although the easiest way of monitoring SQL Server performance counters is via the SQLDiag tool, SQL Server’s counters are also available in rowset form through the sys.dm_os_performance_counters DMV. The data in this DMV is the raw data reflected in the performance counters, so some of it requires some processing in order to be useful.
The Wait Statistics group accumulates counters for various types of waits. The Process blocked counter in the General Statistics group is a useful one that you can use to track the number of blocked processes without needing to query sys.dm_os_waiting_tasks.
Notification-Based Detection
SQL Server 2005 includes a built-in proactive method of blocking notification. This is the Blocked Process Report event, which can be captured in a trace and that can be used to trigger code to deal with the blocking or notify operators. This event is triggered when a wait exceeds the specified threshold. The usefulness of this event for monitoring general blocking is somewhat limited by the fact that it only detects blocking on resources that support deadlock detection. Resources that support deadlock detection are listed in Table 1-1 along with the corresponding wait types.
Table 1-1. Resources that support deadlock detection
Deadlock-Detectable Resource |
Corresponding Wait Types |
Locks |
LCK_M_* |
Worker threads |
Any deadlock detectable resource wait types, but generally lock waits |
Memory |
RESOURCE_SEMAPHORE |
Parallel query execution |
CXPACKET, EXCHANGE, EXECSYNC Multiple active result set (MARS) resources TRANSACTION_MUTEX, MSQL_DQ, MSQL_XP, ASYNC_NETWORK_IO |
CLR resources |
SQLCLR_APPDOMAIN, CLR_MONITOR, CLR_RWLOCK_READER, CLR_RWLOCK_WRITER |
The blocking threshold trigger is set using the 'blocked process threshold' option of the sp_configure command. Note that this is an advanced option, so the 'show advanced options' option must first be enabled. When the specified blocking threshold duration is crossed, the event is produced. As with other events, you can configure this event to produce an alert or run diagnostic scripts:
sp_configure 'show advanced options', 1; go reconfigure; go sp_configure 'blocked process threshold', <threshold-in-seconds>; go reconfigure;
The event includes a blocking graph similar to that produced for deadlocks. When the event fires because of blocking on a lock resource, identifying information for the lock resource is included in the event columns.
Using the Blocked Process Threshold event is particularly useful to capture unexpectedly long waits on lock and MARS resources. The threshold value should be established after having achieved an acceptable level of blocking by tuning the application and SQL Server. Under these conditions, this event provides a lightweight blocking monitoring mechanism for these types of blocking.