Monitoring Blocking
The preceding sections have focused first on detecting that blocking is occurring and second on identifying the cause and possible resolutions. They have been geared more toward interactive investigations. However, it is generally not possible to dedicate a database operator to continuous active monitoring of a system. It would also not be efficient. This calls for a way to monitor blocking where alerts can be raised when blocking is encountered or the proper information gets collected automatically. To achieve this, many of the scripts from earlier sections, and some additional ones, can be rolled into a collection that can be run via SQLDiag to monitor blocking and collect the appropriate data. I have included here several script snippets and explanations as to why I would include them in a monitoring script. These can be used to build monitoring stored procedures such as sp_blocker_pssNNN used by SQLDiag and also available from the Microsoft website. As mentioned previously, SQLDiag is now included as part of SQL Server. This tool can collect many of the data points included here out of the box and can be extended to include custom scripts. The level of monitoring can be customized based on specific needs and the availability of CPU cycles to execute the scripts. This is an important concern because some of these script snippets can be somewhat expensive to run or might produce a lot of output that must then be analyzed.
Wait Statistics
It’s always useful to collect wait statistics. These are low-impact queries. Filtering out the innocuous wait types and any zero statistics greatly reduces the output and makes it easier to review. Here’s a sample query:
select * from amalgam.dm_os_wait_stats_filtered select * from sys.dm_os_latch_stats where waiting_requests_count <> 0
Current Wait Information
Various queries can be run against sys.dm_os_waiting_tasks to collect information on current waiters. The cheapest option is to just include the entire contents of sys.dm_os_waiting_tasks, or amalgam.dm_os_waiting_tasks_filtered, like this:
select * from sys.dm_os_waiting_tasks select * from amalgam.dm_os_waiting_tasks_filtered select * from amalgam.dm_os_waiting_tasks_filtered2
A slightly enhanced version includes the current statements and plans for the waiting tasks:
select amalgam.current_statement ( st.dbid, st.objectid, st.encrypted, st.text, er.statement_start_offset, er.statement_end_offset) as current_statement, qp.query_plan, wt.* from amalgam.dm_os_waiting_tasks_filtered wt left join sys.dm_exec_requests er on wt.waiting_task_address = er.task_address outer apply sys.dm_exec_sql_text (er.sql_handle) st outer apply sys.dm_exec_query_plan (er.plan_handle) qp
And a further enhancement includes the blocking task’s current statement and plan. Remember: Locks might have been acquired by a statement other than the current statement:
select amalgam.current_statement ( st.dbid, st.objectid, st.encrypted, st.text, er.statement_start_offset, er.statement_end_offset) as waiters_current_statement, qp.query_plan, amalgam.current_statement ( stb.dbid, stb.objectid, stb.encrypted, stb.text, erb.statement_start_offset, erb.statement_end_offset) as blockers_current_statement, qp.query_plan, wt.* from amalgam.dm_os_waiting_tasks_filtered wt left join sys.dm_exec_requests er on wt.waiting_task_address = er.task_address outer apply sys.dm_exec_sql_text (er.sql_handle) st outer apply sys.dm_exec_query_plan (er.plan_handle) qp left join sys.dm_exec_requests erb on wt.blocking_task_address = erb.task_address outer apply sys.dm_exec_sql_text (erb.sql_handle) stb outer apply sys.dm_exec_query_plan (erb.plan_handle) qpb
These, however, require manual analysis of potentially verbose output. This can be eased by analyzing some of the information at the time of collection.
Often, it is useful to find the hottest resources or wait types. The following queries find all resources and wait types with at least five waiters:
select resource_description, additional_resource_description, count(*) from amalgam.dm_os_waiting_tasks_filtered2 where resource_description is not null group by resource_description, additional_resource_description having count (*) > 5 select wait_type, count(*) from amalgam.dm_os_waiting_tasks_filtered group by wait_type having count (*) > 5
Long waiters are generally of more concern than short-duration waiters, so it might be useful to call them out; let’s see all waiters that have been waiting more than 10 seconds:
select * from amalgam.dm_os_waiting_tasks_filtered where wait_duration_ms > 10000 order by wait_duration_ms desc
The blocking chain also has some interesting information available (for example, head blockers that are blocking a large number of other tasks, and the chains themselves):
select head_blocker_task_address, head_blocker_session_id, count(*) from amalgam.blocking_chain group by head_blocker_task_address, head_blocker_session_id having count(*) > 10 order by count(*) desc option (maxrecursion 128) select * from amalgam.blocking_chain option (maxrecursion 128)
The index operational statistics are also useful to have when looking for tables with high latch or lock waits:
select top 20 * from sys.dm_db_index_operational_stats ( null, null, null, null) order by page_latch_wait_count + page_io_latch_wait_count desc select top 20 * from sys.dm_exec_query_stats order by (total_physical_reads + total_logical_reads + total_logical_writes) / execution_count desc
These are a sampling of queries that can prove useful in monitoring and then investigating blocking. Again, much of this can be easily collected using SQL Server’s SQLDiag tool. Obviously, the more information that is available, the easier it is to investigate, but the costlier it is to monitor. The balance depends largely on the extra load that the system can handle without adversely affecting throughput and response times of actual application work.