- Overview of Latch and Mutex Architecture
- Measuring Latch and Mutex Contention
- Latch and Mutex Scenarios
- Intractable Latch Contention
- Summary
Measuring Latch and Mutex Contention
As with most contention scenarios, the wait interface and time model provide the best way to determine the extent of any contention that might exist. Time spent in latch or mutex sleeps is recorded in V$SYSTEM_EVENT and similar tables and usually is the primary indication that a problem exists.
However, be aware that the wait interface records only latch sleeps; latch misses do not result in a wait being recorded, even though they might consume CPU (if the session spins on the latch). Therefore, latch misses should be considered to be a lesser but still important aspect of latch contention.
Prior to Oracle Database 10g, a single latch free wait event was recorded for all latch sleeps. From Oracle 10g onward, certain latches now have their own event—such as latch: cache buffers chains. Not all latches have their own event, though, and those that do not continue to be included in the latch free wait.
Mutex waits are represented by waits such as library cache: mutex X, which represents a wait on an exclusive library cache mutex.
To break out mutex and latch waits and compare them to other high-level wait categories, we could issue a query such as that shown in Listing 16.1.
Listing 16.1 Latch Wait Times
SQL> WITH system_event AS 2 (SELECT CASE WHEN (event LIKE '%latch%' or event 3 LIKE '%mutex%' or event like 'cursor:%') 4 THEN event ELSE wait_class 5 END wait_type, e.* 6 FROM v$system_event e) 7 SELECT wait_type,SUM(total_waits) total_waits, 8 round(SUM(time_waited_micro)/1000000,2) time_waited_seconds, 9 ROUND( SUM(time_waited_micro) 10 * 100 11 / SUM(SUM(time_waited_micro)) OVER (), 2) pct 12 FROM (SELECT wait_type, event, total_waits, time_waited_micro 13 FROM system_event e 14 UNION 15 SELECT 'CPU', stat_name, NULL, VALUE 16 FROM v$sys_time_model 17 WHERE stat_name IN ('background cpu time', 'DB CPU')) l 18 WHERE wait_type <> 'Idle' 19 GROUP BY wait_type 20 ORDER BY 4 DESC 21 / WAIT_TYPE TOTAL_WAITS TIME_WAITED_SECONDS PCT --------------------------------- ------------ ------------------- ------ CPU 1,494.63 69.26 latch: shared pool 1,066,478 426.20 19.75 latch free 93,672 115.66 5.36 wait list latch free 336 58.91 2.73 User I/O 9,380 27.28 1.26 latch: cache buffers chains 2,058 8.74 .40 Other 50 7.26 .34 System I/O 6,166 6.37 .30 cursor: pin S 235 3.05 .14 Concurrency 60 3.11 .14 library cache: mutex X 257,469 2.52 .12
Of course, this query reports all waits since the database first started. To get a view over a specific period of time, you would need to run the query twice and compare totals. We can also observe the ongoing state of these statistics in the Oracle Cloud Control, in third-party tools such as Toad, or by using Automatic Workload Repository (AWR) or Statspack reports.
Identifying Individual Latches
If we’re lucky, the latch that is responsible for whatever latch contention exists will be identified by its specific wait event—latch: cache buffers chains, for instance. However, this won’t always be the case; some latches are included in the general-purpose latch free event, and some might be recorded against the event wait list latch free.
The wait list latch free event relates to Oracle’s latch wait posting algorithm. Oracle implements a latch wait list that allows sessions sleeping on a latch to be woken when the latch becomes available. When a session sleeps on a latch, it normally places itself on the latch wait list and is woken by the session that releases the latch. If there’s heavy contention on the wait list, then the wait list latch free event may occur.
If the specific latch waits are being obscured by these general-purpose latch free events, then you may need to examine V$LATCH, which includes latch statistics for each specific latch. The V$LATCH view records the number of gets, misses, sleeps, and wait times for each latch. The query in Listing 16.2 interrogates this view to identify the latches with the most sleeps and wait times.
Listing 16.2 Latch Miss Statistics
SQL> WITH latch AS ( 2 SELECT name, 3 ROUND(gets * 100 / SUM(gets) OVER (), 2) pct_of_gets, 4 ROUND(misses * 100 / SUM(misses) OVER (), 2) pct_of_misses, 5 ROUND(sleeps * 100 / SUM(sleeps) OVER (), 2) pct_of_sleeps, 6 ROUND(wait_time * 100 / SUM(wait_time) OVER (), 2) 7 pct_of_wait_time 8 FROM v$latch) 9 SELECT * 10 FROM latch 11 WHERE pct_of_wait_time > .1 OR pct_of_sleeps > .1 12 ORDER BY pct_of_wait_time DESC; Pct of Pct of Pct of Pct of NAME Gets Misses Sleeps Wait Time ------------------------------ ------ ------ ------ --------- cache buffers chains 99.59 99.91 70.59 89.75 shared pool .07 .03 16.69 7.78 session allocation .18 .05 11.39 1.88 row cache objects .07 .00 .78 .24 simulator lru latch .01 .00 .31 .18 parameter table management .00 .00 .08 .14 channel operations parent latc .00 .00 .16 .02
Drilling into Segments and SQLs
Determining the latches associated with contention is usually not enough to identify the root cause. We most likely need to identify the SQLs and segments involved.
If you have an Oracle diagnostic pack license, then you can query the Active Session History (ASH) and/or AWR tables to identify the SQLs and segments associated with particular wait conditions. The query in Listing 16.3 identifies entries in the ASH table associated with latch contention.
Listing 16.3 Finding Latch Contention with ASH
SQL> l 1 WITH ash_query AS ( 2 SELECT event, program, 3 h.module, h.action, object_name, 4 SUM(time_waited)/1000 reltime, COUNT( * ) waits, 5 username, sql_text, 6 RANK() OVER (ORDER BY COUNT(*) DESC) AS wait_rank 7 FROM v$active_session_history h 8 JOIN dba_users u USING (user_id) 9 LEFT OUTER JOIN dba_objects o 10 ON (o.object_id = h.current_obj#) 11 LEFT OUTER JOIN v$sql s USING (sql_id) 12 WHERE (event LIKE '%latch%' or event like '%mutex%') 13 GROUP BY event,program, h.module, h.action, 14 object_name, sql_text, username) 15 SELECT event,module, username, object_name, waits, 16 sql_text 17 FROM ash_query 18 WHERE wait_rank < 11 19* ORDER BY wait_rank SQL> / EVENT MODULE USERNAME OBJECT_NAME WAITS ------------------------- ------------ -------- ------------ ---------- SQL_TEXT ------------------------------------------------------------ library cache: mutex X SQL*Plus OPSG 13 latch: shared pool SQL*Plus OPSG 8 latch: shared pool SQL*Plus OPSG LT_SALES_PK 3 begin latch_test(10000,10000,1000000,10000); end; library cache: mutex X SQL*Plus OPSG LT_SALES_PK 2 library cache: mutex X SQL*Plus OPSG LT_SALES 1 begin latch_test(10000,1,10000,10000); end; latch: shared pool SQL*Plus OPSG 1 SELECT quantity_sold , amount_sold FROM lt_sales t539564 WH ERE id BETWEEN 124410 AND 360759 latch: shared pool SQL*Plus OPSG 1 SELECT quantity_sold , amount_sold FROM lt_sales t539571 WH ERE id BETWEEN 512313 AND 825315 library cache: mutex X SQL*Plus OPSG 1 SELECT quantity_sold , amount_sold FROM lt_sales t539563 WH ERE id BETWEEN 698302 AND 392634 latch: shared pool SQL*Plus OPSG LT_SALES_PK 1 SELECT quantity_sold , amount_sold FROM lt_sales t539555 WH ERE id BETWEEN 387009 AND 268338
If you don’t have a Diagnostic Pack license, then you can indirectly identify the SQLs by focusing on those SQLs with the highest concurrency wait times. The concurrency wait class includes most commonly encountered latch and mutex waits, although it also includes some internal locks and buffer waits. However, if you’re encountering high rates of latch contention, it’s a fair bet that the SQLs with the highest concurrency waits are the ones you want to look at.
Listing 16.4 pulls out the SQLs with the highest concurrency waits.
Listing 16.4 Identifying High Contention SQL
SQL> WITH sql_conc_waits AS 2 (SELECT sql_id, SUBSTR(sql_text, 1, 80) sql_text, 3 concurrency_wait_time/1000 con_time_ms, 4 elapsed_time, 5 ROUND(concurrency_wait_Time * 100 / 6 elapsed_time, 2) con_time_pct, 7 ROUND(concurrency_wait_Time* 100 / 8 SUM(concurrency_wait_Time) OVER (), 2) pct_of_con_time, 9 RANK() OVER (ORDER BY concurrency_wait_Time DESC) ranking 10 FROM v$sql 11 WHERE elapsed_time > 0) 12 SELECT sql_text, con_time_ms, con_time_pct, 13 pct_of_con_time 14 FROM sql_conc_waits 15 WHERE ranking <= 10 16 ORDER BY ranking ; SQL Conc % Tot SQL Text Conc Time(ms) Time% ConcTime ---------------------------------------- ------------- -------- -------- DECLARE job BINARY_INTEGER := :job; next 899 18.41 44.21 _date DATE := :mydate; broken BOOLEAN : select max(data) from log_data where id< 472 .01 23.18 :id begin query_loops ( run_seconds=>120 , 464 .01 22.80 hi_val =>1000 , use_ update sys.aud$ set action#=:2, returnco 143 75.46 7.02 de=:3, logoff$time=cast(SYS_EXTRACT_UTC (
As expected the SQL that generated the latch waits is found (the second and third entries are from a job that generated the latch waits). However, other SQLs—associated with waits for certain internal Oracle locks—are also shown. You’ll need to exercise judgment to determine which SQLs are most likely associated with your latch waits.