## Oracle Tuning 基础概述01 - Oracle 常见等待事件

2015-05-25 23:32  AlfredZhao  阅读(...)  评论(...编辑  收藏

# 官方文档对等待事件的分类

Classes of Wait Events

Every wait event belongs to a class of wait event. The following list describes each of the wait classes.

Waits resulting from DBA commands that cause users to wait (for example, an index rebuild)

Application

Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)

Cluster

Waits related to Real Application Clusters resources (for example, global cache resources such as 'gc cr block busy')

Commit

This wait class only comprises one wait event - wait for redo log write confirmation after a commit (that is, 'log file sync')

Concurrency

Waits for internal database resources (for example, latches)

Configuration

Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size)

Idle

Waits that signify the session is inactive, waiting for work (for example, 'SQL*Net message from client')

Network

Waits related to network messaging (for example, 'SQL*Net more data to dblink')

Other

Waits which should not typically occur on a system (for example, 'wait for EMON to spawn')

Queue

Contains events that signify delays in obtaining additional data in a pipelined environment. The time spent on these wait events indicates inefficiency or other problems in the pipeline. It affects features such as Oracle Streams, parallel queries, or DBMS_PIPE PL/SQL packages.

Scheduler

Resource Manager related waits (for example, 'resmgr: become active')

System I/O

Waits for background process I/O (for example, DBWR wait for 'db file parallel write')

User I/O

Waits for user I/O (for example 'db file sequential read')

# Oracle 常见等待事件

+ [log file sync](#2.1) + [log buffer space](#2.2) + [log file switch](#2.3) + [log file parallel write](#2.4) + [buffer busy waits](#2.5) + [free buffer waits](#2.6) + [library cache pin](#2.7) + [library cache lock](#2.8) + [latch events](#2.9) + [direct path read and direct path read temp](#2.10) + [direct path write and direct path write temp](#2.11) + [db file sequential read](#2.12) + [db file scattered read](#2.13) + [read by other session](#2.14) + [cursor: pin S wait on X](#2.15) + [SQL*Net Events](#2.16)

## log buffer space

log buffer空间问题。 > This event occurs when server processes are waiting for free space in the log buffer, because all the redo is generated faster than LGWR can write it out. > > **Actions** > > Modify the redo log buffer size. If the size of the log buffer is reasonable, then ensure that the disks on which the online redo logs reside do not suffer from I/O contention. The log buffer space wait event could be indicative of either disk I/O contention on the disks where the redo logs reside, or of a too-small log buffer. Check the I/O profile of the disks containing the redo logs to investigate whether the I/O system is the bottleneck. If the I/O system is not a problem, then the redo log buffer could be too small. Increase the size of the redo log buffer until this event is no longer significant.

## log file switch

redo日志文件切换。 > There are two wait events commonly encountered: > > log file switch (archiving needed) > > log file switch (checkpoint incomplete) > > In both of the events, the LGWR cannot switch into the next online redo log file. All the commit requests wait for this event. > > **Actions** > > For the log file switch (archiving needed) event, examine why the archiver cannot archive the logs in a timely fashion. It could be due to the following: > > Archive destination is running out of free space. > > Archiver is not able to read redo logs fast enough (contention with the LGWR). > > Archiver is not able to write fast enough (contention on the archive destination, or not enough ARCH processes). If you have ruled out other possibilities (such as slow disks or a full archive destination) consider increasing the number of ARCn processes. The default is 2. > > If you have mandatory remote shipped archive logs, check whether this process is slowing down because of network delays or the write is not completing because of errors. > > Depending on the nature of bottleneck, you might need to redistribute I/O or add more space to the archive destination to alleviate the problem. For the log file switch (checkpoint incomplete) event: > > Check if DBWR is slow, possibly due to an overloaded or slow I/O system. Check the DBWR write times, check the I/O system, and distribute I/O if necessary. See Chapter 8, "I/O Configuration and Design". > > Check if there are too few, or too small redo logs. If you have a few redo logs or small redo logs (for example, 2 x 100k logs), and your system produces enough redo to cycle through all of the logs before DBWR has been able to complete the checkpoint, then increase the size or number of redo logs. See "Sizing Redo Log Files".

## log file parallel write

redo日志文件并行写。 > This event involves writing redo records to the redo log files from the log buffer.

SGA中的buffer争用。 > This wait indicates that there are some buffers in the buffer cache that multiple processes are attempting to access concurrently. Query V$WAITSTAT for the wait statistics for each class of buffer. Common buffer classes that have buffer busy waits include data block, segment header, undo header, and undo block. > > Check the following V$SESSION_WAIT parameter columns: > > P1: File ID > > P2: Block ID > > P3: Class ID > > **10.3.1.1 Causes** > > To determine the possible causes, first query V$SESSION to identify the value of ROW_WAIT_OBJ# when the session waits for buffer busy waits. For example: > > SELECT row_wait_obj# > FROM V$SESSION > WHERE EVENT = 'buffer busy waits'; > > To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example: > > SELECT owner, object_name, subobject_name, object_type > FROM DBA_OBJECTS > WHERE data_object_id = &row_wait_obj; > > **10.3.1.2 Actions** > > The action required depends on the class of block contended for and the actual segment. > > **10.3.1.2.1 segment header** > > If the contention is on the segment header, then this is most likely free list contention. > > Automatic segment-space management in locally managed tablespaces eliminates the need to specify the PCTUSED, FREELISTS, and FREELIST GROUPS parameters. If possible, switch from manual space management to automatic segment-space management (ASSM). > > The following information is relevant if you are unable to use ASSM (for example, because the tablespace uses dictionary space management). > > A free list is a list of free data blocks that usually includes blocks existing in several different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED. Specify the number of process free lists with the FREELISTS parameter. The default value of FREELISTS is one. The maximum value depends on the data block size. > > To find the current setting for free lists for that segment, run the following: > > SELECT SEGMENT_NAME, FREELISTS > FROM DBA_SEGMENTS > WHERE SEGMENT_NAME = segment name > AND SEGMENT_TYPE = segment type; > > Set free lists, or increase the number of free lists. If adding more free lists does not alleviate the problem, then use free list groups (even in single instance this can make a difference). If using Oracle RAC, then ensure that each instance has its own free list group(s). > > See Also: > Oracle Database Concepts for information about automatic segment-space management, free lists, PCTFREE, and PCTUSED > > **10.3.1.2.2 data block** > > If the contention is on tables or indexes (not the segment header): > > Check for right-hand indexes. These are indexes that are inserted into at the same point by many processes. For example, those that use sequence number generators for the key values. > > Consider using ASSM, global hash partitioned indexes, or increasing free lists to avoid multiple processes attempting to insert into the same block. > > **10.3.1.2.3 undo header** > > For contention on rollback segment header: > > If you are not using automatic undo management, then add more rollback segments. > > **10.3.1.2.4 undo block** > > For contention on rollback segment block: > > If you are not using automatic undo management, then consider making rollback segment sizes larger. > ## free buffer waits SGA中free buffer不足时触发的等待。 > This wait event indicates that a server process was unable to find a free buffer and has posted the database writer to make free buffers by writing out dirty buffers. A dirty buffer is a buffer whose contents have been modified. Dirty buffers are freed for reuse when DBWR has written the blocks to disk. > > **10.3.8.1 Causes** > > DBWR may not be keeping up with writing dirty buffers in the following situations: > > The I/O system is slow. > > There are resources it is waiting for, such as latches. > > The buffer cache is so small that DBWR spends most of its time cleaning out buffers for server processes. > > The buffer cache is so big that one DBWR process is not enough to free enough buffers in the cache to satisfy requests. > > **10.3.8.2 Actions** > > If this event occurs frequently, then examine the session waits for DBWR to see whether there is anything delaying DBWR. > > **10.3.8.2.1 Writes** > > If it is waiting for writes, then determine what is delaying the writes and fix it. Check the following: > > Examine V$FILESTAT to see where most of the writes are happening. > > Examine the host operating system statistics for the I/O system. Are the write times acceptable? > > If I/O is slow: > > Consider using faster I/O alternatives to speed up write times. > > Spread the I/O activity across large number of spindles (disks) and controllers. See Chapter 8, "I/O Configuration and Design" for information about balancing I/O. > > **10.3.8.2.2 Cache is Too Small** > > It is possible DBWR is very active because the cache is too small. Investigate whether this is a probable cause by looking to see if the buffer cache hit ratio is low. Also use the V$DB_CACHE_ADVICE view to determine whether a larger cache size would be advantageous. See "Sizing the Buffer Cache". > > **10.3.8.2.3 Cache Is Too Big for One DBWR** > > If the cache size is adequate and the I/O is evenly spread, then you can potentially modify the behavior of DBWR by using asynchronous I/O or by using multiple database writers. > > **10.3.8.3 Consider Multiple Database Writer (DBWR) Processes or I/O Slaves** > > Configuring multiple database writer processes, or using I/O slaves, is useful when the transaction rates are high or when the buffer cache size is so large that a single DBWn process cannot keep up with the load. > > **10.3.8.3.1 DB_WRITER_PROCESSES** > > The DB_WRITER_PROCESSES initialization parameter lets you configure multiple database writer processes (from DBW0 to DBW9 and from DBWa to DBWj). Configuring multiple DBWR processes distributes the work required to identify buffers to be written, and it also distributes the I/O load over these processes. Multiple db writer processes are highly recommended for systems with multiple CPUs (at least one db writer for every 8 CPUs) or multiple processor groups (at least as many db writers as processor groups). > > Based upon the number of CPUs and the number of processor groups, Oracle Database either selects an appropriate default setting for DB_WRITER_PROCESSES or adjusts a user-specified setting. > > **10.3.8.3.2 DBWR_IO_SLAVES** > > If it is not practical to use multiple DBWR processes, then Oracle Database provides a facility whereby the I/O load can be distributed over multiple slave processes. The DBWR process is the only process that scans the buffer cache LRU list for blocks to be written out. However, the I/O for those blocks is performed by the I/O slaves. The number of I/O slaves is determined by the parameter DBWR_IO_SLAVES. > > DBWR_IO_SLAVES is intended for scenarios where you cannot use multiple DB_WRITER_PROCESSES (for example, where you have a single CPU). I/O slaves are also useful when asynchronous I/O is not available, because the multiple I/O slaves simulate nonblocking, asynchronous requests by freeing DBWR to continue identifying blocks in the cache to be written. Asynchronous I/O at the operating system level, if you have it, is generally preferred. > > DBWR I/O slaves are allocated immediately following database open when the first I/O request is made. The DBWR continues to perform all of the DBWR-related work, apart from performing I/O. I/O slaves simply perform the I/O on behalf of DBWR. The writing of the batch is parallelized between the I/O slaves. > > Note: > Implementing DBWR_IO_SLAVES requires that extra shared memory be allocated for I/O buffers and request queues. Multiple DBWR processes cannot be used with I/O slaves. Configuring I/O slaves forces only one DBWR process to start. > > **10.3.8.3.3 Choosing Between Multiple DBWR Processes and I/O Slaves** > > Configuring multiple DBWR processes benefits performance when a single DBWR process cannot keep up with the required workload. However, before configuring multiple DBWR processes, check whether asynchronous I/O is available and configured on the system. If the system supports asynchronous I/O but it is not currently used, then enable asynchronous I/O to see if this alleviates the problem. If the system does not support asynchronous I/O, or if asynchronous I/O is configured and there is still a DBWR bottleneck, then configure multiple DBWR processes. > > Note: > If asynchronous I/O is not available on your platform, then asynchronous I/O can be disabled by setting the DISK_ASYNCH_IO initialization parameter to FALSE. > > Using multiple DBWRs parallelizes the gathering and writing of buffers. Therefore, multiple DBWn processes should deliver more throughput than one DBWR process with the same number of I/O slaves. For this reason, the use of I/O slaves has been deprecated in favor of multiple DBWR processes. I/O slaves should only be used if multiple DBWR processes cannot be configured. ## library cache pin This event manages library cache concurrency. Pinning an object causes the heaps to be loaded into memory. If a client wants to modify or examine the object, the client must acquire a pin after the lock. ## library cache lock This event controls the concurrency between clients of the library cache. It acquires a lock on the object handle so that either: One client can prevent other clients from accessing the same object The client can maintain a dependency for a long time which does not allow another client to change the object  This lock is also obtained to locate an object in the library cache. ## latch events latch相关的等待事件。 > A latch is a low-level internal lock used by Oracle Database to protect memory structures. The latch free event is updated when a server process attempts to get a latch, and the latch is unavailable on the first attempt. > > There is a dedicated latch-related wait event for the more popular latches that often generate significant contention. For those events, the name of the latch appears in the name of the wait event, such as latch: library cache or latch: cache buffers chains. This enables you to quickly figure out if a particular type of latch is responsible for most of the latch-related contention. Waits for all other latches are grouped in the generic latch free wait event. > > See Also: > Oracle Database Concepts for more information on latches and internal locks > > **10.3.10.1 Actions** > > This event should only be a concern if latch waits are a significant portion of the wait time on the system as a whole, or for individual users experiencing problems. > > Examine the resource usage for related resources. For example, if the library cache latch is heavily contended for, then examine the hard and soft parse rates. > > Examine the SQL statements for the sessions experiencing latch contention to see if there is any commonality. > > Check the following V$SESSION_WAIT parameter columns: > > P1: Address of the latch > > P2: Latch number > > P3: Number of times process has slept, waiting for the latch > > **10.3.10.2 Example: Find Latches Currently Waited For** > > SELECT EVENT, SUM(P3) SLEEPS, SUM(SECONDS_IN_WAIT) SECONDS_IN_WAIT > FROM V$SESSION_WAIT > WHERE EVENT LIKE 'latch%' > GROUP BY EVENT; > > A problem with the previous query is that it tells more about session tuning or instant instance tuning than instance or long-duration instance tuning. > > The following query provides more information about long duration instance tuning, showing whether the latch waits are significant in the overall database time. > > SELECT EVENT, TIME_WAITED_MICRO, > ROUND(TIME_WAITED_MICRO*100/S.DBTIME,1) PCT_DB_TIME > FROM V$SYSTEM_EVENT, > (SELECT VALUE DBTIME FROM V$SYS_TIME_MODEL WHERE STAT_NAME = 'DB time') S > WHERE EVENT LIKE 'latch%' > ORDER BY PCT_DB_TIME ASC; > > A more general query that is not specific to latch waits is the following: > > SELECT EVENT, WAIT_CLASS, > TIME_WAITED_MICRO,ROUND(TIME_WAITED_MICRO*100/S.DBTIME,1) PCT_DB_TIME > FROM V$SYSTEM_EVENT E, V$EVENT_NAME N, > (SELECT VALUE DBTIME FROM V$SYS_TIME_MODEL WHERE STAT_NAME = 'DB time') S > WHERE E.EVENT_ID = N.EVENT_ID > AND N.WAIT_CLASS NOT IN ('Idle', 'System I/O') > ORDER BY PCT_DB_TIME ASC; > > Table 10-3 Latch Wait Event > Latch SGA Area Possible Causes Look For: > > Shared pool, library cache > > > Shared pool > > > Lack of statement reuse > > Statements not using bind variables > > Insufficient size of application cursor cache > > Cursors closed explicitly after each execution > > Frequent logins and logoffs > > Underlying object structure being modified (for example truncate) > > Shared pool too small > > > Sessions (in V$SESSTAT) with high: > > parse time CPU > > parse time elapsed > > Ratio of parse count (hard) / execute count > > Ratio of parse count (total) / execute count > > Cursors (in V$SQLAREA/V$SQLSTATS) with: > > High ratio of PARSE_CALLS / EXECUTIONS > > EXECUTIONS = 1 differing only in literals in the WHERE clause (that is, no bind variables used) > > High RELOADS > > High INVALIDATIONS > > Large (1mb) SHARABLE_MEM > > cache buffers lru chain > > > Buffer cache LRU lists > > > Excessive buffer cache throughput. For example, inefficient SQL that accesses incorrect indexes iteratively (large index range scans) or many full table scans > > DBWR not keeping up with the dirty workload; hence, foreground process spends longer holding the latch looking for a free buffer > > Cache may be too small > > > Statements with very high logical I/O or physical I/O, using unselective indexes > > cache buffers chains > > > Buffer cache buffers > > > Repeated access to a block (or small number of blocks), known as a hot block > > > Sequence number generation code that updates a row in a table to generate the number, rather than using a sequence number generator > > Index leaf chasing from very many processes scanning the same unselective index with very similar predicate > > Identify the segment the hot block belongs to > > row cache objects > > > **10.3.10.3 Shared Pool and Library Cache Latch Contention** > > A main cause of shared pool or library cache latch contention is parsing. There are several techniques that you can use to identify unnecessary parsing and several types of unnecessary parsing: > > Unshared SQL > > Reparsed Sharable SQL > > By Session > > cache buffers lru chain > > cache buffers chains > > row cache objects > > **10.3.10.3.1 Unshared SQL** > > This method identifies similar SQL statements that could be shared if literals were replaced with bind variables. The idea is to either: > > Manually inspect SQL statements that have only one execution to see whether they are similar: > > SELECT SQL_TEXT > FROM V$SQLSTATS > WHERE EXECUTIONS < 4 > ORDER BY SQL_TEXT; > > Or, automate this process by grouping what may be similar statements. Estimate the number of bytes of a SQL statement that are likely the same, and group the SQL statements by this number of bytes. For example, the following example groups statements that differ only after the first 60 bytes. > > SELECT SUBSTR(SQL_TEXT, 1, 60), COUNT(*) > FROM V$SQLSTATS > WHERE EXECUTIONS < 4 > GROUP BY SUBSTR(SQL_TEXT, 1, 60) > HAVING COUNT(*) > 1; > > Or report distinct SQL statements that have the same execution plan. The following query selects distinct SQL statements that share the same execution plan at least four times. These SQL statements are likely to be using literals instead of bind variables. > > SELECT SQL_TEXT FROM V$SQLSTATS WHERE PLAN_HASH_VALUE IN > (SELECT PLAN_HASH_VALUE > FROM V$SQLSTATS > GROUP BY PLAN_HASH_VALUE HAVING COUNT(*) > 4) > ORDER BY PLAN_HASH_VALUE; > > **10.3.10.3.2 Reparsed Sharable SQL** > > Check the V$SQLSTATS view. Enter the following query: > > SELECT SQL_TEXT, PARSE_CALLS, EXECUTIONS > FROM V$SQLSTATS > ORDER BY PARSE_CALLS; > > When the PARSE_CALLS value is close to the EXECUTIONS value for a given statement, you might be continually reparsing that statement. Tune the statements with the higher numbers of parse calls. > > **10.3.10.3.3 By Session** > > Identify unnecessary parse calls by identifying the session in which they occur. It might be that particular batch programs or certain types of applications do most of the reparsing. To achieve this goal, run the following query: > > SELECT pa.SID, pa.VALUE "Hard Parses", ex.VALUE "Execute Count" > FROM V$SESSTAT pa, V$SESSTAT ex > WHERE pa.SID = ex.SID > AND pa.STATISTIC#=(SELECT STATISTIC# > FROM V$STATNAME WHERE NAME = 'parse count (hard)') > AND ex.STATISTIC#=(SELECT STATISTIC# > FROM V$STATNAME WHERE NAME = 'execute count') > AND pa.VALUE > 0; > > The result is a list of all sessions and the amount of reparsing they do. For each session identifier (SID), go to V$SESSION to find the name of the program that causes the reparsing. > > Note: > Because this query counts all parse calls since instance startup, it is best to look for sessions with high rates of parse. For example, a connection which has been up for 50 days might show a high parse figure, but a second connection might have been up for 10 minutes and be parsing at a much faster rate. > > The output is similar to the following: > > SID Hard Parses Execute Count > ------ ----------- ------------- > 7 1 20 > 8 3 12690 > 6 26 325 > 11 84 1619 > > **10.3.10.3.4 cache buffers lru chain** > > The cache buffers lru chain latches protect the lists of buffers in the cache. When adding, moving, or removing a buffer from a list, a latch must be obtained. > > For symmetric multiprocessor (SMP) systems, Oracle Database automatically sets the number of LRU latches to a value equal to one half the number of CPUs on the system. For non-SMP systems, one LRU latch is sufficient. > > Contention for the LRU latch can impede performance on SMP computers with a large number of CPUs. LRU latch contention is detected by querying V$LATCH, V$SESSION_EVENT, and V$SYSTEM_EVENT. To avoid contention, consider tuning the application, bypassing the buffer cache for DSS jobs, or redesigning the application. > > **10.3.10.3.5 cache buffers chains** > > The cache buffers chains latches are used to protect a buffer list in the buffer cache. These latches are used when searching for, adding, or removing a buffer from the buffer cache. Contention on this latch usually means that there is a block that is greatly contended for (known as a hot block). > > To identify the heavily accessed buffer chain, and hence the contended for block, look at latch statistics for the cache buffers chains latches using the view V$LATCH_CHILDREN. If there is a specific cache buffers chains child latch that has many more GETS, MISSES, and SLEEPS when compared with the other child latches, then this is the contended for child latch. > > This latch has a memory address, identified by the ADDR column. Use the value in the ADDR column joined with the X$BH table to identify the blocks protected by this latch. For example, given the address (V$LATCH_CHILDREN.ADDR) of a heavily contended latch, this queries the file and block numbers: > > SELECT OBJ data_object_id, FILE#, DBABLK,CLASS, STATE, TCH > FROM X$BH > WHERE HLADDR = 'address of latch' > ORDER BY TCH; > > X$BH.TCH is a touch count for the buffer. A high value for X$BH.TCH indicates a hot block. > > Many blocks are protected by each latch. One of these buffers will probably be the hot block. Any block with a high TCH value is a potential hot block. Perform this query several times, and identify the block that consistently appears in the output. After you have identified the hot block, query DBA_EXTENTS using the file number and block number, to identify the segment. > > After you have identified the hot block, you can identify the segment it belongs to with the following query: > > SELECT OBJECT_NAME, SUBOBJECT_NAME > FROM DBA_OBJECTS > WHERE DATA_OBJECT_ID = &obj; > > In the query, &obj is the value of the OBJ column in the previous query on X$BH. > > **10.3.10.3.6 row cache objects** > > The row cache objects latches protect the data dictionary.

This event occurs when a session requests a buffer that is currently being read into the buffer cache by another session. Prior to release 10.1, waits for this event were grouped with the other reasons for waiting for buffers under the 'buffer busy wait' event

Wait Time: Time waited for the buffer to be read by the other session (in microseconds)

## cursor: pin S wait on X

A session waits for this event when it is requesting a shared mutex pin and another session is holding an exclusive mutex pin on the same cursor object.

Wait Time: Microseconds

# Reference

《Performance Tuning Guide》-Wait Events Statistics 《Database Administration》-Descriptions of Wait Events