代码改变世界

ORA-1652: unable to extend temp segment by 128 in tablespace xxx Troubleshootin

2017-08-08 22:54  潇湘隐者  阅读(2520)  评论(0编辑  收藏  举报

当收到告警信息ORA-01652: unable to extend temp segment by 128 in tablespace xxxx 时,如何Troubleshooting ORA-1652这样的问题呢? 当然一般xxx是临时表空间,也有可能是用户表空间。

 

我们先来模拟一下这个情况,在两个会话窗口执行下面SQL语句,这个视图比较特殊(因为比较懒,不想去构造一个大量消耗临时段的SQL,便使用手头的一个案例脚本),它里面有一个DISTINCT操作会消耗TEMP表空间中大量的临时段

 

 

SQL> select count(*) from v_ies_go_information;

 

 

开启两个会话窗口执行上面这个SQL,此时这两个会话会耗大量临时段,那么你用下面SQL语句就能捕获到这个SQL,如下所示:

 

 

For 8.1.7 to 9.2

 

SELECT A.USERNAME, A.SID, A.SERIAL#, A.OSUSER, B.TABLESPACE, B.BLOCKS, C.SQL_TEXT

FROM V$SESSION A, V$SORT_USAGE B, V$SQLAREA C

WHERE A.SADDR = B.SESSION_ADDR

AND C.ADDRESS= A.SQL_ADDRESS

AND C.HASH_VALUE = A.SQL_HASH_VALUE

ORDER BY B.TABLESPACE, B.BLOCKS;

 

 

For 10.1 and above:

COL  USERNAME FOR A16;

COL  OSUSER FOR A16;

COL  TABLESPACE FOR A10;

COL  SQL_TEXT FOR A160;

SELECT A.USERNAME, A.SID, A.SERIAL#, A.OSUSER, B.TABLESPACE, B.BLOCKS, C.SQL_TEXT

FROM GV$SESSION A, GV$TEMPSEG_USAGE B, GV$SQLAREA C

WHERE A.SADDR = B.SESSION_ADDR

AND C.ADDRESS= A.SQL_ADDRESS

AND C.HASH_VALUE = A.SQL_HASH_VALUE

ORDER BY B.TABLESPACE, B.BLOCKS;

 

 

当然消耗临时表空间的BLOCKS是一直变化的,下面只是其中一次查询结果的截图

 

 

clip_image001

 

 

当然这个也可以通过下面SQL查询当前消耗TEMP临时段的SQL_ID以及具体大小信息。这些信息都是实时变化的。

 

 
 
SQLSELECT SQL_ID,SUM(BLOCKS) FROM GV$TEMPSEG_USAGE GROUP BY SQL_ID ORDER BY 2 DESC;
 
SQL_ID        SUM(BLOCKS)
------------- -----------
cw4d8h5fudg6b      456704
 
SQL> SELECT TABLESPACE_NAME,TOTAL_BLOCKS,USED_BLOCKS,FREE_BLOCKS FROM V$SORT_SEGMENT;
 
TABLESPACE_NAME                 TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMPSCM2                             1048320      506368      541952
 
SQL> SELECT TABLESPACE_NAME,TOTAL_BLOCKS,USED_BLOCKS,FREE_BLOCKS FROM V$SORT_SEGMENT;
 
TABLESPACE_NAME                 TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMPSCM2                             1048320     1030144       18176

 

clip_image002

 

 

在另外一个窗口,不时执行下面SQL语句观察临时表空间的消耗使用情况,也能看到临时表空间的消耗变化情况, 如下所示:

 

 

SELECT D.TABLESPACE_NAME,                                                
       SPACE                                      "SUM_SPACE(M)",        
       BLOCKS                                     "SUM_BLOCKS",          
       USED_SPACE                                 "USED_SPACE(M)",       
       ROUND(NVL(USED_SPACE, 0) / SPACE * 100, 2) "USED_RATE(%)",        
       SPACE - USED_SPACE                         "FREE_SPACE(M)"        
FROM   (SELECT TABLESPACE_NAME,                                          
               ROUND(SUM(BYTES) / ( 1024 * 1024 ), 2) SPACE,             
               SUM(BLOCKS)                            BLOCKS             
        FROM   DBA_TEMP_FILES                                            
        GROUP  BY TABLESPACE_NAME) D,                                    
       (SELECT TABLESPACE,                                               
               ROUND(SUM(BLOCKS * 8192) / ( 1024 * 1024 ), 2) USED_SPACE 
        FROM   V$SORT_USAGE                                              
        GROUP  BY TABLESPACE) F                                          
WHERE  D.TABLESPACE_NAME = F.TABLESPACE(+)                               
  AND  D.TABLESPACE_NAME='TEMPSCM2'                                      

 

clip_image003

 

 

 

但是很多时候,当我们收到告警日志的告警邮件时,其实该SQL语句其实已经结束了。就像我这个测试会话中,如果已经收到ORA-1652 错误提示,其实会话已经结束,返回错误提示了。

 

 

clip_image004

 

Mon Aug 07 22:23:40 CST 2017

ORA-1652: unable to extend temp segment by 128 in tablespace                 TEMPSCM2

Mon Aug 07 22:23:40 CST 2017

ORA-1652: unable to extend temp segment by 128 in tablespace                 TEMPSCM2

 

此时你用上面SQL其实已经不能捕获到相关信息了,因为PMON已经释放、回收了这些会话占用的临时段,如下所示,测试环境已经查不到任何信息,如果是生产环境,那么有可能查到是不准确的信息(查到的是非引起问题的SQL)。上面只适合查询当前临时表空间的使用情况,而不适合用来追查已经出现的ORA-1652错误。

 

clip_image005

 

clip_image006

 

 

那么此时我们应该怎么办呢? 其实我们可以使用ASH报告来帮忙定位消耗了大量临时段的SQL语句,如果收到ORA-01652告警后,最好及时生成一个快照,然后根据告警日志里面ORA-01652出现的时间,生成ASH报表,例如,此次试验ORA-01652出错的时间为22:23:40,那么我们生成22:20 ~ 22:25这个时间段的ASH报告。当然这个时间适当调整,尽量缩小范围,可以精准定位问题SQL。

 

SQL> @?/rdbms/admin/ashrpt.sql

 

clip_image007

 

 

然后从ASH报告的TOP SQL里面找到对应的TOP SQL的SQL ID,然后使用awrsqrpt报告找到具体SQL的执行计划, 如下所示, 然后分析SQL是否耗用了大量的临时段,当然生产环境肯定会复杂很多,TOP SQL里面肯定有多个,我们需要仔细甄别。这个分析也是一个耗时费力的体力活,所以上述ASH报告的时间段非常关键。

 

 

clip_image008

 

 

如下所示,通过awrsqrpt找到对应SQL_ID的具体执行计划,发现HASH UNIQUE这个DISTINCT操作使用了接近3G的临时段排序。再加上一些其他的操作需要消耗临时段,所以两个会话的同时执行就引起了ORA-1652的错误。

 

 

clip_image009

 

 

 

使用ASH报告基本上都能定位到具体消耗大量临时段的SQL语句,但是这个分析,有时候需要耗费很长时间,在How Can Temporary Segment Usage Be Monitored Over Time? (文档 ID 364417.1)里面介绍了如何监控临时段的使用情况。如下所示:

 

 

-- Create a table to hold your temporary space monitoring

-- 最好根据具体情况放入一个表空间,不要放入系统表空间

 

CREATE TABLE MONITOR_TEMP_SEG_USAGE

(

   DATE_TIME  DATE,

   USERNAME   VARCHAR2(30),

   SID        VARCHAR2(6),

   SERIAL#    VARCHAR2(6),

   OS_USER    VARCHAR2(30),

   SPACE_USED NUMBER,

   SQL_TEXT   VARCHAR2(1000)

);

 

 

--创建存储过程,将消耗临时段超过阀值的SQL插入MONITOR_TEMP_SEG_USAGE

 

CREATE OR REPLACE PROCEDURE MONITOR_TEMP_SEG_USAGE_INSERT IS

BEGIN

 

INSERT INTO MONITOR_TEMP_SEG_USAGE

SELECT sysdate,a.username, a.sid, a.serial#, a.osuser, b.blocks, c.sql_text

FROM v$session a, v$sort_usage b, v$sqlarea c

WHERE b.tablespace = 'TEMP' --输入具体临时表空间

AND a.saddr = b.session_addr

AND c.address= a.sql_address

AND c.hash_value = a.sql_hash_value

AND b.blocks*(select block_size from dba_tablespaces where tablespace_name = b.tablespace) > 1024;

COMMIT;

 

END;

/

 

 

--创建作业,每5分钟运行一次,捕获消耗临时段超过阀值的SQL语句。

 

SQL> SELECT JOB FROM DBA_JOBS;

 

       JOB

----------

       141

       142

 

 

BEGIN

DBMS_JOB.ISUBMIT(JOB => 20,

WHAT => 'MONITOR_TEMP_SEG_USAGE_INSERT;',

NEXT_DATE => SYSDATE,

INTERVAL => 'SYSDATE + (5/1440)');

COMMIT;

END;

/

 

 

另外,ORACLE 11.2 或后面的版本,可以使用下面SQL 语句查询出消耗临时段超过一定阀值的SQL语句,这样基本也能通过控制条件找到引起ORA-01652错误的SQL

 

 

SELECT SQL_ID,MAX(TEMP_SPACE_ALLOCATED)/(1024*1024*1024) GIG 
FROM DBA_HIST_ACTIVE_SESS_HISTORY 
WHERE 
SAMPLE_TIME > SYSDATE-2 AND 
TEMP_SPACE_ALLOCATED > (1024*1024*1024) 
GROUP BY SQL_ID ORDER BY SQL_ID;

 

 

 

上面的一些介绍,基本已经涵盖如何Troubleshooting ORA-1652这个问题了,那么下面介绍一下ORA-1652出现的场景,其实这个对理解ORA-1652出现的前因后果非常有帮助!

 

EXAMPLE 1:

Temporary tablespace TEMP is being used and is 50gb in size (a recommended minimum for 11g)

TIME 1 : Session 1 starts a long running query
TIME 2 : Session 2 starts a query and at this point in time Session 1 has consumed 48gb of TEMP's free space
TIME 3 : Session 1 and Session 2 receive an ORA-1652 because the tablespace has exhausted of of its free space
    Both sessions fail .. and all temp space used by the sessions are freed (the segments used are marked FREE for reuse)
TIME 4 : SMON cleans up the temporary segments used by Session 1 and Session 2 (deallocates the storage)
TIME 5 : Queries are run against the views V$SORTSEG_USAGE or V$TEMSEG_USAGE and V$SORT_SEGMENT ... and it is found that no space is being used (this is normal)

EXAMPLE 2: 

Permanent tablespace INDEX_TBS is being used and has 20gb of space free 

TIME 1 : Session 1 begins a CREATE INDEX command with the index stored in INDEX_TBS 
TIME 2 : Session 1 exhausts all of the free space in INDEX_TBS as a result the CREATE INDEX abends
TIME 3 : SMON cleans up the temporary segments that were used to attempt to create the index
TIME 4 : Queries are run against the views V$SORTSEG_USAGE or V$TEMSEG_USAGE ... and it is found that the INDEX_TBS has no space used (this is normal)

 

 

In some cases, you may find that the ORA-1652 is not reported for a temporary tablespace, but a permanent one. This is not an abnormal behaviour and it can occur for example while creating or dropping objects like tables and indexes in permanent tablespaces. Reference : Note 19047.1 - OERR: ORA 1652 unable to extend temp segment by %s in tablespace %s
In such cases the following note will be of use :
  Note 100492.1 - ORA-01652: Estimate Space Needed to CREATE INDEX

 


If the tablespace in which the TEMPORARY segment resides is of type PERMANENT, also check that the following events are not set in the initialization parameter file:

event="10061 trace name context forever, level 10"
event="10269 trace name context forever, level 10"

If they are set, unset them and restart database.

These two events prevent SMON from cleaning up.

Reference : Note 1039341.6 - Temporary Segments Are Not Being De-Allocated After a Sort

 

                                                                                                                                                  
                                                                                                                                                               
 

最后就是要给出一个解决方案,对于ORA-01652 这个错误有两个解决方案:

 

1: 如果临时表空间确实比较小,那么必须扩展临时表空间,增加临时数据文件或设置现有临时数据文件自动扩展。

2: 优化消耗大量临时段的SQL语句。减少临时段的消耗。

 

另外在RAC环境中,情况又有所不同,可以参考NOTE:280578.1 - Troubleshooting ORA-1652 Errors in RAC

 

 

 

 

参考资料:

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=144013041361565&id=793380.1&displayIndex=4&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_402

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=117616124858929&id=317441.1&_afrWindowMode=0&_adf.ctrl-state=n218pg0kt_171

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=147317477297631&id=100492.1&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_672

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=147331987541915&id=793380.1&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_721

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=148488302921419&parent=DOCUMENT&sourceId=317441.1&id=364417.1&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_936

 

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=184946492709140&parent=DOCUMENT&sourceId=317441.1&id=364417.1&_afrWindowMode=0&_adf.ctrl-state=17n4hl18o6_53