Bug 34885986 - Flashback log file was not reused even if db_flashback_retention_target is passed

0.案例概述

客户有一套11.2.0.4 的 RAC +DG环境,primary端的RAC数据库出现故障,检查日志发现:闪回区的空间使用率达到100%,最终导致了数据库故障。 故障解决后,需要分析该故障的具体原因。

 

1、案例分析

1.1 检查数据库闪回区的配置信息。

S0L> show parameter flash
NAME                              TYPE           VALUE
------------------------------  -------------   ---------
db_flash_cache_file               string   
db_flash_cache_size               integer        0
db_flashback_retention_target     integer        1440

S0L> show parameter recover
NAME                              TYPE           VALUE
------------------------------  -------------   ---------
db_recovery_file_dest             string          +DATA
db_recovery_file_dest_size        integer         100G
db_unrecoverable_scn_tracking     boolean         TRUE
recovery_parallelism              integer         0
S0L>

可见,当前的闪回区大小为100GB,闪回日志的保存时间为1440分钟,也即1天。

1.2  检查闪回区的当前使用情况。

SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE             PERCENT_SPACE_USED   PERCENT_SPACE_RECLAIMABLE    NUMBER_OF_FILES
----------            -------------------  --------------------------   ----------------
CONTROL FILE             0.04                  0                            1
REDO LOG                 2.94                  0                            6
ARHICVED LOG                0                  0                            0
BACKUP PIECE                0                  0                            0
IMAGE COPY                  0                  0                            0
FLASHBACK LOG           41.47                  0                           87
FOREIGN ARCHIVED LOG        0                  0                            0

SQL> select OLDEST_FLASHBACK_SCN, OLDEST_FLASHBACK_TIME, RETENTION_TARGET, ESTIMATED_FLASHBACK_SIZE from v$flashback_database_log;
OLDEST_FLASHBACK_SCN  OLDEST_FLASHBACK_TI   RETENTION_TARGET   ESTIMATED_FLASHBACK_SIZE
--------------------  --------------------  -----------------  -------------------------
676939705              08/16/2024 18:51:02    1440               407937024              

可见,闪回日志占用了整个闪回区41.47%的空间。虽然设置的闪回日志保存时间为1天,但实际上闪回日志重未被自动清理过,数据库可以闪回到一年前。

1.3  检查数据库是否设置了guaranteed restore point。但未发现guaranteed restore point。

1.4 事出反常必有妖。搜索到《Bug 34885986 - Flashback log file was not reused even if db_flashback_retention_target is passed (Doc ID 34885986.8)》这篇文章,果然是一个BUG。

 

2、解决方案

针对该故障,官方提供了两个Workaround。
 There are 2 possible workarounds:
  - workaround#1: deletes all the flashback logs (this is simplest)
  - workaround#2: deletes a subset of the flashback logs, preserves 
                  all the flashback logs within the retention period
                  (db_flashback_retention_target minutes),

note: these 2 workarounds don't prevent the problem from happening, they just fix the problem after it has happened.
  
 Workaround#1:
 Remove all the flashback logs by disabling then re-enabling flashback:
   alter database flashback off;
   alter database flashback on;
  
 Workaround#2:
 Remove the problematic flashback log (and all the old flashback logs which could have been reused in the past but were not) via the following steps:
  1. check what is the approx current disk space usage Mb of all the flashback logs that were generated within the past db_flashback_retention_target minutes:
         select sum((bytes)/(1024*1024)) "Mb"
           from v$flashback_database_logfile
          where first_time >
                  (sysdate - ( db_flashback_retention_target /(24*60)));
     if the archive logs are also stored in the db_recovery_file_dest area (check via: select * form v$flash_recovery_area_usage), then run a similar query against v$archived_log:
         select sum((blocks*block_size)/(1024*1024)) "Mb"
           from v$archived_log
          where first_time >
                  (sysdate - ( db_flashback_retention_target /(24*60)));
  2. Note down the current FRA limit:
       select value from v$parameter
         where name='db_recovery_file_dest_size';
  3. Take the combined total from step#1, then set the FRA limit to say 120% of that value, via:
       alter system set db_recovery_file_dest_size= ...M;
     note: the +20% will allow for a future peak workload not needing to immediately create new flashback logs for the thread.
  4. as soon as the parameter is lowered, the problematic flashback log with the null x$kccfle.flehtim should be deleted, check this via
     running the selects as described in the "Rediscovery"
  5. restore the FRA limit back to its original value from step#2
       alter system set db_recovery_file_dest_size= ...M;

 

3、回顾下闪回区的管理规则:

The following rules govern creating, retaining, overwriting and deleting of flashback logs in the fast recovery area:
*  If the fast recovery area has enough space, then a flashback log is created whenever necessary to satisfy the flashback retention target.
*  If a flashback log is old enough that it is no longer needed to satisfy the flashback retention target, then a flashback log is reused.
*  If the database must create a new flashback log and the fast recovery area is full or there is no disk space, then the oldest flashback log is reused instead.
Note:  Reusing the oldest flashback log shortens the flashback database window. If enough flashback logs are reused due to a lack of disk space, then the flashback retention target may not be satisfied.
*  If the fast recovery area is full, then an archived redo log that is reclaimable according to the fast recovery area rules may be automatically deleted by the fast recovery area to make space for other files. In this case, any flashback logs that would require the use of that redo log file for the use of FLASHBACK DATABASE are also deleted.
Note:  According to fast recovery area rules, a file is reclaimable when one of the following criteria is true:  The file is reported as obsolete and not needed by the flashback database. For example, the file is outside the DB_FLASHBACK_RETENTION_TARGET parameters. The file is backed up to tape.
*  No file in the fast recovery area is eligible for deletion if it is required to satisfy a guaranteed restore point. Thus, retention of flashback logs and other files required to satisfy the guaranteed restore point, in addition to files required to satisfy the backup retention policy, can cause the fast recovery area to fill completely. Consult "Responding to a Full Fast Recovery Area" if your fast recovery area becomes full.

 

posted @ 2025-08-08 12:37  石云华  阅读(18)  评论(0)    收藏  举报