Bug 34885986 - Flashback log file was not reused even if db_flashback_retention_target is passed
0.案例概述
客户有一套11.2.0.4 的 RAC +DG环境,primary端的RAC数据库出现故障,检查日志发现:闪回区的空间使用率达到100%,最终导致了数据库故障。 故障解决后,需要分析该故障的具体原因。
1、案例分析
1.1 检查数据库闪回区的配置信息。
S0L> show parameter flash
NAME TYPE VALUE
------------------------------ ------------- ---------
db_flash_cache_file string
db_flash_cache_size integer 0
db_flashback_retention_target integer 1440
S0L> show parameter recover
NAME TYPE VALUE
------------------------------ ------------- ---------
db_recovery_file_dest string +DATA
db_recovery_file_dest_size integer 100G
db_unrecoverable_scn_tracking boolean TRUE
recovery_parallelism integer 0
S0L>
可见,当前的闪回区大小为100GB,闪回日志的保存时间为1440分钟,也即1天。
1.2 检查闪回区的当前使用情况。
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
---------- ------------------- -------------------------- ----------------
CONTROL FILE 0.04 0 1
REDO LOG 2.94 0 6
ARHICVED LOG 0 0 0
BACKUP PIECE 0 0 0
IMAGE COPY 0 0 0
FLASHBACK LOG 41.47 0 87
FOREIGN ARCHIVED LOG 0 0 0
SQL> select OLDEST_FLASHBACK_SCN, OLDEST_FLASHBACK_TIME, RETENTION_TARGET, ESTIMATED_FLASHBACK_SIZE from v$flashback_database_log;
OLDEST_FLASHBACK_SCN OLDEST_FLASHBACK_TI RETENTION_TARGET ESTIMATED_FLASHBACK_SIZE
-------------------- -------------------- ----------------- -------------------------
676939705 08/16/2024 18:51:02 1440 407937024
可见,闪回日志占用了整个闪回区41.47%的空间。虽然设置的闪回日志保存时间为1天,但实际上闪回日志重未被自动清理过,数据库可以闪回到一年前。
1.3 检查数据库是否设置了guaranteed restore point。但未发现guaranteed restore point。
1.4 事出反常必有妖。搜索到《Bug 34885986 - Flashback log file was not reused even if db_flashback_retention_target is passed (Doc ID 34885986.8)》这篇文章,果然是一个BUG。
2、解决方案
针对该故障,官方提供了两个Workaround。
There are 2 possible workarounds:
- workaround#1: deletes all the flashback logs (this is simplest)
- workaround#2: deletes a subset of the flashback logs, preserves
all the flashback logs within the retention period
(db_flashback_retention_target minutes),
note: these 2 workarounds don't prevent the problem from happening, they just fix the problem after it has happened.
Workaround#1:
Remove all the flashback logs by disabling then re-enabling flashback:
alter database flashback off;
alter database flashback on;
Workaround#2:
Remove the problematic flashback log (and all the old flashback logs which could have been reused in the past but were not) via the following steps:
1. check what is the approx current disk space usage Mb of all the flashback logs that were generated within the past db_flashback_retention_target minutes:
select sum((bytes)/(1024*1024)) "Mb"
from v$flashback_database_logfile
where first_time >
(sysdate - ( db_flashback_retention_target /(24*60)));
if the archive logs are also stored in the db_recovery_file_dest area (check via: select * form v$flash_recovery_area_usage), then run a similar query against v$archived_log:
select sum((blocks*block_size)/(1024*1024)) "Mb"
from v$archived_log
where first_time >
(sysdate - ( db_flashback_retention_target /(24*60)));
2. Note down the current FRA limit:
select value from v$parameter
where name='db_recovery_file_dest_size';
3. Take the combined total from step#1, then set the FRA limit to say 120% of that value, via:
alter system set db_recovery_file_dest_size= ...M;
note: the +20% will allow for a future peak workload not needing to immediately create new flashback logs for the thread.
4. as soon as the parameter is lowered, the problematic flashback log with the null x$kccfle.flehtim should be deleted, check this via
running the selects as described in the "Rediscovery"
5. restore the FRA limit back to its original value from step#2
alter system set db_recovery_file_dest_size= ...M;
3、回顾下闪回区的管理规则:
The following rules govern creating, retaining, overwriting and deleting of flashback logs in the fast recovery area:
* If the fast recovery area has enough space, then a flashback log is created whenever necessary to satisfy the flashback retention target.
* If a flashback log is old enough that it is no longer needed to satisfy the flashback retention target, then a flashback log is reused.
* If the database must create a new flashback log and the fast recovery area is full or there is no disk space, then the oldest flashback log is reused instead.
Note: Reusing the oldest flashback log shortens the flashback database window. If enough flashback logs are reused due to a lack of disk space, then the flashback retention target may not be satisfied.
* If the fast recovery area is full, then an archived redo log that is reclaimable according to the fast recovery area rules may be automatically deleted by the fast recovery area to make space for other files. In this case, any flashback logs that would require the use of that redo log file for the use of FLASHBACK DATABASE are also deleted.
Note: According to fast recovery area rules, a file is reclaimable when one of the following criteria is true: The file is reported as obsolete and not needed by the flashback database. For example, the file is outside the DB_FLASHBACK_RETENTION_TARGET parameters. The file is backed up to tape.
* No file in the fast recovery area is eligible for deletion if it is required to satisfy a guaranteed restore point. Thus, retention of flashback logs and other files required to satisfy the guaranteed restore point, in addition to files required to satisfy the backup retention policy, can cause the fast recovery area to fill completely. Consult "Responding to a Full Fast Recovery Area" if your fast recovery area becomes full.
浙公网安备 33010602011771号