Excessive AWR Growth From Partitioned Objects Such as SYS.WRH$_EVENT_HISTOGRAM Causing Sysaux to Grow (Doc ID 1912201.1)

APPLIES TO:

Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Information in this document applies to any platform.

SYMPTOMS

  • The AWR is growing on a nightly basis despite very little activity on the database
  • Awrinfo shows following partitioned objects taking most space:
    WRH$_OSSTAT.WRH$_OSSTAT_703264452_0 - 74% TABLE PARTITION
    EVENTS 580.0 WRH$_EVENT_HISTOGRAM_PK.WRH$_EVENT__703264452_0 - 97% INDEX PARTITION
    EVENTS 489.0 WRH$_EVENT_HISTOGRAM.WRH$_EVENT__703264452_0 - 98% TABLE PARTITION
    EVENTS 168.0 WRH$_SYSTEM_EVENT_PK.WRH$_SYSTEM_703264452_0 - 75% INDEX PARTITION
    EVENTS 164.0 WRH$_SYSTEM_EVENT.WRH$_SYSTEM_703264452_0 - 94% TABLE PARTITION  
    This report can be found under the Oracle Home directory in the rdbms/admin subdirectory. It can be executed as follows:
    SQL> @?/rdbms/admin/awrinfo.sql

CAUSE

This is due to following known issue:

Document 14084247.8 Bug 14084247 - Failed AWR purge can lead to continued SYSAUX space use

However, even after applying bug patch 14084247, this fix will not have an immediate effect and may take some time to "catch up".

Because of purge operation failure, the split partition does not occur. This causes problems because partitions grow bigger and bigger affecting the query performance, table space reclaimation, and further worsening the purging operation.

When we reach a certain point in the AWR repository where the purge job times out, it will grow the AWR repository on a daily basis without ever reducing the size.

After 14084247 fix is applied, this is the order of relevant purge steps:

1) drop expired partitions (DDL)
2) purge by callback (anything can happen here, but mostly no-ops or regular DELETE operations)
3) purge regular non-partitioned tables (i.e, DELETE)
4) split partitions (DDL)

This is the crux of the problem: if step 2 or step 3 is slow, the partitions do not split. And if partitions do not split, it means they do not change to expired. So step 1 will not be able to drop them next time. So they grow and grow, which can itself contribute to making step 2 or step 3 slow, depending on how the data is related.

The fix will add a separate split (step 4 above), done by the snapshot when needed. That means that the next time purge runs, there will be one more partition. However, that does not mean that it is necessarily time to purge that partition yet. Since it was only created a few hours ago, it still has active data in it.

In this case, the retention period is 8 days.
The partition(s) created by the snapshot will typically hang around throughout the retention period. If they are created on 1-JAN, they won't be purged (by step 1 in the regular purge) until 9-JAN.  It should get deleted no later than whatever the retention period is (8 days from now in this case). The fix adds partitions for future snapshots, leaving the existing content in the first partition, where it has been building from the beginning.

SOLUTION

1. Download and Apply Patch 14084247.

2. Another way to speed up the purging is to lower the retention period on a temporary basis to 1 day:

exec DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(1440,60);

If the above command results in the following error, alter the baseline window size:

ERROR at line 1:
ORA-13541: system moving window baseline size (691200) greater than retention
(86400)
ORA-06512: at "SYS.DBMS_WORKLOAD_REPOSITORY", line 174
ORA-06512: at "SYS.DBMS_WORKLOAD_REPOSITORY", line 222
ORA-06512: at line 1

To alter the baseline window size:

exec DBMS_WORKLOAD_REPOSITORY.modify_baseline_window_size(window_size => 1);

Then, rerun the snapshot setting:

exec DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(1440,60);

 

Note: Before implementing into production, please consider that lowering the retention period means losing AWR data that may be needed.
posted @ 2022-06-28 09:13  阿西吧li  阅读(83)  评论(0编辑  收藏  举报