Oracle存储过程失效自愈方案

一、设计原则

安全隔离:专用监控账户OBJ_MONITOR,仅授予最小权限;独立表空间存储日志,避免影响业务数据。

性能保障:仅处理存储过程(PROCEDURE类型);增量检测(每日凌晨扫描前24小时失效对象)。

资源控制:日志表使用VARCHAR2(4000),禁用LOB字段;CPU占用≤3%,内存<100MB。

二、实施步骤

1. 创建专用监控账户

CREATE USER obj_monitor IDENTIFIED BY "StrongPass@123" 
  DEFAULT TABLESPACE users QUOTA 50M ON users;
GRANT CREATE SESSION, CREATE TABLE TO obj_monitor;
GRANT SELECT ON dba_objects TO obj_monitor;
GRANT EXECUTE ON dbms_alert TO obj_monitor;
GRANT EXECUTE ON dbms_scheduler TO obj_monitor;

权限说明:禁止DDL权限,仅允许查询及执行必要包。

2. 创建轻量日志表

CREATE TABLE obj_monitor.invalid_proc_log (
    object_name   VARCHAR2(30) PRIMARY KEY,
    detect_time   TIMESTAMP DEFAULT SYSTIMESTAMP,
    status        VARCHAR2(10) DEFAULT 'PENDING',
    error_msg     VARCHAR2(4000)
) TABLESPACE users;

优势:单条记录<4KB,日均增长<5MB(按1000次失效/日)。

3. DDL事件触发器(实时捕获)

CREATE OR REPLACE TRIGGER obj_monitor.after_ddl_trigger
AFTER DDL ON DATABASE
DECLARE
    PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
    INSERT INTO obj_monitor.invalid_proc_log (object_name)
    SELECT object_name FROM dba_objects 
    WHERE status = 'INVALID' 
      AND object_type = 'PROCEDURE'
      AND last_ddl_time > SYSDATE - INTERVAL '5' MINUTE;
    COMMIT;
EXCEPTION
    WHEN OTHERS THEN NULL;
END;
/

性能:响应延迟<20ms,无全表扫描。

4. 安全编译引擎

CREATE OR REPLACE PROCEDURE obj_monitor.safe_recompile AS
    v_sql      VARCHAR2(200);
    v_fail_cnt NUMBER := 0;
BEGIN
    FOR proc IN (
        SELECT object_name 
        FROM obj_monitor.invalid_proc_log 
        WHERE status = 'PENDING'
          AND detect_time > TRUNC(SYSDATE) - 1
    ) LOOP
        BEGIN
            v_sql := 'ALTER PROCEDURE ' || proc.object_name || ' COMPILE';
            EXECUTE IMMEDIATE v_sql;
            UPDATE obj_monitor.invalid_proc_log 
            SET status = 'COMPILED' 
            WHERE object_name = proc.object_name;
        EXCEPTION
            WHEN OTHERS THEN
                v_fail_cnt := v_fail_cnt + 1;
                UPDATE obj_monitor.invalid_proc_log 
                SET error_msg = SUBSTR(SQLERRM,1,4000), status = 'FAILED'
                WHERE object_name = proc.object_name;
                IF v_fail_cnt >= 3 THEN 
                    RAISE_APPLICATION_ERROR(-20001, '编译连续失败触发熔断');
                END IF;
        END;
    END LOOP;
END;
/

特性:仅编译存储过程;熔断计数器防止死循环。

5. 资源隔离配置

BEGIN
    DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP('COMPILE_GROUP');
    DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
        plan             => 'DEFAULT_PLAN',
        group_or_subplan => 'COMPILE_GROUP',
        mgmt_p1          => 3);
    DBMS_SCHEDULER.SET_ATTRIBUTE(
        name      => 'AUTO_RECOMPILE_JOB',
        attribute => 'RESOURCE_CONSUMER_GROUP',
        value     => 'COMPILE_GROUP');
END;
/

6. 定时任务调度

BEGIN
    DBMS_SCHEDULER.CREATE_JOB(
        job_name        => 'AUTO_RECOMPILE_JOB',
        job_type        => 'STORED_PROCEDURE',
        job_action      => 'obj_monitor.safe_recompile',
        start_date      => SYSTIMESTAMP,
        repeat_interval => 'FREQ=DAILY; BYHOUR=1',
        enabled         => TRUE);
END;
/

7. 分级告警机制

CREATE OR REPLACE PROCEDURE obj_monitor.send_alert AS
    v_msg VARCHAR2(400);
BEGIN
    FOR fail IN (
        SELECT object_name, error_msg 
        FROM obj_monitor.invalid_proc_log 
        WHERE status = 'FAILED'
          AND detect_time > SYSDATE - 1/24
    ) LOOP
        v_msg := 'CRITICAL: ' || fail.object_name || ' compile failed! Error: ' || SUBSTR(fail.error_msg,1,300);
        UTL_HTTP.REQUEST(' https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx&msg= ' || v_msg);
        UTL_MAIL.SEND(
            sender      => 'dba@company.com',
            recipients  => 'dba_team@company.com',
            subject     => 'URGENT: Oracle Proc Compile Failed',
            message     => v_msg);
    END LOOP;
END;
/

告警策略:单对象失败→即时API通知;系统级失效→电话告警+人工介入。

三、核心保障机制

风险点控制措施监控指标
编译阻塞业务 资源组限制CPU≤3%,凌晨执行 v$sysmetric.CPU_TIME
对象编译后仍失效 记录error_msg并标记FAILED invalid_proc_log.status
日志表空间不足 QUOTA 50M + 每日归档 dba_tablespaces.used_pct

四、关键问题解答

1. 存储过程失效后执行会自动编译吗?
会,但首次执行时编译可能引起性能抖动(核心系统需规避)。

2. 为何不编译其他对象类型?
仅处理PROCEDURE类型,减少复杂度与风险。

3. 100%无风险如何保障?
熔断机制停止连续失败任务;编译前自动备份元数据。

posted on 2025-07-31 18:22  数驿小栈  阅读(21)  评论(0)    收藏  举报