结合 Oracle 11g RAC 架构特性、博客中 CRS-4124 报错解决方案及双节点协同要求,以下是从集群服务到数据库实例的完整启动流程、状态验证方法、问题排查及核心原则,形成可复用的标准化操作体系。

1. 核心启动原则(底层→上层,不可逆转)

RAC 集群依赖严格的组件依赖关系,启动顺序错误会导致组件冲突或启动失败,核心流程:
 
服务器硬件→操作系统→OHAS(高可用服务)→CRS核心服务(CSSD/GPNPD/GIPCD等)→ASM实例→ASM磁盘组→RAC数据库实例
 
双节点需遵循 “先主节点(如 239)→后从节点(如 238)” 的顺序,避免集群脑裂或资源竞争。
 

2. 全流程分步操作(含 BUG 处理与双节点协同)

2.1.阶段 1:前置准备(双节点均需执行)

  1. 确认服务器硬件正常(网卡、存储链路连通),操作系统(Linux)已启动;
  2. 验证/etc/hosts配置一致性(双节点公网 / 私网 / VIP/SCAN IP 均需正确映射);
  3. 检查 Grid/Oracle 用户环境变量(ORACLE_HOME/GRID_HOME/ORACLE_SID)配置正确;
  4. 清理残留进程与锁文件(避免历史进程导致启动冲突):
    # root用户执行
    kill -9 $(ps -ef | grep -E 'grid|oracle|ohasd|cssd|crsd' | grep -v grep)
    rm -rf /var/tmp/.oracle/* /tmp/.oracle/* $GRID_HOME/log/$(hostname)/ohasd/*.lock

2.2.阶段 2:启动集群服务(CRS)(双节点分步执行)

场景 1:正常启动(无 CRS-4124 报错)

  1. 主节点(如 239)启动 CRS:
    # root用户执行
    crsctl start crs
    # 验证CRS核心服务状态(4个组件均需ONLINE)
    crsctl check crs 
    [root@wx-ncdb-90-239 ohasd]# crsctl check crs 
    CRS-4638: Oracle High Availability Services is online
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    [root@wx-ncdb-90-239 ohasd]# 
  2. 从节点(如 238)启动 CRS(等待主节点服务稳定后执行,约 3 分钟):
    crsctl start crs
    crsctl check crs
    [root@wx-ncdb-90-238 wx-ncdb-90-238]# crsctl check crs 
    CRS-4638: Oracle High Availability Services is online
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    [root@wx-ncdb-90-238 wx-ncdb-90-238]# 

场景 2:触发 CRS-4124 报错(OHAS 启动失败,博客 BUG 解决方案)

若执行crsctl start crsCRS-4124,则需通过init.ohasd脚本强制启动 OHAS:

image

  1. 查找并执行init.ohasd脚本:
    #  root用户执行
    find / -name init.ohasd  # 定位脚本路径(默认/etc/rc.d/init.d/init.ohasd)
    [root@wx-ncdb-90-238 ohasd]# find / -name init.ohasd
    /etc/rc.d/init.d/init.ohasd
    /u01/app/11.2.0/grid/crs/init/init.ohasd
    /u01/app/11.2.0/grid/crs/utl/init.ohasd
    [root@wx-ncdb-90-238 ohasd]# 
    cd /etc/rc.d/init.d/
    nohup ./init.ohasd run &  # 后台启动OHAS,输出重定向到nohup.out
    tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-238/ohasd/ohasd.log 

    image

  2. 验证 OHAS 状态,再启动 CRS:
    crsctl check has  # 显示"CRS-4638: Oracle High Availability Services is online"

    image

    crsctl start crs  # 此时可正常启动CRS核心服务(耐心等待,开启时间几分钟),查看日志
    tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-238/ohasd/ohasd.log 
    tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-238/cssd/ocssd.log
    tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-238/crsd/crsd.log 
    [root@wx-ncdb-90-239 ohasd]# tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-239/ohasd/ohasd.log 
    2026-01-18 10:28:33.916: [UiServer][574560000]{0:0:1969} Done for ctx=0x7fdcd000ab30
    2026-01-18 10:29:33.900: [UiServer][572458752] CS(0x7fdcc40008e0)set Properties ( grid,0x7fdcf40a0aa0)
    2026-01-18 10:29:33.911: [UiServer][574560000]{0:0:1970} Sending message to PE. ctx= 0x7fdcd00097a0, Client PID: 75616
    2026-01-18 10:29:33.911: [   CRSPE][576661248]{0:0:1970} Processing PE command id=2011. Description: [Stat Resource : 0x7fdccc0a33c0]
    2026-01-18 10:29:33.914: [UiServer][574560000]{0:0:1970} Done for ctx=0x7fdcd00097a0
    ^C
    [root@wx-ncdb-90-239 ohasd]# 
    [root@wx-ncdb-90-239 ohasd]# 
    [root@wx-ncdb-90-239 ohasd]# tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-239/cssd/ocssd.log 
    2026-01-18 10:30:39.783: [    CSSD][1793013504]clssnmSendingThread: sent 5 status msgs to all nodes
    2026-01-18 10:30:43.785: [    CSSD][1793013504]clssnmSendingThread: sending status msg to all nodes
    2026-01-18 10:30:43.785: [    CSSD][1793013504]clssnmSendingThread: sent 4 status msgs to all nodes
    2026-01-18 10:30:48.787: [    CSSD][1793013504]clssnmSendingThread: sending status msg to all nodes
    2026-01-18 10:30:48.787: [    CSSD][1793013504]clssnmSendingThread: sent 5 status msgs to all nodes
    ^C
    [root@wx-ncdb-90-239 ohasd]# 
    [root@wx-ncdb-90-239 ohasd]# tail -f -n 5 /u01/app/11.2.0/grid/log/wx-ncdb-90-239/crsd/crsd.log 
    2026-01-18 10:30:43.168: [UiServer][2730448640] CS(0x7f7e34009b10)set Properties ( grid,0x7f7e7815bec0)
    2026-01-18 10:30:43.178: [UiServer][2732549888]{2:54573:1507} Sending message to PE. ctx= 0x7f7e40008b00, Client PID: 75616
    2026-01-18 10:30:43.179: [   CRSPE][2734651136]{2:54573:1507} Processing PE command id=1558. Description: [Stat Resource : 0x7f7e3c0075f0]
    2026-01-18 10:30:43.179: [   CRSPE][2734651136]{2:54573:1507} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == wx-ncdb-90-239))
    2026-01-18 10:30:43.182: [UiServer][2732549888]{2:54573:1507} Done for ctx=0x7f7e40008b00
    View Code
  3. 集群服务层
    1. [root@wx-ncdb-90-239 ~]# crsctl check cluster -all
      **************************************************************
      wx-ncdb-90-238:
      CRS-4537: Cluster Ready Services is online
      CRS-4529: Cluster Synchronization Services is online
      CRS-4533: Event Manager is online
      **************************************************************
      wx-ncdb-90-239:
      CRS-4537: Cluster Ready Services is online
      CRS-4529: Cluster Synchronization Services is online
      CRS-4533: Event Manager is online
      **************************************************************
  4. 配置开机自启(避免重启后重复报错):
    vi /etc/rc.d/rc.local
    # 添加内容:nohup /etc/rc.d/init.d/init.ohasd run &
    chmod 744 /etc/rc.d/rc.local  # 赋予执行权限
    • 开机自启优化:不应直接写入 rc.local。应使用集群工具配置:

    crsctl enable crs # 启用集群自启

2.3.阶段 3:验证集群核心服务状态(双节点均需验证)

核心命令(覆盖所有 CRS 组件):
# 查看初始化层资源状态(关键:除ora.diskmon外均为ONLINE)
crsctl stat res -t -init
[root@wx-ncdb-90-238 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                                                   
ora.cluster_interconnect.haip
      1        ONLINE  OFFLINE                                                   
ora.crf
      1        ONLINE  OFFLINE                                                   
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  OFFLINE                                                   
ora.cssdmonitor
      1        ONLINE  OFFLINE                                                   
ora.ctssd
      1        ONLINE  OFFLINE                                                   
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  OFFLINE                                                   
ora.gipcd
      1        ONLINE  OFFLINE                                                   
ora.gpnpd
      1        ONLINE  OFFLINE                                                   
ora.mdnsd
      1        ONLINE  OFFLINE  
View Code
# 查看完整集群资源状态(含VIP/监听器等)
crsctl stat res -t    
[root@wx-ncdb-90-238 wx-ncdb-90-238]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.DATA.dg
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.LISTENER.lsnr
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.OCRVOTE.dg
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.SYSTEM.dg
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.asm
               ONLINE  ONLINE       wx-ncdb-90-238           Started             
               ONLINE  ONLINE       wx-ncdb-90-239           Started             
ora.gsd
               OFFLINE OFFLINE      wx-ncdb-90-238                               
               OFFLINE OFFLINE      wx-ncdb-90-239                               
ora.net1.network
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
ora.ons
               ONLINE  ONLINE       wx-ncdb-90-238                               
               ONLINE  ONLINE       wx-ncdb-90-239                               
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       wx-ncdb-90-239                               
ora.cvu
      1        ONLINE  ONLINE       wx-ncdb-90-239                               
ora.oc4j
      1        ONLINE  ONLINE       wx-ncdb-90-239                               
ora.orcl.db
      1        ONLINE  ONLINE       wx-ncdb-90-238           Open                
      2        ONLINE  ONLINE       wx-ncdb-90-239           Open                
ora.scan1.vip
      1        ONLINE  ONLINE       wx-ncdb-90-239                               
ora.wx-ncdb-90-238.vip
      1        ONLINE  ONLINE       wx-ncdb-90-238                               
ora.wx-ncdb-90-239.vip
      1        ONLINE  ONLINE       wx-ncdb-90-239                      
View Code
# 验证OCR(集群注册表)完整性(无损坏提示)
ocrcheck
[root@wx-ncdb-90-238 wx-ncdb-90-238]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2988
         Available space (kbytes) :     259132
         ID                       : 1283645573
         Device/File Name         :   +OCRVOTE
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

[root@wx-ncdb-90-238 wx-ncdb-90-238]# 
View Code
# 验证节点间集群通信(双节点均为ACTIVE)
su - grid
olsnodes -s -n
[root@wx-ncdb-90-238 wx-ncdb-90-238]# su - grid
[grid@wx-ncdb-90-238 ~]$ olsnodes -s -n
wx-ncdb-90-238  1       Active
wx-ncdb-90-239  2       Active
[grid@wx-ncdb-90-238 ~]
正常标准:ora.cssd/ora.crsd/ora.gpnpd/ora.gipcd/ora.evmd均为ONLINE,双节点olsnodes状态为Active

2.4.阶段 4:启动并验证 ASM(Grid 用户操作,双节点均需执行)

ASM 是 RAC 存储核心,由 grid 用户管理,需确保实例启动 + 磁盘组挂载:
 
  1. 启动 ASM 实例并挂载磁盘组:
    su - grid
    sqlplus / as sysasm
    SQL> startup;  # 启动ASM实例
    SQL> alter diskgroup all mount;  # 挂载所有ASM磁盘组(含OCR/数据磁盘组)
    SQL> select name, state from v$asm_diskgroup;  # 验证磁盘组状态(STATE=MOUNTED)
    SQL> exit;
  2. CRS 层面验证 ASM 资源:
    srvctl status asm -n <节点名>  # 如srvctl status asm -n wx-ncdb-90-239
    # 正常输出:"ASM is running on wx-ncdb-90-239"

        3.  asmcmd 验证

[grid@wx-ncdb-90-239 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    204800   182511                0          182511              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576   1638400  1133900                0         1133900              0             N  DATA/
MOUNTED  NORMAL  N         512   4096  1048576     61440    60514            20480           20017              0             Y  OCRVOTE/
MOUNTED  EXTERN  N         512   4096  1048576    204800      720                0             720              0             N  SYSTEM/

2.5.阶段 5:启动并验证 RAC 数据库实例(Oracle 用户操作)

推荐用srvctl工具启动,确保双节点实例协同同步:
  1. 启动数据库(双节点同时启动):
    su - oracle
    # 查看数据库配置(获取数据库名,如orcl)
    srvctl config database
    # 启动整个RAC数据库
    srvctl start database -d <数据库名>
  2. 验证数据库状态:
    # 集群层面验证(双节点实例均需RUNNING)
    srvctl status database -d <数据库名>
    # 单节点实例验证
    sqlplus / as sysdba
    SQL> select instance_name, status from v$instance;  # STATUS=OPEN
    SQL> select inst_id, instance_name, status from gv$instance;  # 双节点实例均为OPEN
  3. (可选)单独启动单个节点实例:
    srvctl start instance -d <数据库名> -i <实例名>  # 如srvctl start instance -d orcl -i orcl1

        4. 验证

检查层面检查命令与关键点健康标准
应用层 sqlplus / as sysdba -> select * from gv$instance; 双实例 OPEN
集群资源层 crsctl stat res -t 无 OFFLINE 或 UNKNOWN 的关键资源(如 .db.vip.lsnr
集群服务层 crsctl check cluster -all 所有节点显示 Cluster is ready
网络层 olsnodes -s -n 所有节点 Active,序号连续
存储层 asmcmd lsdg 所有磁盘组 State=MOUNTED

3. 关键状态验证命令汇总(按组件分类)

 
组件类型验证命令正常标准
CRS 核心服务 crsctl check crs 4 个组件均为 ONLINE
集群资源 crsctl stat res -t / crsctl stat res -t -init 核心资源无 OFFLINE/INTERMEDIATE
OCR 完整性 ocrcheck 无 "corrupt" 或 "unavailable" 提示
节点通信 olsnodes -s -n 双节点均为 ACTIVE
ASM sqlplus / as sysasm → select name, state from v$asm_diskgroup; 磁盘组 STATE=MOUNTED
数据库实例 srvctl status database -d <数据库名> / gv$instance 双节点实例均为 RUNNING/OPEN
 

4. 常见问题与解决方案(结合博客 + 实操经验)

 
问题现象根因解决方案
CRS-4124: OHAS 启动失败 OHAS 进程僵死 / 配置异常 执行nohup /etc/rc.d/init.d/init.ohasd run &强制启动 OHAS,配置开机自启
ASM 启动报 ORA-29701 CSSD 服务未启动 先通过crsctl start crs启动 CSSD,再启动 ASM
ASM 磁盘组无法挂载(STATE=DISMOUNTED) 存储链路异常 / 磁盘权限错误 检查/dev/asm*权限(属主 grid:oinstall),验证存储阵列连通性
数据库实例启动失败 ASM 磁盘组未挂载 / OCR 损坏 重新挂载 ASM 磁盘组,执行ocrcheck修复 OCR
节点间通信失败(olsnodes 状态异常) 私网配置错误 / 防火墙拦截 验证oifcfg getif私网配置,关闭防火墙 / 开放集群端口(如 2484/42424)
 

5. 核心原则与避坑指南

  1. 启动顺序不可乱:严格遵循 “OHAS→CRS→ASM→数据库”,双节点先主后从,禁止反向操作;
  2. 状态验证贯穿全程:每个阶段启动后必须验证状态,避免后续组件启动失败;
  3. 优先用集群工具:CRS 用crsctl,ASM / 数据库用srvctl,避免手动 sqlplus 启动导致集群资源不同步;
  4. BUG 预防优先:11g RAC 建议配置init.ohasd开机自启,避免重启后触发 CRS-4124 报错;
  5. 日志定位核心:启动失败时优先查看对应组件日志(OHAS 日志:$GRID_HOME/log/<节点名>/ohasd/ohasd.log;ASM 日志:$GRID_HOME/log/<节点名>/asm/alert_asm.log)。

6. 总结

2 节点 Oracle 11g RAC 启动的核心是 “遵循依赖顺序 + 双节点协同 + 状态闭环验证”。遇到 CRS-4124 等典型 BUG 时,可通过init.ohasd脚本强制启动 OHAS;全程需确保 CRS 核心服务、ASM 磁盘组、数据库实例的 “全 ONLINE”,最终通过crsctl stat res -t/gv$instance等命令确认集群整体健康,才能保障 RAC 高可用特性发挥。
 posted on 2026-01-18 10:34  xibuhaohao  阅读(17)  评论(0)    收藏  举报