代码改变世界

案例:Oracle 10g RAC 集群无法启动

2019-12-12 15:05  AlfredZhao  阅读(920)  评论(0编辑  收藏  举报

环境:RHEL 5.7 + Oracle 10.2.0.5 RAC

很多年前的一套测试环境,今天发现集群无法启动。手工尝试启动crs,集群日志也无任何输出。进一步检查集群配置:

[oracle@rac1-server rac1-server]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :      96144
         Used space (kbytes)      :       3852
         Available space (kbytes) :      92292
         ID                       : 1953645605
         Device/File Name         : /dev/raw/raw14
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw15
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

[oracle@rac1-server rac1-server]$ crsctl query css votedisk
 0.     0    jy2

located 1 votedisk(s).

确认Votedisk 存在问题,这个jy2不知道是怎么来的,反正是没有有效的votedisk,根据实际环境,我这里尝试加入合法的votedisk后恢复正常:

[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11
Cluster is not in a ready state for online disk addition
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -f
unrecognized parameter -f.
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw11 -force
Now formatting voting disk: /dev/raw/raw11
successful addition of votedisk /dev/raw/raw11.
[root@rac1-server ~]#
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw12 -force
Now formatting voting disk: /dev/raw/raw12
successful addition of votedisk /dev/raw/raw12.
[root@rac1-server ~]# 
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force
Now formatting voting disk: /dev/raw/raw13
Write failed: Broken pipe

因为我测试环境是ssh跳转的,会话断开,再次登陆查询:

[oracle@rac1-server ~]$ crsctl query css votedisk
 0.     0    /dev/raw/raw13
 1.     0    /dev/raw/raw11
 2.     0    /dev/raw/raw12
 3.     0    /dev/raw/raw13

发现有两个/dev/raw/raw13,尝试删除:

[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force
successful deletion of votedisk /dev/raw/raw13.
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk
 0.     0    /dev/raw/raw11
 1.     0    /dev/raw/raw12
 2.     0    /dev/raw/raw13

located 3 votedisk(s).
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl delete css votedisk /dev/raw/raw13 -force
successful deletion of votedisk /dev/raw/raw13.
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk
 0.     0    /dev/raw/raw11
 1.     0    /dev/raw/raw12

located 2 votedisk(s).
[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl add css votedisk /dev/raw/raw13 -force
Now formatting voting disk: /dev/raw/raw13
Write failed: Broken pipe

[root@rac1-server ~]# /s01/oracle/product/10.2.0/crs_1/bin/crsctl query css votedisk
 0.     0    /dev/raw/raw13
 1.     0    /dev/raw/raw11
 2.     0    /dev/raw/raw12

不确认这里Write failed: Broken pipe会不会有潜在影响,实际我查询和使用都是正常的。
再次尝试启动crs可以成功。
从集群日志中可以看到正常使用了我们加进去的votedisk:

--节点1集群alert日志:
2019-12-12 13:27:37.806
[cssd(7734)]CRS-1603:CSSD on node rac1-server shutdown by user.
2019-12-12 13:28:15.035
[cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.
2019-12-12 13:28:15.048
[cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.
2019-12-12 13:28:15.058
[cssd(13146)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/cssd/ocssd.log.
2019-12-12 13:28:22.162
[cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server .
2019-12-12 13:28:22.610
[evmd(12526)]CRS-1401:EVMD started on node rac1-server.
2019-12-12 13:28:22.678
[crsd(12662)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870592 to 169870592. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.
2019-12-12 13:28:22.679
[crsd(12662)]CRS-1012:The OCR service started on node rac1-server.
2019-12-12 13:28:23.757
[crsd(12662)]CRS-1201:CRSD started on node rac1-server.
2019-12-12 13:28:24.172
[crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.rac2-server.ASM2.asm. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.
2019-12-12 13:28:24.199
[crsd(12662)]CRS-1205:Auto-start failed for the CRS resource ora.jy.jy2.inst. Details in /s01/oracle/product/10.2.0/crs_1/log/rac1-server/crsd/crsd.log.
2019-12-12 13:28:36.180
[cssd(13146)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server .

--节点2集群alert日志:
2019-12-12 13:30:23.828
[cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw13. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.
2019-12-12 13:30:23.845
[cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw11. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.
2019-12-12 13:30:23.870
[cssd(6736)]CRS-1605:CSSD voting file is online: /dev/raw/raw12. Details in /s01/oracle/product/10.2.0/crs_1/log/rac2-server/cssd/ocssd.log.
2019-12-12 13:30:24.768
[cssd(6736)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1-server rac2-server .
2019-12-12 13:30:25.463
[crsd(6199)]CRS-1012:The OCR service started on node rac2-server.
2019-12-12 13:30:25.478
[evmd(6116)]CRS-1401:EVMD started on node rac2-server.
2019-12-12 13:30:27.101
[crsd(6199)]CRS-1201:CRSD started on node rac2-server.

最后检查下集群状态确认正常:

[oracle@rac1-server ~]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.jy.db      application    ONLINE    ONLINE    rac2-server 
ora....y1.inst application    ONLINE    ONLINE    rac1-server 
ora....y2.inst application    ONLINE    ONLINE    rac2-server 
ora....SM1.asm application    ONLINE    ONLINE    rac1-server 
ora....ER.lsnr application    ONLINE    ONLINE    rac1-server 
ora....ver.gsd application    ONLINE    ONLINE    rac1-server 
ora....ver.ons application    ONLINE    ONLINE    rac1-server 
ora....ver.vip application    ONLINE    ONLINE    rac1-server 
ora....SM2.asm application    ONLINE    ONLINE    rac2-server 
ora....ER.lsnr application    ONLINE    ONLINE    rac2-server 
ora....ver.gsd application    ONLINE    ONLINE    rac2-server 
ora....ver.ons application    ONLINE    ONLINE    rac2-server 
ora....ver.vip application    ONLINE    ONLINE    rac2-server 
[oracle@rac1-server ~]$