OLR文件丢失的恢复
11.2.0.1的RAC中,rac1和rac2
一、OLR有备份的情况
1.手动将rac1中的olr重命名,模拟丢失
mv rac1.olr rac1.olr.test
2.重新启动crs
./crsctl stop crs
正常关闭。
./crsctl start crs
启动报错:
2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [ OCRRAW][3084149472]proprinit: Could not open raw device
2016-01-12 11:08:58.249: [ OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26]
2016-01-12 11:08:58.249: [ CRSOCR][3084149472] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26
2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry
2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR
2016-01-12 11:08:58.250: [ default][3084149472] Done.
最后:
[root@rac1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
3.1有备份的情况
olr备份在:
[root@rac1 olr]# pwd
/olr
[root@rac1 olr]# ll
total 6412
-rw------- 1 root root 6553600 Jan 12 09:01 backup_20160112_090125.olr
root用户:
[root@rac1 bin]# ./crsctl stop crs -f
CRS-4133: Oracle High Availability Services has been stopped.
[root@rac1 bin]# touch /u01/app/11.2.0/grid/cdata/rac1.olr
[root@rac1 bin]# chown root:oinstall /u01/app/11.2.0/grid/cdata/rac1.olr
[root@rac1 bin]# ./ocrconfig -local -restore /olr/backup_20160112_090125.olr
然后发现:
[root@rac1 olr]# cd /u01/app/11.2.0/grid/cdata/
[root@rac1 cdata]# ll
total 8928
drwxr-xr-x 2 grid oinstall 4096 Jan 11 08:44 localhost
drwxr-xr-x 2 grid oinstall 4096 Jan 12 08:52 rac1
-rw-r--r-- 1 root oinstall 272756736 Jan 12 11:33 rac1.olr
-rw------- 1 root oinstall 272756736 Jan 12 11:02 rac1.olr.test
drwxrwxr-x 2 grid oinstall 4096 Jan 12 05:29 rac-cluster
olr文件回去了!。
启动一下试试:
[root@rac1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
检查数据库状态:
SQL> select open_mode from gv$database;
OPEN_MODE
--------------------
READ WRITE
READ WRITE
二、OLR无备份的情况
重头戏来了,没有备份的情况下
1.手动将rac1中的olr重命名,模拟丢失
mv rac1.olr rac1.olr.test
2.重新启动crs
./crsctl stop crs
正常关闭。
./crsctl start crs
启动报错:
2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:6m':failed in stat OCR file/disk /u01/app/11.2.0/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [ OCROSD][3084149472]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2016-01-12 11:08:58.249: [ OCRRAW][3084149472]proprinit: Could not open raw device
2016-01-12 11:08:58.249: [ OCRAPI][3084149472]a_init:16!: Backend init unsuccessful : [26]
2016-01-12 11:08:58.249: [ CRSOCR][3084149472] OCR context init failure. Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
2016-01-12 11:08:58.249: [ default][3084149472] OLR initalization failured, rc=26
2016-01-12 11:08:58.250: [ default][3084149472]Created alert : (:OHAS00106:) : Failed to initialize Oracle Local Registry
2016-01-12 11:08:58.250: [ default][3084149472][PANIC] OHASD exiting; Could not init OLR
2016-01-12 11:08:58.250: [ default][3084149472] Done.
最后:
[root@rac1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
3.那么在没有备份的前提下。就只能又一次配置然后重跑root.sh以重建olr
[root@rac1 bin]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2016-01-12 14:12:34: Parsing the host name
2016-01-12 14:12:34: Checking for super user privileges
2016-01-12 14:12:34: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.eons is registered
Cannot communicate with crsd
ACFS-9200: Supported
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
运行root.sh
[root@rac1 bin]# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2016-01-12 14:14:26: Parsing the host name
2016-01-12 14:14:26: Checking for super user privileges
2016-01-12 14:14:26: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac2, number 2, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
Timed out waiting for the CRS stack to start.
OLR有了!
[root@rac1 cdata]# ll
total 8716
drwxr-xr-x 2 grid oinstall 4096 Jan 11 08:44 localhost
drwxr-xr-x 2 grid oinstall 4096 Jan 12 08:52 rac1
-rw------- 1 root oinstall 272756736 Jan 12 11:59 rac1.olr
-rwxr-xr-x 1 grid oinstall 272756736 Jan 12 11:43 rac1.olr.bak
drwxrwxr-x 2 grid oinstall 4096 Jan 12 05:29 rac-cluster
[root@rac1 cdata]#
等等!运行root.sh的时候有报错!
。
Timed out waiting for the CRS stack to start.
查看crsd.log
2016-01-11 09:35:41.780: [ CRSD][4165444320] ENV Logging level for Module: UiServer 0
2016-01-11 09:35:41.780: [ CRSMAIN][4165444320] Checking the OCR device
2016-01-11 09:35:41.781: [ CRSMAIN][4165444320] Connecting to the CSS Daemon
2016-01-11 09:35:41.783: [ CSSCLNT][1099733312]clssnsquerymode: not connected to CSSD
2016-01-11 09:35:41.987: [ CRSMAIN][4165444320] Initializing OCR
节点1的ocrcheck
[grid@rac1 crsd]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2760
Available space (kbytes) : 259360
ID : 729466762
Device/File Name : +crs
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
节点2的ocrcheck
[grid@rac2 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2728
Available space (kbytes) : 259392
ID : 729466762
Device/File Name : +OCRNEW
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
是由于我改动过ocr以及vd的存储磁盘。
首先查看磁盘状态:
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
0 CRS MOUNTED EXTERN 5120 4756 4756
0 OCRNEW MOUNTED NORMAL 15360 14436 7063
0 TMP DISMOUNTED 0 0 0
0 FRA DISMOUNTED 0 0 0
0 DATA DISMOUNTED 0 0 0
mount他们!
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
0 CRS MOUNTED EXTERN 5120 4756 4756
0 OCRNEW MOUNTED NORMAL 15360 14436 7063
5 TMP MOUNTED EXTERN 5120 4757 4757
5 FRA MOUNTED EXTERN 8192 6604 6604
5 DATA MOUNTED EXTERN 8192 5513 5513
由于时间关系。关掉了电脑,再次开机的时候,rac1节点启动了。可是rac2节点启动失败。
GROUP_NUMBER NAME STATE TYPE TOTAL_MB FREE_MB USABLE_FILE_MB
------------ ------------------------------ ----------- ------ ---------- ---------- --------------
0 CRS MOUNTED EXTERN 5120 4756 4756
0 DATA MOUNTED EXTERN 8192 5513 5513
5 FRA MOUNTED EXTERN 8192 6581 6581
5 OCRNEW MOUNTED NORMAL 15360 14436 7063
5 TMP MOUNTED EXTERN 5120 4757 4757
两个节点的磁盘组都是mount的。
[grid@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2728
Available space (kbytes) : 259392
ID : 729466762
Device/File Name : +crs
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
[grid@rac2 crsd]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2728
Available space (kbytes) : 259392
ID : 729466762
Device/File Name : +OCRNEW
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
两节点的OCR磁盘不同。
将1节点的OCR磁盘改动成与2节点同样:
[root@rac1 bin]# ./ocrconfig -add +OCRNEW
[root@rac1 bin]# ./ocrconfig -delete +crs
[root@rac1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2728
Available space (kbytes) : 259392
ID : 729466762
Device/File Name : +OCRNEW
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
[root@rac1 bin]# ./crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 610033cee0c34ff3bf2269f62bbf7340 (/dev/raw/raw2) [OCRNEW]
2. ONLINE afa5da0d2a8f4f75bf05f1b72d979c4c (/dev/raw/raw3) [OCRNEW]
3. ONLINE 02d613656c1c4f99bf59a36d62b24c8b (/dev/raw/raw4) [OCRNEW]
Located 3 voting disk(s).
1节点vd还是原来的配置。
启动2节点:
首先关闭:./crsctl stop crs -f
启动: ./crsctl start crs
立刻查看集群状态,还是没起来。赶紧找日志。找了半天没发现什么。再次查看:
[grid@rac2 rac2]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.FRA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.OCRNEW.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.TMP.dg
ONLINE OFFLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.eons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.registry.acfs
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.oc4j
1 OFFLINE OFFLINE
ora.orcl.db
1 ONLINE ONLINE rac1 Open
2 ONLINE ONLINE rac2 Open
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
原来,须要时间的。不不过人类。还有RAC!!
结论:1.看到有备份和没备份的恢复步骤,赤裸裸证明备份是多么的重要!!
!
2.一旦有磁盘组的变更。建议马上对OLR进行备份。
3.有时候我们须要的不不过技术,还有耐心。

浙公网安备 33010602011771号