异地clone RAC数据库 +ASM USE RMAN

 

 ###sample 

如何在本地生成数据库的备份,并复制到DG库新环境(高级)

1. 首先确定本地文件系统(存放备份集)足够大,可以使用如下语句查询当前数据库实际的使用总大小

 

 

Rman 备份进度:

select sum("已使用空间(M)") from

(

SELECT Upper(F.TABLESPACE_NAME)         "表空间名",

       D.TOT_GROOTTE_MB                 "表空间大小(M)",

       D.TOT_GROOTTE_MB - F.TOTAL_BYTES "已使用空间(M)",

       To_char(Round(( D.TOT_GROOTTE_MB - F.TOTAL_BYTES ) / D.TOT_GROOTTE_MB * 100, 2), '990.99')

       || '%'                           "使用比",

       F.TOTAL_BYTES                    "空闲空间(M)",

       F.MAX_BYTES                      "最大块(M)"

FROM   (SELECT TABLESPACE_NAME,

               Round(Sum(BYTES) / ( 1024 * 1024 ), 2) TOTAL_BYTES,

               Round(Max(BYTES) / ( 1024 * 1024 ), 2) MAX_BYTES

        FROM   SYS.DBA_FREE_SPACE

        GROUP  BY TABLESPACE_NAME) F,

       (SELECT DD.TABLESPACE_NAME,

               Round(Sum(DD.BYTES) / ( 1024 * 1024 ), 2) TOT_GROOTTE_MB

        FROM   SYS.DBA_DATA_FILES DD

        GROUP  BY DD.TABLESPACE_NAME) D

WHERE  D.TABLESPACE_NAME = F.TABLESPACE_NAME

ORDER  BY 1)

 

 

 

2. 开始在本地备份,备份目录需要自定义如/oracle/10g/backup

 

Rman target /

CONFIGURE DEVICE TYPE DISK PARALLELISM 4 BACKUP TYPE TO BACKUPSET;

 

 

2.1 将备份存放为脚本 backup.sh

 

man trace reco1.log <<eof

connect target /

CONFIGURE DEVICE TYPE DISK PARALLELISM 4 BACKUP TYPE TO BACKUPSET;

run{

allocate channel c1 type disk format '/oracle/10g/backup/%U';    

BACKUP INuserEMENTAL LEVEL 0 DATABASE;

     BACKUP ARCHIVELOG ALL;

backup current controlfile for standby;

}

exit

eof

 

 

2.2 oracle用户后台执行

Nohup sh backup.sh &

 

2.3.检查备份的进度:

SELECT SID,
       SERIAL#,
       CONTEXT,
       SOFAR,
       TOTALWORK,
       ROUND(SOFAR / TOTALWORK * 100, 2) "%_COMPLETE"
  FROM gV$SESSION_LONGOPS
WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND
       TOTALWORK != 0 AND SOFAR <> TOTALWORK;

 

 

3. 在本地创建pfile 文件(/tmp/dba/pfile.ora),使用生产的参数文件,检查以下参数是否要随着DG库实际环境调整

audit_file_dest, background_dump_dest, control_files, dd_dump_dest, log_archive_dest_1, user_dump_dest, local_listener, log_archive_config, db_unique_name, control_files, fal_client,fal_client ,log_archive_config etc

 

 

sqlplus / as sysdba

SQL> usereate pfile='/tmp/dba/pfile.ora' from spfile;

File usereated.

 

4

4.1 移动备份片和参数文件到新的主机,最好能新主机(DG) 和旧主机使用相同的目录。

(还有一种简单的方法,使用存储映射的方法,在新主机重新挂在相同的NAS device)

scp pfile.ora  root@58.2.104.11:/tmp/dba

 

4.2   移动密码文件到新主机 DG

Cd $ORACLE_HOME/dbs

scp  orapwdd*  root@58.2.104.11:/tmp/dba

 

登陆新主机DG,修改文件权限为oracle用户,并且移动到对应目录

Cd /tmp/dba

chown opdd:oinstall *

 

su – opdd

cd /tmp/dba

cp orapwdd* $ORACLE_HOME/dbs

cp pfile.ora  $ORACLE_HOME/dbs/initdd1.ora

 

5编辑新主机DG库的tnsnames.ora (在$ORACLE_HOME/network/admin 目录下)

 5.1 参考旧主机的tnsname.ora 加入网络信息,方便新主机 DG传日志

 

5.2 同时旧主机的tnsnames 也有加入新主机DG信息

 

6. 新主机上使用上面创建的 pfile 启动新主机(DG) 上的数据库


sqlplus "/ as sysdba"


SQL> startup nomount;
ORACLE instance started.

Total System Global Area  167772160 bytes
Fixed Size                  1218292 bytes
Variable Size              62916876 bytes
Database Buffers           96468992 bytes
Redo Buffers                7168000 bytes

 

 

 

7. 恢复备库的控制文件

(RMAN>backup current controlfile for standby format '备份路径‘;) (if backup db is primary)

 

7.1

旧主机上备份( if backup db is standby)

RMAN>backup current controlfile format ‘/tmp/dba/standby.ctl’;

 

cd /tmp/dba

scp standby.ctl  root@58.2.104.11:/tmp/dba

 

7.2

新主机DG上恢复控制文件

 

备库重启到nomount状态,恢复控制文件,启动到mount

(for fix 10g with nas bug )

SQL>alter system set events '10298 trace name context forever, level 32';

 

RMAN>restore standby controlfile from '/tmp/dba/standby.ctl'

 

SQL>alter database mount standby database;

 

注意:如果恢复的备份控制文件  在NAS上,可以会出现如下报错,(备份的数据文件不会有这个报错)这个时候

感谢ilmarkerm 

https://ilmarkerm.eu/blog/2016/05/rman-06172-when-restoring-a-controlfile-and-dnfs-is-not-turned-on/

(Although the error message does not say it, but I remembered that I had mounted the NFS using SOFT mount option and when trying to restore datafiles from soft mounted NFS shared you will usually get ORA-27054: NFS file system not mounted with correct options, unless you have turned on Direct-NFS on the database kernel. So I just wondered, maybe this is the real error message in this case also.
After turning on Direct NFS, restoring the control file worked as expected:)


Starting restore at 17-OCT-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1200 device type=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 10/17/2020 15:31:53
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

RMAN> exit

 

这个时候,临时解决方案: 可以将nas 上的恢复文件移动 备机的物理位置比如/tmp

 

 

8. 新主机检查可以用来恢复的归档日志文件的最大序列号

 

export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';

 

RMAN>list backup of archivelog time between "to_date('2019-03-01 00:00:00','yyyy-mm-dd,hh24:mi:ss')"  and "to_date('2019-03-01 06:00:00','yyyy-mm-dd hh24:mi:ss')";

 

检查所有重做线程的最后一个存档序列, 并选择其中有 "下一个 scn(Next SCN)" 的最小存档序列。如38833,

 

选择完最小的存档序列,使用38833+1

(Device Type DISK)

Thrd Seq     Low SCN    Low Time            Next SCN   Next Time

  1    38833   114944541420 2019-03-01 00:38:59 114944807665 2019-03-01 00:41:40

  2    38198   114944543989 2019-03-01 00:39:01 114945682491 2019-03-01 00:55:11

 

9新主机准备restore.sh

###以下脚本是用来 将数据文件 指向新主机的新目录的拼凑

 

set linesize 999 linesize 999 head off feedback off

select 'set newname for datafile '||''||FILE#||''||' to '||chr(39)||replace(name,'+NEW_DATA/rdd/datafile','+NEW_DATA/rdd_new/datafile')||''||'.dbf'';' from v$datafile;

spool off

 

 

10在新环境下运行resotore.sh

10.1 resotore.sh (ASM newname的路径都要与db_uniq_name 一样,如果不一样,创建目录将会失败)

rman target / log='/tmp/rdd_new.log'  <<EOF

run{

sql 'alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS"';

set until sequence 38834 thread 1;

set newname for datafile 1 to '+NEW_DATA/rdd_new/datafile/system.422.983499857.dbf;

set newname for datafile 2 to '+NEW_DATA/rdd_new/datafile/undotbs1.454.983492257.dbf;

……

restore database;

switch datafile all;

switch tempfile all;

recover database;

release channel ch00;

}

EOF

 

10,2 运行

Nohup sh resotore.sh &

 

10.3 观察恢复的进度

 

 SELECT sid, spid, client_info  
     FROM v$process p, v$session s  
     WHERE p.addr = s.paddr  
     AND client_info LIKE '%rman%';
   

 

 

SELECT SID,
       SERIAL#,
       CONTEXT,
       SOFAR,
       TOTALWORK,
       ROUND(SOFAR / TOTALWORK * 100, 2) "%_COMPLETE"
  FROM gV$SESSION_LONGOPS
WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND
       TOTALWORK != 0 AND SOFAR <> TOTALWORK;

 

观察数据库alert.log

 

11恢复完成之后,rename online_log

 

11.1 生成rename logfile 并clear 脚本

 

set linesize 999 linesize 999 head off feedback off

select 'alter database rename file '||''''||member||''''||' to '||chr(39)||replace(member,'+NEW_FRA/rdd/onlinelog','+NEW_FRA/rdd_new/onlinelog')||''';' from v$logfile;

spool off

 

spool clear_log.sql

set linesize 999 linesize 999 head off feedback off

select 'alter database clear logfile group '|| GROUP#||';' from v$logfile;

spool off

 

11.2 After renaming the redolog files, the database can be opened

SQL>alter system set STANDBY_FILE_MANAGEMENT =MANUAL;

SQL> select member from v$logfile;

SQL>@rename_log.sql

 

 

11.3

SQL> alter database open;

alter database open

*

ERROR at line 1:

ORA-16004: backup database requires recovery

ORA-01152: file 1 was not restored from a sufficiently old backup

ORA-01110: data file 1: '+NEW_DATA/rdd_new/datafile/system.422.983499857.dbf'

 

 

12.旧主机追加归档日志,新主机应用归档日志  

12.1 旧主机追加归档日志

connect target /

run{

allocate channel c1 type disk format '/bakfs/rman/arhive_03_01%U';

allocate channel c2 type disk format '/bakfs/rman/arhive_03_01%U';

allocate channel c3 type disk format '/bakfs/rman/arhive_03_01%U';

allocate channel c4 type disk format '/bakfs/rman/arhive_03_01%U';

backup archivelog from time 'sysdate-2';

release channel c1;

release channel c2;

release channel c3;

release channel c4;

}

 

12,2 将备份文件拷贝到新主机,新主机应用归档日志

 

è 确定当前的最大归档:

SQL> select max(SEQUENCE#)-1,THREAD# from v$log group by THREAD#;

 

MAX(SEQUENCE#)-1    THREAD#

---------------- ----------

           38933          1

           38293          2

 

è 开始恢复到当前最大的归档

rman target /

Rman> catalog start with '/bakfs/rman';

 

###恢复实例1的归档

Rman>

run{

allocate channel c1 type disk;

allocate channel c2 type disk;

allocate channel c3 type disk;

allocate channel c4 type disk;

restore archivelog from logseq 38833 until logseq 38933 thread 1;

release channel c1;

release channel c2;

release channel c3;

release channel c4;

}

 

 

###恢复实例2的归档

Rman>

run{

allocate channel c1 type disk;

allocate channel c2 type disk;

allocate channel c3 type disk;

allocate channel c4 type disk;

restore archivelog from logseq 38198   until logseq 38293 thread 2;

release channel c1;

release channel c2;

release channel c3;

release channel c4;

}

 

è 开启恢复

SQL> alter database recover managed standby database  disconnect from session;

 

è 观察数据库日志alert*.log

 

è 停下恢复

alter database recover managed standby database cancel;

SQL〉@clear_log.sql

 

13. 调整参数 STANDBY_FILE_MANAGEMENT

 

SQL> alter database open

SQL〉alter system set STANDBY_FILE_MANAGEMENT =auto;

13.1 调整所有现有环境的数据库参数 log_archive_config

 

 

13.2. 检查ASM PFILE 和rdbms  spfile 设置  

-〉2个节点ASM 使用 pfile 配置

 

-〉RDBMS 使用spfile 配置 ,放在ASM盘上

Sqlplus / as sysdba

usereate spfile='+NEW_DATA/rdd_new/spfiledd.ora' from pfile;

配置init 文件一行:

spfile='+NEW_DATA/rdd_new/spfiledd.ora'

 

 

 

13.3 调整主库的参数log_archive_dest_*

alter system set log_archive_dest_5='SERVICE=new lgwr async noaffirm reopen=60 valid_for=(online_logfiles,primary_role) db_unique_name=new' sid='*';

 

14.调整tempfile

 

spool /tmp/dba/re-tempfile.sql

set linesize 999 linesize 999 head off feedback off

 

--FOR BYTES IS NOT 0M

select 'ALTER TABLESPACE '||t.name||' add tempfile '||''''||replace(f.name,'+NEW_DATA/dd','+NEW_DATA/rdd_new')||''''||' size '||f.bytes/1024/1024||'M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;' from v$tempfile f ,(select ts#,name from v$tablespace t where included_in_database_backup='NO') t

where f.ts#= t.ts#

 

 

--For Bytes is 0M

select 'ALTER TABLESPACE '||t.name||' add tempfile '||''''||replace(f.name,'+NEW_DATA/dd','+NEW_DATA/rdd_new')||''''||' size '||'10'||'M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;' from v$tempfile f ,(select ts#,name from v$tablespace t where included_in_database_backup='NO') t

where f.ts#= t.ts#

/

 

 

select 'alter database tempfile '||''''||name||''' drop;' from v$tempfile;

 

spool off

 

@/tmp/dba/re-tempfile.sql

 

 

è 开启恢复

SQL> alter database recover managed standby database  disconnect from session;

 

 

最后调整rman 策略 for standby

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

 

附录:手工挂载NAS device in AIX

umount  /bakfs

 

1.mount 格式 (for ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS) FOR AIX and  NAS device

 

mount -F nfs -o cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600 "nas.sw1.userb":/ifs/data/bak   /bakfs

 

2.数据库参数设置 (只是在当前实例级别生效,重启instance丢失):(for ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS)FOR AIX

 

alter system set events '10298 trace name context forever, level 32';

 

 

3.change to asynch to directio,  for (RMAN-00600 [8083], LFI-00005 Free some memory failed in lfibrdt(), LFI-00004 Call to lfibgl() failed)  AIX and  NAS device

 

SQL> show parameter filesystemio_options

 

NAME

------------------------------------

TYPE

----------------------------------------------------------------

VALUE

------------------------------

filesystemio_options

string

asynch

 

 

alter system set filesystemio_options=directio scope=spfile;

 

 

4. ORA-19870: error reading backup piece (for restore in ASM disk issue)

 

export ORACLE_SID=+ASM1

show parameter shared_pool_size

show parameter large_pool

show parameter db_cache_size

 

edit init+ASM1.ora

shared_pool_size = 5G

large_pool_size = 1G

db_cache_size = 1G

sga_max_size=8192M

processes=70

sessions=80

 

( Inuserease the PROCESSES parameter in the ASM parameter file

Processes = 25 + 15n, where n is the number of instances on the box using ASM for their storage.)

 

5.

ORA-19870: error reading backup piece /bakfs/rman/dd_LVL0_20190228_pgtr15p1_s88880_p1

ORA-19504: failed to usereate file "+NEW_DATA/dd/datafile/rb_data01.dbf.dbf"

ORA-17502: ksfdusere:4 Failed to usereate file +NEW_DATA/dd/datafile/rb_data01.dbf.dbf

ORA-15173: entry 'dd' does not exist in directory '/'

 

ASM 只能自动创建跟db_uniq_name相同的目录。

Change restore.sh 脚本

 

5.10g restore to ASM file 即使

即使使用了set newname ,但是重命名OMF 时候会出现如下报错,但是恢复仍然成功,因为OMF FILE 还是会自动创建

原因是10g BUG

Note that the datafile name is not an OMF name. If it were an OMF name a different error is raised:

        Errors in file /db/db01/home/oracle/admin/dm2gp/bdump/dm2gp1_dbw0_26350.trc:
        ORA-01157: cannot identify/lock data file 45 - see DBWR trace file
        ORA-01110: data file 45: '+DISKGRP1/dm2gp/datafile/undotbs02.281.562308511'
        ORA-17503: ksfdopn:2 Failed to open file +DISKGRP1/dm2gp/datafile/undotbs02.281.562308511
        ORA-15012: ASM file '+diskgrp1.281.562308511' does not exist

But the restore succeeds because the file is an OMF file.

 

6.问题:RMAN恢复restore database中断后 ,再启动RMAN,会产出什么样的后果?

 

--==========================================================

答案:

 

如果使用上次恢复使用的控制文件,应该会跳过已经恢复的数据文件。。。

但是第二次restore并不减少你所预期的时间~

 

比如

你全备份 1T 数据库 1000各数据文件。 单通道备份,1000个数据文件产生一个备份集~

 

你第一次restore ,restore到999个文件,就差一个文件,被你中断了。假如这次使用了999分钟那么你第二次restore的时候,会自动跳过前999个数据文件,直接restore最后一个数据文件,但是会需要很长时间,而不是你期望了1分钟就能结束,因为就算你只恢复一个数据文件,rman也许要读取整个备份集~

但是综合来看,效率仍然高于 将文件全部删除在重新恢复的时间。

 

 

7. renaming the redolog files 出现报错

ORA-17503: ksfdopn: ASM file '+NEW_FRA/rdd_new/onlinelog/group_6.329.983526345'

解决办法1:进入ASM,手工创建目录

Asmcmd>cd NEW_FRA

Asmcmd>mkdir rdd_new

Asmcmd>cd rdd_new

Asmcmd>mkdir onlinelog

 

解决办法2:如果办法1不行,可以尝试解决办法2

alter database clear unarchived logfile group 9;

 

alter database rename file '+NEW_FRA/rdd/onlinelog/redolog09_1.log' to '+NEW_FRA/rdd_new/onlinelog/redolog09_1.log'

 

alter database rename file '+NEW_FRA/rdd/onlinelog/redolog09_1.log' to '+NEW_FRA/rdd_new/onlinelog/redolog09_1.log'

 

 

alter database clear unarchived logfile group 14;

alter database rename file '+NEW_FRA/rdd/onlinelog/group_14.444.983454527' to '+NEW_FRA/rdd_new/onlinelog/group_14.444.983454527'

附录:手工挂载NAS device in AIX

umount  /bakfs

 

1.mount 格式 (for ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS) FOR AIX and  NAS device

 

mount -F nfs -o cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600 "nas.sw1.userb":/ifs/data/bak   /bakfs

 

2.数据库参数设置 (只是在当前实例级别生效,重启instance丢失):(for ORA-27054 ERRORS WHEN RUNNING RMAN WITH NFS)FOR AIX

 

RMAN-03009: failure of backup command on c6 channel at 10/17/2020 11:46:42
ORA-19504: failed to create file "/bakfs/dba_bak/ams/fms/full_bk_FMS_844_1_1054036001"
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

 

alter system set events '10298 trace name context forever, level 32';

 

 

3.change to asynch to directio,  for (RMAN-00600 [8083], LFI-00005 Free some memory failed in lfibrdt(), LFI-00004 Call to lfibgl() failed)  AIX and  NAS device

 

SQL> show parameter filesystemio_options

 

NAME

------------------------------------

TYPE

----------------------------------------------------------------

VALUE

------------------------------

filesystemio_options

string

asynch

 

 

alter system set filesystemio_options=directio scope=spfile;

 

 

4. ORA-19870: error reading backup piece (for restore in ASM disk issue)

 

export ORACLE_SID=+ASM1

show parameter shared_pool_size

show parameter large_pool

show parameter db_cache_size

 

edit init+ASM1.ora

shared_pool_size = 5G

large_pool_size = 1G

db_cache_size = 1G

sga_max_size=8192M

processes=70

sessions=80

 

( Inuserease the PROCESSES parameter in the ASM parameter file

Processes = 25 + 15n, where n is the number of instances on the box using ASM for their storage.)

 

5.

ORA-19870: error reading backup piece /bakfs/rman/db_LVL0_20190228_pgtr15p1_s88880_p1

ORA-19504: failed to usereate file "+NEW_DATA/db/datafile/rb_data01.dbf.dbf"

ORA-17502: ksfdusere:4 Failed to usereate file +NEW_DATA/db/datafile/rb_data01.dbf.dbf

ORA-15173: entry 'db' does not exist in directory '/'

 

ASM 只能自动创建跟db_uniq_name相同的目录。

Change restore.sh 脚本

 

 

 

 

 

 

 

 

 

附录:

https://blog.csdn.net/hw_libo/article/details/6878885

如何kill掉RMAN备份进程

本文的目的就是在紧急状态下,需要立即终止正在进行的RMAN备份进程。

(1)查看RMAN分配的各个通道的进程号
SQL> SELECT sid, spid, client_info 
     FROM v$process p, v$session s 
     WHERE p.addr = s.paddr 
     AND client_info LIKE '%rman%';
  
       SID SPID       CLIENT_INFO
---------- ------------------------ -------------------------
       525 26244      rman channel=t1
      1023 26245      rman channel=t2
       699 26246      rman channel=t3

 

(2)根据第(1)中得到的进程号,终止RMAN备份
注:这里既要kill 掉RMAN备份脚本的PID,也要kill 掉RMAN中分配的各个通道的PID
subsdb1:~ # ps -ef | grep 26244
oracle   26244 26224  7 17:12 ?        00:01:49 oraclesubsdb (DESuserIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root      9877  9603  0 17:34 pts/11   00:00:00 grep 26244
subsdb1:~ # kill -9 26244
subsdb1:~ # ps -ef | grep 26245
oracle   26245 26224  5 17:12 ?        00:01:13 oraclesubsdb (DESuserIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root      9968  9603  0 17:35 pts/11   00:00:00 grep 26245
subsdb1:~ # kill -9 26245
subsdb1:~ # ps -ef | grep 26246
oracle   26246 26224  4 17:12 ?        00:01:03 oraclesubsdb (DESuserIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     10009  9603  0 17:35 pts/11   00:00:00 grep 26246
subsdb1:~ # kill -9 26246
subsdb1:~ # ps -ef | grep rman
oracle   26224 25962  0 17:11 pts/3    00:00:03 rman target / nocatalog
root     10061  9603  0 17:35 pts/11   00:00:00 grep rman
subsdb1:~ # kill -9 26224
subsdb1:~ # ps -ef | grep rman
root     10102  9603  0 17:36 pts/11   00:00:00 grep rman
subsdb1:~ # ps -ef | grep 26246
root     10213  9603  0 17:36 pts/11   00:00:00 grep 26246

此时RMAN备份操作已经被终止。查看(1)中的SQL语句时,结果为空。

说明:如果单单kill掉RMAN的进程号,那么RMAN备份并没有停止,而是要连channel进程也一起掉才可以!

 
---------------------
作者:bosco1986
来源:CSDN
原文:https://blog.csdn.net/hw_libo/article/details/6878885
版权声明:本文为博主原创文章,转载请附上博文链接!

 

 

2. 10g RAC有一个BUG  ASMCMD  lsdg 显示的free_mb 不准确,只要在dg 磁盘组触发了rm -rf操作的话

 10g规避办法:

Login to ASM instance, Run the below command

SQL> ALTER DISKGROUP <DG_NAME> CHECK ALL  REPAIR;

Similar issue occurs using ASMCMD as well. If any of the files / directories deleted using "rm -rf" command in a diskgroup at asm command prompt will lead to this issue having diskgroup compatibility pre 11.2.

 v$asm_disk Shows FREE_MB as 0 After Datafiles/Tablespaces Dropped

 

 

v$asm_disk shows FREE_MB as 0MB for all the disks which are part of the affected diskgroup. 

When executed check all norepair on the diskgroup. it shows Warning messages as Disk directory mismatch on the disks.

===

Thu May 29 19:04:57 2014
SQL> /* ASMCMD */ALTER DISKGROUP P0302_DG_FLASH CHECK  NOREPAIR 
NOTE: starting check of diskgroup P0302_DG_FLASH
Thu May 29 18:04:58 2014
GMON checking disk 1 for group 2 at 13 for pid 35, osid 14440
NOTE: disk P0302_ASMDISK009, used AU total mismatch: DD={4294966447, 0} AT={216, 0} >>> 
SUCCESS: check of diskgroup P0302_DG_FLASH found no errors
SUCCESS: /* ASMCMD */ALTER DISKGROUP P0302_DG_FLASH CHECK  NOREPAIR
Thu May 29 19:05:35 2014
SQL> /* ASMCMD */ALTER DISKGROUP P0302_DG_FLASH CHECK  NOREPAIR 
NOTE: starting check of diskgroup P0302_DG_FLASH
Thu May 29 18:05:35 2014
GMON checking disk 1 for group 2 at 14 for pid 35, osid 14440
NOTE: disk P0302_ASMDISK009, used AU total mismatch: DD={4294966447, 0} AT={216, 0}
SUCCESS: check of diskgroup P0302_DG_FLASH found no errors
SUCCESS: /* ASMCMD */ALTER DISKGROUP P0302_DG_FLASH CHECK  NOREPAIR

CAUSE

 There is a mismatch between used space in the disk directory and the actual used space. This is a known issue in pre 11.2 asm-compatible diskgroups (where Used Space Directory does not exist). This is possible because we batch the update to used space in disk directory. So, if we perform allocations/de-allocations during rebalance, we will run into this. We usereated USD to fix this. Unfortunately, it cannot be backported.

The fix for that usereated USD. It cannot be backported. Users will have to advance the compatibility of the diskgroup to 11.2 to ensure that this will not happen again. 

Bug 8451024   - V$ASM_DISK REPORTS ZERO FOR FREE SPACE : SPACE MISMATCH FOR ALL DISKS 
Base Bug 5077325   - FREESPACE LIST TO TRACK THE FREE SPACE IN AN ASM DISK

 

NOTE: Similar issue occurs using ASMCMD as well. If any of the files / directories deleted using "rm -rf" command in a diskgroup at asm command prompt will lead to this issue having diskgroup compatibility pre 11.2.

SOLUTION

 Initially run the check all repair to fix the issue. 

Login to ASM instance, Run the below command

SQL> ALTER DISKGROUP <DG_NAME> CHECK ALL  REPAIR;

 

 

 

###########asm filesystem file 文件实际使用情况

check asm file usage:

ort ORACLE_SID=+ASM1
ARCHIVE_LOG/
------------------------- 
set linesize 240

col g_n format 999 
col f_n format 9999 
col bytes format 999,999,999,999 
col space format 999,999,999,999 
col a_i format 999 
col blocks format 999,999 
col block_size format 999,999 
col ftype format a16 
col dir format a3 
col s_usereated format a10 
col full_alias_path format a60 
col striped format a6 
col rdun format a6 
col au format 99999999 
col calculated format 999,999,999,999

select x.gnum g_n, 
x.filnum f_n, 
substr(x.full_alias_path,1, 60) full_alias_path, 
f.striped, 
f.redundancy rdun, 
f.bytes, 
f.space, 
case when calculated / x.au > 60 then calculated + 3 * x.au -- Over 60 direct extents consideration 
else calculated 
end calculated 
from ( SELECT gnum,filnum,au, concat('+'||gname, sys_connect_by_path(aname, '/')) full_alias_path 
from ( SELECT g.name gname, 
g.allocation_unit_size au, 
a.parent_index pindex, 
a.name aname, 
a.reference_index rindex, 
a.group_number gnum, 
a.file_number filnum 
FROM v$asm_alias a, v$asm_diskgroup g 
WHERE a.group_number = g.group_number) 
START WITH (mod(pindex, power(2, 24))) = 0 CONNECT BY PRIOR rindex = pindex) x, 
(select f.group_number gnum, 
f.file_number filnum, 
f.type ftype , 
f.bytes, 
f.space, 
f.blocks, 
f.block_size, 
f.striped, 
f.redundancy, 
case f.striped when 'FINE' 
then ceil(((f.blocks * f.block_size + 1) / g.allocation_unit_size) / 8) 
* decode(f.redundancy, 'HIGH', 3 , 'MIRROR', 2, 'UNPROT', 1 ) * 8 * g.allocation_unit_size 
else ceil((f.blocks * f.block_size + 1) / g.allocation_unit_size) 
* decode(f.redundancy, 'HIGH', 3 , 'MIRROR', 2, 'UNPROT', 1 ) * g.allocation_unit_size 
end calculated 
from v$asm_file f , v$asm_diskgroup g 
where f.group_number = g.group_number
and g.name='ARCHIVE_LOG' 
order by f.group_number,file_number) f 
where x.filnum != 4294967295 
and x.gnum=f.gnum and x.filnum=f.filnum 
order by full_alias_path 

 

3.RMAN backup AND RESOTER I/O 检查 和进度检查脚本

 

alter session set nls_date_format='dd-mon-rr hh24:mi:ss';
set lines 1500
set pages 100
col CLI_INFO format a10
col spid format a5
col ch format a20
col seconds format 999999.99
col filename format a65
col bfc format 9
col "% Complete" format 999.99
col event format a40
set numwidth 10

prompt  "disk i/o checking"

prompt   '02-28-19 14:00:00(sample)‘’

select s.inst_id, a.sid, CLIENT_INFO Ch, a.STATUS,
open_time, round(BYTES/1024/1024,2) "SOFAR Mb" , round(total_bytes/1024/1024,2)
TotMb, io_count,
round(BYTES/TOTAL_BYTES*100,2) "% Complete" , a.type, filename
from gv$backup_async_io a, gv$session s
where not a.STATUS in ('UNKNOWN')
and a.sid=s.sid and open_time > to_date('&1', 'dd-mm-rr hh24:mi:ss') order by 2,7;

 

prompt  "rman progress checking"

SELECT SID,
       SERIAL#,
       CONTEXT,
       SOFAR,
       TOTALWORK,
       ROUND(SOFAR / TOTALWORK * 100, 2) "%_COMPLETE"
  FROM gV$SESSION_LONGOPS
WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND
       TOTALWORK != 0 AND SOFAR <> TOTALWORK;

 

 

4.

问题:RMAN恢复restore database中断后 ,再启动RMAN,会产出什么样的后果?

--==========================================================
答案:

如果使用上次恢复使用的控制文件,应该会跳过已经恢复的数据文件。。。
但是第二次restore并不减少你所预期的时间~

比如
你全备份 1T 数据库 1000各数据文件。 单通道备份,1000个数据文件产生一个备份集~

你第一次restore ,restore到999个文件,就差一个文件,被你中断了。假如这次使用了999分钟
那么你第二次restore的时候,会自动跳过前999个数据文件,直接restore最后一个数据文件,但是会需要很长时间,而不是你期望了1分钟就能结束,因为就算你只恢复一个数据文件,rman也许要读取整个备份集~

 


datafile 8 is already restored to file +NEW_DATA/rdb_new/datafile/fm_data.272.983480425.dbf

 

处理方法:重新恢复控制文件,删除干净文件,重新发起恢复
---------------------
作者:没大没小
来源:CSDN
原文:https://blog.csdn.net/jiangzeqiang/article/details/8724725
版权声明:本文为博主原创文章,转载请附上博文链接!

 

 

6.

磁盘组删磁盘时报 ORA-15032 ORA-15054

SQL> alter diskgroup backup drop disk '/dev/mapper/backup6';    --删磁盘
alter diskgroup backup drop disk '/dev/mapper/backup6'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15054: disk "/DEV/MAPPER/BACKUP6" does not exist in diskgroup "BACKUP"

SQL> select name,path,state from v$asm_disk;


NAME                           PATH                                     STATE
------------------------------ ---------------------------------------- --------
DATA_0005                      /dev/mapper/data6p1                      NORMAL
DATA_0000                      /dev/mapper/data1p1                      NORMAL
BACKUP_0000                    /dev/mapper/backupp1                     NORMAL
BACKUP_0002                    /dev/mapper/backup1                      NORMAL
DATA_0001                      /dev/mapper/data2p1                      NORMAL
BACKUP_0003                    /dev/mapper/backup2                      NORMAL
DATA_0006                      /dev/mapper/data5p1                      NORMAL
BACKUP_0006                    /dev/mapper/backup5                      NORMAL
DATA_0002                      /dev/mapper/data3p1                      NORMAL
BACKUP_0005                    /dev/mapper/backup4                      NORMAL
BACKUP_0007                    /dev/mapper/backup6                      NORMAL
userS_0002                       /dev/mapper/users4p1                       NORMAL
BACKUP_0004                    /dev/mapper/backup3                      NORMAL
userS_0004                       /dev/mapper/users2p1                       NORMAL
userS_0000                       /dev/mapper/users6p1                       NORMAL
userS_0005                       /dev/mapper/users1p1                       NORMAL
userS_0003                       /dev/mapper/users3p1                       NORMAL
userS_0001                       /dev/mapper/users5p1                       NORMAL
 
SQL> select group_number,name,state,total_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB
------------ ------------------------------ ----------- ----------
           1 BACKUP                         MOUNTED        1433597
           2 userS                            MOUNTED           6096
           3 DATA                           MOUNTED        1021961


SQL> alter diskgroup backup drop disk backup_0007;
Diskgroup altered.
SQL> alter diskgroup backup drop disk backup_0006;
Diskgroup altered.

总结: 删除磁盘时只能跟磁盘的名字,不能跟path。
SQL> select group_number,name,state,total_mb from v$asm_diskgroup;
GROUP_NUMBER NAME                           STATE         TOTAL_MB
------------ ------------------------------ ----------- ----------
           1 BACKUP                         MOUNTED        1023997
           2 userS                            MOUNTED           6096
           3 DATA                           MOUNTED        1021961
删除两个磁盘后磁盘大小减少了400G。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22969361/viewspace-1084565/,如需转载,请注明出处,否则将追究法律责任。

 

 

###########asm filesystem file

check asm file usage:

ort ORACLE_SID=+ASM1
ARCHIVE_LOG/
------------------------- 
set linesize 240

col g_n format 999 
col f_n format 9999 
col bytes format 999,999,999,999 
col space format 999,999,999,999 
col a_i format 999 
col blocks format 999,999 
col block_size format 999,999 
col ftype format a16 
col dir format a3 
col s_usereated format a10 
col full_alias_path format a60 
col striped format a6 
col rdun format a6 
col au format 99999999 
col calculated format 999,999,999,999

select x.gnum g_n, 
x.filnum f_n, 
substr(x.full_alias_path,1, 60) full_alias_path, 
f.striped, 
f.redundancy rdun, 
f.bytes, 
f.space, 
case when calculated / x.au > 60 then calculated + 3 * x.au -- Over 60 direct extents consideration 
else calculated 
end calculated 
from ( SELECT gnum,filnum,au, concat('+'||gname, sys_connect_by_path(aname, '/')) full_alias_path 
from ( SELECT g.name gname, 
g.allocation_unit_size au, 
a.parent_index pindex, 
a.name aname, 
a.reference_index rindex, 
a.group_number gnum, 
a.file_number filnum 
FROM v$asm_alias a, v$asm_diskgroup g 
WHERE a.group_number = g.group_number) 
START WITH (mod(pindex, power(2, 24))) = 0 CONNECT BY PRIOR rindex = pindex) x, 
(select f.group_number gnum, 
f.file_number filnum, 
f.type ftype , 
f.bytes, 
f.space, 
f.blocks, 
f.block_size, 
f.striped, 
f.redundancy, 
case f.striped when 'FINE' 
then ceil(((f.blocks * f.block_size + 1) / g.allocation_unit_size) / 8) 
* decode(f.redundancy, 'HIGH', 3 , 'MIRROR', 2, 'UNPROT', 1 ) * 8 * g.allocation_unit_size 
else ceil((f.blocks * f.block_size + 1) / g.allocation_unit_size) 
* decode(f.redundancy, 'HIGH', 3 , 'MIRROR', 2, 'UNPROT', 1 ) * g.allocation_unit_size 
end calculated 
from v$asm_file f , v$asm_diskgroup g 
where f.group_number = g.group_number
and g.name='ARCHIVE_LOG' 
order by f.group_number,file_number) f 
where x.filnum != 4294967295 
and x.gnum=f.gnum and x.filnum=f.filnum 
order by full_alias_path 
/

 

 

#########sample 2 RAC 搭建DG

 

 环境如下:5.6T ,每天产生1T 归档日志, 现在搭建一个DG 环境,(RAC 对RAC )

 经过测试,每个通道

--备份一个文件5M/S , 使用压缩备份,压缩率 1/2
17:06:58 beign:
Sep 10 17:54 end

--通道数量决定可以达到多少速率,如果10个通道可以达到50M/s, 20个通道可以达到100M /s.

--通道只要不影响CPU 的使用率,问题不大,当前系统CPU 达到64个,可以支持20个通道


run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
configure device type disk backup type to compressed backupset parallelism 6;
backup as compressed backupset full database format '/bakfs/ams/aas/full_bk_%d_%s_%p_%t';
backup current controlfile format='/bakfs/ams/aas/aas_control.ctl';
backup current controlfile for standby format '/backup/suseript/bakdb/standby.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
}


--test speed:
----------
+db_DATA/ndb/datafile/tbs_oml_e.364.1026403801
91
size:61,440.00M

92,93,94,95,96
--一个文件


run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
configure device type disk backup type to compressed backupset parallelism 6;
backup as compressed backupset DATAFILE 91 format '/bakfs/ams/db/t1_bk_%d_%s_%p_%t';
backup current controlfile format='/bakfs/ams/db/control.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
}

--备份一个文件5M/S ,压缩率 1/2
17:06:58 beign:
Sep 10 17:54 end:


27G


--跑批时间: 晚上9点到第二天8点

--10个文件,并发测试
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
configure device type disk backup type to compressed backupset parallelism 10;
backup as compressed backupset DATAFILE 91,92,93,94,95,96,97,98,99,100 format '/bakfs/ams/db/t1_bk_%d_%s_%p_%t';
backup current controlfile format='/bakfs/ams/db/control.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
}

 


############### suseript 10个通道


rman target / <<eof1
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
configure device type disk backup type to compressed backupset parallelism 10;
backup as compressed backupset DATAFILE 91,92,93,94,95,96,97,98,99,100 format '/bakfs/ams/db/t1_bk_%d_%s_%p_%t';
backup current controlfile format='/bakfs/ams/db/control.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
}
eof1

#####

 

############### suseript 20个通道 备份20个文件


rman target / <<eof1
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
allocate channel c11 type disk;
allocate channel c12 type disk;
allocate channel c13 type disk;
allocate channel c14 type disk;
allocate channel c15 type disk;
allocate channel c16 type disk;
allocate channel c17 type disk;
allocate channel c18 type disk;
allocate channel c19 type disk;
allocate channel c20 type disk;
configure device type disk backup type to compressed backupset parallelism 20;
backup as compressed backupset DATAFILE 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 format '/bakfs/ams/db/t1_bk_%d_%s_%p_%t';
backup current controlfile format='/bakfs/ams/db/control.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
release channel c11;
release channel c12;
release channel c13;
release channel c14;
release channel c15;
release channel c16;
release channel c17;
release channel c18;
release channel c19;
release channel c20;
}
eof1


##第一件事:
#####执行步骤,周六早上8点准时做,
##10.198.227.183 密码是标准密码

su - opdb
cd /backup/suseript/bakdb


nohup sh rman.sh &


观察:
1.过10分钟,然后观察/bakfs/ams/db 是否有文件产生,正常是有文件产生的
2.过10分钟,观察top 负载,正常负载会从个位数飙升到20% 左右,


等到晚上17点30, (实际完成时间begin 8:00 到 end 13:30)
tail -f /backup/suseript/bakdb/nohup.out

正常会有提示:Recovery Manager complete.


##第2件事:

机房110.198.227.183 已经挂载目录 ,放DB 备份文件
10.199.128.151:/ifs/data/dba_bak 8589934592 554931392 8035003200 7% /bakfs
[root@pdbdb05 ~]#

麻烦在机房2机器58.2.103.76 (堡垒机登陆)上挂在目录,方便该主机读取 DB 备份文件

####定时备份归档日志文件:


10 17,5 * * * su - opdb -c /backup/suseript/bakdb/rman_arc.sh >> /backup/suseript/bakdb/rman_arc.sh.out


##归档日志,再次发起时候,会忽略已经备份的归档日志
###300G 的归档日志,发起会压缩备份大概产生 200G 的文件
### 20个通道备份,备份时间大概30分钟

cd $HOME
OSNAME=`uname`
case $OSNAME in
SunOS) OSNAME=1 ;;
HP-UX) OSNAME=2 ;;
AIX) OSNAME=3 ;;
Linux) OSNAME=4 ;;
esac
if [ $OSNAME -eq 4 ]
then
. ./.bash_profile
else
. ./.profile
fi


$ORACLE_HOME/bin/rman target / <<eof2
RUN {
# backup all archive logs
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
allocate channel c11 type disk;
allocate channel c12 type disk;
allocate channel c13 type disk;
allocate channel c14 type disk;
allocate channel c15 type disk;
allocate channel c16 type disk;
allocate channel c17 type disk;
allocate channel c18 type disk;
allocate channel c19 type disk;
allocate channel c20 type disk;
backup as compressed backupset archivelog from time 'sysdate -0.7' format '/bakfs/ams/db/t1_arc_%d_%s_%p_%t';;

#
# Note: During the process of backing up the database, RMAN also backs up the
# control file. This version of the control file does not contain the
# information about the current backup because "nocatalog" has been specified.
# To include the information about the current backup, the control file should
# be backed up as the last step of the RMAN section. This step would not be
# necessary if we were using a recovery catalog or auto control file backups.
#

backup current controlfile format='/bakfs/ams/db/control_new_%T.ctl';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
release channel c11;
release channel c12;
release channel c13;
release channel c14;
release channel c15;
release channel c16;
release channel c17;
release channel c18;
release channel c19;
release channel c20;
}
eof2


######

定时userontab

 

 

0.primary

cd /bakfs/ams/db/pass

Cd $ORACLE_HOME/dbs


cp $ORACLE_HOME/dbs/orapw* .

RMAN>backup current controlfile for standby format '/backup/suseript/bakdb/standby.ctl';

cd /bakfs/ams/db
mv /backup/suseript/bakdb/standby.ctl .

SQL> show parameter db_uni

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_unique_name string ndb


#####################
1.standby
SQL> usereate pfile='/tmp/dba/pfile.ora' from spfile;
SQL> show parameter spfile
SQL>show parameter db_uniq_name
cat tnsnames.ora

SQL> usereate pfile='/tmp/dba/pfile.ora' from spfile;

File usereated.

SQL> show parameter spfile

NAME TYPE
------------------------------------ ----------------------
VALUE
------------------------------
spfile string
+db_DATA/ndb/spfilendb.ora
SQL> show parameter db_uniq_name
db_unique_name string
ndb


add /tmp/dba/pfile.ora
*.db_unique_name=ndb_new

 

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+db_DATA/ndb/datafile/system.262.1018625725
+db_DATA/ndb/datafile/sysaux.263.1018625727
+db_DATA/ndb/datafile/undotbs1.264.1018625727
+db_DATA/ndb/datafile/tbs_idm_data.410.1025025807
+db_DATA/ndb/datafile/tbs_idm_data.411.1025025809


su - grid
ASMCMD>+db_DATA/ndb/datafile
ASMCMD> rm *
You may delete multiple files and/or directories.
Are you sure? (y/n) y


SQL> startup nomount pfile='/tmp/dba/pfile.ora';


RMAN> restore standby controlfile from '/bakfs/ams/db/standby.ctl';

Starting restore at 12-SEP-20
using channel ORA_DISK_1

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
output file name=+db_DATA/ndb/controlfile/current.256.1018625719
output file name=+db_FRA/ndb/controlfile/current.256.1018625719
Finished restore at 12-SEP-20


SQL>alter database mount standby database;


set linesize 999 linesize 999 head off feedback off

select 'set newname for datafile '||''||FILE#||''||' to '||chr(39)||replace(name,'+db_DATA/ndb/datafile','+db_DATA/ndb_new/datafile')||''||'.dbf'';' from v$datafile;

spool off

 

 

restore.sh

rman target / log='/tmp/restore_new.log' <<EOF
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
allocate channel c11 type disk;
allocate channel c12 type disk;
allocate channel c13 type disk;
allocate channel c14 type disk;
allocate channel c15 type disk;
allocate channel c16 type disk;
allocate channel c17 type disk;
allocate channel c18 type disk;
allocate channel c19 type disk;
allocate channel c20 type disk;
sql 'alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS"';
set newname for datafile 1 to '+db_DATA/ndb_new/datafile/system.263.1026400095.dbf';
set newname for datafile 2 to '+db_DATA/ndb_new/datafile/sysaux.264.1026400099.dbf';
set newname for datafile 3 to '+db_DATA/ndb_new/datafile/undotbs1.265.1026400103.dbf';

set newname for datafile 194 to '+db_DATA/ndb_new/datafile/tbs_ocl.490.1050466801.dbf';
set newname for datafile 195 to '+db_DATA/ndb_new/datafile/tbs_oml_e.491.1050466901.dbf';
set newname for datafile 196 to '+db_DATA/ndb_new/datafile/tbs_oml_e.492.1050466905.dbf';
restore database;
switch datafile all;
switch tempfile all;
recover database;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
release channel c11;
release channel c12;
release channel c13;
release channel c14;
release channel c15;
release channel c16;
release channel c17;
release channel c18;
release channel c19;
release channel c20;
}
EOF

nohup sh restore.sh &

20个通道:
begin:22:10
end: 3:30

###20个通道实际CPU 消耗不高,只是 很多个ASM 进程在跑而已

##白天需要做的。standby 机房2机器58.2.103.76 (堡垒机登陆)

步骤1:观察/tmp/restore_new.log,正常会有提示:Recovery Manager complete.
tail -f /tmp/restore_new.log

步骤2:
su - opdb

2.1
rman target /
RMAN>CATALOG START WITH '/bakfs/ams/db/archivelog/' noprompt;


2.2 恢复归档日志
cd /backup/suseript/bakdb
nohup sh recover_arch.sh &

2.3 观察/tmp/rdd_new_arc.log 的输出,理论上不会有报错,正常会有提示:Recovery Manager complete.


11点发起
14点完成


3.1 晚上5点再次发起
su - opdb
cd /backup/suseript/bakdb
nohup sh recover_arch.sh &

 

rman target / log='/tmp/rdd_new_arc.log' <<EOF
CATALOG START WITH '/bakfs/ams/db/archivelog/' noprompt;
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
allocate channel c11 type disk;
allocate channel c12 type disk;
allocate channel c13 type disk;
allocate channel c14 type disk;
allocate channel c15 type disk;
allocate channel c16 type disk;
allocate channel c17 type disk;
allocate channel c18 type disk;
allocate channel c19 type disk;
allocate channel c20 type disk;
sql 'alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS"';
recover database;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
release channel c11;
release channel c12;
release channel c13;
release channel c14;
release channel c15;
release channel c16;
release channel c17;
release channel c18;
release channel c19;
release channel c20;
}
EOF

 

####

-〉2.拷贝pri密码文件
C:\Users\useradmin\Downloads

 

2.2

-〉处理恢复的控制文件

startup mount pfile='/home/opigfs/pfile.ora'

asmcmd ls +igfs_data/ligfs/CONTROLFILE/current.256.1044730271

asmcmd ls +igfs_fra/ligfs/CONTROLFILE/current.256.1044730271


*.control_files='+igfs_data/ligfs/CONTROLFILE/current.256.1044730271','+igfs_fra/ligfs/CONTROLFILE/current.256.1044730271'

startup mount pfile='/tmp/dba/pfile.ora';
SQL> usereate spfile from pfile;
shutdown immediate
startup mount


-〉处理恢复的日至文件


asmcmd ls +db_FRA/ndb/onlinelog

asmcmd ls +db_DATA/ndb/onlinelog

group_1.257.1018625719
group_1.638.1051096005
group_10.266.1018627289
group_2.258.1018625719
group_3.259.1018625721
group_4.260.1018625723
group_5.261.1018625723
group_6.262.1018627285
group_7.263.1018627285
group_8.264.1018627287
group_9.265.1018627287

 

问题:看上去1到10号日至文件存在,好像丢失了11到24的日至,重新open 会自动创建

经过咨询,如果ASM中的主库db_uniq_name和standby的db_uniq_name 都存在于ASM 盘里。理论上redo log file 会在db open时候,创建在主库db_uniq_name,这个时候
不需要rename 也可以。文件系统不是OMF ,不存在这个问题,

如果是文件系统,理论上在mount 情况下会创建 redo(也就是系统自动clear),某种特殊情况下,无法创建,可以设置standby_file_managment 为手工,在执行clear logfile 操作

确保 +db_FRA/ndb 和 +db_FRA/ndb_new 路径,
确保 +db_data/ndb 和 +db_data/ndb_new 路径,

 


11.恢复完成之后,rename online_log(如果主库db_uniq_name目录存在asm,就不需要rename,但是open后期增加的standby redo log 会在standby的db_uniq_name,因为asm
会自动进行转换)

 

-- 11.1 生成rename logfile 并clear 脚本, (clear log 可以做,也可以不做)


--spool rename_log.sql
-set linesize 999 linesize 999 head off feedback off

--select 'alter database rename file '||''''||member||''''||' to '||chr(39)||replace(member,'+db_DATA/ndb/onlinelog','+db_DATA/ndb_new/onlinelog')||''';' from v$logfile;
-select 'alter database rename file '||''''||member||''''||' to '||chr(39)||replace(member,'+db_FRA/ndb/onlinelog','+db_FRA/ndb_new/onlinelog')||''';' from v$logfile;

---spool off

 

spool clear_log.sql

set linesize 999 linesize 999 head off feedback off

select 'alter database clear logfile group '|| GROUP#||';' from v$logfile;

spool off

 

--11.2 After renaming the redolog files, the database can be opened

--SQL>alter system set STANDBY_FILE_MANAGEMENT =MANUAL;

--SQL> select member from v$logfile;
---SQL>@rename_log.sql
---##renem 可能有错误,忽略报错
SQL> select member from v$logfile;
SQL>@clear_log.sql

##renem 可能有错误,忽略报错


asmcmd ls +db_DATA/ndb/onlinelog

asmcmd ls +db_FRA/ndb/onlinelog

asmcmd ls +db_FRA/ndb_new/onlinelog

asmcmd ls +db_DATA/ndb_new/onlinelog

 

-〉处理恢复的临时文件
-在数据库open的时候,临时文件也是与redo文件一样,会创建创建在主库db_uniq_name)
--Re-usereating tempfile +db_DATA/ndb/tempfile/tbs_idm_sts.439.1026405277
asmcmd ls +db_DATA/ndb/tempfile
asmcmd du +db_DATA/ndb/tempfile

 

---change pfile standby file managemen.
--*.standby_file_management='MANUAL'

--try to clean ,report ORA-01624: log 2 needed for userash recovery of instance igfs1 (thread 1)., go on

--startup mount pfile='/home/opigfs/pfile.ora'

--all standby file is ok .no need to add

 

-〉处理归档日至(缺少每天的17:00 ~ 23:50的归档日至)

从nbu 恢复,因为说明了nbu 备份 只开了4个通道,那么恢复时候,也只能开4个通道

 

$ORACLE_HOME/bin/rman target / <<eof
run {
allocate channel c1 type 'sbt_tape';
allocate channel c2 type 'sbt_tape';
allocate channel c3 type 'sbt_tape';
allocate channel c4 type 'sbt_tape';
send 'NB_ORA_SERV=pnbumaster,NB_ORA_CLIENT=pdbdb05';
set archivelog destination to '/bakfs/ams/db/archivelog';
sql 'alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS"';
restore archivelog time between '2020/9/13 17:00:29' and '2020/9/13 23:50:29';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
}
eof

 

 

再次发起恢复 (recover 是顺序进行的,无法做到并行,因为速度太慢,因此想分2步骤,使用中转盘NAS,先恢复归档,在恢复控制文件)
-su - opdb
-cd /backup/suseript/bakdb
-nohup sh recover_arch.sh &

--在dg 尝试首先恢复归档日至。在作resover ,准备一个足够大的临时盘;(16:00 ~ 18:42 size 1.6T 5:30 下班后速度明显加快)

-分配20个通道 restore_arch_disk.sh
rman target / log='/tmp/rdd_new_arc_test.log' <<eof1
CATALOG START WITH '/bakfs/ams/db/archivelog/' noprompt;
run{
allocate channel c1 type disk;
allocate channel c2 type disk;
allocate channel c3 type disk;
allocate channel c4 type disk;
allocate channel c5 type disk;
allocate channel c6 type disk;
allocate channel c7 type disk;
allocate channel c8 type disk;
allocate channel c9 type disk;
allocate channel c10 type disk;
allocate channel c11 type disk;
allocate channel c12 type disk;
allocate channel c13 type disk;
allocate channel c14 type disk;
allocate channel c15 type disk;
allocate channel c16 type disk;
allocate channel c17 type disk;
allocate channel c18 type disk;
allocate channel c19 type disk;
allocate channel c20 type disk;
sql 'alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS"';
set archivelog destination to '/test/archivelog';
restore archivelog from time 'sysdate -2';
release channel c1;
release channel c2;
release channel c3;
release channel c4;
release channel c5;
release channel c6;
release channel c7;
release channel c8;
release channel c9;
release channel c10;
release channel c11;
release channel c12;
release channel c13;
release channel c14;
release channel c15;
release channel c16;
release channel c17;
release channel c18;
release channel c19;
release channel c20;
}
eof1


--recover_arch.sh (1.6T归档日至 18:40 ~ 22:17 完成)
20个通道
recover database;

 

---配置参数文件


###below we need deploy network

 


-〉3. 在本地创建pfile 文件(/tmp/dba/pfile.ora),使用生产的参数文件,检查以下参数是否要随着DG库实际环境调整

audit_file_dest, background_dump_dest, control_files, dd_dump_dest, log_archive_dest_1, user_dump_dest,
local_listener, log_archive_config, db_unique_name, control_files, fal_client etc,log_archive_config,remote_listener


-for standby:
log_archive_config="DG_CONFIG=(ndb,ndb_new)"
db_unique_name=ndb_new
fal_client=ndb_new
fal_server=ndb
log_archive_dest_1="LOCATION=USE_DB_RECOVERY_FILE_DEST"
log_archive_dest_2="SERVICE=ndb lgwr async noaffirm reopen=60 valid_for=(online_logfiles,primary_role) db_unique_name=ndb"
service_names=ndb

-for primary:
log_archive_config="DG_CONFIG=(ndb,ndb_new)"
log_archive_dest_1="LOCATION=USE_DB_RECOVERY_FILE_DEST"
log_archive_dest_2="SERVICE=ndb_new lgwr async noaffirm reopen=60 valid_for=(online_logfiles,primary_role) db_unique_name=ndb_new"
fal_client=ndb
fal_server=ndb_new


--in primary:
usereate pfile='/tmp/dba/pfile.bak' from spfile;
alter system set log_archive_config='DG_CONFIG=(ndb,ndb_new)' sid='*' scope = both;
alter system set log_archive_dest_2 = 'SERVICE=ndb_new lgwr async noaffirm reopen=60 valid_for=(online_logfiles,primary_role) db_unique_name=ndb_new' sid='*' scope = both;
alter system set fal_server = ndb_new sid='*' scope = both;
alter system set fal_client = ndb scope = both;


--in standby

 

usereate SPFILE='+db_DATA/ndb/spfilendb.ora' from pfile='/tmp/dba/pfile.ora';
##use asm spfile to start db spile is +<dg_name>/<db_name>/spfile<db_name>.ora
shutdown immediate
startup mount
usereate pfile='/tmp/dba/pfile.ora_0915' from spfile;
alter system set log_archive_config='DG_CONFIG=(ndb,ndb_new)' sid='*' scope = both;
alter system set log_archive_dest_2 = 'SERVICE=ndb lgwr async noaffirm reopen=60 valid_for=(online_logfiles,primary_role) db_unique_name=ndb' sid='*' scope = both;
alter system set fal_server = ndb sid='*' scope = both;
alter system set fal_client = ndb_new sid='*' scope = both;
alter system set service_names=ndb sid='*' scope = both;

-〉5.编辑新主机DG库的tnsnames.ora (在$ORACLE_HOME/network/admin 目录下)
确保ndb和ndb_new和catalog11g 都在tnsnames.ora

for standby:


for primary:

 


-〉6. 检查参数

###in standby
alter system set standby_file_management = auto sid='*' scope = both;

show parameter db_unique_name
show parameter service_names

 

->7
开启恢复

SQL> alter database recover managed standby database disconnect from session;

 

观察数据库日志alert*.log,检查恢复归档的进展:
alter session set NLS_DATE_FORMAT="YYYY-MM-DD HH24:MI:SS";
select checkpoint_time from v$datafile_header;

停下恢复

alter database recover managed standby database cancel;

 

 

->7.证明userS 能自动拉起来
/userbank/db/grid/app/11.2.0/grid/bin/usersctl stop users
/userbank/db/grid/app/11.2.0/grid/bin/usersctl start users
/userbank/db/grid/app/11.2.0/grid/bin/usersctl status res -t
/userbank/db/grid/app/11.2.0/grid/bin/usersctl check users

->8,主库建立standby logfile

alter database add standby logfile thread 1 group 25 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 26 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 27 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 28 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 29 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 30 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 31 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 32 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 33 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 34 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 35 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 36 ('+db_DATA') size 1024M;


alter database add standby logfile thread 2 group 37 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 38 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 39 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 40 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 41 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 42 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 43 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 44 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 45 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 46 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 47 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 48 ('+db_DATA') size 1024M;

 

->8,standby建立standby logfile

alter system set standby_file_management =MANUAL sid='*' scope = both;
recover managed standby database cancel;

alter database add standby logfile thread 1 group 25 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 26 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 27 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 28 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 29 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 30 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 31 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 32 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 33 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 34 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 35 ('+db_DATA') size 1024M;
alter database add standby logfile thread 1 group 36 ('+db_DATA') size 1024M;


alter database add standby logfile thread 2 group 37 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 38 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 39 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 40 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 41 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 42 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 43 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 44 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 45 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 46 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 47 ('+db_DATA') size 1024M;
alter database add standby logfile thread 2 group 48 ('+db_DATA') size 1024M;
alter system set STANDBY_FILE_MANAGEMENT=auto sid='*' scope=both;;

->11.3

SQL> alter database open;

 

 


最后调整rman 策略 for standby

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

 


è 开启恢复

SQL> alter database recover managed standby database disconnect from session;
or

 

è 观察数据库日志alert*.log

 

--è 停下恢复

--alter database recover managed standby database cancel;

--SQL〉@clear_log.sql

 

13. 调整参数 STANDBY_FILE_MANAGEMENT

 

SQL> alter database open

SQL〉alter system set STANDBY_FILE_MANAGEMENT =auto;

13.1 .调整所有现有环境的数据库参数 log_archive_config

 

 

--13.2. 检查ASM PFILE 和rdbms spfile 设置

-- -〉2个节点ASM 使用 pfile 配置

 

-- -〉RDBMS 使用spfile 配置 ,放在ASM盘上

--Sqlplus / as sysdba

---usereate spfile='+NEW_DATA/rdd_new/spfiledd.ora' from pfile;

---配置init 文件一行:

---spfile='+NEW_DATA/rdd_new/spfiledd.ora'

 

 


14.调整tempfile,(临时文件会自动创建,跟主库保持ASM 路径一样,所以只有asmcmd ls 查找到这些文件,也可以不用rename,)

 

--spool /tmp/dba/re-tempfile.sql

--set linesize 999 linesize 999 head off feedback off

 

--FOR BYTES IS NOT 0M

---select 'ALTER TABLESPACE '||t.name||' add tempfile '||''''||replace(f.name,'+NEW_DATA/dd','+NEW_DATA/rdd_new')||''''||' size '||f.bytes/1024/1024||'M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;' from v$tempfile f ,(select ts#,name from v$tablespace t where included_in_database_backup='NO') t

---where f.ts#= t.ts#

 

 

--For Bytes is 0M

---select 'ALTER TABLESPACE '||t.name||' add tempfile '||''''||replace(f.name,'+NEW_DATA/dd','+NEW_DATA/rdd_new')||''''||' size '||'10'||'M REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;' from v$tempfile f ,(select ts#,name from v$tablespace t where included_in_database_backup='NO') t

---where f.ts#= t.ts#

---/

 

 

--select 'alter database tempfile '||''''||name||''' drop;' from v$tempfile;

 

----spool off

 

----@/tmp/dba/re-tempfile.sql

 

 

è 开启恢复

SQL> alter database recover managed standby database using current logfile disconnect from session;

 

 

最后调整rman 策略 for standby

CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

 

 

 

#########
select session_recid,
input_bytes_per_sec_display,
output_bytes_per_sec_display,
time_taken_display,
end_time
from v$rman_backup_job_details
order by end_time;

select s.client_info,
sl.opname,
sl.message,
sl.sid,
sl.serial#,
p.spid,
sl.sofar,
sl.totalwork,
round(sl.sofar / sl.totalwork * 100, 2) "% Complete"
from v$session_longops sl, v$session s, v$process p
where p.addr = s.paddr
and sl.sid = s.sid
and sl.serial# = s.serial#
and opname LIKE 'RMAN%'
and opname NOT LIKE '%aggregate%'
and totalwork != 0
and sofar <> totalwork;

--backup as compressed backupset DATAFILE 1 format '/tmp/t1_bk_%d_%s_%p_%t';

10.198.227.183
10.198.227.184

58.2.103.78/79/80

ctags_bin=`which ls`
sed -i '' 's#let Tlist_Ctags_Cmd.*#let Tlist_Ctags_Cmd = '"\'${ctags_bin}\'"'#g' 1

 

##问题1:
[grid@sdbdb01 ~]$ lsnrctl status ndb

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 15-SEP-2020 23:45:25

Copyright (c) 1991, 2013, Oracle. All rights reserved.

Connecting to (DESuserIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=Ndb)))
STATUS of the LISTENER
------------------------
Alias Ndb
Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date 29-DEC-2019 10:24:54
Uptime 261 days 13 hr. 20 min. 30 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /userbank/db/grid/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /userbank/db/grid/app/grid/diag/tnslsnr/sdbdb01/ndb/alert/log.xml
Listening Endpoints Summary...
(DESuserIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=Ndb)))
(DESuserIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=58.2.103.78)(PORT=1528)))
(DESuserIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=58.2.103.76)(PORT=1528)))
Services Summary...
Service "+ASM" has 2 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "Ndb" has 1 instance(s).
Instance "ndb1", status READY, has 1 handler(s) for this service...
Service "ndb_new" has 1 instance(s).
Instance "ndb1", status READY, has 1 handler(s) for this service...
The command completed successfully

 

[opdb@sdbdb01 ~]$ sqlplus dbmgr/'Byscyrj8!ABD'@ndb_new

SQL*Plus: Release 11.2.0.4.0 Production on Tue Sep 15 23:47:59 2020

Copyright (c) 1982, 2013, Oracle. All rights reserved.

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
desuseriptor


SQL> show parameter remote_listener

NAME TYPE
------------------------------------ ----------------------
VALUE
------------------------------
remote_listener string
(DESuserIPTION=(ADDRESS=(PROTOCO
L=tcp)(HOST=58.2.103.79)(PORT=


fix: should use scan-ip not vip 70. because tnsnames us scan-ip 80

--alter system set remote_listener='(DESuserIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=58.2.103.80)(PORT=1528)))' sid='*' scope = both;

注意:remote-listener 是注册在本地的,不能使用sid='*' 注册,因为使用这个选项导致spfile 会出现3个条目,只能使用sid='1' 或者sid='2' 本地注册

alter system set remote_listener='(DESuserIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=58.2.103.80)(PORT=1528)))' sid='ndb2' scope = both;
alter system set remote_listener='(DESuserIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=58.2.103.80)(PORT=1528)))' sid='ndb1' scope = both;

###问题2


--node2 (gi can't restart)

/userbank/db/grid/app/11.2.0/grid/log/sdbdb02/cssd/ocssd.log


10.950: [ CSSD][1327982336]clssnmvDHBValidateNcopy: node 1, sdbdb01, has a disk HB, but no network HB, DHB has rcfg 463248178, wrtcnt, 162142067, LATS 13575974, lastSeqNo 162142062, uniqueness 1577586213, timestamp 1600237270/1175974344
2020-09-16 14:21:10.950: [ CSSD][1332713216]clssnmvDHBValidateNcopy: node 1, sdbdb01, has a disk HB, but no network HB, DHB has rcfg 463248178, wrtcnt, 162142068, LATS 13575974, lastSeqNo 162142063, uniqueness 1577586213, timestamp 1600237270/1175974354
2020-09-16 14:21:10.953: [ CSSD][1323251456]clssnmvDHBValidateNcopy: node 1, sdbdb01, has a disk HB, but no network HB, DHB has rcfg 463248178, wrtcnt, 162142069, LATS 13575974, lastSeqNo 162142064, uniqueness 1577586213, timestamp 1600237270/1175974654

有磁盘心跳,没有网卡心跳


[cssd(56121)]userS-1605:CSSD voting file is online: /dev/mapper/ouser_db_lun03; details in /userbank/db/grid/app/11.2.0/grid/log/sdbdb02/cssd/ocssd.log.
2020-09-16 11:11:05.062:
[/userbank/db/grid/app/11.2.0/grid/bin/cssdagent(56110)]userS-5818:Aborted command 'start' for resource 'ora.cssd'. Details at (:userSAGF00113:) {0:0:2} in /userbank/db/grid/app/11.2.0/grid/log/sdbdb02/agent/ohasd/oracssdagent_root//oracssdagent_root.log.
2020-09-16 11:11:05.062:
[cssd(56121)]userS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /userbank/db/grid/app/11.2.0/grid/log/sdbdb02/cssd/ocssd.log
2020-09-16 11:11:05.062:

 

cat /userbank/db/grid/app/11.2.0/grid/log/sdbdb02/agent/ohasd/oracssdagent_root//oracssdagent_root.log


2020-09-16 11:11:05.062: [ CSSD][439494400](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally


ocssd.log

2020-09-16 10:40:59.616: [ CLSF][3357497088]checksum failed for disk:/dev/mapper/data_db_lun20:
2020-09-16 10:40:59.616: [ CLSF][3357497088]Error: obj 2147483648 blk 0 name 'check_kfbh' num1 3873730824 num2 2505139254

 

-node 1()
cluster', inf 'mcast://224.0.0.251:42424/190.2.103.76:26739'
2020-09-16 14:47:35.022: [GIPCHTHR][3647645440] gipchaWorkerusereateInterface: usereated remote bootstrap multicast interface for node 'sdbdb02', haName 'CSS_sdbdb-cluster', inf 'mcast://230.0.1.0:42424/190.2.103.76:26739'
2020-09-16 14:47:35.022: [GIPCHTHR][3647645440] gipchaWorkerusereateInterface: usereated remote bootstrap broadcast interface for node 'sdbdb02', haName 'CSS_sdbdb-cluster', inf 'udp://190.2.103.255:42424'
2020-09-16 14:47:35.022: [GIPCHALO][3647645440] gipchaLowerProcessNode: no valid interfaces found to node for 1177558774 ms, node 0x7f0dc01f3360 { host 'sdbdb02', haName 'CSS_sdbdb-cluster', srcLuid db1ed9b6-5f8af9b4, dstLuid 43edd30b-6f5200bb numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], usereateTime 1177557774, sentRegister 1, localMonitor 1, flags 0x4 }

 

solution:
1.try to start users

/userbank/db/grid/app/11.2.0/grid/bin/usersctl start res ora.usersd -init


userS-2674: Start of 'ora.cssd' on 'sdbdb02' failed


2. oifcfg getif


em3 190.2.103.0 global cluster_interconnect
bond0 58.2.103.0 global public

3.
cd /userbank/db/grid/app/11.2.0/grid/log/sdbdb02/cssd
grep gipc ocssd.log|more
gipcDestroy

4.
traceroute 190.2.103.76
traceroute 190.2.103.77
/userbank/db/grid/app/11.2.0/grid/bin/ousercheck


processes = 170
sessions = 320
sga_max_size = 8G
shared_pool_size = 5G
large_pool_size = 1G
instance_type = "asm"
cluster_interconnects = "190.2.103.76"
db_cache_size = 1G
remote_login_passwordfile= "EXCLUSIVE"
local_listener = "(DESuserIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=58.2.103.78)(PORT=1528))))"
remote_listener = "(DESuserIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=58.2.103.79)(PORT=1528))))"
asm_diskstring = "/dev/mapper"
asm_diskgroups = "db_FRA"
asm_power_limit = 1
_asm_hbeatiowait = 120
diagnostic_dest = "/userbank/db/grid/app/grid"

 


alter diskgroup db_Ouser mount;

 

5.red-hat 7 RAC node2 主机重启动,GI 无法使用

cat /var/log/messages|grep avahi-daemon

该damon avahi-daemon 作用是检测网路中其他设备,比如打印机,共享文件,这个服务不用到,可以关闭

FIX:
[root@sdbdb02 sysconfig]# chkconfig avahi-daemon off
Note: Forwarding request to 'systemctl disable avahi-daemon.service'.
Removed symlink /etc/systemd/system/multi-user.target.wants/avahi-daemon.service.
Removed symlink /etc/systemd/system/sockets.target.wants/avahi-daemon.socket.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.Avahi.service.


2.重新启动主机
shutdown -r now


###issue 3

###重新启动主机后发现grid listener 无法使用

###但是主库是由一个脚本,temp for dg usereate
18 5,17,23 * * * su - opdb -c /backup/suseript/bakdb/rman_arc.sh >> /backup/suseript/bakdb/rman_arc.sh.out

###监控实效原因是 重新启动mrp ,如果备用库没有检索到主库的日至,就会显示是空置

SQL> select substr(value,2,2)*24*3600+substr(value,5,2)*3600+substr(value,8,2)*60+substr(value,11,2) from v$dataguard_stats where name='apply lag';

SUBSTR(VALUE,2,2)*24*3600+SUBSTR(VALUE,5,2)*3600+SUBSTR(VALUE,8,2)*60+SUBSTR(VAL
--------------------------------------------------------------------------------


SQL> /

 

 

 

#######

posted @ 2019-03-01 14:22  feiyun8616  阅读(545)  评论(0编辑  收藏  举报