[转]Oracle数据文件损坏的模拟和修复(一) |ORA-01578 data block corrupted|
造成数据块损坏的原因通常是由于开启了异步I/O或者增加了写进程,还有可能是硬件引起的,今天模拟一下该问题的发生及修复方法。由于水平有限,那面疏漏,欢迎大家指正。
创建测试环境
建立测试表空间:
|
1
2
3
4
5
6
|
create tablespace testdatafile '/u02/oradata/logdw/test01.dbf' size 180K autoextend offloggingsegment space management autoextent management local; |
创建测试用户并设置相应的权限:
|
1
2
3
4
5
6
7
|
create user testidentified by testdefault tablespace test;grant connect to test;grant resource to test;revoke unlimited tablespace from test;alter user test quota unlimited on test; |
插入测试数据,直到填满 test01.dbf:
|
1
2
3
4
5
6
7
8
|
declare i number;begin for i in 1..99999 loop insert into test_table values(i,lpad('Q',30)); commit; end loop;end; |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
SQL> declare 2 i number; 3 begin 4 for i in 1..99999 loop 5 insert into test_table values(i,lpad('Q',30)); 6 commit; 7 end loop; 8 end; 9 /declare*ERROR at line 1:ORA-01653: unable to extend table TEST.TEST_TABLE by 8 in tablespace TESTORA-06512: at line 5 |
test01.dbf 已经填满了数据后,看看一共插入了多少条数据:
|
1
2
3
4
5
|
select count(*) from test_table; COUNT(*)---------- 2356 |
如果我想再加一个索引是不会成功的,我为该表空间添加多一个数据文件来扩容。
|
1
|
alter tablespace test add datafile '/u02/oradata/logdw/test02.dbf' size 180K autoextend off; |
扩容后的 test 表空间可以创建索引了:
|
1
|
create index idx_test_table on test_table(id); |
通过这样一些步骤,现在数据的分布情况是test01.dbf这个数据文件装着数据,test02.dbf装着索引。可以使用查询一下dba_extents 确认一下:
|
1
2
3
|
select owner,segment_name,segment_type,extent_id,file_id,block_id,bytes from dba_extents where owner='TEST'; |
|
1
2
3
4
5
6
|
OWNER SEGMENT_NAME SEGMENT_TYPE EXTENT_ID FILE_ID BLOCK_ID BYTES----- --------------- ------------------ ---------- ---------- ---------- ----------TEST TEST_TABLE TABLE 0 15 8 65536TEST TEST_TABLE TABLE 1 15 16 65536TEST IDX_TEST_TABLE INDEX 0 16 8 65536TEST IDX_TEST_TABLE INDEX 1 16 16 65536 |
一条使用索引的典型SQL:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
SQL> select count(id) from test_table; COUNT(ID)---------- 2356----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 13 | | || 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 30628 | 4 (0)| 00:00:01 |---------------------------------------------------------------------------------------- |
自此,我们的测试环境就建立完成了。
破坏并修复数据文件
在Linux修改二进制文件的方法可以参考我这篇blog《在 Linux下使用vim配合xxd查看并编辑二进制文件》,这里就不熬述了。不同位置的数据块遭到损坏就会报不同的错误,非常幸运,我一个下午就遇到了3个。
使用drop和create修复ORA-01578
关闭数据库后打开test02.dbf ,定位到偏移地址 001e9c0 左右的地方修改一下,要注意偏移的位置,不同的位置可能会引发不同的问题,保存并退出。
使用dbv工具检查一下数据文件:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
[oracle@logserver logdw]$ dbv file=test02.dbf blocksize=8192DBVERIFY: Release 11.2.0.1.0 - Production on Wed Jun 23 09:16:35 2010Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.DBVERIFY - Verification starting : FILE = /u02/oradata/logdw/test02.dbfPage 15 is marked corruptCorrupt block relative dba: 0x0400000f (file 16, block 15)Bad check value found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x0400000f last change scn: 0x0000.0a71e746 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xe7460601 check value in block header: 0x46a7 computed block checksum: 0x12d1Page 16 is marked corruptCorrupt block relative dba: 0x04000010 (file 16, block 16)Bad check value found during dbv: Data in bad block: type: 6 format: 2 rdba: 0x04000010 last change scn: 0x0000.0a71e746 seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xe7460602 check value in block header: 0x99f0 computed block checksum: 0xadcaDBVERIFY - Verification completeTotal Pages Examined : 23Total Pages Processed (Data) : 0Total Pages Failing (Data) : 0Total Pages Processed (Index): 4Total Pages Failing (Index): 0Total Pages Processed (Other): 10Total Pages Processed (Seg) : 0Total Pages Failing (Seg) : 0Total Pages Empty : 7Total Pages Marked Corrupt : 2Total Pages Influx : 0Total Pages Encrypted : 0Highest block SCN : 175236934 (0.175236934) |
此时,数据库可以正常启动,但是当接触到坏块的时候就会报ORA-01578:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
SQL> startup ; ORACLE instance started.Total System Global Area 1887350784 bytesFixed Size 2214456 bytesVariable Size 1560282568 bytesDatabase Buffers 301989888 bytesRedo Buffers 22863872 bytesDatabase mounted.Database opened.SQL> connect test/testSQL> show user;USER is "TEST"SQL> set autotrace on ; SQL> select count(id) from test_table; select count(id) from test_table *ERROR at line 1:ORA-01578: ORACLE data block corrupted (file # 16, block # 15)ORA-01110: data file 16: '/u02/oradata/logdw/test02.dbf'SQL> select count(*) from test_table; COUNT(*)---------- 2356-------------------------------------------------------------------------| Id | Operation | Name | Rows | Cost (%CPU)| Time |-------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | | || 2 | TABLE ACCESS FULL| TEST_TABLE | 2356 | 5 (0)| 00:00:01 |------------------------------------------------------------------------- |
如果查询不用到索引,就不会有问题,证明我们的数据没有问题,仅是损失了索引。由于场景特殊,我们知道肯定是索引损坏了,数据没有丢失的,通常的场景是我们知道损坏所在的文件编号和块编号,但是不知道对应的是什么对象,可以用以下查询定位到对应的对象上:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
sys$logdw@logserver SQL> SELECT tablespace_name, owner,segment_name,segment_type 2 FROM dba_extents 3 WHERE file_id = &file 4 and &block between block_id AND block_id + blocks - 1 ;Enter value for file: 16old 3: WHERE file_id = &filenew 3: WHERE file_id = 16Enter value for block: 15old 4: and &block between block_id AND block_id + blocks - 1new 4: and 15 between block_id AND block_id + blocks - 1TABLESPACE OWNER SEGMENT_NAME SEGMENT_TYPE---------- ---------- -------------------- ------------------TEST TEST IDX_TEST_TABLE INDEX |
由于数据没有丢失,我尝试重建索引。
|
1
2
3
4
5
6
|
SQL> alter index IDX_TEST_TABLE rebuild online ; alter index IDX_TEST_TABLE rebuild online*ERROR at line 1:ORA-00604: error occurred at recursive SQL level 1ORA-01658: unable to create INITIAL extent for segment in tablespace TEST |
空间不足……,于是drop后再create吧:
|
1
2
3
4
5
6
7
8
9
10
|
SQL> drop index IDX_TEST_TABLE ; Index dropped.Elapsed: 00:00:00.57SQL> create index IDX_TEST_TABLE on test_table(id);Index created.Elapsed: 00:00:00.10 |
我们再看看使用索引的查询:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
SQL> select count(id) from test_table ; COUNT(ID)---------- 2356----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 4 | | || 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 |---------------------------------------------------------------------------------------- |
索引恢复正常了。
该方法适用于坏块落在非数据区的情况。
使用RMAN修复ORA-01578
如果损坏了仅仅两个数据块(8K*2),导致一个几百兆的索引重建,这样的很不值,我们可以试一下Oracle的blockrecover,但前提是之前有备份。
|
1
2
3
4
5
|
RMAN> blockrecover datafile 16 block 15; RMAN> blockrecover datafile 16 block 16; -- 或者组合起来RMAN> blockrecover datafile 16 block 15 datafile 16 block 16; RMAN> blockrecover datafile 16 block 15,16; |
如果坏块很多,例如我们就要写很长很长的命令,此时可以试一下:
|
1
|
RMAN> blockrecover corruption list; |
这个命令是依靠 v$database_block_corruption 这个表进行工作的。Oracle每当发现一个损坏的数据块就会在这个表中多一条记录,相反地如果没有碰到,就自然不会在这表中显示出来的。
|
1
2
3
4
5
|
sys$logdw@logserver SQL> select * from v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO---------- ---------- ---------- ------------------ --------- 16 15 2 0 CHECKSUM |
我们可以借助rman对数据库、文件或表空间进行“扫描”,检查出所有的坏块。
|
1
2
3
|
RMAN> backup validate database;RMAN> backup validate tablespace test;RMAN> backup validate datafile 16; |
如果数据库比较大或者表空间也比较大,backup validate就会长时间占用大量I/O资源,因此范围应该尽量小。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
RMAN> backup validate datafile 16;Starting backup at 2010-06-23 10:42:23using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: SID=71 device type=DISKchannel ORA_DISK_1: starting full datafile backup setchannel ORA_DISK_1: specifying datafile(s) in backup setinput datafile file number=00016 name=/u02/oradata/logdw/test02.dbfchannel ORA_DISK_1: backup set complete, elapsed time: 00:00:01List of Datafiles=================File Status Marked Corrupt Empty Blocks Blocks Examined High SCN---- ------ -------------- ------------ --------------- ----------16 FAILED 0 7 23 175356605 File Name: /u02/oradata/logdw/test02.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 0 Index 2 6 Other 0 10 validate found one or more corrupt blocksSee trace file /u01/app/oracle/diag/rdbms/logdw/logdw/trace/logdw_ora_30130.trc for detailsFinished backup at 2010-06-23 10:42:26 |
只要让Oracle直到了哪些是坏块就好办了,是blockrecover即可修复坏块:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
RMAN> blockrecover corruption list; Starting recover at 2010-06-23 11:00:49using target database control file instead of recovery catalogallocated channel: ORA_DISK_1channel ORA_DISK_1: SID=136 device type=DISKsearching flashback logs for block imagesfinished flashback log search, restored 2 blocksstarting media recoverymedia recovery complete, elapsed time: 00:00:01Finished recover at 2010-06-23 11:10:48 |
再使用索引查询一下数据:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
test$logdw@logdw SQL> select count(id) from test_table ; COUNT(ID)---------- 2356----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 4 | | || 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 |---------------------------------------------------------------------------------------- |
此时索引idx_test_table可以用了,一切都是在线做的。
该方法需要有rman的备份,如果日常有rman的备份就不用怕了。
使用exp和imp尽力挽救数据
如果没有rman备份呢?以下主要参考盖国强的《Oracle中模拟及修复数据块损坏》
我们再用同样的方法编辑test01.dbf,定位到最后,将数据破坏。
可以预见test_table的数据已经损坏,索引没有损坏。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
test$logdw@logdw SQL> select count(*) from test_table ; select count(*) from test_table *ERROR at line 1:ORA-01578: ORACLE data block corrupted (file # 15, block # 23)ORA-01110: data file 15: '/u02/oradata/logdw/test01.dbf'Elapsed: 00:00:01.93test$logdw@logdw SQL> select count(id) from test_table ; COUNT(ID)---------- 2356----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 || 1 | SORT AGGREGATE | | 1 | 4 | | || 2 | INDEX FAST FULL SCAN| IDX_TEST_TABLE | 2356 | 9424 | 3 (0)| 00:00:01 |---------------------------------------------------------------------------------------- |
由于没有已经假设没有rman的备份,所以blockrecover已经不可能了,此时可以用vim+xxd重新对test01.dbf进行修改或者借助内部工具 bbed 将坏块修正也是可以的,据说这是Oracle 8i之前的做法,但几乎是mission impossible。
如果我们将任务定位为可以容忍数据丢失,希望能尽量挽救,减少数据丢失。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
[oracle@logserver tmp]$ exp test/test file=test_table.dmp tables=test_table Export: Release 11.2.0.1.0 - Production on Wed Jun 23 14:08:29 2010Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsExport done in ZHS16GBK character set and AL16UTF16 NCHAR character setAbout to export specified tables via Conventional Path .... . exporting table TEST_TABLEEXP-00056: ORACLE error 1578 encounteredORA-01578: ORACLE data block corrupted (file # 15, block # 23)ORA-01110: data file 15: '/u02/oradata/logdw/test01.dbf'Export terminated successfully with warnings. |
我们使用导出得到一个错误
通过设置内部事件,设置在全表扫描时跳过损坏的数据块
|
1
2
|
ALTER SYSTEM SET EVENTS='10231 trace name context forever,level 10' ;ALTER SYSTEM SET EVENTS='10231 trace name context off' ; |
exp可以成功执行,将损坏的数据表删除掉,在imp即可挽救部分数据。
在当前的场景中,没有设置内部事件的情况下挽救了 2356 条数据中的 1920 条;而设置了内部事件之后挽救了 2175 条数据。
该方法几乎成了最后的救命稻草了,如果你足够强悍可以试一下用bbed或者其他工具直接对数据文件进行二进制编辑从而手动修复数据文件。
小结
遇到ORA-01578后,第一步应首先确定rman备份情况,如果有可用备份,一切都不太糟糕,几条rman命令就可以修复;
如果没有再确定坏块对应的是什么对象,如果是索引等非数据对象,可以考虑drop后再create一次;
如果坏块落在数据上,设置10231内部事件,然后求神拜佛,能救回多少数据只能看人品了。
引用
https://www.cnblogs.com/killkill/archive/2010/06/24/1764067.html
微信赞赏
支付宝赞赏

浙公网安备 33010602011771号