oracle block

基本上每个对象对应一个段（ Segment），只有分区对应多个段，这里的对象包括table，index，partition等等，段可以跨越多个数据文件。

每个段又有多个区（extent）来组成，这些区不能跨越多个数据文件，同时在系统使用过程中自动扩展。

最后是块（block），所有的数据都是存放在块中。为了适应操作系统，每个块在创建数据库的时候默认了一个大小，这个大小一般是8K，同时在9I及其以后的版本中增加了不同大小的块参数，这将在以后的实验中体现。先说说这个8K大小的块，一般来说，为了使得oracle运行读写数据文件的时候有一个合理的吞吐量，这里的块大小，都跟操作系统块大小设为整数倍，例如ntfs格式化的磁盘文件，每个物理块大小为4，这里oracle的块大小为8，即是代表每读取一个oracle块，其实物理上也就是读取了两个操作系统块。这里主要指的是数据文件存放在块设备上，在实际的生产环境中，大部分情况都是将数据库安装在裸设备（RAW）也叫做原始分区之上。关于RAW将在以后进行讲解。
------------------------------------------------------------------------------------------------------------
通过上面这段文字，我们可以了解到ORACLE基本的存储结构，下一篇将针对块的大小与存放数据大小来做实验。

上一节了解到了ORACLE的存储结构，这节讲一讲块的大小与数据存放之间的关系。

大家都知道了在ORACLE环境中，所有的对象都是存放在块中，这个块大小与存放的记录之间到底存在怎样的关系呢？

做一个实验看看：

创建一个表空间test

create tablespace test datafile '/oracle/oradata/test.dbf' size 100m;

创建一个用户

create user test identified by test default tablespace test;

创建一个表

create table test.t1 (a1 number,a2 varchar2(100));

检查段，可以发现在这个视图中出现了名称为T的段，段类型为TABLE，这个段里面分配了1个区，其中包含8个块，大小为64K字节。

select segment_name,blocks,extents,bytes,segment_type,tablespace_name from dba_segments where owner='TEST';

SEGMENT_NAME    BLOCKS EXTENTS    BYTES         SEGMENT_TYPE    TABLESPACE_NAME

----------             ---------- ----------      ----------       ------------------         ----------
T                                8            1             65536                 TABLE                    TEST

---------------------------------------------------------------------------------------------------
检查区，可以发现在这个视图中出现了一个区，区号为0，包含8个块，大小为64K字节。

select segment_name,segment_type,extent_id,blocks,bytes from dba_extents where owner='TEST';

SEGMENT_NAME      SEGMENT_TYPE       EXTENT_ID    BLOCKS        BYTES
----------                 ------------------       ----------   ----------     ----------
T                                TABLE                      0                 8             65536

------------------------------------
检查块，可以发现这里没有载入到内存的块，由此断定，在数据未写入的时候，内存中并没有存放数据的块。

select file#,block#,class#,status,xnc,objd from v$bh where ts#=12;

未选定行

插入10行数据，进行测试。

SQL> declare
  2  i number
  3  ;
  4  begin
  5  for i in 1..10 loop
  6  execute immediate 'insert into test.t values (:x,:y)' using i,i;
  7  end loop;
  8  end;
  9  /

PL/SQL 过程已成功完成。

再次查看v$bh视图，检查内存中是否使用到了块。

select file#,block#,class#,status,xnc,objd from v$bh where ts#=12;

   FILE#    BLOCK#    CLASS# STATU       XNC    OBJD
---------- ---------- ---------- ----- ---------- ----------
      1    28089       4 xcur          0    11038
      1    28090       1 xcur          0    11038

哈哈，果然出现了数据，说明在数据插入的表的时候在内存中已经载入了分配的块，同时在这些块中写入了数据，这里占用了两个块，块号分别为28089，28090，其中我们可以根据CLASS#来判断出他们属于不同类型。

这一节紧接着上一节来说。

上一节通过实验，我们了解到，块的创建和读取流程，不过只是针对一个会话的，现在我们来看看在一个会话中插入数据之后，同时在另外一个会话查询数据，这样的情况会对块有什么影响。

打开一个新的会话，然后执行如下命令：

查询表，由于插入数据的事务没有提交，这里在另外的会话中就看不到任何数据，深深体现了ORACLE的多版本一致性

select * from test_gao.t;

未选定行

查询视图v$bh，看是否有了变化

select file#,block#,class#,status,xnc,objd from v$bh where ts#=12;

   FILE#    BLOCK#    CLASS# STATU       XNC    OBJD
---------- ---------- ---------- ----- ---------- ----------
      1    28089       4 xcur          0    11038
      1    28090       1 cr          0    11038
      1    28090       1 cr          0    11038
      1    28090       1 xcur          0    11038

果然和上一节查询出来的结果不同，多了红色字体标识出来的两行，大家可以看到这两行的STATUS字段值为cr，什么是cr呢？它是Consistency Read（一致性读取）的缩写。从这里可以看出28090这个块被两个会话进行了操作。

在第一个会话中回滚事务会发生什么呢？看下面的操作：

会话1：执行rollback

SQL> rollback;

回退已完成。

再次查询v$bh视图，看看什么情况

  select file#,block#,class#,status,xnc,objd from v$bh where objd=11038;

   FILE#    BLOCK#    CLASS# STATU       XNC    OBJD
---------- ---------- ---------- ----- ---------- ----------
      1    28089       4 xcur          0    11038
      1    28090       1 cr          0    11038
      1    28090       1 cr          0    11038
      1    28090       1 xcur          0    11038

结果还是一样，说明在事务回滚之后，块还是处于一致读取的状态。

++++++++++++++++++++++++
顺便问一个问题：表t1只有两个字段，10行记录为什么就占了两个block呢？
这两个块不一样，有一个块是用来存放表目录，行目录，还有一些其他信息的，另外一个块是存放数据的
++++++++++++++++++++++++++
我们继续上一节的话题。

关闭数据库实例

SQL> shutdown immediate

数据库已经关闭。
已经卸载数据库。

重新打开数据库

SQL>startup

ORACLE 例程已经启动。

Total System Global Area  253214492 bytes
Fixed Size                454428 bytes
Variable Size          117440512 bytes
Database Buffers       134217728 bytes
Redo Buffers             1101824 bytes
数据库装载完毕。
数据库已经打开。

检查v$bh视图

select file#,block#,class#,status,xnc,objd from v$bh where objd=11038;

未选定行

说明在没有进行块中数据的相关操作的时候，并没有从物理文件中提取块到内存。

执行查询或者插入、更新的SQL语句

SQL> insert into test.t values (200,200);

已创建 1 行。

再次检查v$bh视图

SQL> select file#,block#,class#,status,xnc,objd from v$bh where objd=11038;

   FILE#    BLOCK#    CLASS# STATU       XNC    OBJD
---------- ---------- ---------- ----- ---------- ----------
      1    28089       4 xcur          0    11038
      1    28090       1 xcur          0    11038

总结：在没有进行物理I/O的时候,v$bh视图中不会出现相关的块信息，同时证明此视图中存放的乃是数据文件块放到内存中的“块”信息。
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CR 叫做一致性读取，为什么呢？它主要是体现在高并发的环境中。

ORACLE有一个特性叫做“多版本一致性”，在一个事务（1）更新数据的同时，而另一个事务（2）要查询这些数据，怎么办？
有了这个特性，我们就可以查询出来正确的数据。因为在1更新的时候，表数据是处于锁定状态的。2读取的时候，只能是从undo来读，也就是从回滚段中读取，事务1 发生那个时间点的数据，这样可以得到还没有更新完的正确数据。

不知道我这样说大家明白了没有？
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
第五节
这一节通过转储trace文件来看前几节操作的块的详细内容。

转储块数据，使用这条命令

alter system dump datafile 12 block 28090;

系统已更改。

在相应的目录下找出trace文件

一般trace文件都存放在$ORACLE_BASE/admin/SID/udump目录下，这里我们找到刚刚转储的文件o9i_ora_300.trc

打开这个文件，列出部分内容：

Dump file c:\oracle\admin\o9i\udump\o9i_ora_300.trc
Wed Jun 18 09:12:17 2008
ORACLE V9.2.0.4.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.1 Service Pack 2, CPU type 586
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
Windows 2000 Version 5.1 Service Pack 2, CPU type 586
Instance name: o9i

Redo thread mounted by this instance: 1

Oracle process number: 10

Windows thread id: 300, image: ORACLE.EXE

*** 2008-06-18 09:47:34.031
Start dump data blocks tsn: 0 file#: 1 minblk 28090 maxblk 28090
buffer tsn: 0 rdba: 0x00406dba (1/28090)
scn: 0x0000.000c0a29 seq: 0x01 flg: 0x00 tail: 0x0a290601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x00406dba
Object id on Block? Y
seg/obj: 0x2b1e  csc: 0x00.c0a29  itc: 2  flg: O  typ: 1 - DATA
   fsl: 0  fnx: 0x0 ver: 0x01

Itl          Xid                Uba       Flag  Lck       Scn/Fsc
0x01 0x0009.019.000000e5  0x00801ea9.0045.0d  C--- 0  scn 0x0000.000b9373
0x02 0x0006.005.000000e0  0x00804605.0051.2c  C--- 0  scn 0x0000.000badb9

data_block_dump,data header at 0x347105c
===============
tsiz: 0x1fa0
hsiz: 0x26
pbl: 0x0347105c
bdba: 0x00406dba
   76543210
flag=--------
ntab=1
nrow=10
frre=4
fsbo=0x26
fseo=0x1e96
avsp=0x1f56
tosp=0x1f56
0xe:pti[0] nrow=10 offs=0
0x12:pri[0] offs=0x1eb6
0x14:pri[1] offs=0x1eae
0x16:pri[2] offs=0x1ea6
0x18:pri[3] offs=0x1e96
0x1a:pri[4] sfll=5
0x1c:pri[5] sfll=6
0x1e:pri[6] sfll=7
0x20:pri[7] sfll=8
0x22:pri[8] sfll=9
0x24:pri[9] sfll=-1
block_row_dump:
[color=Yellow]tab 0, row 0, @0x1eb6[/color]
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  [color=Red]c1 02[/color]
col  1: [ 1]  32
[color=Yellow]tab 0, row 1, @0x1eae[/color]
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  [color=Red]c1 02[/color]
col  1: [ 1]  32
[color=Yellow]tab 0, row 2, @0x1ea6[/color]
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  [color=Red]c1 02[/color]
col  1: [ 1]  32
[color=Yellow]tab 0, row 3, @0x1e96[/color]
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2] [color=Red] c1 02[/color]
col  1: [ 1]  [color=Blue]33[/color]
end_of_block_dump
End dump data blocks tsn: 0 file#: 1 minblk 28090 maxblk 28090

tsiz: 0x1fa0 块大小，转为10进制是8096
hsiz: 0x26    头大小，转为10进制是38

tab 0, row 0, @0x1eb6
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 02
col  1: [ 1]  32
tab 0, row 1, @0x1eae
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 02
col  1: [ 1]  32
tab 0, row 2, @0x1ea6
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 02
col  1: [ 1]  32
tab 0, row 3, @0x1e96
tl: 8 fb: --H-FL-- lb: 0x0  cc: 2
col  0: [ 2]  c1 02
col  1: [ 1]  33

这写数据表示了每条记录的每个列内数据，红色字体标识出来的全部一样，说明这四行数据完全相同，只有第四条记录（蓝色标识）的那一列不同，为什么呢？

查询表中的数据确定答案

select * from test.t;

      A1 A2
---------- ----------
      1 2
      1 2
      1 2
      1 3

相信看了这个查询结果，我们都能明白了，前3行数据完全一样，第四行数据中第2列与前三行不同。

这里的 c1 02 代表 1，32代表2，33代表3。

再来说说黄色字体，表示当前块的第一个tab，四行数据，同时标识出了当前这行数据的指针。

@0x1eb6，@0x1eae，@0x1ea6，@0x1e96；7862，7854，7846，7830 这四个值分别对应了四条数据的指针，根据这个指针我们可以找到相应的值。
+++++++++++++++++++++++++++++++++++++++++++++++++++
再来点高难度的
V$BH是BUFFER HEAD的视图。其基表是X$BH，这是目前能够找到的X$BH的说明中最新的版本：
Column    Type Description
~~~~~~    ~~~~~ ~~~~~~~~~~~
ADDR       RAW(4) Hex address of the Buffer Header.
INDX       NUMBER Buffer Header number
BUF#       NUMBER
HLADDR    RAW(4) Hash Chain Latch Address
See <View:V$LATCH_CHILDREN> . ADDR

LRU_FLAG NUMBER    8.1+ LRU flag
         KCBBHLDF 0x01    8.1  LRU Dump Flag used in debug print routine
         KCBBHLMT 0x02    8.1  moved to tail of lru (for extended stats)
         KCBBHLAL 0x04    8.1  on auxiliary list
         KCBBHLHB 0x08    8.1  hot buffer - not in cold portion of lru

FLAG       NUMBER
         KCBBHFBD 0x00001    buffer dirty
         KCBBHFAM 0x00002  7.3  about to modify; try not to start io
         KCBBHFAM 0x00002  8.0  about to modify; try not to start io
         KCBBHNAC 0x00002  8.1  notify dbwr after change
         KCBBHFMS 0x00004    modification started, no new writes
         KCBBHFBL 0x00008    block logged
         KCBBHFTD 0x00010    temporary data - no redo for changes
         KCBBHFBW 0x00020    being written; can't modify
         KCBBHFWW 0x00040    waiting for write to finish
         KCBBHFCK 0x00080  7.3  checkpoint asap
                  0x00080  8.0  not used
         KCBBHFMW 0x00080  8.1  multiple waiters when gc lock acquired
         KCBBHFRR 0x00100    recovery reading, do not reuse, being read
         KCBBHFUL 0x00200    unlink from lock element - make non-current
         KCBBHFDG 0x00400    write block & stop using for lock down grade
         KCBBHFCW 0x00800    write block for cross instance call
         KCBBHFCR 0x01000    reading from disk into KCBBHCR buffer
         KCBBHFGC 0x02000    has been gotten in current mode
         KCBBHFST 0x04000    stale - unused CR buf made from current
                  0x08000  7.3  Not used.
         KCBBHFDP 0x08000  8.0  deferred ping
         KCBBHFDP 0x08000  8.1  deferred ping
         KCBBHFDA 0x10000    Direct Access to buffer contents
         KCBBHFHD 0x20000    Hash chain Dump used in debug print routine
         KCBBHFIR 0x40000    Ignore Redo for instance recovery
         KCBBHFSQ 0x80000    sequential scan only flag
         KCBBHFNW  0x100000  7.3  Set to indicate a buffer that is NEW
                  0x100000  8.0  Not used
         KCBBHFBP  0x100000  8.1  Indicates that buffer was prefetched
         KCBBHFRW  0x200000  7.3  re-write if being written (sort)
                  0x200000  8.0  Not used
         KCBBHFFW  0x200000  8.1  Buffer has been written once
         KCBBHFFB  0x400000    buffer is "logically" flushed
         KCBBHFRS  0x800000    ReSilvered already - do not redirty
         KCBBHFKW 0x1000000  7.3  ckpt writing flag to avoid rescan */
                  0x1000000  8.0  Not used
         KCBBHDRC 0x1000000  8.1  buffer is nocache
                  0x2000000  7.3  Not used
         KCBBHFRG 0x2000000  8.0  Redo Generated since block read
         KCBBHFRG 0x2000000  8.1  Redo Generated since block read
         KCBBHFWS 0x10000000 8.0  Skipped write for checkpoint.
         KCBBHFDB 0x20000000 8.1  buffer is directly from a foreign DB
         KCBBHFAW 0x40000000 8.0  Flush after writing
         KCBBHFAW 0x40000000 8.1  Flush after writing

TS#       NUMBER 8.X Tablespace number
DBARFIL    NUMBER 8.X Relative file number of block
DBAFIL    NUMBER 7.3 File number of block
DBABLK    NUMBER Block number of block
CLASS    NUMBER See Note 33434.1

STATE    NUMBER
         KCBBHFREE       0    buffer free
         KCBBHEXLCUR    1    buffer current (and if DFS locked X)
         KCBBHSHRCUR    2    buffer current (and if DFS locked S)
         KCBBHCR          3    buffer consistant read
         KCBBHREADING    4    Being read
         KCBBHMRECOVERY 5    media recovery (current & special)
         KCBBHIRECOVERY 6    Instance recovery (somewhat special)

MODE_HELD NUMBER    Mode buffer held in (MODE pre 7.3)
0=KCBMNULL, KCBMSHARE, KCBMEXCL

CHANGES    NUMBER
CSTATE    NUMBER
X_TO_NULL NUMBER Count of PINGS out (OPS)
DIRTY_QUEUE NUMBER  You wont normally see buffers on the LRUW
LE_ADDR    RAW(4)  Lock Element address (OPS)
  SET_DS    RAW(4)  Buffer cache set this buffer is under
   OBJ       NUMBER    Data object number
TCH    NUMBER  8.1 Touch Count
TIM    NUMBER  8.1 Touch Time
  BA       RAW(4)
CR_SCN_BAS  NUMBER    Consistent Read SCN base
  CR_SCN_WRP  NUMBER    Consistent Read SCN wrap
  CR_XID_USN  NUMBER  CR XID Undo segment no
  CR_XID_SLT  NUMBER  CR XID slot
  CR_XID_SQN  NUMBER  CR XID Sequence
  CR_UBA_FIL  NUMBER  CR UBA file
  CR_UBA_BLK  NUMBER  CR UBA Block
CR_UBA_SEQ  NUMBER  CR UBA sequence
  CR_UBA_REC  NUMBER  CR UBA record
  CR_SFL    NUMBER
LRBA_SEQ NUMBER  } Lowest RBA needed to recover block in cache
LRBA_BNO NUMBER  }
  LRBA_BOF NUMBER  }

  HRBA_SEQ NUMBER  } Redo RBA to be flushed BEFORE this block
HRBA_BNO NUMBER  } can be written out
  HRBA_BOF NUMBER    }

RRBA_SEQ NUMBER  } Block recovery RBA
RRBA_BNO NUMBER  }
  RRBA_BOF NUMBER  }
NXT_HASH NUMBER Next buffer on this hash chain
PRV_HASH NUMBER Previous buffer on this hash chain
NXT_LRU    NUMBER Next buffer on the LRU
PRV_LRU    NUMBER Previous buffer on the LRU
US_NXT    RAW(4)
  US_PRV    RAW(4)
  WA_NXT    RAW(4)
  WA_PRV    RAW(4)
  ACC       RAW(4)
MOD       RAW(4)

对于9.2.0.1 BH的数据结构为：

{
  kgglk             kcbbhha; /*  hash chain buffer is on  */
  ktsn                kcbbhtsn; /*  tablespace number  */
  krdba             kcbbhrdba; /*  relative DBA  */
  ub4                kcbbhflg; /*  flags: all changes require hash latch  */
  b1                kcbbhst; /*  state of the buffer  */
  b1                kcbbhmd; /*  mode owned in (KCBMNULL, KCBMSHR, KCBMEXCL)  */
  word                kcbbhcla; /*  block class  */
  kfil                kcbbhafn; /*  absolute file number  */
  kobjd             kcbbhobj; /*  Object # (disk )for block (if known)  */
  kobjn             kcbbhobjn; /*  dictionary object # (if known)  */
  ptr_t             kcbbhba; /*  buffer base address (set when mapped)  */
  kscn                kcbbhdscn; /*  incremental Transactional DSCN  */
  kgglk             kcbbhus; /*  list of buffers using queue  */
  kgglk             kcbbhwa; /*  list of buffers waiting  */
  b1                kcbbhccnt; /*  number of changes to buffer in single kcbchg()  */
  b1                kcbbhcst; /*  change state for recovery if failure during kcbchg()  */
  kgglk             kcbbhrpl; /*  link for maintaining position on replacement chain  */
  b1                kcbbhfoq; /*  TRUE iff the buffer is on a write list  */
  b1                kcbbhlpf; /*  LRU latch protected flags  */
  ub2                kcbbhtch; /*  touch count  */
  ub4                kcbbhtim; /*  time of last touch count increment  */
  kcrda             kcbbhlrba; /*  lowest rba needed to recover block on disk  */
  kcrda             kcbbhrrba; /*  lowest rba needed to recover block in cache  */
  kcbcr             kcbbhcr; /*  consistent read fields  */
  kssob *             kcbbhrsop; /*  recovery s.o.; for recovery buffers only  */
  kcrfkd             kcbbhhfkd; /*  SCN of the highest change in the buffer  */
  ub2                kcbbhdbc; /*  delayed block cleanout count  */
  ub2                kcbbhssid;
  ub2                kcbbhcqid; /*  which ckpt queue in working set buffer is  */
  kgglk             kcbbhckql; /*  link for checkpoint queue  */
  kgglk             kcbbhfql; /*  link for per-file checkpoint queue  */
  struct kcbwds *    kcbbhds; /*  system set descriptor  */
  struct kcbbh.UNK_lch_kcbbhsh UNK_lch_kcbbhsh; /*  all fields that are needed in shared mode  */
}

posted @ 2009-09-21 13:36 mop 阅读(529) 评论(0) 收藏举报

刷新页面返回顶部

mop

oracle block

公告