oracle b-tree、位图索引测试 - cycsa

公告

引言：大家都知道“效率”是数据库中非常重要的一个指标，如何提高效率大家可能都会想起索引，但索引又这么多种，什么场合应该使用什么索引呢？哪种索引可以提高我们的效率，哪种索引可以让我们的效率大大降低（有时还不如全表扫描性能好）下面要讲的“索引”如何成为我们的利器而不是灾难！多说一点，由于不同索引的存储结构不同，所以应用在不同组织结构的数据上，本篇文章重点就是：理解不同的技术都适合在什么地方应用！

B-Tree索引
场合：非常适合数据重复度低的字段例如身份证号码手机号码 QQ号等字段，常用于主键唯一约束，一般在在线交易的项目中用到的多些。
原理：一个键值对应一行（rowid）格式：【索引头|键值|rowid】
优点：当没有索引的时候，oracle只能全表扫描where qq=40354446 这个条件那么这样是灰常灰常耗时的，当数据量很大的时候简直会让人崩溃，那么有个B-tree索引我们就像翻书目录一样，直接定位rowid立刻就找到了我们想要的数据，实质减少了I/O操作就提高速度，它有一个显著特点查询性能与表中数据量无关，例如查2万行的数据用了3 consistent get,当查询1200万行的数据时才用了4 consistent gets。
当我们的字段中使用了主键or唯一约束时，不用想直接可以用B-tree索引
缺点：不适合键值重复率较高的字段上使用，例如第一章 1-500page 第二章 501-1000page

创建两个表用于测试数据重复对b-tree索引的影响

SQL> create  table t1 as select object_id,object_name from all_objects;

Table created.

SQL> create table t2 as select mod(object_id,2) object_id,object_name from all_objects;

Table created.

SQL> select count(*) from t1;

  COUNT(*)
----------
     49562

SQL> select count(*) from t2;

  COUNT(*)
----------
     49563

检查重复度

t1不存在重复

SQL> select count(distinct(object_id)) from t1;

COUNT(DISTINCT(OBJECT_ID))
--------------------------
                     49562

t2高度重复

SQL> select count(distinct(object_id)) from t2;

COUNT(DISTINCT(OBJECT_ID))
--------------------------
                         2

创建索引

SQL> create index t1_idx on t1(object_id);

Index created.

SQL> create index t2_idx on t2(object_id);

Index created.

顺便看看索引大小：

SQL> select segment_name,bytes/1024 from user_segments where segment_type='INDEX';

SEGMENT_NAME         BYTES/1024
-------------------- ----------
T1_IDX                     1024
T2_IDX                      832

通过执行计划分析索引

SQL> set autotrace traceonly
SQL> set linesize 200

清空系统缓存，使测试标准

alter system flush shared_pool;   --清空共享池
alter system flush buffer_cache;  --清空数据库缓冲区，都是为了实验需要

t1的查询执行计划

SQL> select * from t1 where object_id=1;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    30 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     1 |    30 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
        754  recursive calls
          0  db block gets
        195  consistent gets
        240  physical reads
          0  redo size
        395  bytes sent via SQL*Net to client
        481  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
          0  rows processed

再次清空系统缓存，使测试标准

alter system flush shared_pool;
alter system flush buffer_cache;

t2的查询执行计划

SQL> select * from t2 where object_id=1;

24819 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1513984157

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 24110 |   706K|    53   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   | 24110 |   706K|    53   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
        901  recursive calls
          0  db block gets
       2074  consistent gets
        267  physical reads
          0  redo size
    1025900  bytes sent via SQL*Net to client
      18686  bytes received via SQL*Net from client
       1656  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
      24819  rows processed

从执行计划看：即使在t2 object_id列存在索引，系统自动选择了全表扫描

清理缓存

alter system flush shared_pool;
alter system flush buffer_cache;

强制t2索引扫描

SQL> select /*+ index(t2 t2_idx) */ * from t2 where object_id=1;

24819 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4244861920

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        | 24110 |   706K|   271   (1)| 00:00:04 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2     | 24110 |   706K|   271   (1)| 00:00:04 |
|*  2 |   INDEX RANGE SCAN          | T2_IDX | 24110 |       |    52   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
        340  recursive calls
          0  db block gets
       3675  consistent gets
        302  physical reads
          0  redo size
    1025900  bytes sent via SQL*Net to client
      18686  bytes received via SQL*Net from client
       1656  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
      24819  rows processed

t2的索引扫描cpu开销与t2全表扫描一致都读取了全表
t2强制索引扫描后cpu开销惊人，271%高于全表扫描73%
说明B-Tree索引在重复高的字段中做索引效率非但不提高反而下降

bitmap索引测试

尝试将T2的index换成bitmap索引看看效果如何：

SQL> drop index t2_idx;

Index dropped.

SQL> create bitmap index t2_bm_idx on t2(object_id);

Index created.

SQL> alter system flush shared_pool;
alter system flush buffer_cache;
System altered.

SQL>

System altered.

查看转换为bitmap后的执行计划

SQL> select * from t2 where object_id=1;

24819 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2856179546

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           | 24110 |   706K|    50   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T2        | 24110 |   706K|    50   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|           |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | T2_BM_IDX |       |       |            |          |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OBJECT_ID"=1)

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
       1143  recursive calls
          0  db block gets
       2103  consistent gets
        263  physical reads
          0  redo size
    1025900  bytes sent via SQL*Net to client
      18686  bytes received via SQL*Net from client
       1656  SQL*Net roundtrips to/from client
         14  sorts (memory)
          0  sorts (disk)
      24819  rows processed

效果好转

t2索引对比

---------------------------------------------------------------------------------------------
                consistent gets        physical reads        rows            cost
   b-tree            3675                302                24110            271
   bitmap            2103                263                24110            50        
----------------------------------------------------------------------------------------------

从对比看，在重复度高的字段上bitmap索引效率高于b-tree

同时bitmap索引占用表空间小

SQL> select segment_name,bytes/1024 from user_segments where segment_type='INDEX';

SEGMENT_NAME         BYTES/1024
-------------------- ----------
T1_IDX                     1024
T2_BM_IDX                    64
对比之前记录
T2_IDX                      832

数据高度重复情况下b-tree bitmap索引对比

SQL> create table t1_bm as select * from all_objects;

Table created.

SQL> insert into t1_bm select * from t1_bm;

49566 rows created.

SQL> /

99132 rows created.

SQL> /

198264 rows created.

SQL> /

396528 rows created.


SQL> create table t2_bm as select * from t1_bm;

Table created.


SQL> create index idx_t1_bm on t1_bm(object_type);

Index created.

SQL> create bitmap index idx_t2_bm on t2_bm(object_type);

Index created.

数据量增大后b-tree跟bitmap索引的占用空间区别明显

SQL>  select segment_name,bytes/1024/1024 M from  user_segments where segment_type='INDEX';

SEGMENT_NAME                  M
-------------------- ----------
IDX_T2_BM                 .4375   --> bitmap
IDX_T1_BM                    18   -->b-tree

高重复情况下b-tree索引执行计划

SQL> set autotrace traceonly
SQL> set linesize 200
SQL> alter system flush shared_pool;
alter system flush buffer_cache;
System altered.
SQL>

System altered.

SQL>
SQL> select * from t1_bm where object_type='TABLE';

25392 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 448742994

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           | 30759 |  3844K|  1399   (1)| 00:00:17 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1_BM     | 30759 |  3844K|  1399   (1)| 00:00:17 |
|*  2 |   INDEX RANGE SCAN          | IDX_T1_BM | 30759 |       |    81   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
       1410  recursive calls
          0  db block gets
       5869  consistent gets
       6537  physical reads
          0  redo size
    1369477  bytes sent via SQL*Net to client
      19104  bytes received via SQL*Net from client
       1694  SQL*Net roundtrips to/from client
         14  sorts (memory)
          0  sorts (disk)
      25392  rows processed

高数据重复度下bitmap索引执行计划

SQL> select * from t2_bm where object_type='TABLE';

25392 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1397281008

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           | 41878 |  5234K|  1328   (1)| 00:00:16 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T2_BM     | 41878 |  5234K|  1328   (1)| 00:00:16 |
|   2 |   BITMAP CONVERSION TO ROWIDS|           |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | IDX_T2_BM |       |       |            |          |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement


Statistics
----------------------------------------------------------
       1012  recursive calls
          0  db block gets
       4378  consistent gets
       6532  physical reads
          0  redo size
    1369490  bytes sent via SQL*Net to client
      19104  bytes received via SQL*Net from client
       1694  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
      25392  rows processed

t1_bm t2_bm 索引对比

---------------------------------------------------------------------------------------------
                consistent gets        physical reads        rows            cost        time
   b-tree            5869                6537            30759            1399        17
   bitmap            4378                6532            41878            1328        16
----------------------------------------------------------------------------------------------

感觉不是很明显，但是bitmap索引还是优于b-tree索引

从测试看，b-tree适合重复度低的字段做索引 bitmap索引适合重复度高的字段索引

by cycsa

posted on 2013-05-16 16:46 cycsa 阅读(641) 评论(0) 收藏举报

刷新页面返回顶部