引言:大家都知道“效率”是数据库中非常重要的一个指标,如何提高效率大家可能都会想起索引,但索引又这么多种,什么场合应该使用什么索引呢?哪种索引可以提高我们的效率,哪种索引可以让我们的效率大大降低(有时还不如全表扫描性能好)下面要讲的“索引”如何成为我们的利器而不是灾难!多说一点,由于不同索引的存储结构不同,所以应用在不同组织结构的数据上,本篇文章重点就是:理解不同的技术都适合在什么地方应用!
B-Tree索引
场合:非常适合数据重复度低的字段 例如 身份证号码 手机号码 QQ号等字段,常用于主键 唯一约束,一般在在线交易的项目中用到的多些。
原理:一个键值对应一行(rowid) 格式: 【索引头|键值|rowid】
优点:当没有索引的时候,oracle只能全表扫描where qq=40354446 这个条件那么这样是灰常灰常耗时的,当数据量很大的时候简直会让人崩溃,那么有个B-tree索引我们就像翻书目录一样,直接定位rowid立刻就找到了我们想要的数据,实质减少了I/O操作就提高速度,它有一个显著特点查询性能与表中数据量无关,例如 查2万行的数据用了3 consistent get,当查询1200万行的数据时才用了4 consistent gets。
当我们的字段中使用了主键or唯一约束时,不用想直接可以用B-tree索引
缺点:不适合键值重复率较高的字段上使用,例如 第一章 1-500page 第二章 501-1000page
创建两个表用于测试数据重复对b-tree索引的影响
SQL> create table t1 as select object_id,object_name from all_objects; Table created. SQL> create table t2 as select mod(object_id,2) object_id,object_name from all_objects; Table created. SQL> select count(*) from t1; COUNT(*) ---------- 49562 SQL> select count(*) from t2; COUNT(*) ---------- 49563
检查重复度
t1不存在重复
SQL> select count(distinct(object_id)) from t1; COUNT(DISTINCT(OBJECT_ID)) -------------------------- 49562
t2高度重复
SQL> select count(distinct(object_id)) from t2; COUNT(DISTINCT(OBJECT_ID)) -------------------------- 2
创建索引
SQL> create index t1_idx on t1(object_id); Index created. SQL> create index t2_idx on t2(object_id); Index created.
顺便看看索引大小: SQL> select segment_name,bytes/1024 from user_segments where segment_type='INDEX'; SEGMENT_NAME BYTES/1024 -------------------- ---------- T1_IDX 1024 T2_IDX 832
通过执行计划分析索引
SQL> set autotrace traceonly SQL> set linesize 200
清空系统缓存,使测试标准
alter system flush shared_pool; --清空共享池 alter system flush buffer_cache; --清空数据库缓冲区,都是为了实验需要
t1的查询执行计划
SQL> select * from t1 where object_id=1; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 546753835 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 30 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 30 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_IDX | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID"=1) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 754 recursive calls 0 db block gets 195 consistent gets 240 physical reads 0 redo size 395 bytes sent via SQL*Net to client 481 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 10 sorts (memory) 0 sorts (disk) 0 rows processed
再次清空系统缓存,使测试标准
alter system flush shared_pool; alter system flush buffer_cache;
t2的查询执行计划
SQL> select * from t2 where object_id=1; 24819 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1513984157 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 24110 | 706K| 53 (2)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T2 | 24110 | 706K| 53 (2)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("OBJECT_ID"=1) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 901 recursive calls 0 db block gets 2074 consistent gets 267 physical reads 0 redo size 1025900 bytes sent via SQL*Net to client 18686 bytes received via SQL*Net from client 1656 SQL*Net roundtrips to/from client 12 sorts (memory) 0 sorts (disk) 24819 rows processed
从执行计划看:即使在t2 object_id列存在索引,系统自动选择了全表扫描
清理缓存
alter system flush shared_pool; alter system flush buffer_cache;
强制t2索引扫描
SQL> select /*+ index(t2 t2_idx) */ * from t2 where object_id=1; 24819 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4244861920 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 24110 | 706K| 271 (1)| 00:00:04 | | 1 | TABLE ACCESS BY INDEX ROWID| T2 | 24110 | 706K| 271 (1)| 00:00:04 | |* 2 | INDEX RANGE SCAN | T2_IDX | 24110 | | 52 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_ID"=1) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 340 recursive calls 0 db block gets 3675 consistent gets 302 physical reads 0 redo size 1025900 bytes sent via SQL*Net to client 18686 bytes received via SQL*Net from client 1656 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 24819 rows processed
t2的索引扫描cpu开销与t2全表扫描一致都读取了全表
t2强制索引扫描后cpu开销惊人,271%高于全表扫描73%
说明B-Tree索引在重复高的字段中做索引效率非但不提高反而下降
bitmap索引测试
尝试将T2的index换成bitmap索引看看效果如何:
SQL> drop index t2_idx; Index dropped. SQL> create bitmap index t2_bm_idx on t2(object_id); Index created. SQL> alter system flush shared_pool; alter system flush buffer_cache; System altered. SQL> System altered.
查看转换为bitmap后的执行计划
SQL> select * from t2 where object_id=1; 24819 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2856179546 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 24110 | 706K| 50 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | T2 | 24110 | 706K| 50 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX SINGLE VALUE | T2_BM_IDX | | | | | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("OBJECT_ID"=1) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 1143 recursive calls 0 db block gets 2103 consistent gets 263 physical reads 0 redo size 1025900 bytes sent via SQL*Net to client 18686 bytes received via SQL*Net from client 1656 SQL*Net roundtrips to/from client 14 sorts (memory) 0 sorts (disk) 24819 rows processed
效果好转
t2索引对比
--------------------------------------------------------------------------------------------- consistent gets physical reads rows cost b-tree 3675 302 24110 271 bitmap 2103 263 24110 50 ----------------------------------------------------------------------------------------------
从对比看,在重复度高的字段上bitmap索引效率高于b-tree
同时bitmap索引占用表空间小
SQL> select segment_name,bytes/1024 from user_segments where segment_type='INDEX'; SEGMENT_NAME BYTES/1024 -------------------- ---------- T1_IDX 1024 T2_BM_IDX 64 对比之前记录 T2_IDX 832
数据高度重复情况下b-tree bitmap索引对比
SQL> create table t1_bm as select * from all_objects; Table created. SQL> insert into t1_bm select * from t1_bm; 49566 rows created. SQL> / 99132 rows created. SQL> / 198264 rows created. SQL> / 396528 rows created. SQL> create table t2_bm as select * from t1_bm; Table created. SQL> create index idx_t1_bm on t1_bm(object_type); Index created. SQL> create bitmap index idx_t2_bm on t2_bm(object_type); Index created.
数据量增大后b-tree跟bitmap索引的占用空间区别明显
SQL> select segment_name,bytes/1024/1024 M from user_segments where segment_type='INDEX'; SEGMENT_NAME M -------------------- ---------- IDX_T2_BM .4375 --> bitmap IDX_T1_BM 18 -->b-tree
高重复情况下b-tree索引执行计划
SQL> set autotrace traceonly SQL> set linesize 200 SQL> alter system flush shared_pool; alter system flush buffer_cache; System altered. SQL> System altered. SQL> SQL> select * from t1_bm where object_type='TABLE'; 25392 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 448742994 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 30759 | 3844K| 1399 (1)| 00:00:17 | | 1 | TABLE ACCESS BY INDEX ROWID| T1_BM | 30759 | 3844K| 1399 (1)| 00:00:17 | |* 2 | INDEX RANGE SCAN | IDX_T1_BM | 30759 | | 81 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("OBJECT_TYPE"='TABLE') Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 1410 recursive calls 0 db block gets 5869 consistent gets 6537 physical reads 0 redo size 1369477 bytes sent via SQL*Net to client 19104 bytes received via SQL*Net from client 1694 SQL*Net roundtrips to/from client 14 sorts (memory) 0 sorts (disk) 25392 rows processed
高数据重复度下bitmap索引执行计划
SQL> select * from t2_bm where object_type='TABLE'; 25392 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1397281008 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 41878 | 5234K| 1328 (1)| 00:00:16 | | 1 | TABLE ACCESS BY INDEX ROWID | T2_BM | 41878 | 5234K| 1328 (1)| 00:00:16 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX SINGLE VALUE | IDX_T2_BM | | | | | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("OBJECT_TYPE"='TABLE') Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 1012 recursive calls 0 db block gets 4378 consistent gets 6532 physical reads 0 redo size 1369490 bytes sent via SQL*Net to client 19104 bytes received via SQL*Net from client 1694 SQL*Net roundtrips to/from client 10 sorts (memory) 0 sorts (disk) 25392 rows processed
t1_bm t2_bm 索引对比
--------------------------------------------------------------------------------------------- consistent gets physical reads rows cost time b-tree 5869 6537 30759 1399 17 bitmap 4378 6532 41878 1328 16 ----------------------------------------------------------------------------------------------
感觉不是很明显,但是bitmap索引还是优于b-tree索引
从测试看,b-tree适合重复度低的字段做索引 bitmap索引适合重复度高的字段索引
by cycsa
浙公网安备 33010602011771号