组合索引的前导列,要使用选择率高的列?

Oracle Concept上有一句话,大致的意思是:应该选择选择率高的列作为组合索引的前导列

其实这句话,并不严谨,前导列的选择(索引的设计)应该决定于SQL的逻辑,而不是统一的固定原则,比如下面的案例:

 

Z:\>sqlplus mydb

 

SQL*Plus: Release 10.2.0.4.0 - Production on 星期二 3月 10 15:15:49 2009

 

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

 

输入口令:

 

连接到:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

MYDB > select * from testxyz order by a, b;

 

         A          B C
---------- ---------- ----------
         1         10 ABC
         2         15 ABC
         3         15 ABC
         4         21 ABC
         4         25 ABC
         5         20 ABC
         5         22 ABC
         6         27 ABC
         9         10 ABC
        10         15 ABC
        17          3 ABC
        18         21 ABC

 

已选择12行。

 

已用时间:  00: 00: 00.00
MYDB > select * from testxyz order by b, a;

 

         A          B C
---------- ---------- ----------
        17          3 ABC
         1         10 ABC
         9         10 ABC
         2         15 ABC
         3         15 ABC
        10         15 ABC
         5         20 ABC
         4         21 ABC
        18         21 ABC
         5         22 ABC
         4         25 ABC
         6         27 ABC

 

已选择12行。

 

已用时间:  00: 00: 00.00
MYDB > create index testxyz_ab on testxyz (a,b) pctfree 97;

 

索引已创建。

 

已用时间:  00: 00: 00.03
MYDB > create index testxyz_ba on testxyz (b,a) pctfree 97;

 

索引已创建。

 

已用时间:  00: 00: 00.06

MYDB > select object_name,object_id from user_objects
  2  where object_name in ('TESTXYZ_AB','TESTXYZ_BA');

 

OBJECT_NAME                               OBJECT_ID
---------------------------------------- ----------
TESTXYZ_AB                                    13218
TESTXYZ_BA                                    13219

 

已用时间:  00: 00: 00.01

 

A,B两列的选择率不通,我们前面创建了2个不通前导列的两列组合索引;那么对于下面的SQL来说哪个SQL的效率更高呢?

select /*+ index(testxyz testxyz_ba) */ a, b from testxyz
 where a=4 and b between 20 and 25;

 

这要先看看2个索引中,索引key是如何分布在节点上的~

 

1.先看TESTXYZ_AB:


MYDB > alter session set events 'immediate trace name treedump level 13218';

 

会话已更改。

 

已用时间:  00: 00: 00.26

/*

----- begin tree dump
branch: 0x1000c8c 16780428 (0: nrow: 3, level: 1)
   leaf: 0x1000c8d 16780429 (-1: nrow: 4 rrow: 4)
   leaf: 0x1000c8e 16780430 (0: nrow: 4 rrow: 4)
   leaf: 0x1000c8f 16780431 (1: nrow: 4 rrow: 4)
----- end tree dump

*/

 

MYDB > SELECT dbms_utility.data_block_address_file(to_number(ltrim('0x1000c8d',
  2                                                              '0x'),
  3                                                        'xxxxxxxx')) file#,
  4         dbms_utility.data_block_address_block(to_number(ltrim('0x1000c8d',
  5                                                               '0x'),
  6                                                         'xxxxxxxx')) block#
  7    FROM dual
  8  /

 

     FILE#     BLOCK#
---------- ----------
         4       3213

 

已用时间:  00: 00: 00.00
MYDB > SELECT dbms_utility.data_block_address_file(to_number(ltrim('0x1000c8e',
  2                                                              '0x'),
  3                                                        'xxxxxxxx')) file#,
  4         dbms_utility.data_block_address_block(to_number(ltrim('0x1000c8e',
  5                                                               '0x'),
  6                                                         'xxxxxxxx')) block#
  7    FROM dual
  8  /

 

     FILE#     BLOCK#
---------- ----------
         4       3214

 

已用时间:  00: 00: 00.01
MYDB > SELECT dbms_utility.data_block_address_file(to_number(ltrim('0x1000c8f',
  2                                                              '0x'),
  3                                                        'xxxxxxxx')) file#,
  4         dbms_utility.data_block_address_block(to_number(ltrim('0x1000c8f',
  5                                                               '0x'),
  6                                                         'xxxxxxxx')) block#
  7    FROM dual
  8  /

 

     FILE#     BLOCK#
---------- ----------
         4       3215

 

已用时间:  00: 00: 00.00
MYDB > alter system dump datafile 4 block 3213;

 

系统已更改。

 

已用时间:  00: 00: 01.34
MYDB > alter system dump datafile 4 block 3214;

 

系统已更改。

 

已用时间:  00: 00: 00.01
MYDB > alter system dump datafile 4 block 3215;

 

系统已更改。

 

已用时间:  00: 00: 00.01

/*

row#0[8021] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 02 --1
col 1; len 2; (2):  c1 0b --10
col 2; len 6; (6):  01 00 0c 86 00 01
row#1[8006] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 03 --2
col 1; len 2; (2):  c1 10 --15
col 2; len 6; (6):  01 00 0c 86 00 00
row#2[7991] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 04 --3
col 1; len 2; (2):  c1 10 --15
col 2; len 6; (6):  01 00 0c 86 00 02
row#3[7976] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 05 --4
col 1; len 2; (2):  c1 16 --21
col 2; len 6; (6):  01 00 0c 86 00 04
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 3213 maxblk 3213

 

row#0[8021] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 05 --4
col 1; len 2; (2):  c1 1a --25
col 2; len 6; (6):  01 00 0c 86 00 03
row#1[8006] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 06 --5
col 1; len 2; (2):  c1 15 --20
col 2; len 6; (6):  01 00 0c 86 00 05
row#2[7991] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 06 --5
col 1; len 2; (2):  c1 17 --22
col 2; len 6; (6):  01 00 0c 86 00 06
row#3[7976] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 07 --6
col 1; len 2; (2):  c1 1c --27
col 2; len 6; (6):  01 00 0c 86 00 07
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 3214 maxblk 3214

 

row#0[8021] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 0a --9
col 1; len 2; (2):  c1 0b --10
col 2; len 6; (6):  01 00 0c 86 00 08
row#1[8006] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 0b --10
col 1; len 2; (2):  c1 10 --15
col 2; len 6; (6):  01 00 0c 86 00 09
row#2[7991] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 12 --17
col 1; len 2; (2):  c1 04 --3
col 2; len 6; (6):  01 00 0c 86 00 0a
row#3[7976] flag: ------, lock: 0, len=15
col 0; len 2; (2):  c1 13 --18
col 1; len 2; (2):  c1 16 --21
col 2; len 6; (6):  01 00 0c 86 00 0b
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 4 minblk 3215 maxblk 3215

*/

 

从dump的结果上看,TESTXYZ_AB的存储结构和select a,b from testxyz order by a,b相同~

由于pctfree 97,12个索引entry,存储在4个block中,3个leafblock的数据分布,如下图:

 

L1   a       b

 

L2   a       b

 

L3   a       b

1

10

 

4

25

 

9

10

2

15

 

5

20

 

10

15

3

15

 

5

22

 

17

3

4

21

 

6

27

 

18

21

强制使用TESTXYZ_AB索引,以a=4 and b between 14 and 25来查询testxyz.a,testxyz.b,如下:

MYDB > select /*+ index(testxyz testxyz_ab) */ a, b from testxyz
  2  where a=4 and b between 20 and 25;

 

         A          B
---------- ----------
         4         21
         4         25

 

已用时间:  00: 00: 00.00

 

执行计划
----------------------------------------------------------
Plan hash value: 624396574

 

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     2 |    52 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| TESTXYZ_AB |     2 |    52 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):
---------------------------------------------------

 

   1 - access("A"=4 AND "B">=20 AND "B"<=25)

 

Note
-----
   - dynamic sampling used for this statement

 


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets    
          0  physical reads
          0  redo size
        498  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

 

2.再来看看TESTXYZ_BA:

同样得到索引的存储结构如下图:

     b       a

 

     b       a

 

     b       a

3

17

 

15

3

 

21

18

10

1

 

15

10

 

22

5

10

9

 

20

5

 

25

4

15

2

 

21

4

 

27

6

 

强制使用TESTXYZ_AB索引,以a=4 and b between 14 and 25来查询testxyz.a,testxyz.b,如下:

MYDB > select /*+ index(testxyz testxyz_ba) */ a, b from testxyz
  2  where a=4 and b between 20 and 25;

 

         A          B
---------- ----------
         4         21
         4         25

 

已用时间:  00: 00: 00.01

 

执行计划
----------------------------------------------------------
Plan hash value: 1233347511

 

-------------------------------------------------------------------------------
| Id  | Operation        | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |            |     2 |    52 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| TESTXYZ_BA |     2 |    52 |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):
---------------------------------------------------

 

   1 - access("B">=20 AND "A"=4 AND "B"<=25)
       filter("A"=4)

 

Note
-----
   - dynamic sampling used for this statement

 


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        498  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

 

因为b between 20 and 25 后 a的值在存储上不连续,所以无法access,只能filter得到a=4...

 

根据concept上的说法:distinct值比较高的列(selective)要放在组合索引的前面,但是如果是对于我们这种SQL的查询,concpt上的说法明显是错的

 

 

我们可以从下面大数据量的表,类似SQL执行中逻辑读的数量来证实:

 

C:\Documents and Settings\RequieM>sqlplus mydb

SQL*Plus: Release 9.2.0.8.0 - Production on Mon Mar 9 20:39:33 2009

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Enter password:

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.8.0 - Production

SQL> select count(*) from test_abc where object_id=10;

  COUNT(*)
----------
     24976

Elapsed: 00:00:00.00
SQL> update test_abc set object_id=20 where rownum<16000;

15999 rows updated.

Elapsed: 00:00:00.02
SQL> update test_abc set object_id=20 where rownum<8000;

7999 rows updated.

Elapsed: 00:00:00.00SQL> select object_id,count(*) from test_abc group by object_id;

 OBJECT_ID   COUNT(*)
---------- ----------
        10       8977
        20       8000
        30       7999

SQL> create index ind_test_abc on test_abc(data_object_id,object_id);

Index created.

Elapsed: 00:00:00.00

SQL> select count(*)from test_abc where object_id=20 and data_object_id between 20 and 10000;

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT ptimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'IND_TEST_ABC' (NON-UNIQUE)

 


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         12  consistent gets
          0  physical reads
          0  redo size
        407  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> explain plan for
  2  select count(*)from test_abc where object_id=20 and data_object_id between 20 and 10000;

Explained.

Elapsed: 00:00:00.00
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------


----------------------------------------------------------------------
| Id  | Operation            |  Name         | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |       |       |       |
|   1 |  SORT AGGREGATE      |               |       |       |       |
|*  2 |   INDEX RANGE SCAN   | IND_TEST_ABC  |       |       |       |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TEST_ABC"."DATA_OBJECT_ID">=20 AND
              "TEST_ABC"."OBJECT_ID"=20 AND "TEST_ABC"."DATA_OBJECT_ID"<=10000)
      filter("TEST_ABC"."OBJECT_ID"=20)

Note: rule based optimization

17 rows selected.

Elapsed: 00:00:00.00

 

SQL> create index ind_test_abc on test_abc (object_id,data_object_id);

Index created.

Elapsed: 00:00:00.02
SQL> set autotrace traceonly
SQL> select count(*)from test_abc where object_id=20 and data_object_id between 20 and 10000;

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT ptimizer=CHOOSE
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'IND_TEST_ABC' (NON-UNIQUE)

 


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          4  physical reads
          0  redo size
        407  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> explain plan for
  2  select count(*)from test_abc where object_id=20 and data_object_id between 20 and 10000;

Explained.

Elapsed: 00:00:00.00
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------


----------------------------------------------------------------------
| Id  | Operation            |  Name         | Rows  | Bytes | Cost  |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT     |               |       |       |       |
|   1 |  SORT AGGREGATE      |               |       |       |       |
|*  2 |   INDEX RANGE SCAN   | IND_TEST_ABC  |       |       |       |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - access("TEST_ABC"."OBJECT_ID"=20 AND
              "TEST_ABC"."DATA_OBJECT_ID">=20 AND
              "TEST_ABC"."DATA_OBJECT_ID"<=10000)

Note: rule based optimization

17 rows selected.

Elapsed: 00:00:00.00

posted @ 2019-01-31 16:38  zergduan  阅读(870)  评论(0)    收藏  举报