InnoDB索引结构浅析

InnoDB索引浅析

在mysql中,索引作为一种数据结构,它提供对表的 行 进行快速查找的能力 ,通常通过形成表示特定列或一组列 的所有值的 树结构(B+树) 。索引用于快速查找具有特定列值的行。如果没有索引,MySQL 必须从第一行开始,然后读取整个表以查找相关行。表越大,成本就越高。如果表中有相关列的索引,MySQL 可以快速确定在数据文件中寻找的位置,而不必查看所有数据。这比按顺序读取每一行要快得多。
InnoDB表始终具有 表示主键的聚集索引。它们还可以在一个或多个列上定义一个或多个二级索引。

Everything is an index in InnoDB.

在InnoDB中一切都是索引,基于这几点可以得到这个结论:

  1. 所有表都有主键,没有唯一值字段的表会产生一个 GEN_CLUST_INDEX来作为主键保存。
  2. 完整的行数据都存储在聚簇索引结构里,与其中的主键链接(或者说主键就是数据的一部分
  3. 次级索引保存在一个单独的索引结构中,其中只保存主键和次级索引键的数据
    由此可见,索引和数据是不分家的,在分析索引结构也是直接在分析数据存储结构

InnoDB中的数据存储结构

B+ 树中,记录(数据)只能存储在叶节点上,而内部节点只能存储键值。B+树的叶节点以单链表的形式链接在一起,以提高搜索查询的效率。
B+ 树结构
B+Tree

innodb存储页的类B+树结构
InnoDB_index
从b+树图和基于b+树的简略索引图可以看出,InnoDB的索引几乎是一个B+树结构,不过它的每一层都有一个首位没有链接的双向链表(页的链接),而b+树只在叶子节点有一个单链表。
在索引结构中,页对应b+树的节点,通过页中记录的数据来分配子节点。非叶子节点只存储主键值和子节点的页值,叶子节点存储主键对应的其他数据。每个页的内部将记录按主键顺序排序存放。
例如查询数据 D,如果知道主键3,可以通过index_key=3,可以通过b+树走二分法直达页7,再遍历7中的数据,拿出数据。但没有索引的话, 查询no_index_key=D,就需要从页6开始完整的遍历一次叶子节点,取出所有值等于D的数据,性能差距非常大。

B+树的层级和深度

在InnoDB中,按以下表规定有索引的大小和深度限制
在超出层数大小或叶子数大小后,会将页进行分层

高度 非叶子页 叶子页 大小
1 0 1 16.0 KB
2 1 1203 18.8MB
3 1024 1447209 22.1 GB
4 1448413 1740992427 25.9 TB

InnoDB规定每个子页的大小最大为16k,在这个条件下,根据叶子节点数和大小来调整B+树高度。由于主键存储在树中的每一个节点下,过大的主键会导致叶子页树没有达到最大值就不得不提高高度,复杂化树结构。

创建一个student表,表结构及数据:

CREATE TABLE `student` (
  `s_id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `age` tinyint(3) NOT NULL DEFAULT '0',
  `info` varchar(200) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`s_id`),
  KEY `idx_name_age_info` (`name`,`age`,`info`),
  KEY `idx_info` (`info`)
)
+------+---------+-----+----------+
| s_id | name    | age | info     |
+------+---------+-----+----------+
|    1 | aaa     |  12 | cat      |
|    4 | abc     |  11 | catgort  |
|    5 | abf     |  12 | wolf     |
|    7 | akg     |  22 | earphone |
|    2 | bbb     |  25 | man      |
|    6 | bbc     |  23 | radio    |
|    8 | ccc     |  11 | mooncell |
|    3 | ccc     |  19 | dog      |
|    9 | ccc     |  22 | how      |
+------+---------+-----+----------+

student表索引结构:

id          name                            root        fseg        fseg_id     used        allocated   fill_factor
58          PRIMARY                         3           internal    1           1           1           100.00%
58          PRIMARY                         3           leaf        2           0           0           0.00%

根节点为第三页,由于数据量较少,没有子节点
第三页的结构,作为叶子节点展示了所有的数据

Record 127: (s_id=1) → (name="aaa", age=12, info="cat")
Record 158: (s_id=2) → (name="bbb", age=25, info="man")
Record 189: (s_id=3) → (name="ccc", age=19, info="dog")
Record 417: (s_id=4) → (name="abc", age=11, info="catgort")
Record 249: (s_id=5) → (name="abf", age=12, info="wolf")
Record 281: (s_id=6) → (name="bbc", age=23, info="radio")
Record 345: (s_id=7) → (name="akg", age=22, info="earphone")
Record 381: (s_id=8) → (name="ccc", age=11, info="mooncell")
Record 488: (s_id=9) → (name="ccc", age=22, info="how")

另一个结构简单(甚至没有设主键)的表user,数据量大很多:

CREATE TABLE `user` (
  `name` varchar(10) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `age` tinyint(3) NOT NULL DEFAULT '0',
  `info` varchar(200) COLLATE utf8mb4_bin NOT NULL DEFAULT ''
)

mysql> SELECT COUNT(1) FROM user;
+----------+
| COUNT(1) |
+----------+
|     2523 |
+----------+
1 row in set (0.00 sec)

它的根节点不是最底层,对应的子节点才是有数据的叶子节点
根节点:

Record 125: () → #5
Record 140: () → #15
Record 155: () → #15
Record 170: () → #15
Record 185: () → #15
Record 200: () → #15

叶子节点:

Record 127: () → (name="\x00\x00\x01", age=-37, info="\x01\x10a")
Record 160: () → (name="\x00\x00\x01", age=-37, info="\x01\x1Eb")
Record 193: () → (name="\x00\x00\x01", age=-37, info="\x01,c")
Record 226: () → (name="\x00\x00\x01", age=-13, info="")
Record 256: () → (name="\x00\x00\x01", age=-13, info="")
...

索引的种类

在B+树结构的索引中,主要分为聚簇索引与二级索引两类

聚簇索引 Cluster_index

每个InnoDB表都有一个称为聚集索引的特殊索引,用于存储行数据。通常,聚集索引与主键同义。

  • PRIMARY KEY在表上 定义时,InnoDB将其用作聚簇索引。应为每个表定义一个主键。如果没有逻辑上唯一且非空的列或列集来使用主键,请添加一个自动增量列。自动增量列的值是唯一的,并在插入新行时自动添加
  • 如果没有PRIMARY KEY为表定义,则使用将所有键列定义为的InnoDB第一个 索引作为聚集索引。 UNIQUENOT NULL
  • 如果表没有索引PRIMARY KEY或没有合适的 UNIQUE索引,InnoDB 则生成一个隐藏的聚集索引 GEN_CLUST_INDEX,该索引以包含行 ID 值的合成列命名。行按分配的行 ID 排序InnoDB。行 ID 是一个 6 字节字段,随着新行的插入而单调增加。因此,按行 ID 排序的行在物理上按插入顺序排列。

上面对聚簇索引的描述摘自Mysql5.7的官方文档,总来来说就是:1.主键即聚簇索引 2.每个表必需由一个唯一且非空的列做聚簇索引,如果没有则自动生成一个
例如上文中的user表,它是一个没有主键且没有唯一值的表,查看表的索引列表,可以看到InnoDB自动生成了一个聚簇索引:

id          name                            root        fseg        fseg_id     used        allocated   fill_factor
59          GEN_CLUST_INDEX                 3           internal    1           1           1           100.00%
59          GEN_CLUST_INDEX                 3           leaf        2           6           6           100.00%

二级索引 Secondary_indexes

除聚集索引之外的索引称为二级索引。在 中InnoDB,二级索引中的每个记录都包含行的主键列以及为二级索引指定的列。 InnoDB使用此主键值在聚集索引中搜索行。如果主键很长,则二级索引将使用更多空间,因此使用较短的主键是有利的。

官方对于二级索引的描述很少,从上面分析表的索引结构可以看出,二级索引的非叶子节点和聚簇索引相似,但叶子节点只保存了二级索引的列的值和主键值。如果需要查询这俩之外的值,则会进行一个回表操作,先查询二级索引,将主键值保存在内存中,再用主键值回聚簇索引表查询其他数据。

上图student表的二级索引:

id          name                            root        fseg        fseg_id     used        allocated   fill_factor
69          idx_info                        5           internal    5           1           1           100.00%
69          idx_info                        5           leaf        6           0           0           0.00%

查看二级索引叶子节点的结构,仅由索引键和主键字段构成:

Record 139: (info="cat") → (s_id=1)
Record 260: (info="catgort") → (s_id=4)
Record 152: (info="dog") → (s_id=3)
Record 165: (info="earphone") → (s_id=7)
Record 277: (info="how") → (s_id=9)
Record 183: (info="man") → (s_id=2)
Record 196: (info="mooncell") → (s_id=8)
Record 214: (info="radio") → (s_id=6)
Record 229: (info="wolf") → (s_id=5)

一些提高效率的使用方式

以上面的student表为例

字符串的场合

精准匹配,正确使用索引

mysql> EXPLAIN SELECT * FROM student WHERE `info` LIKE 'cat';
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 802     | NULL |    1 |   100.00 | Using index condition |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+

右侧模糊匹配,可以使用索引

mysql> EXPLAIN SELECT * FROM student WHERE `info` LIKE 'c%';
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 802     | NULL |    2 |   100.00 | Using index condition |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+

左侧模糊匹配与全模糊匹配,无法走索引

mysql> EXPLAIN SELECT * FROM student WHERE `info` LIKE '%c';
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |    11.11 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

回到student表中的索引结构,按照索引行的编码结构进行了从左到右的排序

Record 139: (info="cat") → (s_id=1)
Record 260: (info="catgort") → (s_id=4)
Record 152: (info="dog") → (s_id=3)
Record 165: (info="earphone") → (s_id=7)
Record 277: (info="how") → (s_id=9)
Record 183: (info="man") → (s_id=2)
Record 196: (info="mooncell") → (s_id=8)
Record 214: (info="radio") → (s_id=6)
Record 229: (info="wolf") → (s_id=5)

在查询字符串时,通过右侧模糊匹配可以配合索引查询,左侧一旦有模糊项则无法使用索引。这是B+树的存储特性决定的
如果有前段相似后段不同的字符串需要索引,可以使用逆序存储,如cat->tac, catgory->yrogtca  通过这种方式来有效使用索引

联合索引的场合

student表中创建 name_age 的联合索引

mysql> SHOW INDEX FROM student;
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| student |          0 | PRIMARY      |            1 | s_id        | A         |          10 |     NULL | NULL   |      | BTREE      |         |               |
| student |          1 | idx_info     |            1 | info        | A         |          10 |     NULL | NULL   |      | BTREE      |         |               |
| student |          1 | idx_name_age |            1 | name        | A         |           7 |     NULL | NULL   |      | BTREE      |         |               |
| student |          1 | idx_name_age |            2 | age         | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
+---------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

二级索引树的构成

Record 126: (name="aaa", age=12) → (s_id=1)
Record 140: (name="abc", age=11) → (s_id=4)
Record 154: (name="abf", age=12) → (s_id=5)
Record 168: (name="akg", age=22) → (s_id=7)
Record 182: (name="bbb", age=25) → (s_id=2)
Record 196: (name="bbc", age=23) → (s_id=6)
Record 210: (name="ccc", age=11) → (s_id=8)
Record 224: (name="ccc", age=19) → (s_id=3)
Record 238: (name="ccc", age=22) → (s_id=9)

可以看到,先将name排序,在相同的name中,再将age排序。
精准匹配name和age的结果是不同的,后者无法走索引

mysql> EXPLAIN SELECT * FROM student WHERE `name`='abc';
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type | possible_keys | key          | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | student | NULL       | ref  | idx_name_age  | idx_name_age | 42      | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+-------+

mysql> EXPLAIN SELECT * FROM student WHERE `age` = 12;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |    10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

在搭配name后,age也能走上索引了

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name`='ccc' AND `age`=11;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_name_age  | idx_name_age | 42      | NULL |    1 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

范围匹配中,name的范围可以走到索引,但其后的所有键都没办法,因为在name的范围内后续的键都是局部有序整体无序的。只有前面的键为精准匹配,后续的范围匹配才能使用索引
第一个字段范围匹配,扫了7行

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name` > 'abc';
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_name_age  | idx_name_age | 42      | NULL |    7 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

第一个字段+第二个字段一起匹配,依旧是扫7行,通过覆盖率也可以看出,是在第一个字段范围内,一个一个判断是否满足第二个字段的条件,并非索引模式

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name` > 'abc' AND `age` > 12;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_name_age  | idx_name_age | 42      | NULL |    7 |    33.33 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

当第一个字段使用精准匹配后,第二个字段再使用范围匹配也能走上索引了

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name` = 'ccc' AND `age` > 12;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_name_age  | idx_name_age | 43      | NULL |    2 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+--------------------------+

ORDER BY与GROUP BY,结合索引B+树的特性,也可以看出,满足左边键精准的前提下,后续键才能走到索引
虽然设计的表不太好体现,但从结构分析,下面这种写法显然能对age使用索引快速排序

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name` = 'ccc' ORDER BY age ASC;
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------+
| id | select_type | table   | partitions | type | possible_keys | key          | key_len | ref   | rows | filtered | Extra                    |
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | ref  | idx_name_age  | idx_name_age | 42      | const |    3 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------+

但这种写法就无法使用索引对age排序,因为name是一个范围

mysql> EXPLAIN SELECT `name`, `age` FROM student WHERE `name` > 'aaa' ORDER BY age ASC;
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra                                    |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_name_age  | idx_name_age | 42      | NULL |    8 |   100.00 | Using where; Using index; Using filesort |
+----+-------------+---------+------------+-------+---------------+--------------+---------+------+------+----------+------------------------------------------+

回表问题

当查询的数据在索引表中都存在时,只需按索引键查出结果即可,但二级索引只保存的主键和索引键数据,所以需要进行回表操作
只需idx_info就能完成查询

mysql> EXPLAIN SELECT `s_id`, `info` FROM student WHERE `info` LIKE 'c%';
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 802     | NULL |    2 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+

explain分析的结果的是一样的,但当数据量特别大导致大量回表情况时,优化器会采用扫全表而非 二级索引+回表的方式。

mysql> EXPLAIN SELECT `s_id`, `info`, 'age' FROM student WHERE `info` LIKE 'c%';
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 802     | NULL |    2 |   100.00 | Using where; Using index |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+--------------------------+

索引列在比较表达式中单独显现

比较表达式中对索引列进行了比较以外的操作(数学运算,函数等),会导致索引失效

mysql> EXPLAIN SELECT * FROM student WHERE s_id*2 > 4;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   10 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+

操作非索引字段能使索引生效

mysql> EXPLAIN SELECT * FROM student WHERE s_id > 4/2;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |    7 |   100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

总结

索引结构的核心就是B+树,一切逻辑都围绕该结构展开,在使用中应该注意的点

  1. 主键值设计尽可能小,降低索引表的空间占用并提高查询效率
  2. 由于二级索引会有回表的操作,且联合索引的限制很多,所以在设计时要结合业务实际,争取达到不回表的效果
  3. 无论是联合索引还是模糊匹配,都存在最左优先原则,先全值匹配左侧才有可能使右边的使用索引
  4. 避免在比较表达式中对索引列进行比较之外的操作

参考

掘金小册子,从根儿上理解MySQL: https://juejin.cn/book/6844733769996304392?enter_from=course_center&utm_source=course_center
innodb树结构解析与分析工具: https://blog.jcole.us/innodb/

posted @ 2024-12-23 20:58  马猴♂兄贵  阅读(61)  评论(0)    收藏  举报