MySQL案例:全文索引浅析
前言
所谓全文索引,就是一种通过建立倒排索引,快速匹配文档内容的方式。和B+树索引一样,倒排索引也是一种索引结构,一个倒排索引是由文档中所有不重复的分词和其所在文档的映射组成。倒排索引一般有两种不同的结构,一种是inverted file index,另一种是full inverted index。
(1)inverted file index,里面存储的映射关系是{分词,(该分词所在的文档ID)}
Number |
Text |
Documents |
---|---|---|
1 |
how |
(1,3) |
2 |
are |
(1,3) |
3 |
you |
(1,3) |
4 |
fine |
(2,4) |
5 |
thanks |
(2,4) |
(2)full inverted index,里面存储的映射关系是{分词,(该分词所在的文档ID:在文档中的具体位置)}
Number |
Text |
Documents |
---|---|---|
1 |
how |
(1:1),(3:1) |
2 |
are |
(1:2),(3:2) |
3 |
you |
(1:3),(3:3) |
4 |
fine |
(2:1),(4:1) |
5 |
thanks |
(2:2),(4:2) |
实现原理
辅助表
在MySQL InnoDB中,当一个全文索引被创建时,一系列的辅助表会被同时创建,用于存储倒排索引的信息。
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+--------------------------------------