MySQL查询优化器新特性介绍

1.BNL(Block Nested-Loop)
2.BKA(Batch Key Access)
3.MRR(Multi Range Read)
4.ICP(Index Condition push)

序言：

我们使用MySQL，查询性能是一个很重要的指标，对于查询优化器如何使用索引？实际中用单表查询、多表join。对于单表查询好优化，多表join，从最原始的NL算法，到BNL(NL的升级版)，再到BKA(也是NL的升级版)。我们深入理解查询优化器的发展历程及新特性，下面就MySQL查询优化器的新特性着重讲解。

MySQL 5.6 开始引入许多新特性，如ICP、BKA、MRR等。

1.BNL(Block Nested-Loop)

优化对象：可以优化被join的表是ALL/index(全索引扫描)/Range。

BNL: 从MySQL 5.1就引入，到了5.6 由只支持join方式扩展到支持“外连接”、“半连接”，目的是利用join buffer减少内循环扫描的次数。

什么情况下可能用到BNL? 目前执行计划里type是：ALL/index/range 才可能使用 BNL算法。

怎么看查询优化器使用了BNL算法，而不是NL算法：

In EXPLAIN output, use of BNL for a table is signified when the Extra value contains Using join buffer (Block Nested Loop) and the type value is ALL, index, or range.

很明确的看的出来，在执行计划的Extra 列是：Using join buffer (Block Nested Loop)，就说明用了BNL。

下面是BNL的例子：

mysql> desc SELECT a.phone,a.loan_amount,a.duration FROM `xxx_order` as a  straight_join(select order_id from xxx_order_xxx where order_id in('0000003fb42b4e8fa54335b7dcbd63e2','0000004f74b647cdab9a1d9b8163b01e','0000016b22e04a30bc2ae5ea9d49282a') ) as b where a.id=b.order_id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34497996
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: xxx_order_xxx
         type: range
possible_keys: order_id
          key: order_id
      key_len: 96
          ref: NULL
         rows: 3
        Extra: Using where; Using index

mysql> DESC SELECT a.phone,a.loan_amount,a.duration FROM `xxx_order` as a straight_join xxx_order_xxx as b force index(xxx_order_xxx_2fb72c6e) where b.coupon_id in(184506,184508,184509) and a.id=b.order_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34498271
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: range
possible_keys: xxx_order_xxx_2fb72c6e
          key: xxx_order_xxx_2fb72c6e
      key_len: 5
          ref: NULL
         rows: 3
        Extra: Using index condition; Using where; Using join buffer (Block Nested Loop)

-- 下面是LEFT OUTER JOIN:
mysql> DESC SELECT a.phone,a.loan_amount,a.duration FROM `xxx_order` as a LEFT OUTER JOIN (select order_id from xxx_order_xxx where order_id in('0000003fb42b4e8fa54335b7dcbd63e2','0000004f74b647cdab9a1d9b8163b01e','0000016b22e04a30bc2ae5ea9d49282a') ) as b ON a.id=b.order_id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34498410
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3
        Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: xxx_order_xxx
         type: range
possible_keys: order_id
          key: order_id
      key_len: 96
          ref: NULL
         rows: 3
        Extra: Using where; Using index

-- 下面是semi-join(自连接):
mysql> DESC SELECT a.phone,a.loan_amount,a.duration,b.duration FROM xxx_order as a,xxx_order as b  where a.loan_amount=b.loan_amount\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34498593
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34498593
        Extra: Using where; Using join buffer (Block Nested Loop)

下面NL、BNL算法伪代码来自互连接网：

Nested-Loop(NL) 的伪算法如下:
for each row in t1 matching range {
  for each row in t2 matching reference key {
     for each row in t3 {
      if row satisfies join conditions,
      send to client
    }
  }
 }
 
Block Nested-Loop(BNL) Join算法如下：
for each row in t1 matching range {
   for each row in t2 matching reference key {
    store used columns from t1, t2 in join buffer
    if buffer is full {
      for each row in t3 {
         for each t1, t2 combination in join buffer {
          if row satisfies join conditions,
          send to client
        }
       }
      empty buffer
    }
  }
}
 
if buffer is not empty {
   for each row in t3 {
    for each t1, t2 combination in join buffer {
      if row satisfies join conditions,
      send to client
     }
  }
}

如何更好使用优化后的BNL算法：

mysql> show variables like 'optimizer_switch';

| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=

on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,looses

can=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on |

把 block_nested_loop=on 即可。

如何更好的使用到BNL：

join_buffer_size 能一次容纳外层所有的元组(col1,col2,........) ，适当增大join_buffer_size，但是join_buffer_size>＝所有元组size，再增加join_buffer_size不会再提升查询速度。

2.BKA(Batch Key Access)

优化对象：

被join的表有索引(type: ref/eq_ref 可以用到BKA)，与BNL 的区别:被join的表无索引或者走的全索引才使用使用BNL，否则可能使用BKA。

什么情况下可能用到BKA?

目前被join的表执行计划里type是：ref、eq_ref 才可能使用BKA。

Extra: Using join buffer (Batched Key Access) 此时用的BKA，而不是BNL。

打开BKA(因BKA下层要用MRR，所以MRR也要同时打开)：

mysql> set optimizer_switch="mrr=on,mrr_cost_based=off,batched_key_access=on";

下面是使用BKA的sql例子：

mysql> DESC SELECT a.phone,a.loan_amount,a.duration FROM `xxx_order` as a LEFT OUTER JOIN xxx_order_xxx as b ON a.id=b.order_id where a.user_id>b.device_type\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ALL
possible_keys: order_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 34305976
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: eq_ref
possible_keys: PRIMARY,xxx_order_user_id_482c26488354e9c3_fk_users_account_id
          key: PRIMARY
      key_len: 96
          ref: hydra.b.order_id
         rows: 1
        Extra: Using where

BKA 会调用MRR，BKA批量处理被join表的type是：ref、eq_ref ，这里回表读取批量的数据可能会有很多随机IO，BKA很聪明，这个时候利用MRR，通过索引列排序主键值，由随机回表IO转成顺序IO。

如何更好的使用BKA这一新特性：主要在MRR，所以适当增大参数 read_rnd_buffer_size 的值。

3.MRR(Multi Range Read)

优化对象：

使用到二级索引的范围Range查询、ref/eq_ref 使用BKA的时候也可能用到MRR。

下面就用这个sql来说明MRR：

select non_key_col1,non_key_col2,... from tb where key_col=x;

没有MRR之前二级索引的查询是这样的：(下面2个图截自互联网)

第一步 先根据where条件中的二级索引获取辅助索引与主键的集合，结果集为r。
      select key_col, pk_col from tx where key_col=x order by key_col;  --使用sort buffer
第二步 通过第一步获取的主键来获取对应的值。
      for each pk_col_value in r do:
       select non_key_col1,non_key_col2,... from tb where pk_col=pk_col_value;

有MRR之后二级索引的查询是这样的：

第一步 先根据where条件中的二级索引获取辅助索引与主键的集合，结果集为r1。
      select key_col, pk_col from tx where key_col=x order by key_col; 
第二步 将结果集r1放在buffer里面(直到read_rnd_buffer_size 满)，然后对结果集r1按照pk_col排序，得到结果集是r2     
第三步 利用已经排序过的结果集，访问表中的数据，此时是顺序IO.
      select non_key_col1,non_key_col2,... from tx where pk_col in (r2);

下面是使用MRR 的例子:

mysql> desc select * from xxx_order where created_time>'2017-6-22 10:00:00' and created_time<='2017-6-22 10:00:01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: xxx_order
         type: range
possible_keys: xxx_order_created_time_12bbdaa144debf81_uniq
          key: xxx_order_created_time_12bbdaa144debf81_uniq
      key_len: 8
          ref: NULL
         rows: 3
        Extra: Using index condition
--上面没有使用到MRR，把mrr=on,mrr_cost_based=off后使用到MRR，如下：
mysql> set optimizer_switch="index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=off,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on";
Query OK, 0 rows affected (0.00 sec)

mysql> desc select * from xxx_order where created_time>'2017-6-22 10:00:00' and created_time<='2017-6-22 10:00:01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: xxx_order
         type: range
possible_keys: xxx_order_created_time_12bbdaa144debf81_uniq
          key: xxx_order_created_time_12bbdaa144debf81_uniq
      key_len: 8
          ref: NULL
         rows: 3
        Extra: Using index condition

如何更好的使用MRR这一新特性：适当增大参数 read_rnd_buffer_size 的值。

4.ICP(Index Condition push)

优化的对象： type是ref、eq_ref、range、ref_or_null 且当需要读取full table rows，5.6只支持InnoDB、MyISA表，5.7开始支持分区表。

怎么看sql是否使用了ICP：执行计划Extra： Using index condition，说明使用到了ICP。

ICP的好处：减少引擎层访问基表的次数，减少Server层访问引擎层的次数。目的减少了内部IO的数量。

下面以sql：SELECT * FROM tx WHERE key_col='x' and no_key_col1>'x' and no_key_col2<'x' ... ; 来说明ICP特性。（列：key_col 上有索引）

常识：数据在引擎层，要传递给Server层。

没有ICP,优化器的算法是这样：

a)引擎层通过索引(这里是是key_col)回表方式读取包含所有列的一行。

b)引擎层读取到的行传给Server层，Server层通过where后面所有条件过滤，抛弃不满足条件的行。

c)重复上述a、b，直到所有行都过滤完毕。

使用ICP后,优化器的算法是这样：

a)引擎层读取下一行的索引项(这里是是key_col)，check在where中的使用到索引的列，再通过索引过滤，如果此索引项的值满足索引列的where条件，则使用该索引项回表方式读取整行数据。

b)引擎层读取到的行传给Server层，Server层再通过where后面剩下条件(这里是no_key_col1>'x' and no_key_col2<'x' ...)过滤，抛弃不满足条件的行。

c)重复上述a、b，直到所有行都过滤完毕。

下面是使用了ICP的sql例子：

mysql> DESC SELECT a.phone,a.loan_amount,a.duration FROM `xxx_order` as a where finish_repay_time>='2017-05-01 22:41:40.81816' and finish_repay_time<'2017-05-01 22:41:41.81816'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: range
possible_keys: xxx_order_finish_repay_time_896d8b11358405_uniq
          key: xxx_order_finish_repay_time_896d8b11358405_uniq
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using index condition

打开ICP：SET optimizer_switch = 'index_condition_pushdown=on';

总结：

1.BKA、MRR的关系如下图：

BKA下层是通过调用MRR，BKA利用索引批量循环，调用MRR目的是批量回表数据由随机IO转成顺序IO。

参考资料：

1.https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html

2.https://dev.mysql.com/doc/refman/5.6/en/bnl-bka-optimization.html

3.https://dev.mysql.com/doc/refman/5.6/en/mrr-optimization.html

Kevin.hhl

1.BNL(Block Nested-Loop)

2.BKA(Batch Key Access)

3.MRR(Multi Range Read)

4.ICP(Index Condition push)

About