Mysql 系列 | join

join 也是平常开发中经常用到的，本篇了解一下 join 的执行过程。

t1、t2 两张表中有 a、b 字段，a 为索引。

ti 有 100 条数据，t2 有 1000 条数据。

Index Nested-Loop Join（索引嵌套查询）

select * from t1 straight_join t2 on (t1.a=t2.a);

t1 为驱动表，t2 为被驱动表。

mysql> explain select * from t1 straight_join t2 on (t1.a=t2.a) \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: a
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ref
possible_keys: a
          key: a
      key_len: 5
          ref: sunoct.t1.a
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 1 warning (0.01 sec)

explain 中可以看到，t2 表中用到了索引 a。

t1 要进行全表扫描，t2 走树搜索。

搜索过程中共扫描了 100 + 100 行数据

语句的执行过程：

从 t1 中读入一行数据 R
从数据行 R 中取出 a，到 t2 表中检索
取出 t2 中满足条件的行，和 R 组成一行，作为结果集的一部分
重复上面的步骤，t1 全表扫描，直到表中的最后一条数据

Simple Nested-Loop Join

select * from t1 straight_join t2 on (t1.a=t2.b);

t2 表中 b 字段没有索引，则每次用 t1.a 去 t2 表中检索时，都需要全表扫描 t2。
过程同上面的步骤，这个搜索过程中共扫描 100 + 100 * 1000 大概十万行。
实际上，mysql 优化器会自动优化，并不会使用这种笨办法，而是选用接下来提到的方案。

Block Nested-Loop Join

select * from t1 straight_join t2 on (t1.a=t2.b);

可以看到 Extra 中有 Using join buffer

mysql> explain select * from t1 straight_join t2 on (t1.a=t2.b) \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: a
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1000
     filtered: 10.00
        Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)

此时，执行流程如下：
- t1 整张表数据读入内存 join_buffer，
- 扫描表 t2，把每一行取出来和 join_buffer 中的数据对比，
- 满足条件，则作为结果集的一部分返回
扫描行数为 100 + 1000 行
join_buffer 是无序组织，因此内存中判断次数为 100 * 1000 次。
比起 Simple Nested-Loop Join 的扫描十万行，在内存中操作速度快很多，性能也更好。
join_buffer 大小有限，由参数 join_buffer_size 设定，默认是 256k。如果表数据放不下，则会分段放。先放一部分，判断完后清空，再继续放下一部分。
join_buffer_size 越大，一次可以放入的行越多，分成的段数也就越少，对被驱动表的全表扫描次数就越少

能不能使用 join

如果可以使用 IndexNested-Loop Join 算法，可以用上被驱动表上的索引，则扫描行数不太多，占用资源也可以承受。
如果使用 Block Nested-Loop Join 算法，则扫描行数过多，尤其是大表，会占用大量的系统资源，此时尽量不要用。
通过 explain 中，Extra 中的内容判断是否要用 join。

大表驱动 or 小表驱动

如果是 IndexNested-Loop Join 算法，驱动表会全表扫描，所以应该选择小表做驱动表。
如果是 Block Nested-Loop Join 算法，
- join_buffer_size 足够大时，大小表都一样
- join_buffer_size 不够大时（这种情况更常见），应该选择小表做驱动表

如果语句中有 where 条件，

select * from t1 straight_join t2 on (t1.b=t2.b) where t2.id<=50;
select * from t2 straight_join t1 on (t1.b=t2.b) where t2.id<=50;

则 t2 只需把前 50 行放入 join_buffer 中，算是小表

如果查询的字段不同
```
select t1.b,t2.* from t1 straight_join t2 on (t1.b=t2.b) where t2.id<=100;
select t1.b,t2.* from t2 straight_join t1 on (t1.b=t2.b) where t2.id<=100;
```
- t1、t2 表都是查 100 条，但是 t1 只需要字段 b 的值，t2 则需要全部字段。
- 此时只需把 t1 的 b 字段放入 join_buffer 中，此时应该 t1 作为驱动表。

综上，join 时应该尽量用到索引。并选用小表作为驱动表。

posted @ 2022-08-24 11:34 菜乌阅读(128) 评论(0) 收藏举报

刷新页面返回顶部

路漫漫其修远

Mysql 系列 | join

Index Nested-Loop Join（索引嵌套查询）

Simple Nested-Loop Join

Block Nested-Loop Join

能不能使用 join

大表驱动 or 小表驱动

公告

路漫漫 其修远

Mysql 系列 | join

Index Nested-Loop Join（索引嵌套查询）

Simple Nested-Loop Join

Block Nested-Loop Join

能不能使用 join

大表驱动 or 小表驱动

公告

路漫漫其修远