semi-join子查询优化 -- Duplicate Weedout策略

2019-05-20 17:51 abce 阅读(1885) 评论(2) 收藏举报

duplicate weedout是执行semi-join子查询的一种策略。

将semi-join作为一个常规的inner join。然后使用一个临时表，将重复的记录排除。

假设，你有一个查询，你在寻找一个大城市人口占总人口33%以上的国家:

select * 
from Country 
where 
   Country.code IN (select City.Country
                    from City 
                    where 
                      City.Population > 0.33 * Country.Population and 
                      City.Population > 1*1000*1000);

如果我们以常规的inner join方式连接表city和country：

inner join会产生重复的记录。这里Germany有三行记录，因为有三个大城市。现在我们将duplicate weedout加入图示中：

这里是用了一个带有主键的临时表，来避免产生多行记录。(Germany有三条记录)

下面是使用duplicate weedout后的执行计划，其中start temporary和end temporary表明使用了临时表：

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: range
possible_keys: Population,Country
          key: Population
      key_len: 4
          ref: NULL
         rows: 238
        Extra: Using index condition; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.City.Country
         rows: 1
        Extra: Using where; End temporary
2 rows in set (0.00 sec)

这个查询会读取city表中的238行记录，而且，它们中的每一个都将在Country表中进行主键查找，这将提供另外238行记录。总共就是476行，需要在临时表中增加238个查找(因为临时表是in-memory的，相对成本较低)。

相同的SQL，如果不适用duplicate weedout，执行计划为：

mysql> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 239
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: City
         type: index_subquery
possible_keys: Population,Country
          key: Country
      key_len: 3
          ref: func
         rows: 18
        Extra: Using where
2 rows in set (0.00 sec)

读取的行数是(239+239*18)=4541,就慢多了。

原文地址：

https://mariadb.com/kb/en/library/duplicateweedout-strategy/

刷新页面返回顶部

abce don't dream it , be it

semi-join子查询优化 -- Duplicate Weedout策略

About