1 结论
select ... from a where a.id in (select b.id from b);
这种SQL,如果要改成表关联,不是select distinct .... from a,b where a.id=b.id; 两者是不等价的(除非b.id 是主键或唯一值)。
等价改写是: select .... from a,(select distinct b.id from b) bb where a.id=bb.id; 注意distinct的位置。
2 测试表与数据
2.1 左表
mydb=# create table tb1(id int primary key,name text); CREATE TABLE mydb=# insert into tb1 values(1,'a1'),(2,'a2'),(3,'a3'),(4,'a4'); INSERT 0 4 mydb=# select * from tb1; id | name ----+------ 1 | a1 2 | a2 3 | a3 4 | a4
2.2 右表
mydb=# create table tb2(id int ,name text, sales int); CREATE TABLE mydb=# insert into tb2 values(1,'a1',100),(1,'a1',200),(3,'a3',100),(2,'a2',21),(2,'a2',44); INSERT 0 5 mydb=# select * from tb2; id | name | sales ----+------+------- 1 | a1 | 100 1 | a1 | 200 3 | a3 | 100 2 | a2 | 21 2 | a2 | 44 (5 rows)
3 测试过程
3.1 原始SQL
mydb=# select * from tb1 a where a.id in (select id from tb2); id | name ----+------ 1 | a1 2 | a2 3 | a3 (3 rows)
3.2 错误写法
mydb=# select a.* from tb1 a join tb2 b on a.id = b.id; id | name ----+------ 1 | a1 1 | a1 2 | a2 2 | a2 3 | a3 (5 rows)
3.3 正确写法
mydb=# select a.* from tb1 a join (select distinct id from tb2) b on a.id = b.id; id | name ----+------ 1 | a1 2 | a2 3 | a3 (3 rows)
4 IN子查询优化汇总
场景 | 错误写法 | 正确写法 | 执行计划提升 |
---|---|---|---|
基础IN | SELECT a.* FROM a JOIN b ON a.id=b.id |
SELECT a.* FROM a JOIN (SELECT DISTINCT id FROM b) b ON a.id=b.id |
避免重复扫描,减少Hash Join内存压力 |
带条件的IN | SELECT a.* FROM a JOIN b ON a.id=b.id AND b.val='X' |
SELECT a.* FROM a JOIN (SELECT DISTINCT id FROM b WHERE val='X') b ON a.id=b.id |
提前过滤数据,减少关联量 |
多列IN | SELECT ... FROM a,b WHERE a.id=b.id AND a.type=b.type |
SELECT ... FROM a JOIN (SELECT DISTINCT id, type FROM b) b USING(id, type) |
复合键去重,避免Cartesian乘积风险 |