1 结论

select ... from a where a.id in (select b.id from b);

这种SQL,如果要改成表关联,不是select distinct .... from a,b where a.id=b.id; 两者是不等价的(除非b.id 是主键或唯一值)。

等价改写是: select .... from a,(select distinct b.id from b) bb where a.id=bb.id;  注意distinct的位置。

2 测试表与数据

2.1 左表

mydb=# create table tb1(id int primary key,name text);
CREATE TABLE
mydb=# insert into tb1 values(1,'a1'),(2,'a2'),(3,'a3'),(4,'a4');
INSERT 0 4
mydb=# select * from tb1;
 id | name 
----+------
  1 | a1
  2 | a2
  3 | a3
  4 | a4

2.2 右表

mydb=# create table tb2(id int ,name text, sales int);
CREATE TABLE
mydb=# insert into tb2 values(1,'a1',100),(1,'a1',200),(3,'a3',100),(2,'a2',21),(2,'a2',44);
INSERT 0 5
mydb=# select * from tb2;
 id | name | sales 
----+------+-------
  1 | a1   |   100
  1 | a1   |   200
  3 | a3   |   100
  2 | a2   |    21
  2 | a2   |    44
(5 rows)

3 测试过程

3.1 原始SQL

mydb=# select * from tb1 a where a.id in (select id from tb2);
 id | name 
----+------
  1 | a1
  2 | a2
  3 | a3
(3 rows)

3.2 错误写法

mydb=# select a.* from tb1 a join tb2 b on a.id = b.id;
 id | name 
----+------
  1 | a1
  1 | a1
  2 | a2
  2 | a2
  3 | a3
(5 rows)

3.3 正确写法

mydb=# select  a.* from tb1 a join (select distinct id from tb2) b on a.id = b.id;
 id | name 
----+------
  1 | a1
  2 | a2
  3 | a3
(3 rows)

4 IN子查询优化汇总

场景错误写法正确写法执行计划提升
基础IN SELECT a.* FROM a JOIN b ON a.id=b.id SELECT a.* FROM a JOIN (SELECT DISTINCT id FROM b) b ON a.id=b.id 避免重复扫描,减少Hash Join内存压力
带条件的IN SELECT a.* FROM a JOIN b ON a.id=b.id AND b.val='X' SELECT a.* FROM a JOIN (SELECT DISTINCT id FROM b WHERE val='X') b ON a.id=b.id 提前过滤数据,减少关联量
多列IN SELECT ... FROM a,b WHERE a.id=b.id AND a.type=b.type SELECT ... FROM a JOIN (SELECT DISTINCT id, type FROM b) b USING(id, type) 复合键去重,避免Cartesian乘积风险

 

 

 posted on 2025-07-18 15:55  xibuhaohao  阅读(6)  评论(0)    收藏  举报