开发随笔——NOT IN vs NOT EXISTS
之前在论坛中见到一个针对in/exists的讨论。原帖懒得找了,这里介绍一下近期的学习小结:
NOT IN和NOT EIXTS在对同意为null的列查询时会有一定的风险。特别是NOT IN,假设子查询包括了最少一个NULL,会出现非预期的结果。以下做一个演示。
IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL
DROP TABLE dbo.ShipmentItems;
GO
CREATE TABLE dbo.ShipmentItems
(
ShipmentBarcode VARCHAR(30) NOT NULL ,
Description VARCHAR(100) NULL ,
Barcode VARCHAR(30) NOT NULL
);
GO
INSERT INTO dbo.ShipmentItems
( ShipmentBarcode ,
Barcode ,
Description
)
SELECT '123456' ,
'1010203' ,
'Some cool widget'
UNION ALL
SELECT '123654' ,
'1010203' ,
'Some cool widget'
UNION ALL
SELECT '123654' ,
'1010204' ,
'Some cool stuff for some gadget';
GO
-- retrieve all the items from shipment 123654
-- that are not shipped in shipment 123456
SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123654'
AND Barcode NOT IN ( SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123456' );
/*
Barcode
------------------------------
1010204
*/
能够看出得到了期待结果。以下看看改动表结构,同意列为null的情况:
ALTER TABLE dbo.ShipmentItems
ALTER COLUMN Barcode VARCHAR(30) NULL;
INSERT INTO dbo.ShipmentItems
( ShipmentBarcode ,
Barcode ,
Description
)
SELECT '123456' ,
NULL ,
'Users manual for some gadget';
GO
SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123654'
AND Barcode NOT IN ( SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123456' );
/*
Barcode
------------------------------
*/非常多人会认为这是一个bug,有时候能查出数据。有时候却不能。可是实际上不是bug,当NOT IN子句返回最少一个NULL时,查询会返回空。以下的语句能更好地说明这个想法:
SELECT CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True'
ELSE 'Unknown or False'
END ,
CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True'
ELSE 'Unknown or False'
END;
/*
---- ----------------
True Unknown or False
*/实际上,因为IN的本质是OR操作,所以:
SELECT CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True'
ELSE 'Unknown or False'
END ;中,1 in 1。也就是为TRUE,所以返回true。这个语句的逻辑实际上是:
SELECT CASE WHEN ( 1 = 1 )
OR ( 1 = 2 )
OR ( 1 = NULL ) THEN 'True'
ELSE 'Unknown or False'
END ;
当使用NOT IN 时,如以下的语句:
SELECT CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True'
ELSE 'Unknown or False'
END ;会转变成:
SELECT CASE WHEN NOT ( ( 1 = 1 )
OR ( 1 = 2 )
OR ( 1 = NULL )
) THEN 'True'
ELSE 'Unknown or False' END ;依据离散数学的概念,能够转换为:
SELECT CASE WHEN ( ( 1 <> 1 )
AND ( 1 <> 2 )
AND ( 1 <> NULL )
) THEN 'True'
ELSE 'Unknown or False'
END ;谓词有短路特性,即在AND条件中。仅仅要有一个条件为false,整个条件都为false,而1<>1是为false,所以后面的也不须要推断了,直接返回else部分。即使是1<>null,依据集合论的特性,NULL和实际数据的对照总是返回unknown,所以也是为false。假设你非要用NOT IN 。请确保子查询永远不会有NULL返回。或者须要额外处理去除NULL,比方:
SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123654'
AND Barcode NOT IN ( SELECT Barcode
FROM dbo.ShipmentItems
WHERE ShipmentBarcode = '123456'
AND Barcode IS NOT NULL ) ;另一种方法就是改写语句,用NOT EXISTS来等价替换:
SELECT i.Barcode
FROM dbo.ShipmentItems AS i
WHERE i.ShipmentBarcode = '123654'
AND NOT EXISTS ( SELECT *
FROM dbo.ShipmentItems AS i1
WHERE i1.ShipmentBarcode = '123456'
AND i1.Barcode = i.Barcode );
/*
Barcode
------------------------------
1010204
*/另外,基于SARG要求,一般不建议用NOT IN/NOT EXISTS这样的反向扫描,避免影响性能。另一个选择使用IN/EXISTS的要点。就是多列匹配的问题,在T-SQL中,多列同一时候匹配要用EXISTS,而单列匹配能够用EXISTS/IN。可能能够用其它写法来实现IN的多列匹配,可是一般我个人会选择使用EXISTS来匹配多列。
原文出自:CSDN博客:黄钊吉的博客

浙公网安备 33010602011771号