【译】什么情况下应该分解复杂的查询来提升性能

简介

    Microsoft SQL Server在大多数情况下都能够创建非常高效的执行计划。但是在某些特定的查询类型会给查询优化器带来麻烦。本篇文章阐述了其中4种情况,这些情况通常包含了需要SQL Server传递多个数据集、产生没有统计信息的中间结果集,抑或是中间结果集的统计信息无法被正确计算。

    将单独的查询语句分解为多个语句或是多个步骤,来使得SQl Server生成完全不同的执行计划,或是为中间结果集创建统计信息。分解语句,而不是使用提示的好处是使得SQL Server可以根据数据的变化,动态的生成高效的执行计划。

     虽然本篇文章所探讨的查询类型大多是基于ETL或是报表,但同样也可以在其他类型的查询中找到。

     本篇文章主要阐述了下述4中查询模式:

 

Where字句中包含了OR逻辑

    在这种情况下,在OR或Join操作符两端的条件涉及了不同的表。该类语句可以通过使用UNION操作符来替代WHERE或ON子句中的OR操作符。

 

中间解决集存在聚合

    该类查询是与聚合后的数据集进行连接,这往往会导致低效的查询性能。该类问题可以通过将聚合后的查询结果存入临时表来解决。

 

大量复杂的Join

    该类查询模式包含了大量连接,尤其是连接条件是不等连接,由于统计信息随着表连接的增多精度逐渐下降,这会导致低效的查询性能。解决这类情况可以通过分解查询,并将中间解决存入临时表解决。

 

在Where或Join中包含了Case子句

    该类查询在WHERE或JOIN子句中包含了CASE子句,这会导致难以有效的利用统计信息,从而造成查询性能低效。解决该类问题可以使用T-SQL的IF语句来替代CASE子句将语句执行流程提炼出来。

 

    理解上述4种查询类型的概念可以更好的帮助解决其他类似情况导致的低效查询。因此可以重写查询来获得更高效稳定的查询性能。

 

 

 

下列使用OR子句的例子不会引起性能问题.

WHERE a.col1 = @val1 OR a.col1 = @val2 …           

    这种查询模式仅仅是WHERE COL1 IN (@val1,@val2)的另一种写法。使用这种模式不会产生低效的查询计划。但是,值得注意的是WHERE子句中的每一个值都会使得SQL Server从B-Tree的根节点找到叶子节点。

    该查询模式的关键是OR操作符两边的列都会被计算出值。在上述例子中,该列是col1。如果col1上存在索引,则在该索引上执行查找操作就可以满足所有条件。如果索引足够小,甚至可能出现索引扫描。同样的指导原则适用于覆盖索引和其他索引。

 

WHERE a.col1 = @val1 OR a.col2 = @val2 … 

    在这种查询模式中,同一个表的两列中的值需要被估算出来,如果在这两列上都存在索引,其中一个索引的第一列是COL1,另一个索引的第一列是COL2,如果数据量大,则SQL Server使用索引来获取数据(如图1所示)方式为UNION两个索引查找的值,而如果数据量比较小,则SQL Server会使用扫描。

1

图1.

 

WHERE a.col1 = @val1 OR a.col2 IN (SELECT col2 FROM tab2)

    通过分析该模式可以看出,SQL Server重写了IN子句,将IN变为和表2进行了连接,并使用了不同的索引来获取满足条件的行。并且在最后对结果集进行排序来消除重复行(如图2)。这同样也是高效的执行计划,对该类语句应用本文所探讨的技术并不会产生更好的执行计划。

2

图2.

 

会引起问题的OR用法

    在上述例子中,OR操作符两边的条件都是在同一个表之内,因此,执行计划不会出现问题。但如果,OR两端的操作符引用了不同的表时,则会出现问题。

 

WHERE a.col1 = @val1 OR b.col2 = @val2

    上述查询所代表的查询模式是会导致问题的。将该查询分解为两步可以生成成本更小的执行计划。请注意在该查询模式中涉及到了两个表。在OR操作符两端的两个表的条件中涉及到不同的两个表。

注意:为了简便起见,例子中仅仅使用了等号作为条件。然而,即使使用了不等式或诸如BETWEEN的变量作为条件也可以出现相同的情况。

 

    下述是附件1中所描述的会导致问题的示例查询。

DECLARE @minEmp INT
 
DECLARE @maxEmp INT
 
SET @minEmp= 100
 
SET @maxEmp= 200
 
 
 
SELECT e.* FROM HumanResources.Employee e
 
LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID=c.ContactID
 
WHERE e.EmployeeID BETWEEN @minEmp and @maxEmp
 
      OR c.EmailAddress IN('sabria0@adventure-works.com','teresa0@adventure-works.com','shaun0@adventure-works.com')
 
 

代码清单1.会导致问题的示例查询

 

    现在可以看到OR两端的条件涉及到了HumanResources.Employee以及Person.Contact表。现在我们来看在 Person.Contacts上存在的索引。

 

表1.AdventureWork

索引名称 索引描述 索引键
AK_Contact_rowguid 非聚集,唯一 rowguid
IX_Contact_EmailAddress 非聚集 EmailAddress
PK_Contact_ContactID 非聚集,唯一,主键 ContactID

 

    在Person.Contact(EmailAddress)上存在的索引可以被用于获取查询条件中满足邮件地址的行。在Employee表上的主键和聚集索引是EmployeeID。按理来说SQL Server应该从两个表上的索引通过低成本的索引查找获得。然而,执行查询所获得的执行计划却显示了对两个索引使用了索引扫描,如图4所示。

4

图4.

 

    此时如果将STATISTIC IO和STATISTIC TIME启用的话,结果如下:

Table 'Contact'. Scan count 1, logical reads 569, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Employee'. Scan count 1, logical reads 6963, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

代码清单2.

 

    结果显示了总共7532个逻辑读以及94ms的CPU时间。该计划或许在小系统且执行不频繁时不会引起注意,然而,如果表中数据增长到一定数量,且查询的复杂度增长时,该类查询将会成为将系统资源损耗殆尽的罪魁祸首。

    为了降低查询成本,可以将该类查询分为更容易让SQL Server解释的多个小查询。您可以重写查询,将Where子句中的条件进行分解,使得单个查询条件只涉及一个表。

    对于如例子所示的示例,可以很容易看出两个表进行UNION操作可以替代上述出问题的查询。如下面代码所示:

DECLARE @minEmp INT

DECLARE @maxEmp INT

SET @minEmp = 100

SET @maxEmp = 200





SELECT e.*FROM HumanResources.Employee e

LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID = c.ContactID

WHERE EmployeeID BETWEEN @minEmp and @maxEmp

UNION

SELECT e.*FROM HumanResources.Employee e

LEFT JOIN Adventureworks.Person.Contact c ON e.EmployeeID = c.ContactID

WHERE

c.EmailAddress in('sabria0@adventure-works.com','teresa0@adventure-works.com','shaun0@adventure-works.com')

代码清单3.

 

    虽然功能上代码清单3和代码清单1一致,但SQL Server对于这两种查询语句的处理方式则完全不同。在该查询计划中,使用UNION来代替OR。使用UNION之后,SQL Server可以通过索引扫描查找,而不再是索引扫描,因此降低了成本,如图5所示。

5

图5.

 

    结果如代码清单4:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Employee'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Contact'. Scan count 3, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

代码清单4.

 

    因此可以很容易看出,逻辑读下降到26,CPU时间低于1MS。

 

查询反模式2:由聚合产生的中间结果集

    诸如GROUP BY或DISTINCT之类的关键字会产生与原表行数不同的中间结果集。如果这些中间结果集还与其他表或数据集进行连接,则中间结果集上不会存在统计信息。因此SQL Server只能通过原数据集的统计信息估算基数,但是这种方式的精度随着数据集的增多和查询复杂度的增加变得越来越不尽人意。任何不准确的基数预估都可能导致低效的查询计划。

    如果不准确的基数预估是源于过期的统计信息,则更新表或索引的统计信息就能够产生更高效的查询计划。但是,如果查询语句需要生成中间结果集以供后续步骤使用的话,这种中间结果集上不会存在统计信息,虽然可以从原数据集中获取统计信息,但这种方式很难生成高效的执行计划。本文的附件C和附件D的例子就是这类查询无法使用统计信息而导致的低效执行计划。在生产系统中,该类查询往往出现在ETL或报表作业中。值得注意的是如需做出该类DEMO则需要足够量的数据。代码清单5包含了两个可以作为DEMO来演示该情况和解决方案的查询。

    下述步骤可以帮您准备好演示。

1.执行下述代码

/*

Do this all in tempdb to get the auto-cleanup after restart if all is forgotten

*/



USE tempdb

GO

SET NOCOUNT ON

SET STATISTICS IO OFF

SET STATISTICS TIME OFF

GO



/*

Cleanup so multiple runs can be done if necessary

*/



IF OBJECT_ID('dbo.Customers') IS NOT NULL

DROP TABLE dbo.Customers

IF OBJECT_ID('dbo.InternetOrders') IS NOT NULL

DROP TABLE dbo.InternetOrders

IF OBJECT_ID('dbo.StoreOrders') IS NOT NULL

DROP TABLE dbo.StoreOrders

IF OBJECT_ID('dbo.InternetQuotes') IS NOT NULL

DROP TABLE dbo.InternetQuotes

IF OBJECT_ID('dbo.StoreQuotes') IS NOT NULL

DROP TABLE dbo.StoreQuotes

IF OBJECT_ID('dbo.SurveyResults') IS NOT NULL

DROP TABLE dbo.SurveyResults

IF OBJECT_ID('dbo.SurveyDetails') IS NOT NULL

DROP TABLE dbo.SurveyDetails

IF OBJECT_ID('dbo.TransactionType3') IS NOT NULL

DROP TABLE dbo.TransactionType3

IF OBJECT_ID('dbo.TransactionType4') IS NOT NULL

DROP TABLE dbo.TransactionType4

IF OBJECT_ID('dbo.TransactionType5') IS NOT NULL

DROP TABLE dbo.TransactionType5

IF OBJECT_ID('dbo.TransactionType6') IS NOT NULL

DROP TABLE dbo.TransactionType6



/*

create tables for customers, internet orders, and store orders

*/



CREATE TABLE dbo.Customers

(

customerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, customerName VARCHAR(30)NOT NULL

, otherStuff NCHAR(100) NULL

)

GO



CREATE TABLE dbo.InternetOrders

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX InternetOrders_customerID on InternetOrders(customerID) INCLUDE(orderTotal)

CREATE INDEX InternetOrders_OrderDate ON dbo.InternetOrders(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE storeOrders

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX storeOrders_customerID ON storeOrders(customerID) INCLUDE(orderTotal)

CREATE INDEX StoreOrders_OrderDate ON dbo.StoreOrders(orderDate) INCLUDE(CustomerID, orderTotal)

GO



CREATE TABLE dbo.InternetQuotes

(

customerID INT NOT NULL

, quoteID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, quoteTotal MONEY NOT NULL

, quoteDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX InternetQuotes_customerID on InternetQuotes(customerID) INCLUDE(quoteTotal)

CREATE INDEX Internetquotes_OrderDate ON dbo.InternetQuotes(quoteDate) INCLUDE(CustomerID, quoteTotal)



GO



CREATE TABLE dbo.StoreQuotes

(

customerID INT NOT NULL

, storeQuoteID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, quoteTotal MONEY NOT NULL

, quoteDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX StoreQuotes_customerID on StoreQuotes(customerID) INCLUDE(quoteTotal)

CREATE INDEX StoreQuotes_OrderDate ON dbo.StoreQuotes(quoteDate) INCLUDE(CustomerID, quoteTotal)



GO



CREATE TABLE dbo.TransactionType3

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType3_customerID on dbo.TransactionType3(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType3_OrderDate ON dbo.TransactionType3(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE TransactionType4

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType4_customerID ON dbo.TransactionType4(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType4_OrderDate ON dbo.TransactionType4(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE dbo.TransactionType5

(

customerID INT NOT NULL

, orderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType5_customerID on dbo.TransactionType5(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType5_OrderDate ON dbo.TransactionType5(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE TransactionType6

(

customerID INT NOT NULL

, storeOrderID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, orderTotal MONEY NOT NULL

, orderDate DATETIME NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX TransactionType6_customerID ON dbo.TransactionType6(customerID) INCLUDE(orderTotal)

CREATE INDEX TransactionType6_OrderDate ON dbo.TransactionType6(orderDate) INCLUDE(CustomerID, orderTotal)



GO



CREATE TABLE dbo.SurveyResults

(

contactID INT NOT NULL PRIMARY KEY IDENTITY(1, 1)

, customerID INT NULL

, partnerID INT NULL

, aggResults TINYINT NOT NULL

, otherDetails NCHAR(100) NULL

)

CREATE INDEX SurveyReults_CustomerID ON dbo.SurveyResults(CustomerID)

GO

CREATE TABLE dbo.SurveyDetails

(

surveyID INT NOT NULL

, questionNbr TINYINT NOT NULL

, customerID INT NOT NULL

, rating TINYINT NOT NULL

, surveyDate DATETIME NOT NULL

, verbatim NCHAR(500) NULL

)

GO



/*

Populate these tables to form the first part of the query

*/



CREATE TABLE #firstNamePart

(

namePart NVARCHAR(14)

)

GO

CREATE TABLE #secondNamePart

(

namePart NVARCHAR(14)

)



INSERT INTO #firstNamePart VALUES (N'Some')

INSERT INTO #firstNamePart VALUES (N'Another')

INSERT INTO #firstNamePart VALUES (N'Different')

INSERT INTO #firstNamePart VALUES (N'Contoso')

INSERT INTO #firstNamePart VALUES (N'Similar')

INSERT INTO #firstNamePart VALUES (N'Dissimilar')

INSERT INTO #firstNamePart VALUES (N'My')

INSERT INTO #firstNamePart VALUES (N'Your')

INSERT INTO #firstNamePart VALUES (N'Their')

INSERT INTO #firstNamePart VALUES (N'Somebody''s')

INSERT INTO #firstNamePart VALUES (N'This')

INSERT INTO #firstNamePart VALUES (N'That')

INSERT INTO #firstNamePart VALUES (N'Varied')



INSERT INTO #secondNamePart VALUES (N'Inc.')

INSERT INTO #secondNamePart VALUES (N'LLC')

INSERT INTO #secondNamePart VALUES (N'Hobby')

INSERT INTO #secondNamePart VALUES (N'Unlimited')

INSERT INTO #secondNamePart VALUES (N'Limited')

INSERT INTO #secondNamePart VALUES (N'Musings')

INSERT INTO #secondNamePart VALUES (N'Manufacturing')

INSERT INTO #secondNamePart VALUES (N'Exploration')

INSERT INTO #secondNamePart VALUES (N'Enterprise')

INSERT INTO #secondNamePart VALUES (N'Services')

INSERT INTO #secondNamePart VALUES (N'Attempts')

INSERT INTO #secondNamePart VALUES (N'Dreams')

INSERT INTO #secondNamePart VALUES (N'Ideas')



-- populate customer



INSERT INTO dbo.Customers(customerName, otherStuff)

SELECT a.namePart +N' '+ b.namePart,N'otherStuff'

FROM #firstNamePart a CROSS JOIN #secondNamePart b



INSERT INTO dbo.Customers(customerName, otherStuff)

SELECT a.namePart +N' '+ b.namePart,N'otherStuff'

FROM #firstNamePart a CROSS JOIN #secondNamePart b



GO

DROP TABLE #firstNamePart

DROP TABLE #secondNamePart

GO



-- populate the internetOrders and storeOrders tables:



DECLARE @customerID INT -- as we go through

DECLARE @orderTotal MONEY

DECLARE @orderDate DATETIME

DECLARE @numRecords SMALLINT

DECLARE @ct SMALLINT



DECLARE crs CURSOR FOR SELECT customerID from dbo.Customers

OPEN crs

FETCH NEXT FROM crs INTO @customerID



WHILE @@FETCH_STATUS= 0

BEGIN

-- internet orders

SET @numRecords =RAND()* 10000

SET @ct = 0

WHILE @ct < @numRecords

BEGIN

SET @orderTotal =RAND()* 10000

SET @orderDate =DATEADD(dd,RAND()* 1500,'2008-01-01 00:00:00.000')

INSERT INTO dbo.InternetOrders(customerID, orderTotal, orderDate, otherDetails)

VALUES (@customerID, @orderTotal, @orderDate,'Other Details')

SET @ct = @ct + 1

END

-- set up store orders

SET @numRecords =RAND()* 1000

SET @ct = 0

WHILE @ct < @numRecords

BEGIN

SET @orderTotal =RAND()* 10000

SET @orderDate =DATEADD(dd,RAND()* 1500,'2008-01-01 00:00:00.000')

INSERT INTO dbo.StoreOrders(customerID, orderTotal, orderDate, otherDetails)

VALUES (@customerID, @orderTotal, @orderDate,'Other Details')

SET @ct = @ct + 1

END

INSERT INTO dbo.SurveyResults(customerID, aggResults, otherDetails)

VALUES (@customerID, @customerID % 5,N'Other Details')

FETCH NEXT FROM crs INTO @customerID

END



CLOSE CRS

DEALLOCATE CRS



/*

Populate the quote tables with sample data by duplicating the sales data

Also populate TransactionType3 and TransactionType4

*/



INSERT INTO dbo.InternetQuotes(customerID, quoteDate, quoteTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.StoreQuotes(customerID, quoteDate, quoteTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders



INSERT INTO dbo.TransactionType3(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.TransactionType4(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders



INSERT INTO dbo.TransactionType5(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.InternetOrders



INSERT INTO dbo.TransactionType6(customerID, orderDate, orderTotal, otherDetails)

SELECT customerID, orderDate, orderTotal, otherDetails

FROM dbo.storeOrders

GO



/*

Populate SurveyDetails with sample data for 50 questions



*/



DECLARE @questionNbr TINYINT

DECLARE @surveyID INT



SET @questionNbr = 1

WHILE @questionNbr < 51

BEGIN

INSERT INTO dbo.SurveyDetails(surveyID, questionNbr, customerID, rating, surveyDate, verbatim)

SELECT 1, @questionNbr, customerID, customerID % 5,'2008-01-01',N'Feedback from the customer'

FROM dbo.Customers

INSERT INTO dbo.SurveyDetails(surveyID, questionNbr, customerID, rating, surveyDate, verbatim)

SELECT 2, @questionNbr, customerID, customerID % 5,'2008-01-01',N'Feedback from the customer'

FROM dbo.Customers



SET @questionNbr = @questionNbr + 1

END

GO



/*

Update all statistics to be sure they are all in the best possible shape

*/



UPDATESTATISTICS dbo.Customers WITHFULLSCAN

UPDATESTATISTICS dbo.InternetOrders WITHFULLSCAN

UPDATESTATISTICS dbo.storeOrders WITHFULLSCAN

UPDATESTATISTICS dbo.InternetQuotes WITHFULLSCAN

UPDATESTATISTICS dbo.StoreQuotes WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType3 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType4 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType5 WITHFULLSCAN

UPDATESTATISTICS dbo.TransactionType6 WITHFULLSCAN

UPDATESTATISTICS dbo.SurveyResults WITHFULLSCAN

代码清单5.

 

2.分别在两个窗口执行代码清单6和代码清单7

set statistics io on 
set statistics time on

SELECT T1.customerName, R.ContactID, R.AggResults

, D.surveyId, D.questionNbr, D.rating, D.verbatim

FROM

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.InternetOrders i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.StoreOrders s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.InternetQuotes i ON c.customerID = i.customerID

WHERE i.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.quoteTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.StoreQuotes s ON c.customerID = s.customerID

WHERE s.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.quoteTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T1 JOIN

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType3 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType4 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType5 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType6 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T2 ON T1.customerID = T2.customerID

LEFT OUTER JOIN dbo.SurveyResults R on T1.customerID = R.customerID

LEFT OUTER JOIN dbo.SurveyDetails D on T1.customerID = D.customerID

WHERE T1.TotalOrders > 10000.00 AND T2.TotalQuote > 100000.00

代码清单6.

SELECT T1.customerID, T1.customerName 

INTO #temp

FROM

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.InternetOrders i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.StoreOrders s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.InternetQuotes i ON c.customerID = i.customerID

WHERE i.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.quoteTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.quoteTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.StoreQuotes s ON c.customerID = s.customerID

WHERE s.quoteDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.quoteTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T1 JOIN

(

SELECT Tab1.customerID, Tab1.customerName, Tab1.StoreOrderTotal, Tab1.InternetOrderTotal

, Tab1.TotalOrders, Tab2.InternetQuoteTotal, Tab2.StoreQuoteTotal, Tab2.TotalQuote

FROM

(

SELECT A.customerID, a.customerName, a.orderTotal as InternetOrderTotal, b.orderTotal as StoreOrderTotal,

TotalOrders = a.orderTotal + b.orderTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType3 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as orderTotal

FROM dbo.Customers c JOIN dbo.TransactionType4 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.orderTotal + b.orderTotal > 100000.00

) Tab1 JOIN

(

SELECT A.customerID, a.customerName, a.quoteTotal as InternetQuoteTotal, b.quoteTotal as StoreQuoteTotal,

TotalQuote = a.quoteTotal + b.quoteTotal

FROM

(

SELECT c.customerID, c.customerName,

SUM(i.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType5 i ON c.customerID = i.customerID

WHERE i.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(i.orderTotal)> 10000.00

) A

JOIN

(

SELECT c.customerID, c.customerName,

SUM(s.orderTotal)as quoteTotal

FROM dbo.Customers c JOIN dbo.TransactionType6 s ON c.customerID = s.customerID

WHERE s.orderDate BETWEEN'2010-01-01'and'2010-12-31 23:59:59.999'

GROUP BY c.customerID, c.customerName

HAVING SUM(s.orderTotal)> 10000.00

) B on A.customerID = B.customerID

WHERE a.quoteTotal + b.quoteTotal > 100000.00

) Tab2 ON Tab1.customerID = Tab2.customerID

) T2 ON T1.customerID = T2.customerID

WHERE T1.TotalOrders > 10000.00 AND T2.TotalQuote > 100000.00











SELECT T1.customerName, R.ContactID, R.AggResults, D.surveyId, d.QuestionNbr, D.Rating, D.verbatim FROM #temp T1

LEFT OUTER JOIN dbo.SurveyResults R on T1.customerID = R.customerID

LEFT OUTER JOIN dbo.SurveyDetails D on T1.customerID = D.customerID





drop table #temp

代码清单7.

 

3.启用IO和TIME计数

4.包含实际的执行计划

 

    在完成代码清单5后,运行代码清单6两次。第一次执行会包含解析,执行时间以及物理读的时间,因此主要观察第二次执行时间,在第二次执行结束后,查看实际的执行计划。对其中估计的行数和实际的行数进行对比。

    值得注意的是,在数据流的开始,实际的行数和估计的行数相差很小。随着每一个数据集集和聚合的中间结果集做连接后,估计的行数变得越来越不准确。

    随着由右向左的数据流向观察,直至最左边,你会发现估计的行数急剧下降,虽然实际的行数却没有变。最终,估计的行数变为1,而实际的行数是300,如图6所示。

6

图6.

 

    当SQL Server低估了操作符中需要处理的数据集的行数时,会导致查询分析器做出次优的决定,比如说低效的连接顺序。最简单的例子是,错误的低估行数会导致中间结果集作为loop join的外层表,见图7。

7

图7.

    在循环嵌套连接中,对应外层表的每一行,都需要在内存表执行相应的操作。在本例中,SQL Server选择循环嵌套连接是由于外层结果集的估计行数是1。这意味着仅仅需要在内层结果集执行一次操作。低估行数导致性能噩梦是由于当内部操作是索引或表扫描时出现,如图8所示。此时估计的行数是1,而实际的行数则为334。

    8

图8.

 

    因为上述估计的不准确,SQL Server认为做扫描而不是使用索引查找或其他连接策略是最优的选择。如果统计信息准确,则对SurveyDetails表的扫描只占了18%的执行成本,但实际上该表扫描执行了334次而不是估计的1次。因此该成本要远高于预计的值。在执行计划编译时所有的统计信息都是最新,但sql server同时也需要中间结果集来做出最有的选择。

 

解决该类问题

    为了解决上述问题,可以使用临时表存储中间结果集来使得SQL Server计算统计信息后重编译。

注意:这里必须使用临时表,不能使用表变量。因为在临时表中插入数据后就为数据创建统计信息。但表变量中没有这些统计信息。但如果表变量是在option(recompile)中使用的,那么表变量的统计信息会在编译时获得。

     代码清单7和代码清单6得到的结果是一样的,但方式不同。首先是select…into,这种方式将聚合后的数据存入临时表。一旦结果集保存到临时表后,SQL Server就可以创建和更新统计信息并且重编译执行计划来使得代码清单6中低效执行计划的出现。然后接下来的查询用于获得最终结果,该查询会将临时表中的数据和其他两个表进行连接,如图9所示。

9

图9.

    可以看出,SQL Server分为多个步骤来使得查询更加高效,最主要的事将物理连接类型变为hash join,从而避免了多次扫描。

注意:随着查询复杂度的增加,有很多种不同的方式可以分解查询来提高效率。比如说,聚合后的数据可以插入临时表,你可以查看执行计划来找到实际行数和估计行数严重不符的部分,最好的选择是估计只有一行,但实际有多行的部分。

 

性能提升是否明显?

    由于测试数据的随机性,下面测试结果的数字或许会略微不同,但性能的提升一定会很明显。表2展示了IO和CPU时间的差异。

表2.对比两个查询的性能

测试标准 第一个查询 第二个查询 性能提升百分比
逻辑读 1,622,398 11,685 99.90%
CPU时间 4,914 1,139 76.82%
消耗时间 5,278 2,803 46.89%

 

    数据越多,查询越复杂,则性能提升越明显。上述数据已经足够证明分解查询带来的性能提升。

    虽然上述例子中的聚合数据集是来自衍生表,但同样也可以来自CTE和逻辑视图。通过同样的步骤,使用临时表存储中间结果集就可以利用统计信息来生成高效的执行计划。

 

其他办法

    代码清单6中的查询提示了索引缺失,创建索引可以消除表扫描从而提升性能,至少在少量数据集中是这样,这类查询通常来说是用于ETL或报表,执行的次数往往很少,因此创建和维护索引的成本需要考虑,因此需要考虑trade off。

    另一种办法是使用hash join提示,但这并不总是最好的选择。使用临时表存储中间结果集允许查询优化器根据数据的大小选择最好的执行计划。这意味着可以获得更稳定的性能收益,而使用连接提示则性能变得不稳定。

    还有一种办法是为中间结果集创建索引视图,此时统计信息在索引视图上进行维护。但是,索引视图在数据修改时的开销也不小,这也是锁争抢的来源之一,因此需要将多种可能的解决方案进行评估。

 

其他需要分解的复杂查询

    需要分解的复杂查询不仅仅包含之前讨论的两种,虽然查询优化器一直在提升,但仍然可能某些特定查询需要分解来获得性能提升。

包含大量连接的查询

    随着每次连接,初始数据集中的数据不断被过滤,因此在估计基数时引入了更多因此,每次单表数据中出现的小异常都会随着连接的增多被放大。

    因此大量连接的查询应该被分解为多个查询,将中间结果存入临时表,然后使用临时表和数据集进行连接。

在WHERE和Join子句中出现的CASE子句

    这类查询的一个例子包括“WHERE col1 = CASE @val1 WHEN 1 THEN ‘val1’ WHEN 2 THEN ‘val2’ WHEN 3 THEN col1”这样的子句使得获取估计行数变得困难,甚至不可能。任何时候估计的行数不准确都会导致低效的查询计划。

    使用T-SQL的条件语句IF…ELSE来将这类查询分解为多个查询可以获得更准确的估计行数。

 

小结

    通常来说,很少需要只使用一个复杂的查询语句获取所有结果。在诸如需要连接聚合结果的情况下,在单独一个查询中完成所有的步骤会导致不准确的预计基数,基数估计的偏差会导致低效的查询,在单独一个查询语句中,SQL Server无法再查询执行过程中暂停,在中间结果集上计算统计信息,并随之调整执行计划。

    将查询分解并将中间结果集存入临时表使得SQL Server可以做到单独查询中做不到的事,即为中间结果集创建统计信息并根据这些信息重新生成执行计划。使用UNION和IF逻辑可使得SQL Server生成更高效的执行计划来得到同样的结果。

    在SQL Server查询语句的设计中,高性能不仅取决于获取到的结果,还取决于如何获取结果。虽然查询优化器在大多数情况下能够很好的完成工作,但也有些情况下无法计算出统计信息,因此需要改变获取数据的方式从而获得稳定的高性能。本篇文章讨论了将大且复杂的语句分解为多个部分所需的技术。

 

 

---------------------------------------------------

原文链接:http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

posted @ 2013-12-18 12:48  CareySon  阅读(6431)  评论(14编辑  收藏  举报