【SQLpassion Performance Tuning Training Plan】- Week 13: Statistics
Welcome to the 4th month of the SQLpassion Performance Tuning Training Plan. This month is all about Statistics in SQL Server, and how they help the Query Optimizer to produce a good-enough Execution Plan. Statistics are mainly used by the Query Optimizer to estimate(vt. 估计,估量;判断,评价) how many rows would be returned from a query. It's just an estimate, nothing more.
Statistics Overview
SQL Server uses a so-called Histogram within the Statistics object, which describes the data distribution for a given column within a maximum of 200 Steps. One of the biggest limitations that we have with Statistics in SQL Server is that limit of the 200 steps (which can be overcome with Filtered Statistics that were introduced back with SQL Server 2008).
The other limitation is the Auto Update mechanism(n. 机制;原理,途径;进程;机械装置;技巧) of Statistics: with a table larger than 500 rows, a Statistics object is only updated if 20% + 500 column values of the underlying(adj. 潜在的;根本的;在下面的;优先的) table have changed. This means that your Statistics are getting updated more less as soon as your table grows.
Imagine you have a table with 100.000 records. In this case, SQL Server updates the Statistics object if you have done 20.500 (20% + 500) data changes in the underlying column. If you have a table with 1.000.000 rows, you need 200.500 (20% + 500) data changes. The algorithm(n. [计][数] 算法,运算法则) that is used here is exponential(adj. 指数的) and not linear(adj. 线的,线型的;). There is also a trace flag 2371 in SQL Server that influences(vt. 影响,感化) this behavior. See the following link for more information about it.
|
This behavior can be a huge problem if you are dealing with execution plans that contain a Bookmark Lookup. As you already know, a Bookmark Lookup operator is only chosen by the Query Optimizer if the query is estimated to return a very selective(adj. 选择性的) result - based on the current Statistics. If your Statistics are out-of-date, and your execution plan is still valid, SQL Server will just blindly(adv. 盲目地;轻率地;) reuse the cached plan, and your page reads will just explode(vi. 爆炸,爆发;激增). Let's have a look at a more concrete example of that specific problem. |
Stale(adj. 陈腐的;不新鲜的) Statistics
The following script creates a simple table with 1500 records with an even data distribution in the column Column2. In addition we define a Non-Clustered Index on that column.
CREATE TABLE Table1 ( Column1 INT IDENTITY, Column2 INT ) GO -- Insert 1500 records into Table1 SELECT TOP 1500 IDENTITY(INT, 1, 1) AS n INTO #Nums FROM master.dbo.syscolumns sc1 INSERT INTO Table1 (Column2) SELECT n FROM #nums DROP TABLE #nums GO CREATE NONCLUSTERED INDEX idx_Table1_Colum2 ON Table1(Column2) GO
When you now perform a simple SELECT * query against that table by restricting(n. 整形,限制;扼流) on Column2, you will get an execution plan with a Bookmark Lookup operator.

As you will also see from the Index Seek (Non Clustered) operator, SQL Server estimates 1 row (property Estimated Number of Rows), and in reality SQL Server processes 1 row (property Actual Number of Rows). This means we are dealing here with accurate(adj. 精确的) Statistics, and the query itself just produced 3 logical reads.
In our case we are dealing now with a table of 1500 rows, so SQL Server will automatically update the Statistics object of the underlying Non-Clustered Index when 20% + 500 rows have changed. When you do the math, you can see that we need 800 data changes (1500 x 20% + 500).
What we are now doing in the next step is the following: we are working against SQL Server a little bit, and we are only inserting 799 new records into the table. But the value of the 2nd column is now always 2. This means we are now completely changing the data distribution of the table in that column. The Statistics object thinks that 1 record will be returned, but in reality we get back 800 rows (1 existing rows + 799 newly inserted rows):
SELECT TOP 799 IDENTITY(INT, 1, 1) AS n INTO #Nums FROM master.dbo.syscolumns sc1 INSERT INTO Table1 (Column2) SELECT 2 FROM #nums DROP TABLE #nums GO
When you now run the identical SELECT statement again, SQL Server reuses the cached Execution Plan with the Bookmark Lookup. This means now that the Bookmark Lookup in the Execution Plan is executed 1500 times - once for every record! This will also cost a lot of logical reads - SQL Server now reports 806 page reads! As you can see from the following picture, the Estimated Number of Rows is now far from the Actual Number of Rows.

Summary
Today I have given you a brief((adj. 简短的,简洁的;短暂的,草率的)) overview in the SQLpassion Performance Tuning Training Plan about Statistics in SQL Server. As you have seen, stale Statistics can introduce serious performance problems with cached, reused execution plans.
I hope that you have now a much better understanding of Statistics in SQL Server, and the side-effects that they will introduce to your Execution Plans, when they are out-of-date. Next week I will talk in more detail about Statistics, and how they look internally in SQL Server. Stay tuned!
浙公网安备 33010602011771号