【Weekly SQLpassion Newsletter】Actual Number of Rows are not always accurate

Today’s blog posting is a quite interesting one, because I want to show you a concrete example where in an Actual Execution Plan the Actual Number of Rows are WRONG! Yes, you have read correctly: I’m talking here about the Actual Number of Rows, and not about the Estimated Number of Rows, which are always somehow(adv. 以某种方式，用某种方法; 不知怎么地, 莫名其妙地; ) wrong, because they are only estimated during the Cardinality(n. 基数；集的势) Estimation.

Counting rows in the wrong way

A few weeks ago I had worked with a customer, and they had a really interesting phenomenon: SQL Server has returned a lot of rows, but when they looked into the Execution Plan, the Actual Number of Rows were a lot higher. You don’t trust me? Look at the following Actual Execution Plan within SQL Server Management Studio:

SQL Server returned here within SQL Server Management Studio 110561 rows, but the Actual Number of Rows within the Actual Execution Plan are much higher: 118968!

Wow, this is a quite interesting behavior. And I don’t think that this behavior is just by-design. The problem lies in a Filtered Non-Clustered Columnstore Index that was introduced with SQL Server 2016. You can create a Filtered Non-Clustered Columnstore Index in an OLTP(On-Line Transaction Processing) workload to support so-called Realtime Data Analytics Scenarios. Instead of creating a dedicated Data Warehouse database you perform your Analytics workload directly on your OLTP tables, which can be also changed concurrently.

This means now that a Non-Clustered Columnstore Index is also changeable since SQL Server 2016. And in that behavior lies the problem: when SQL Server calculates and returns you the Actual Number of Rows, the deleted rows from the Non-Clustered Columnstore Index are also considered. And therefore the Actual Number of Rows in the Execution Plan are just higher as the real row count that is returned from the query.

The following T-SQL code shows a simple scenario with which you can reproduce this scenario.

-- Create a table copy
SELECT * INTO Sales.SalesOrderDetail2 FROM Sales.SalesOrderDetail
GO

-- Create a Non-Clustered ColumnStore Index for the "cold" data partition
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_ColdData ON Sales.SalesOrderDetail2(ProductID)
WHERE ModifiedDate < '20140601'
GO

-- Let's delete some rows
DELETE FROM Sales.SalesorderDetail2
WHERE ModifiedDate >= '20140501' AND ModifiedDate < '20140601'
GO

-- These rows are not logically deleted
SELECT * FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail2')
GO

-- Uses the Non-Clustered ColumnStore Index to access the data
SELECT ProductID FROM Sales.SalesOrderDetail2
WHERE ModifiedDate < '20140601'
GO

-- Clean up
DROP TABLE Sales.SalesOrderDetail2
GO

Summary

Never, ever trust anyone – especially not a piece of software! I find this behavior quite funny, because until I encountered this specific scenario I always thought that the Actual Number of Rows are somehow calculated on-the-fly during the Query Execution. But it seems (at least in combination with a Non-Clustered Columnstore Index) that this is not really always the case.

Thanks for your time,

-Klaus

posted @ 2018-08-03 10:44 FH1004322 阅读(131) 评论(0) 收藏举报

刷新页面返回顶部