Bug: sys.dm_db_session_space_usage Reporting Extremely High Tempdb Allocations - princessd8251

A little more than a year ago while working at Idera, I was consulted on reports from a few clients using the Idera monitoring tool SQL diagnostic manager (SQLdm). SQLdm was reporting very high numbers for internal object allocations (internal_objects_alloc_page_count) and very low numbers for internal object deallocations (internal_objects_dealloc_page_count) to the point that SQLdm showed that individual sessions were reported using more space than was available in the database.

I worked with Vicky Harp (blog|@vickyharp), SQLdm dev lead and an excellent developer, on this issue. She asked me to validate whether they were calculating and interpreting the data correctly or if it was a bug in SQL Server.

In the beginning, all I had to go on were some numbers and anecdotal information being relayed from the product users. I devised a theory based on the minimal information I had on-hand. The theory did not pan out when trying to reproduce the issue on an internal server.

When I was able to reproduce the issue in a controlled environment, I noticed that sessions that worked with large objects (tables and indexes with greater than 128 extents) that get dropped, truncated, or rebuilt, as appropriate to the object. I also noticed that the Resource Monitor thread was showing no allocations and extremely high deallocations. When considering Resource Monitor deallocations and user allocations and deallocations as a whole, the numbers came pretty close to settling out. It appeared from my observations that deallocations were occurring, but they were being attributed to a system session.

What’s Happening

So the answer was deferred drops. A deferred drop is when a two-phase drop is used for dropping large objects or a large number of pages. This optimization was introduced in SQL Server 2005 to speed up large deallocations.

Phase 1 of a deferred drop — the logical phase — simply marks the pages as deallocated. This greatly minimizes the length of time required to complete the commit of the transaction. This also greatly speeds up a rollback as it simply needs to reallocate the pages it deallocated. No movement of data occurs during this phase.

Phase 2 of a deferred drop — the physical phase — physically deallocates the pages. Since the pages are not allocated to any object, it does not require taking extensive locks to perform the physical drop. The physical phase is executed in batches by a background process called Resource Monitor. This means they are deallocated by a system process instead of the session that allocated them.

This clearly explains why the deallocations were way out of line for user sessions. The user sessions were not performing the deallocations for large objects.

Learn More

I came up with the theory, and provided my findings to Vicky. Vicky got the pleasure of working with CSS at Microsoft to prove out that it actually was a bug. It wouldn’t be enough to simply tell the SQLdm users that it was a bug in SQL Server. We needed to have hard evidence. Too many people blame Microsoft if they have an unusual experience with SQL Server. Vicky spent many hours over many months working with CSS to prove out that it is a bug. Vicky created an excellent repro for the issue. go to post on the topic to get a look at and try the repro for yourself. Vicky also offers some additional technical details to which I no longer have access.

We finally got confirmation from CSS and the SQL Server product team that it is indeed a bug in how deallocations are being reported. A fix for this is not available yet, but at least you now know to consider deallocations by system threads when the numbers don’t look right.

SQL Server bug: Incorrect values in sys.dm_db_session_space_usage for deferred drops

For the past few months I have been chasing down an issue which has been plaguing my attempts at tempdb monitoring, and in just the last week I’ve received news that it is indeed a confirmed bug in SQL Server. I’m sharing what I’ve found here for the benefit of the SQL community.

One of the features I worked on for SQL diagnostic manager was our tempdb monitoring feature, whereby users can track and alert on a number of tempdb-specific metrics. Among the items collected are the tempdb page allocation and deallocation values from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage – specifically, the following columns:

user_objects_alloc_page_count
user_objects_dealloc_page_count
internal_objects_alloc_page_count
internal_objects_dealloc_page_count

The details of how to use these values in troubleshooting are covered in this Troubleshooting Insufficient Disk Space in tempdb article by Microsoft. In brief, these values are intended to track the number of pages allocated or deallocated by a task or session, which can allow the practitioner to identify sessions which are using excessive amounts of space in tempdb or which are churning through a large number of pages over their lifetime.

We were periodically seeing cases where the apparent space used by a session – as calculated by subtracting the deallocation count from the allocation count for a session – was well in excess of the total size of tempdb. I worked closely with Robert Davis (b|t) to identify whether this was a bug in the product or a misunderstanding of the contents of the DMV, but he confirmed our logic and believed it was probably a bug. In deeper testing I was able to reproduce the issue very consistently using simple batches. I’d like to walk you through the process.

In this example, a temp table is being created and then rows are being added. I’ve noted the line where a “go 10″ is used – this causes the script to fill the temp table with 10 rows before dropping it. The script waits 10 seconds, then checks the allocation and deallocation counts from sys.dm_db_session_space_usage and sys.dm_db_task_space_usage.

set nocount on
 
create table #BigTempTable(a nchar(4000) default 'a')
go
 
insert into #BigTempTable default values
 
-- Note this value
go 10
 
drop table #BigTempTable
 
go
 
waitfor delay '00:00:10'
 
select
      'Session Space',
      ssu.session_id,
      sessionUserPagesAlloc = sum(ssu.user_objects_alloc_page_count),
      sessionUserPagesDealloc = sum(ssu.user_objects_dealloc_page_count),
      taskUserPagesAlloc = sum(tsu.user_objects_alloc_page_count),
      taskUserPagesDealloc = sum(tsu.user_objects_dealloc_page_count),
      sessionInternalPagesAlloc = sum(ssu.internal_objects_alloc_page_count),
      sessionInternalPagesDealloc = sum(ssu.internal_objects_dealloc_page_count),
      taskInternalPagesAlloc = sum(tsu.internal_objects_alloc_page_count),
      taskInternalPagesDealloc = sum(tsu.internal_objects_dealloc_page_count)
from
      tempdb.sys.dm_db_session_space_usage ssu
      left join tempdb.sys.dm_db_task_space_usage tsu
      on ssu.session_id = tsu.session_id
where
    ssu.session_id = @@spid
group by
      ssu.session_id

When run as above, with go 10, the DMV data returns exactly what is expected (I have switched rows and columns below for easy readability).

session_id	52
sessionUserPagesAlloc	16
sessionUserPagesDealloc	16
taskUserPagesAlloc	0
taskUserPagesDealloc	0
sessionInternalPagesAlloc	0
sessionInternalPagesDealloc	0
taskInternalPagesAlloc	0
taskInternalPagesDealloc	0

Each of the 10 loops adds 1 row to the temp table, and at that row size each row is 1 page. The first 8 pages come from mixed extents, and pages 9 and 10 come from a second extent, for a total of 16 pages allocated to the table. Upon dropping the table, all of that space is deallocated, so we see both an allocation and a deallocation of 16 pages.

If I reset the connection so I have a fresh spid and change the “go 10″ to “go 5000,” however, the returned data begins looking a bit suspect.

session_id	68
sessionUserPagesAlloc	5000
sessionUserPagesDealloc	0
taskUserPagesAlloc	0
taskUserPagesDealloc	0
sessionInternalPagesAlloc	0
sessionInternalPagesDealloc	0
taskInternalPagesAlloc	0
taskInternalPagesDealloc	0

At 5000 rows, this data divides up nicely into 625 extents, so exactly 5000 pages are allocated. However, no pages are marked as deallocated even though the object was dropped.

If I remove the “where ssu.session_id = @@spid” clause and look at the space utilization of all sessions, I find that the deallocation value is apparently being credited to a system spid – in this case, spid 26:

So this is the cause of the incorrect values for session space used. These two contrived examples show a small and a large temp table, and show that at a certain point the deallocated pages are being credited to a system spid. In practice, however, workloads have a mix of small and large temporary objects, so the deallocation value is often non-zero but still incorrect.

When we first observed this problem Robert mentioned that he thought it was probably an issue to do with deferred drops in tempdb. In the last week I’ve heard back from Microsoft, who have confirmed this. Specifically, the problem is that deferred drops of over 128 extents are not incrementing the user_objects_dealloc_page_count in sys.dm_db_session_space_usage. Hop on over to this post on Robert’s blog, where he describes this issue and the concept of deferred drops in great detail.