【indexing-4】Unique and non-unique non-clustered indexes on a non-unique clustered index

In the last weblog post I have talked about the differences in unique and non-unique non-clustered indexes on a unique clustered index. In this weblog post I want to talk about the differences of non-clustered indexes defined on a non-unique clustered index. As you already know from this posting, SQL Server handles non-unique clustered indexes internally different as unique clustered indexes. If you define a non-unique clustered index, SQL Server adds the so called uniquifier(唯一标志) to your index records, which leads to a 4 byte overhead per each index row in the navigation structure of your clustered index.

The following listing creates again our customers table, defines this time a non-unique clustered index on it, and finally two non-clustered indexes, where one of them is unique and the other is non-unique.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes 
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24 
CREATE TABLE CustomersUnique 
( 
    CustomerID INT NOT NULL, 
    CustomerName CHAR(100) NOT NULL, 
    CustomerAddress CHAR(100) NOT NULL, 
    Comments CHAR(189) NOT NULL 
) 
GO 
 
-- Create a non unique clustered index on the previous created table 
CREATE CLUSTERED INDEX idx_Customers ON CustomersUnique(CustomerID) 
GO

-- Insert 80.000 records 
DECLARE @i INT = 1 
WHILE (@i <= 20000) 
BEGIN 
    DECLARE    @j INT = 1     
    INSERT INTO CustomersUnique VALUES 
    ( 
        @i, 
        'CustomerName' + CAST(@i AS CHAR) + CAST(@j AS CHAR), 
        'CustomerAddress' + CAST(@i AS CHAR), 
        'Comments' + CAST(@i AS CHAR) 
    ) 
     
    SET @j += 1; 
     
    INSERT INTO CustomersUnique VALUES 
    ( 
        @i, 
        'CustomerName' + CAST(@i AS CHAR) + CAST(@j AS CHAR), 
        'CustomerAddress' + CAST(@i AS CHAR), 
        'Comments' + CAST(@i AS CHAR) 
    ) 
     
    SET @j += 1; 
     
    INSERT INTO CustomersUnique VALUES 
    ( 
        @i, 
        'CustomerName' + CAST(@i AS CHAR) + CAST(@j AS CHAR), 
        'CustomerAddress' + CAST(@i AS CHAR), 
        'Comments' + CAST(@i AS CHAR) 
    ) 
     
    SET @j += 1; 
     
    INSERT INTO CustomersUnique VALUES 
    ( 
        @i, 
        'CustomerName' + CAST(@i AS CHAR) + CAST(@j AS CHAR), 
        'CustomerAddress' + CAST(@i AS CHAR), 
        'Comments' + CAST(@i AS CHAR) 
    ) 
     
    SET @i += 1 
END 
GO

-- Create a unique non clustered index on the clustered table 
CREATE UNIQUE NONCLUSTERED INDEX idx_UniqueNCI_CustomerID 
ON CustomersUnique(CustomerName) 
GO 
 
-- Create a non-unique non clustered index on the clustered table 
CREATE NONCLUSTERED INDEX idx_NonUniqueNCI_CustomerID 
ON CustomersUnique(CustomerName) 
GO

When you now do a short look into the sys.dm_db_index_physical_stats DMV, you can see that the unique non-clustered index takes 107 bytes per index row in the navigation levels, where the non unique non-clustered index takes 117 bytes on the average (minimum 111 bytes, maximum 117 bytes). Let’s analyze the differences and dump out the index root page of the unique non-clustered index through the DBCC PAGE command:

DBCC PAGE(NonUniqueClusteredIndexStructure_NonClusteredIndex, 1, 4529, 3) 
GO

As you can see from the following picture SQL Server only stores the unique non-clustered key in the index root level (and also in the intermediate( adj. 中间的，中级的;) levels), because the non-clustered key is already unique by itself:

When you examine the byte by byte representation of the unique non-clustered index record, you can see that SQL Server uses here the following bytes:

1 Byte: Status Bits
n Bytes: Unique Non-Clustered Index Key – in this case 100 bytes
4 Bytes: PageID
2 Bytes: FileID

When you dump out the leaf-level of the unique non-clustered index, you can see that SQL Server points through the non-unique clustered key and the uniquifier to the correct record in the clustered table:

The conclusion here is that the unique non-clustered index on a non-unique clustered index makes only the 4 byte overhead in the leaf level of the clustered index, because here SQL Server directly points to the correct record. There is no additional overhead involved in the non-leaf levels of the unique non-clustered index.

Let’s now dump out the index root page of the non-unique non-clustered index defined on our non-unique clustered index:

This is now a really interesting output! The key of the index record must be by design unique. How can SQL Server make a non-unique non-clustered index key unique? Easy – SQL Server just adds the clustered index key (4 bytes). But the clustered index key is also not unique by default, therefore SQL Server also adds the uniquifier (4 bytes), so you have an resulting overhead of 8 bytes per index row, when the uniquifier is not equal to 0. When the uniquifier is equal to 0, you get an overhead of 4 bytes, because in this case the uniquifier is not stored physically in the index record, and a 0 is assumed( adj. 假定的; 假装的; ) by SQL Server automatically. When you again analyze the byte by byte representation you can see the following bytes:

1 Byte: Status Bits
n Bytes: Unique Non-Clustered Index Key – in this case 100 bytes
n Bytes: Non-Unique Clustered Index Key – in this case 4 bytes
4 Bytes: PageID
2 Bytes: FileID
4 Bytes: Some bytes used by the uniquifier
4 Bytes: The uniquifier value itself, when it is not equal to 0

The minimum length of the index record is therefore 111 bytes and the maximum length is 117 bytes already found out earlier through the sys.dm_db_index_physical_stats DMV. When you finally dump out the leaf-level of the non-unique non-clustered index, you get the following result:

As you can see the leaf-level is the same as the leaf-level in a unique non-clustered index defined on a non unique clustered index. The leaf level just points through the clustered key (CustomerID) and the uniquifier to the correct record in the clustered table. This example has shown you that there is huge overhead (8 bytes per index row) when you define a non-unique non-clustered index on a non-unique clustered index, because SQL Server must make the non-clustered index records internally unique and needs therefore several bytes of storage overhead. You can download the T-SQL script for this posting here.