SQL2000索引页面结构

SQL2000索引页面结构

sql2000技术内幕

Index pages are structured much like data pages. As with all other types of pages in SQL Server, index pages have a fixed size of 8 KB, or 8192 bytes. Index pages also have a 96-byte header, but just like in data pages, there is an offset array at the end of the page with two bytes for each row to indicate the offset of that row on the page. However, if you use DBCC PAGE with style 1 to print out the individual rows on an index page, the slot array is not shown. If you use style 2, it will only print out all the bytes on a page with no attempt to separate the bytes into individual rows. You can then look at the bottom of the page and notice that each set of two bytes does refer to a byte offset on the page. Each index has a row in the sysindexes table, with an indid value of either 1, for a clustered index, or a number between 2 and 250, indicating a nonclustered index. An indid value of 255 indicates LOB data (text, ntext or image information). The root column value contains a file number and page number where the root of the index can be found. You can then use DBCC PAGE to examine index pages, just as you do for data pages.

The header information for index pages is almost identical to the header information for data pages. The only difference is the value for type, which is 1 for data and 2 for index. The header of an index page also has nonzero values for level and indexId. For data pages, these values are both always 0.

There are basically three different kinds of index pages: leaf level for nonclustered indexes, node (nonleaf) level for clustered indexes, and node level for nonclustered indexes. There isn't really a separate structure for leaf level pages of a clustered index because those are the data pages, which we've already seen in detail. There is, however, one special case for leaf-level clustered index pages.

Clustered Index Rows with a Uniqueifier

If your clustered index was not created with the UNIQUE property, SQL Server adds a 4-byte field when necessary to make each key unique. Since clustered index keys are used as bookmarks to identify the base rows being referenced by nonclustered indexes, there needs to be a unique way to refer to each row in a clustered index. SQL Server adds the uniquifier only when necessary—that is, when duplicate keys are added to the table. As an example, I'll use the same table I initially used in Chapter 6 to illustrate the row structure of a table with all fixed-length columns. I'll create an identical table with a different name, then I'll add a clustered (nonunique) index to the table:

USE爌ubs
GO
CREATE燭ABLE燙lustered_Dupes�
牋(Col1燾har(5)牋燦OT燦ULL,
牋燙ol2爄nt牋牋燦OT燦ULL,
牋燙ol3燾har(3)牋燦ULL,
牋燙ol4燾har(6)牋燦OT燦ULL,
牋燙ol5爁loat牋牋NOT燦ULL)
GO
CREATE燙LUSTERED營NDEX燙l_dupes_col1燨N燙lustered_Dupes(col1)

The rows in syscolumns look identical to the ones for the table in Chapter 6, without the clustered index. However, if you look at the sysindexes row for this table, you'll notice something different.

SELECT爁irst,爄ndid,爇eycnt,爊ame燜ROM爏ysindexes�
WHERE爄d�=爋bject_id�('Clustered_Dupes')
RESULT:
first牋牋�     indid� keycnt爊ame牋牋牋
--------------�------�------�---------------
0x000000000000�1牋�   2牋�   Cl_dupes_col1

The column called keycnt, which indicates the number of keys an index has, is 2. If I had created this index using the UNIQUE qualifier, the keycnt value would be 1. If I had looked at the sysindexes row before adding a clustered index, when the table was still a heap, the row for the table would have had a keycnt value of 0. I'll now add the same initial row that I added to the table in Chapter 6, and then I'll look at sysindexes to find the first page of the table:

INSERT燙lustered_Dupes燰ALUES�('ABCDE',�123,爊ull,�'CCCC',�4567.8)
GO
SELECT爁irst,爎oot,爄d,indid燜ROM爏ysindexes
WHERE爄d�=爋bject_id('Clustered_Dupes')

RESULT:
first牋牋     爎oot牋牋牋     id牋牋�     indid�
--------------�--------------�-----------�------�
0xB50500000100�0xB30500000100�1426104121� 1

The first column tells me that the first page of the table is 1461 (0x05B5), so I can use DBCC PAGE to examine that page. Remember to turn on trace flag 3604 prior to executing this undocumented DBCC command.

DBCC燭RACEON�(3604)
GO
DBCC燩AGE�(5,1,1461,�1)

The only row on the page looks exactly as it was shown in Figure 6-7, but I'll reproduce it again here, in Figure 8-3. When you read the row output from DBCC PAGE, remember that each two displayed characters represents a byte and that the bytes are displayed from right to left within each group of 4 bytes in the output. For character fields, you can just treat each byte as an ASCII code and convert that to the associated character. Numeric fields are stored with the low-order byte first, so within each numeric field, we must swap bytes to determine what value is being stored. Right now, there are no duplicates of the clustered key, so no extra information has to be provided. However, if I add two additional rows, with duplicate values in col1, the row structure changes:

INSERT燙lustered_Dupes燰ALUES�('ABCDE',�456,爊ull,�'DDDD',�4567.8)
INSERT燙lustered_Dupes燰ALUES�('ABCDE',�64,爊ull,�'EEEE',�4567.8)

Figure 8-4 shows the three rows that are now on the page.

Figure 8-3. A data row containing all fixed-length columns, and a unique value in the clustered key column.

The first difference in the second two rows is that the bits in the first byte (TagA) are different. Bit 5 is on, giving TagA a value of 0x30, which means the variable block is present in the row. Without this bit on, TagA would have a value of 0x10. The "extra" variable-length portions of the second two rows are shaded in the figure. You can see that 8 extra bytes are added when we have a duplicate row. In this case, the first 4 extra bytes are added because the uniqueifier is considered a variable-length column. Since there were no variable-length columns before, SQL Server adds 2 bytes to indicate the number of variable-length columns present. These bytes are at offsets 33-34 in these rows with the duplicate keys and have the value of 1. The next 2 bytes, (offsets 35-36) indicate the position where the first variable length column ends. In both these rows, the value is 0x29, which converts to 41 decimal. The last 4 bytes (offsets 37-40) are the actual uniqueifier. In the second row, which has the first duplicate, the uniquifier is 1. The third row has a uniquifier of 2.

Figure 8-4. Three data rows containing duplicate values in the clustered key column.

Index Row Formats

The row structure of an index entry is very similar to the structure of a data row. An index row does not use the TagB or Fsize row header values. In place of the Fsize field, which indicates where the fixed-length portion of a row ends, the page header pminlen value is used to decode an index row. The pminlen value indicates the offset at which the fixed-length data portion of the row ends. If there are no variable-length or nullable columns in the index row, that is the end of the row. Only if the index row has nullable columns are the field called Ncol and the null bitmap both present. The Ncol field contains a value indicating how many columns are in the index row; this value is needed to determine how many bits are in the null bitmap. Data rows have an Ncol field and null bitmap whether or not any columns allow NULL, but index rows have only a null bitmap and an Ncol field if any NULLs are allowed in an index column. Table 8-1 shows the general format of an index row.

Table 8-1. Information stored in an index row.

InformationMnemonicSize
Status Bits A TagA

Some of the relevant bits are:

  • Bits 1 through 3: Taken as a 3-bit value, 0 indicates a primary record, 3 indicates an index record, and 5 indicates a ghost index record. (I'll discuss ghost records in Chapter 9.)
  • Bit 4: Indicates that a NULL bitmap exists.
  • Bit 5: Indicates that variable-length columns exist in the row.
1 byte
Fixed-length data Fdata pminlen - 1
Number of columns Ncol 2 bytes
NULL bitmap (1 bit for each column in the table; a 1 indicates that the corresponding column is NULL) Nullbits Ceiling (Ncol / 8)
Number of variable-length columns; only present if >0 VarCount 2 bytes
Variable column offset array; only present if VarCount > 0 VarOffset 2 * VarCount
Variable-length data, if any VarData  

The data contents of an index row depend on the level of the index. All rows except those in the leaf level contain a 6-byte down-page pointer in addition to the key values and bookmarks. The down-page pointer is the last column in the fixed-data portion of the row. In nonclustered indexes, the nonclustered keys and bookmarks are treated as normal columns. They can reside in either the fixed or variable portion of the row, depending on how each of the key columns was defined. A bookmark that is a RID, however, is always part of the fixed-length data.

If the bookmark is a clustered key value and the clustered and nonclustered indexes share columns, the actual data is stored only once. For example, if your clustered index key is lastname and you have a nonclustered index on (firstname, lastname), the index rows will not store the value of lastname twice. I'll show you an example of this shortly.

Clustered Index Node Rows

The node levels of a clustered index contain pointers to pages at the next level down in the index, along with the first key value on each page pointed to. Page pointers are 6 bytes: 2 bytes for the file number and 4 bytes for the page number in the file. The following code creates and populates a table that I'll use to show you the structure of clustered index node rows.

CREATE燭ABLE燾lustered_nodupes�(
牋id爄nt燦OT燦ULL�,
牋str1燾har�(5)燦OT燦ULL�,
牋str2燾har�(600)燦ULL�
)�
GO
CREATE燙LUSTERED營NDEX爄dxCL燨N燾lustered_nodupes(str1)�
GO
SET燦OCOUNT燨N
GO
DECLARE燖i爄nt
SET燖i�=�1240
WHILE燖i�<�13000燘EGIN
牋INSERT營NTO燾lustered_nodupes�
牋燬ELECT燖i,燾ast(@i燗S燾har),燾ast(@i燗S燾har)
牋SET燖i�=燖i�+�1
燛ND�
GO
SELECT爁irst,爎oot,爄d,爄ndid燜ROM爏ysindexes
WHERE爄d�=爋bject_id('clustered_nodupes')

RESULT:
first牋牋     爎oot牋     牋牋id牋牋     爄ndid�
--------------�--------------�-----------�------�
0xB80500000100�0xB60500000100�1458104235� 1

DBCC燭RACEON�(3604)
GO
DBCC燩AGE�(5,1,1462,�1)

The root of this index is at 0x05B6, which is 1462 decimal. Using DBCC PAGE to look at the root, we can see two index entries, as shown in Figure 8-5. When you examine index pages, you need to be aware that the first index key entry on each page is frequently either meaningless or empty. The down-page pointer is valid, but the data for the index key might not be a valid value. When SQL Server traverses the index, it starts looking for a value by comparing the search key with the second key value on the page. If the value being sought is less than the second entry on the page, SQL Server follows the page pointer indicated in the first index entry. In this example, the down-page pointer is at byte offsets 6-9, with a hex value of 0x0839. In decimal, this is 2105.

We can then use DBCC PAGE again to look at page 2105, part of which is shown in Figure 8-6. The row structure is identical to the rows on page 1462, the root page.

Figure 8-5. The root page of a clustered index.

Figure 8-6. An intermediate index-level page for a clustered index.

The first row has a meaningless key value, but the page-down pointer should be the first page in clustered index order. The hex value for the page-down pointer is 0x07dd, which is 2013 decimal. That page is the leaf level page, and has a structure of a normal data page. If we use DBCC PAGE to look at that page, we'll see that the first row has the value of '10000', which is the minimum value of the str1 column in the table. We can verify this with the following query:

SELECT爉in(str1)
FROM燾lustered_nodupes

Remember that the clustered index column is a char(5) column. Although the first number I entered was 1240, when converted to a char(5), that's '1240 '. When sorted alphabetically, '10000' comes well before '1240 '.

Nonclustered Index Leaf Rows

Index node rows in a clustered index contain only the first key value of the page they are pointing to in the next level and page-down pointers to guide SQL Server in traversing the index. Nonclustered index rows can contain much more information. The rows in the leaf level of a nonclustered index contain every key value and a bookmark. I'll show you three different examples of nonclustered index leaf rows. First, we'll look at the leaf level of a nonclustered index built on a heap. I'll use the same code I used to build the clustered index with no duplicates in the previous example, but the index I build on the str1 column will be nonclustered. I'll also put only a few rows in the table so that the root page will be the entire index.

CREATE燭ABLE爊c_heap_nodupes�(
牋id爄nt燦OT燦ULL�,
牋str1燾har�(5)燦OT燦ULL�,
牋str2燾har�(600)燦ULL�
)�
GO
CREATE燯NIQUE營NDEX爄dxNC_heap燨N爊c_heap_nodupes�(str1)�
GO
SET燦OCOUNT燨N
GO
DECLARE燖i爄nt
SET燖i�=�1240
WHILE燖i�<�1300燘EGIN
牋INSERT營NTO爊c_heap_nodupes�
牋燬ELECT燖i,燾ast(@i燗S燾har),燾ast(@i燗S燾har)
牋SET燖i�=燖i�+�1
燛ND�
GO
SELECT爁irst,爎oot,爄d,爄ndid燜ROM爏ysindexes
WHERE爄d�=爋bject_id('nc_heap_nodupes')
燗ND爄ndid�>�1

RESULT:
first牋牋爎oot牋牋牋id牋牋爄ndid�
--------------�--------------�-----------�------�
0xC10500000100�0xC10500000100�1506104406�2

DBCC燭RACEON�(3604)
GO
DBCC燩AGE�(5,1,1473,�1)

Since this index is built on a heap, the bookmark is an 8-byte RID. As you can see in Figure 8-7, this RID is fixed-length and is located in the index row immediately after the index key value of '1240 '. The first 4 bytes are the page address, the next 2 bytes are the file ID, and the last 2 bytes are the slot number.

Figure 8-7. An index row on a leaf-level page from a nonclustered index on a heap.

I'll now build a nonclustered index on a similar table, but before building the nonclustered index, I'll build a clustered index on a varchar column so we can see what an index looks like when a bookmark is a clustered index key.. Also, in order not to have the same values in the same sequence in both the str1 and str2 columns, I'll introduce a bit of randomness into the generation of the values for str2. If you run this script, the randomness might end up generating some duplicate values for the unique clustered index column str2. Since each row is inserted in its own INSERT statement, a violation of uniqueness will cause only that one row to be rejected. If you get error messages about PRIMARY KEY violation, just ignore them. You'll still end up with enough rows.

CREATE燭ABLE爊c_nodupes�(
牋id爄nt燦OT燦ULL�,
牋str1燾har�(5)燦OT燦ULL�,
牋str2爒archar�(10)燦ULL�
)�
GO
CREATE燯NIQUE燙LUSTERED營NDEX爄dxcl_str2爋n爊c_nodupes�(str2)
CREATE燯NIQUE營NDEX爄dxNC燨N爊c_nodupes�(str1)�
GO
SET燦OCOUNT燨N
GO
DECLARE燖i爄nt
SET燖i�=�1240
WHILE燖i�<�1300燘EGIN
牋INSERT營NTO爊c_nodupes�
牋燬ELECT燖i,燾ast(@i燗S燾har),�
牋牋牋燾ast(cast(@i�*爎and()燗S爄nt)燼s燾har)
牋SET燖i�=燖i�+�1
燛ND�
GO
SELECT爁irst,爎oot,爄d,爄ndid燜ROM爏ysindexes
WHERE爄d�=爋bject_id('nc_nodupes')
燗ND爄ndid�>�1

RESULT:
first牋牋爎oot牋牋牋id牋牋爄ndid�
--------------�--------------�-----------�------�
0xD00500000100�0xD00500000100�1586104691�2

DBCC燭RACEON�(3604)
GO
DBCC燩AGE�(5,1,1488,�1)

Figure 8-8. An index row on a leaf-level page from a nonclustered index built on a clustered table.

The index row contains the nonclustered key value of '1240 ', and the bookmark is the clustered key '822'. Because the clustered key is variable length, it comes at the end of the row after the Ncol value and null bitmap. A variable-length key column in an index uses only the number of bytes needed to store the value inserted. In the script to create this table, an int is converted to a char to get the value for the field. Since no length is supplied for the char, SQL Server by default allows 30 characters. However, since the table was defined to allow a maximum of 10 characters in that column, that is all that was stored in the row. If the script to create the table had actually converted to char(3), the clustered key bookmark in the index row would use only 3 bytes.

The third example will show a composite nonclustered index on str1 and str2 and an overlapping clustered index on str2. I'll just show you the code to create the table and indexes; the code to populate the table and to find the root of the index is identical to the code in the previous example.

CREATE燭ABLE爊c_overlap�(
牋id爄nt燦OT燦ULL�,
牋str1燾har�(5)燦OT燦ULL�,
牋str2燾har�(10)燦ULL�
)�
GO
CREATE燯NIQUE燙LUSTERED營NDEX爄dxCL_overlap燨N爊c_overlap(str2)
CREATE燯NIQUE營NDEX爄dxNC_overlap燨N爊c_overlap�(str1,爏tr2)�
GO

Figure 8-9 shows a leaf-level index row from the nonclustered composite index on the table nc_overlap.

Figure 8-9. An index row on a leaf-level page from a composite nonclustered index with an overlapping clustered key.

Note that the value in column str2 ('375' in this row) is part of the index key for the nonclustered index as well as the bookmark because it is the clustered index key. Although it serves two purposes in the row, its value is not duplicated. The value '375' occurs only once, taking the maximum allowed space of 10 bytes. Because this is a leaf-level row, there is no page-down pointer, and because the index is built on a clustered table, there is no RID, only the clustered key to be used as a bookmark.

Nonclustered Index Node Rows

You now know that the leaf level of a nonclustered index must have a bookmark because from the leaf level you want to be able to find the actual row of data. The nonleaf levels of a nonclustered index only need to help us traverse down to pages at the lower levels. If the nonclustered index is unique, the node rows need to have only the nonclustered key and the page-down pointer. If the index is not unique, the row contains key values, a down-page pointer, and a bookmark.

Let's say I need to create a table that will have enough rows for more than a single nonclustered index level. I can use a script similar to the ones shown previously and change the upper limit of the WHILE loop to 13000. I'll do this twice: I'll create one table with a nonunique nonclustered index on str1 and a clustered index on str2 and another table with a unique nonclustered index on str1 and a clustered index on str2. For the purposes of creating the index rows, SQL Server doesn't care whether the keys in the nonunique index actually contain duplicates. If the index is not defined to be unique, even if all the values are unique, the nonleaf index rows will contain bookmarks. You can see this is in the index row in Figure 8-10. Figure 8-11 shows an index row for a nonclustered index row for a unique index.

Figure 8-10. A nonleaf index row for a nonunique, nonclustered index.

Figure 8-11. A nonleaf index row for a unique, nonclustered index.

As you've seen, once you've found the root page of an index in the sysindexes table, you can decode down-page pointers in every row to find all the pages that belong to the index. The header information of an index page also has level, prevPage, and nextPage values, so you can ascertain the order of pages at a particular level of the index. As much as I enjoy decoding strings of bytes and converting hex values to decimal, I don't mind an occasional shortcut. You can use an undocumented DBCC command that lists all the pages that belong to a particular index. Keep in mind that Microsoft does not support this command and that it is not guaranteed to maintain the same behavior from release to release. The command DBCC IND has three parameters:

DBCC營ND�({'dbname'爘燿bid爙,爗�'objname'爘爋bjid爙,�
{爄ndid爘�0爘�-1爘�-2爙)

The first parameter is the database name or the database ID. The second parameter is an object name or object ID within the database. The third parameter is a specific index ID or one of the values 0, -1, or -2. The values for this parameter have the following meanings:

0 Displays the page numbers for all IAMs and data pages.
-1 Displays the page numbers for all IAMs, data pages, and index pages.
-2 Displays the page numbers for all IAMs.
Index ID Displays the page numbers for all IAMs and index pages for this index. If the index ID is 1 (meaning the clustered index), the data pages are also displayed.

If we run this command on the table I created for Figure 8-10, which is called nc_nonunique_big, with a third parameter of 2 (for the nonclustered index), we get the following results:

DBCC營ND(5,爊c_nonunique_big,�2)
RESULTS:
牋牋牋牋牋牋牋牋�  牋牋                  NextPageFID燦ext燩rev燩rev
牋牋牋牋牋牋牋牋牋�         牋           Page牋牋    Page Page燩age
PageFID燩agePID營AMFID營AMPID營ndexLevel燜ID牋牋�    PID� FID� PID
-------�-------�------�------�----------�-----------�----�----�----�
1牋   牋1594�  燦ULL� 燦ULL  �0     牋牋�0牋牋牋     0牋  0  牋0
1   牋牋1593  牋1牋   �1594牋 1牋牋�     0     牋牋牋0牋  0牋  0
1牋�   �1596牋  1   牋�1594 牋0     牋牋�1牋牋�     �4608�0牋  0
1   牋牋1597  牋1牋�   1594� �0牋牋�     1     牋牋牋1599�1�  �4611
1牋�   �1599牋  1   牋�1594牋 0     牋牋�1牋牋�     �1602�1  牋1597
1�   牋�1602  牋1牋�   1594 牋0牋牋�     1     牋牋牋1604�1牋  1599
1牋牋   1604牋  1�   牋1594� �0     牋牋�1牋牋�     �1606�1  牋1602
1   牋牋1606�  �1牋�   1594牋 0牋牋�     1     牋牋牋1607�1牋  1604
1牋�   �1607牋  1   牋�1594 牋0     牋牋�1牋牋�     �4584�1  牋1606
1�   牋�4584  牋1牋�   1594� �0牋牋�     1�     牋牋�4585�1牋  1607
1牋牋   4585牋  1   牋�1594牋 0     牋牋�1牋牋牋     4586�1  牋4584
1�   牋�4586  牋1牋�   1594 牋0牋牋�     1�     牋牋�4587�1牋  4585
1牋牋   4587牋  1   牋�1594� �0     牋牋�1牋牋牋     4588�1  牋4586
1   牋牋4588  牋1牋�   1594牋 0牋牋�     1     牋牋牋4589�1牋  4587
1牋�   �4589牋  1   牋�1594 牋0     牋牋�1牋牋�     �4590�1  牋4588
1�   牋�4590  牋1牋�   1594� �0牋牋�     0�     牋牋�0牋  1牋  4589
1牋牋   4591牋  1   牋�1594牋 0     牋牋�1     牋牋牋4611�1  牋4610
1   牋牋4608  牋1牋�   1594 牋0牋牋�     1牋牋�     �4609�1�  �1596
1牋�   �4609牋  1   牋�1594� �0     牋牋�1�     牋牋�4610�1牋  4608
1�   牋�4610  牋1牋�   1594牋 0牋牋�     1牋牋牋     4591�1  牋4609
1牋牋   4611牋  1   牋�1594� �0     牋牋�1牋牋     牋1597�1�  �4591

The columns in the result set are described in Table 8-2. Note that all page references have the file and page component conveniently split between two columns, so you don't have to do any conversion.

Table 8-2. Column descriptions for DBCC IND output.

ColumnMeaning
PageFID Index file ID
PagePID Index page ID
IAMFID File ID of the IAM managing this page
IAMPID Page ID of the IAM managing this page
ObjectID (Not shown in output) Object ID
IndexID (Not shown in output) Index ID
PageType (Not shown in output) Page type: 1 = Data page, 2 = Index page, 10 = IAM page
IndexLevel Level of index; 0 is leaf
NextPageFID File ID for next page at this level
NextPagePID Page ID for next page at this level
PrevPageFID File ID for previous page at this level
PrevPagePID Page ID for previous page at this level

 

posted @ 2014-08-19 11:30  桦仔  阅读(205)  评论(0)    收藏  举报