【SQLpassion Performance Tuning Training Plan】 - Week 6: Clustered Indexes
Last week I introduced you to heap tables. As we have said, a table in SQL Server can be a Heap Table or a Clustered Table - a table with a Clustered Index defined on it. And today we will have a more detailed look at Clustered Indexes, and how to choose the right Clustered Key.
Every time you create a Primary Key constraint in SQL Server, the constraint is (by default) enforced( n. 强迫的;) through a Unique Clustered Index. This means that you need to have in that column (or these columns, if you have defined a composite Primary Key) unique values. And as a side effect your table data is physically sorted by that/these column(s). Let's have again a look at the advantages and disadvantages of Clustered Indexes in SQL Server.
Advantages
One of the biggest advantages of Clustered Tables is that the data is physically sorted by the Clustered Key in your storage subsystem. You can compare a Clustered Table to a traditional phone book: the phone book is clustered/sorted by last name, which means that the last name Aschenbrenner(..人名) comes before Bauer(..人名), and Bauer comes before Meyer(..人名). Clustered Tables are therefore completely different from Heap Tables, where you have no physical sorting order.
It's a really huge benefit that you get from Clustered Tables. Imagine you are searching for a specific record in the table, and the column in the WHERE clause on which you are restricting( v. 约束; (以法规) 限制( restrict的现在分词 ); 束缚;) your data is the Clustered Key. In that case SQL Server will choose in the execution plan anClustered Index Seek operator. Seek operations are very, very efficient( adj. 有效率的; (直接) 生效的;), because SQL Server uses a B-tree structure to find the relevant( adj. 有关的,中肯的; 相关联的;) data. The complexity( n. 复杂性,错综复杂的状态; ) of a seek operation is always O(log N). If you want to learn more about the internally used B-tree structure, you can also watch my SQL Server Quickie on that topic. Back in 2010 I also wrote multiple blog postings about that topic.
It's the same when you are searching in a phone book for the name Aschenbrenner. You know that this name can be only at the beginning of the phone book, because the phone book is sorted by this data. So you can avoid a scan of the complete phone book, and SQL Server can avoid a complete scan of the Clustered Index in the leaf level.
As long as you have no index fragmentation( n. 分裂,破碎;) in your Clustered Index, you will also get sequential( adj. 序贯; 时序; 按次序的,相继的) I/O when you are accessing the Clustered Index with a scan operation. Index Fragmentation would mean that the logical and physical sorting orders of your pages in the leaf level are not the same. You can fix index fragmentation through Index Rebuild andIndex Reorganize operations. We will talk more about the difference between both methods in week 24, when we cover database maintenance( n. 维护; 维修; 维持,保持; 保养,保管;).
Whether you have index fragmentation or not depends on your chosen Clustered Key column. As long as you are using an ever increasing value (like INT IDENTITY, or an OrderDate column), records are inserted at the end of the Clustered Index. This means that you are not introducing fragmentation in your index, because SQL Server is only appending data at the end of your Clustered Index. But in some rare( adj. 罕见的,特殊的; 优秀的,不寻常的; ) cases this approach will not scale( n.规模; 比例(尺);测量;衡量; ) indefinitely. Therefore we also have to talk now about the disadvantages that you can have with Clustered Indexes, and a wrong chosen Clustered Key.
Disadvantages
|
Data inserted only at the end of the Clustered Index can introduce the so-called Last Page Insert Latch( n. 门闩; 弹簧锁;) Contention( n. 争夺; 论点; 竞争,争论;), because you have a single hotspot at the end of your Clustered Index, where multiple queries compete against each other, when traversing through the B-tree structure. The following picture shows this phenomenon. |
To overcome this problem you can choose a random Clustered Key for your Clustered Index, because then you are distributing the inserted records across multiple different places in the Clustered Index. But a random Clustered Key introduces so-called hard Page Splits, because SQL Server has to allocate new data pages somewhere within the leaf level of the Clustered Index. Hard page splits will also have a negative impact( n. 影响; 碰撞,冲击,撞击; 冲击力;) on the performance of your transaction log, because logging a hard page split is a lot more work than logging a "normal" INSERT (a so-called soft Page Split) at the end of your Clustered Index.
As a side-effect you are also introducing index fragmentation with a random Clustered Key, because your logical and physical sorting order are not the same anymore. Random I/O will kill the performance of your scan operations on traditional rotational( adj. 转动的,轮流的;) storage because the disk head must move forward and backward on the platters(n. 唱片;) of your drive, when reading the various data pages.
Summary
Clustered Indexes scale very well, because internally they use a B-tree structure. SQL Server can make effective use of this structure when performing index seek operations on your table. But choosing the right and correct Clustered Key is a time-consuming job, because you have to think about all the advantages and disadvantages of every scenario (ever increasing value vs. random value).
For that reason the SQLpassion Online Academy provides you a 1-hour long training video, where I show you both approaches and about what facets( n. 方面; ) you have to be aware of, when choosing your Clustered Key. As I have mentioned( vt. 提到,说起;提及) in the introduction email of the SQLpassion Performance Tuning Training Plan, you are also eligible( adj. 合格的;合适的; ) for a discount for the SQLpassion Online Academy. So please use the discount code SQLTP3For2 during your checkout to get 3 training videos for the price of 2.
Next week I will talk more about Non-Clustered Indexes in SQL Server. You will learn what a Non-Clustered Index is, and what dependency( n. 属国,属地; 从属,从属物;) a Non-Clustered Index has on the Clustered Key defined on the Clustered Index. So enjoy your next 7 days, and then we will meet each other again.

浙公网安备 33010602011771号