【SQLpassion Performance Tuning Training Plan】 - Week 5: Heap Tables

Welcome to the second month of the SQLpassion Performance Tuning Training Plan. This month will be one of the most interesting and challenging ones - it's the month where we talk exclusively(adv. 唯一地; 专门地,特定地; 专有地; 排外地;) about Indexing, Indexing, and Indexing in SQL Server. But it's worthwhile(adj. 有价值的; 值得做的; 合算; 值得花时间的;) - trust me.

Today I cover so-called Heap Tables, and in the following 3 weeks we are talking about Clustered Indexes, Non-Clustered Indexes, and Indexing Strategies(n. 战略; 对策; 策略( strategy的名词复数 ); 战略学;)for your SQL Server database. Let's talk about heap tables. Heap tables are tables without a Clustered Index. A table in SQL Server can have a Clustered Index, then it's called a Clustered Table, and without a Clustered Index, it's called a heap table.

In a heap table, the data is not sorted in any way, it's just a pile of unordered, unstructured records. When you access a heap table through a SELECT statement, SQL Server will use aTable Scan operator in the Execution Plan, when you have no suitable(adj. 合适的,适当的,适宜的,恰当的;) Non-Clustered Index defined. There is no Table Seek operator available. That's very important.

As you will see next week, when we talk about Clustered Indexes, you access a Clustered Index through a Clustered Index Scan and through a Clustered Index Seek operator. On a heap table you have only a Table Scan operator. And Table Scan means that you have to scan the whole table. And it will not scale(n. 规模; 比例(尺); ) with the amount of data you have. The more data you have, the longer the operation takes.

A Table Scan is always a O(n) operation (more about the Big O notation) - it will not scale as your table gets larger and larger. Now let's have a more detailed look at the advantages and disadvantages of heap tables in SQL Server.

Advantages

Heap tables are very, very, very fast - for inserting data. As we have said earlier it's just a pile of data - nothing more. When you crack(vt. 破裂,打开; (使…) 开裂; ) out every page from a traditional phone book and place the individual(adj. 个人的; 个别的; 独特的;) pages on the desk in front of you, you have created a heap table. Inserting new phone book records into that heap table is very fast: you allocate(vt. 分配,分派; 把…拨给;) a new page (of 8kb), write the new records onto that page, and finally put the page with the other ones in front of you. Finished. No sorting order has to be guaranteed( adj. 有保证的; 有人担保的;). 

It's the same with SQL Server: a new page is allocated, the records are stored onto that page, and the page is assigned( n. [计][修](已)赋值[分配] ; [计] 指定的,赋值的;) to the heap table. Finished. This is a very fast approach, because SQL Server doesn't has to ensure any sorting order. It's up to SQL Server where to put the new records.

For that reason it can be sometimes a good idea to leave some tables as heap tables in a database schema: tables on which you have a huge, parallel( adj. 平行的; 相同的,类似的; [电] 并联的; [计] 并行的;) INSERT activity. Just think about your logging/auditing tables. But I'm never ever recommending using heap tables everywhere. There are only a few specific( adj. 具体的; 明确的; 特种的; [免疫学] 特效的;) use cases where they make sense. But not everywhere!

Disadvantages

Besides the advantage that heap tables are very fast for inserting data, there are also a huge number of disadvantages, that you also have to incorporate(vt. 包含; 组成公司; 使混合; 使具体化;) into your thoughts when you decide to create a table as a heap table. 

As a first disadvantage, a heap table leads to random I/O in the storage subsystem when you access the table data. Imagine you are executing a simple SELECT statement against your heap table. If the data isn't yet cached in the Buffer Pool, SQL Server has to issue physical reads to your storage subsystem. These read will be random I/O, because the pages of the heap table are stored somewhere in your data files, but not next to each other.

If you are using traditional rotational(adj. 转动的,轮流的;) storage (which is currently still the most common case), you will have performance problems on your storage level, because random I/O is slow, very slow. SSD drives are a big game changer(规则改变者; 游戏改变者; 博弈改变者;) here, because it's doesn't really matter anymore, if you are performing random or sequential(adj. 序贯; 时序; 按次序的,相继的,构成连续镜头的;) I/O on SSDs. Both operations are almost the same regarding their speed (random I/O is still "a little" bit slower than sequential I/O). 

Another big problem that you will definitely have with your heap tables are so-called Forwarding Records. Records that are stored on a heap table can move in some circumstances(n. 境况; 境遇;) from one page to another one. If this occurs( vi. 发生; 出现; 闪现;), SQL Server stores on the original( adj. 原始的; 最初的; 独创的; 新颖的;) page a forwarding record that points to the new location where the record is stored.

When you are accessing your data, SQL Server still accesses the original page, and fetches( vt. 接来(某人); 使发出;  vi. 取来; 抵达,到达;) the record through the forwarding record, which is an additional page read that you need. And this will also slow down your read performance tremendously( adv. 非常; 极其; 极大地; 极端地;). If you are more interested in forwarding records, and how you can avoid them, I suggest that you watch my 10th SQL Server Quickie on that specific topic.

Summary

 

Heap tables have their uses - in some cases. I always recommend creating a Clustered Table (define a Clustered Index on it), but think about some specific use cases, and whether a heap table would serve your business needs in a better way (regarding( prep. 关于; 就…而论; 至于;) the performance characteristics(n. 性质; 特性,特征,特色)). If you want to have a more detailed explanation of when heap tables can be suitable(adj. 合适的,适当的,适宜的,恰当的;) for your workload, I also highly recommend that you read Thomas Kejser's blog posting Clustered Indexes vs. Heaps about it. Thomas gives you a very controversial( adj. 有争议的,引起争议的,被争论的; 好争论的;), but still valid insight( n. 顿悟; 领悟; 洞察力,洞悉; 直觉,眼光;) on when heap tables can make sense for you. 

Next week I will talk more about Clustered Indexes in SQL Server. You will learn how to choose your right Clustered Key - when they are good - and when they are bad. Stay tuned, and see you very soon.

posted @ 2017-06-14 15:16  FH1004322  阅读(110)  评论(0)    收藏  举报