Exploring NewSequentialID() in SQL Server 2005(转自SqlJunkies)

Posted on 2007-08-15 17:04 江南白衣阅读(848) 评论(0) 编辑收藏举报

By Rob Garrison
Published: 6/14/2006
Reader Level: Intermediate
Rated: 4.00 by 1 member(s).

Introduction

The benefits and drawbacks of GUIDs as primary keys are generally known. Despite the drawbacks, some people use them to fulfill specific business requirements (the IDs are unique across multiple servers). For a concise explanation of the drawbacks, see the second paragraph of this blog entry by Marcus Mac Innes.

Thankfully, SQL Server 2005 introduces a new way to create GUIDs that generates them in sequence. This gives you the benefit of sequential IDs (like using an IDENTITY) with the uniqueness benefits of GUIDs.

There are some very important things to understand about this feature. From SQL Server 2005 Books Online:

"Creates a GUID that is greater than any GUID previously generated by this function on a specified computer."
"Important: If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID, and therefore access data associated with that GUID."
"The GUIDs generated by NEWSEQUENTIALID() are unique only within a particular computer if the computer does not have a network card."
"You can use NEWSEQUENTIALID() to generate GUIDs to reduce page contention at the leaf level of indexes."

Read that first sentence carefully, because some have interpreted it incorrectly. It means that if you don't have a network card, you could create GUIDs that match GUIDs created on another machine.

Let's see how this works by looking at GUIDs created one after the other on two different machines. This code will get us started:

CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID())
CREATE TABLE t2 (EntryID uniqueidentifier DEFAULT NewSequentialID())

INSERT INTO t1 DEFAULT VALUES
INSERT INTO t1 DEFAULT VALUES
INSERT INTO t1 DEFAULT VALUES

INSERT INTO t2 DEFAULT VALUES
INSERT INTO t2 DEFAULT VALUES
INSERT INTO t2 DEFAULT VALUES

SELECT EntryID, CONVERT(binary(16), EntryID) AS ConvertBinary FROM t1
SELECT EntryID, CONVERT(binary(16), EntryID) AS ConvertBinary FROM t2

System 1

EntryID ConvertBinary
------------------------------------ ----------------------------------
853C63ED-D7CC-4E86-965E-F425A53E1BA2 0xED633C85CCD7864E965EF425A53E1BA2
032F7CCC-70DF-41F5-AD24-4296922D6E3C 0xCC7C2F03DF70F541AD244296922D6E3C
16208550-D184-4E57-BBFD-35DE50D95C31 0x5085201684D1574EBBFD35DE50D95C31

EntryID ConvertBinary
------------------------------------ ----------------------------------
24AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA24F63911DA8F7300096B6FFD84
25AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA25F63911DA8F7300096B6FFD84
26AA2604-39F6-DA11-8F73-00096B6FFD84 0x0426AA26F63911DA8F7300096B6FFD84

System 2

EntryID ConvertBinary
------------------------------------ ----------------------------------
5E11A586-534C-4CB8-8073-BD60B9D9F57F 0x86A5115E4C53B84C8073BD60B9D9F57F
DA29DBAF-B273-4648-914C-B069A0072060 0xAFDB29DA73B24846914CB069A0072060
6C0AE722-F93A-40F3-88F0-F58FD470E9C1 0x22E70A6C3AF9F34088F0F58FD470E9C1

EntryID ConvertBinary
------------------------------------ ----------------------------------
6009C875-3BF6-DA11-8104-000E35C144DE 0x75C80960F63B11DA8104000E35C144DE
6109C875-3BF6-DA11-8104-000E35C144DE 0x75C80961F63B11DA8104000E35C144DE
6209C875-3BF6-DA11-8104-000E35C144DE 0x75C80962F63B11DA8104000E35C144DE

Based on the documentation, there would never be a collision between the GUIDs created on different machines.

Shared Counter

As expected, multiple tables using NewSequentialID() will create sequential IDs. This is clear from a simple example.

CREATE TABLE dbo.t1 (col1 uniqueidentifier DEFAULT NewSequentialID())
CREATE TABLE dbo.t2 (col1 uniqueidentifier DEFAULT NewSequentialID())
CREATE TABLE dbo.t3 (col1 uniqueidentifier DEFAULT NewSequentialID())

-- Three inserts into t1
INSERT INTO dbo.t1 DEFAULT VALUES
INSERT INTO dbo.t1 DEFAULT VALUES
INSERT INTO dbo.t1 DEFAULT VALUES

-- Alternating inserts into t2 and t3
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES
INSERT INTO dbo.t2 DEFAULT VALUES
INSERT INTO dbo.t3 DEFAULT VALUES

SELECT CONVERT(binary(16), Col1) AS 't1Col1' FROM dbo.t1
SELECT CONVERT(binary(16), Col1) AS 't2Col1' FROM dbo.t2
SELECT CONVERT(binary(16), Col1) AS 't3Col1' FROM dbo.t3

t1Col1
----------------------------------
0xBC0AB27AF25311DA8F7300096B6FFD84 [Sequence: 1]
0xBC0AB27DF25311DA8F7300096B6FFD84 [Sequence: 2]
0xBC0AB27FF25311DA8F7300096B6FFD84 [Sequence: 3]

t2Col1
----------------------------------
0xBC0AB281F25311DA8F7300096B6FFD84 [Sequence: 4]
0xBC0AB287F25311DA8F7300096B6FFD84 [Sequence: 6]
0xBC0AB28BF25311DA8F7300096B6FFD84 [Sequence: 8]

t3Col1
----------------------------------
0xBC0AB284F25311DA8F7300096B6FFD84 [Sequence: 5]
0xBC0AB289F25311DA8F7300096B6FFD84 [Sequence: 7]
0xBC0AB28DF25311DA8F7300096B6FFD84 [Sequence: 9]

Space

I built a test database to compare NewID() GUIDs, NewSequentialID() GUIDs, and integer IDENTITY() as clustered primary (surrogate) keys. The basic setup is a credit card database. There are three tables: Customer, Card, and Purchase. Each table has an ID column, and each child table has a foreign key to the parent's primary key.

The script creates 100,000 customers, two cards per customer, and 100 purchases per card. (Note that the attached script has the 100,000 number commented out. Generating the full data set took almost 43 hours.)

Table	Type	Rows	Data Size (KB)	Index Size (KB)
Customer1	Sequential GUID	100,000	5,264	2,760
Customer2	GUID	100,000	7,736	2,760
Customer3	bigint	100,000	4,472	1,944

Card1	Sequential GUID	20,000	9,704	12,496
Card2	GUID	20,000	14,288	17,088
Card3	bigint	20,000	6,536	7,632

Purchase1	Sequential GUID	20,000,000	1,126,768	707,736
Purchase2	GUID	20,000,000	1,635,232	1,067,176
Purchase3	bigint	20,000,000	812,184	381,960

Using a sequential GUID here saves significant space versus a standard NewID() GUID. The bigint is still much better than either of the GUID alternatives.

INSERT ... OUTPUT ... INTO

When inserting into an identity column, SCOPE_IDENTITY() will return the identity column value from the just-added record.

For SQL Server 2005, when inserting into a column with NewID() or NewSequentialID() as the primary (surrogate) key, you could retrieve the ID based on the natural key. But the exciting development in 2005 is the OUTPUT clause. Try this simple test:

CREATE TABLE t1 (EntryID uniqueidentifier DEFAULT NewID())
CREATE TABLE t2 (EntryID uniqueidentifier DEFAULT NewSequentialID())

INSERT INTO t1 OUTPUT INSERTED.* DEFAULT VALUES
INSERT INTO t2 OUTPUT INSERTED.* DEFAULT VALUES

EntryID
------------------------------------
FE06B4B9-0838-4AC5-AF45-827D570A9F75

EntryID
------------------------------------
53644D63-38F6-DA11-8F73-00096B6FFD84

The OUTPUT clause gives you back the GUID that was just created. This is a clear advantage over doing a SELECT to look up the GUID based on data in the record.

Performance

I always want to understand how changes in design and implementation affect performance. (All tests were run on a single-processor desktop with a 2.66 GHz P4, 2 GB RAM, and a single-spindle 80 GB IDE drive running Windows 2003 Server and SQL Server 2005 SP1.)

In this simple test, the execution plan shows that the insert into Customer1 (NewSequentialID) takes 62% of the overall batch time (INSERT: 56%, SELECT: 6%), while the insert into Customer3 (bigint IDENTITY) takes only 38%. If you leave out the post-insert SELECT statements, the numbers are 60%/40%, so the OUTPUT statement is causing a hit by itself.

DECLARE @outputTblCustomer1 TABLE (CustomerID uniqueidentifier)

-- Customer1
INSERT INTO dbo.Customer1 (CustomerNumber, LastName)
OUTPUT INSERTED.CustomerID INTO @outputTblCustomer1
VALUES (-1, N'LastName')

SELECT CustomerID FROM @outputTblCustomer1

-- Customer3
INSERT INTO dbo.Customer3 (CustomerNumber, LastName)
VALUES (-1, N'LastName')

SELECT SCOPE_IDENTITY() AS CustomerID

Recommendations

If you have avoided using GUIDs because of page splits and other issues, reconsider this option as you move to SQL Server 2005.

Because of the performance disadvantage of the ID retrieval, the larger amount of space required (16 bytes versus 8 bytes), and the inherent difficulty in using GUIDs, I believe bigint IDENTITY() still wins over NewSequentialID(). If you have some compelling reason to use GUIDs, then I suggest using NewSequentialID() rather than NewID() ... unless you cannot accept the predictability of NewSequentialID(), then use NewID().

It always comes down to making a decision based on your unique business requirements.

会员力量，点亮园子希望

刷新页面返回顶部

江南白衣

公告

Exploring NewSequentialID() in SQL Server 2005(转自SqlJunkies)