《Microsoft Sql server 2008 Internal》读书笔记--第七章Special Storage(6)

《Microsoft Sql server 2008 Interna》读书笔记订阅地址：

http://www.cnblogs.com/downmoon/category/230397.html/rss

《Microsoft Sql server 2008 Interna》索引目录：

《Microsoft Sql server 2008 Internal》读书笔记--目录索引

这一节我们继续了解稀疏列的物理存储。

■物理存储

在某种角度上，你可以稀疏列作为存储，同时使用列集显示。如此，没有值的特殊列，它不会被列出，也不会占用空间；但有值的稀疏列比正常的null列得花费更多的空间。

用non-sparse列定义的表：

ID	sc1	sc2	sc3	sc4	sc5	sc6	sc7	sc8	sc9
1	1								9
2		2		4		6
3							7
4	1				5
5				4				8
6			3						9
7					5		7
8		2						8
9			3			6

用sparse列定义的表：

ID	sparse columns
1	(sc1,sc9)(1,9)
2	(sc2,sc4)(2,4)
3	(sc6,sc7)(6,7)
4	(sc1,sc5)(1,5)
5	(sc4,sc8)(4,8)
6	(sc3,sc9)(3,9)
7	(sc5,sc7)(5,7)
8	(sc3,sc8)(2,8)
9	(sc3,sc7)(3,7)

SQL Server用一个叫做稀疏向量(sparse vector)的行来跟踪Sparse列的存储。
稀疏向量的里的字节：

Name	Number of Bytes	Meaning
Complex column Header	2	05
Sparse column Count	2	numbers of sparse columns
Column ID Set	2* the number of sparse columns	two bytes for the column ID of each colun in the table with a value stored in the sparse vector
ColumnOffset Table	2* the number of sparse columns	two bytes for the offset of the ending position of each sparse column
Sparse Data	Depends on actural values	Data

我们看一个例子：

USE testdb;
GO
IF EXISTS (SELECT * FROM sys.tables
   WHERE name = 'sparse_bits')
  DROP TABLE sparse_bits;
GO
CREATE TABLE sparse_bits
(
c1 int IDENTITY,
c2 varchar(4),
c3 char(4) SPARSE,
c4 varchar(4) SPARSE
);
GO
INSERT INTO sparse_bits SELECT 'aaaa', 'bbbb', 'cccc';
INSERT INTO sparse_bits SELECT 'dddd', null, 'eeee';
INSERT INTO sparse_bits SELECT 'ffff', null, 'gg';
GO

现在我们使用DBCC IND命令查找存储三行的数据页的页数量，用DBCC Page查看页的字节。

DBCC IND(testdb, sparse_bits, -1);
GO

邀月工作室

稀疏向量的字节偏移：

邀月工作室

字节交换后的值：

邀月工作室

关于sparse vector，请注意以下几点：

1、sparse vector不会出现关于null值列的信息

2、对于稀疏向量，存储固定长度和可变长度的字符串没有区别。但是一个稀疏varchar列（不符合8060字节）能被存储在行溢出数据，而一个稀疏char列不可以。

3、因为只有两个字节存储稀疏列的数量，这就是数量的限制。

4、两字节的复杂列的头部显示可能是另外一个复杂列的类型，可能是向后指针(back-Poiner)（对应前文的转向指针forward-pointer)

■元数据

非常小的特殊元数据被用来支持稀疏列。目录视图sys.columns包含两个列，用来跟踪表的Sparse column,每个列仅有两个可能的值0或1：is_Sparse和is_column_set，属性函数有两个值与稀疏列有关：isSarse和isColumnSet

使用Sparse的列：

SELECT OBJECT_NAME(object_id) as 'Table', name as 'Column', is_sparse, is_column_set
FROM sys.columns
WHERE OBJECT_NAME(object_id) like '%sparse%';

使用列集的表：

SELECT OBJECT_NAME(object_id) as 'Table', name as 'Column'
FROM sys.columns
WHERE COLUMNPROPERTY(object_id, name, 'IsColumnSet') = 1;

■稀疏列存储方面的节省

Sparse column 主要是为了解决大部分为null值的表而设置的。但同时那些不是null的列却将要花费更多的空间存储开销。我们用一个例子(插入10万数据)说明：

代码

USE testdb;
GO
SET NOCOUNT ON;
GO
IF EXISTS (SELECT * FROM sys.tables
   WHERE name = 'sparse_nonulls_size')
  DROP TABLE sparse_nonulls_size;
GO
CREATE TABLE sparse_nonulls_size
(col1 int IDENTITY,
col2 datetime SPARSE,
col3 char(10) SPARSE
);
GO
IF EXISTS (SELECT * FROM sys.tables
   WHERE name = 'nonsparse_nonulls_size')
  DROP TABLE nonsparse_nonulls_size;
GO
CREATE TABLE nonsparse_nonulls_size
(col1 int IDENTITY,
col2 datetime,
col3 char(10)
);
GO
IF EXISTS (SELECT * FROM sys.tables
   WHERE name = 'sparse_nulls_size')
  DROP TABLE sparse_nulls_size;
GO
CREATE TABLE sparse_nulls_size
(col1 int IDENTITY,
col2 datetime SPARSE,
col3 char(10) SPARSE
);
GO
IF EXISTS (SELECT * FROM sys.tables
   WHERE name = 'nonsparse_nulls_size')
  DROP TABLE nonsparse_nulls_size;
GO
CREATE TABLE nonsparse_nulls_size
(col1 int IDENTITY,
col2 datetime,
col3 char(10)
);
GO
DECLARE @num int
SET @num = 1
WHILE @num < 100000
BEGIN
  INSERT INTO sparse_nonulls_size
SELECT GETDATE(), 'my message';
  INSERT INTO nonsparse_nonulls_size
SELECT GETDATE(), 'my message';
  INSERT INTO sparse_nulls_size
SELECT NULL, NULL;
  INSERT INTO nonsparse_nulls_size
SELECT NULL, NULL;
  SET @num = @num + 1;
END;
GO

现在我们看一下每个表的页面数。下面的元数据查询（使用sys.allocation_units视图）：

邀月工作室

注意：页面数最小的是有（null值）稀疏列的表。与没有稀疏列的表使用相同页面数的是列有null值或没有但是数据被定义成固定长度的表。这个空间比那个有null的稀疏列大两倍还多。最可怕的是定义了稀疏列，但这个列有not null值。

Fixed-Length Data Types

Data type	Nonsparse bytes	Sparse bytes	NULL percentage
bit	0.125	4.125	98%
tinyint	1	5	86%
smallint	2	6	76%
int	4	8	64%
bigint	8	12	52%
real	4	8	64%
float	8	12	52%
smallmoney	4	8	64%
money	8	12	52%
smalldatetime	4	8	64%
datetime	8	12	52%
uniqueidentifier	16	20	43%
date	3	7	69%

Precision-Dependent–Length Data Types

Data type	Nonsparse bytes	Sparse bytes	NULL percentage
datetime2(0)	6	10	57%
datetime2(7)	8	12	52%
time(0)	3	7	69%
time(7)	5	9	60%
datetimetoffset(0)	8	12	52%
datetimetoffset (7)	10	14	49%
decimal/numeric(1,s)	5	9	60%
decimal/numeric(38,s)	17	21	42%
vardecimal(p,s)	Use the decimal type as a conservative estimate.

Data-Dependent–Length Data Types

Data type	Nonsparse bytes	Sparse bytes	NULL percentage
sql_variant	Varies with the underlying data type
varchar or char	2*	4*	60%
nvarchar or nchar	2*	4*+	60%
varbinary or binary	2*	4*	60%
xml	2*	4*	60%
hierarchyid	2*	4*	60%

*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes.

通常推荐的做法是：当你断定使用稀疏列能节省20%-40%的空间时使用稀疏列。至此，稀疏列告一段落，下一篇请看数据压缩。

posted @ 2010-05-09 13:02 邀月阅读(1899) 评论(6) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

《Microsoft Sql server 2008 Internal》读书笔记--第七章Special Storage(6)

公告