第7章 透视、逆透视及分组集
7.1 透视
7.1.1 使用标准SQL进行透视转换
SELECT empid, SUM(CASE WHEN custid = 'A' THEN qty END) AS A, SUM(CASE WHEN custid = 'B' THEN qty END) AS B, SUM(CASE WHEN custid = 'C' THEN qty END) AS C, SUM(CASE WHEN custid = 'D' THEN qty END) AS D FROM dbo.Orders GROUP BY empid;
1、分组阶段:处理相关的分组或行元素。用GROUP BY子句实现,本例的分组元素为 GROUP BY empid。
2、扩展阶段:处理相关的扩展或列元素。为每个目标列指定case表达式来实现,本例中的扩展元素为custid。
3、聚合阶段:处理相关的聚合元素和聚合函数。为每个case表达式的结果应用相关的聚合函数,本例为SUM。当然,可根据不同的需求使用MAX、MIN和COUNT等。
据自己理解,将过程分析如下:

--------------------------------------- --按empid分组,此时只有empid 1列 --------------------------------------- SELECT empid FROM dbo.Orders GROUP BY empid; ---------------------------------------- --添加4个新的列,比如新列的值都取空 --从只有empid 1列,扩展成了5列 ---------------------------------------- SELECT empid, '' AS A, '' AS B, '' AS C, '' AS D FROM dbo.Orders GROUP BY empid; ---------------------------------------- --下一步考虑的是在行列交叉处填什么值 --用case将符合条件行的qty全部筛出来 --用sum将这些筛出来的值求和 ---------------------------------------- --先填充一列的值看看 SELECT empid, SUM(CASE WHEN custid = 'A' THEN qty END) AS A FROM dbo.Orders GROUP BY empid; --再填充所有列 ...
7.1.2 使用 T-SQL PIVOT 运算符进行透视转换
SQL server2005引入了一个T-SQL 特有的运算符,和其他表运算符(如,JOIN)类似,PIVOT也是在查询FROM子句的上下文执行操作。
同样涉及三个阶段,但使用的是SQL Sever 自己的语法。
SELECT empid, A, B, C, D FROM (SELECT empid, custid, qty FROM dbo.Orders) AS D PIVOT(SUM(qty) FOR custid IN(A, B, C, D)) AS P;
7.2 逆透视
7.2.1 使用标准SQL进行逆透视转换
SELECT * FROM (SELECT empid, custid, CASE custid WHEN 'A' THEN A WHEN 'B' THEN B WHEN 'C' THEN C WHEN 'D' THEN D END AS qty FROM dbo.EmpCustOrders CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS Custs(custid)) AS D WHERE qty IS NOT NULL;
3个逻辑处理阶段:生成副本、提取元素、删除不相关的交叉:

--------------------------------------------------- --Step 1: 生成副本 --(1)使用cros join进行交叉连接 --(2)按values子句的格式来创建一个虚拟表 --------------------------------------------------- SELECT * FROM dbo.EmpCustOrders CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS Custs(custid); --或 SELECT * FROM dbo.EmpCustOrders CROSS JOIN (SELECT 'A' AS custid UNION ALL SELECT 'B' UNION ALL SELECT 'C' UNION ALL SELECT 'D') AS Custs; ---------------------------------------------------- --Step 2: 提取元素,生成一个数据列 ---------------------------------------------------- SELECT empid, custid, CASE custid WHEN 'A' THEN A WHEN 'B' THEN B WHEN 'C' THEN C WHEN 'D' THEN D END AS qty FROM dbo.EmpCustOrders CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS Custs(custid); ------------------------------------------------------ -- Unpivot Step 3: 过滤掉不需要的值 ------------------------------------------------------ SELECT * FROM (SELECT empid, custid, CASE custid WHEN 'A' THEN A WHEN 'B' THEN B WHEN 'C' THEN C WHEN 'D' THEN D END AS qty FROM dbo.EmpCustOrders CROSS JOIN (VALUES('A'),('B'),('C'),('D')) AS Custs(custid)) AS D WHERE qty IS NOT NULL;
7.2.2 使用 T-SQL 的 UNPIVOT 运算符进行逆透视转换
SELECT empid, custid, qty FROM dbo.EmpCustOrders UNPIVOT(qty FOR custid IN(A, B, C, D)) AS U;
7.3 分组集
分组集就是分组(GROUP BY子句)使用的一组属性(或列名)。
如果将基于多个分组集的查询放在同一张表里呢?
SELECT empid, custid, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY empid, custid UNION ALL SELECT empid, NULL, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY empid UNION ALL SELECT NULL, custid, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY custid UNION ALL SELECT NULL, NULL, SUM(qty) AS sumqty FROM dbo.Orders;
实现了,但代码长,执行效率低。应该用下面的方式(sqlserver2008新加入的,且是标准SQL):
7.3.1 GROUPING SETS 从属子句
SELECT empid, custid, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY GROUPING SETS ( (empid, custid), (empid), (custid), () );
7.3.2 CUBE从属子句
CUBE(a, b, c)与GROUPING SETS((a, b, c), (a,b), (a,c), (b, c), (a), (b), (c), ())等效。
在集合论中,由给定集合的所有子集构成的集合称为幂集(power set)。对于由给定元素构成的一个分组集(如,(a, b, c)),可以把CUBE从属子句看作是用于生成这个分组集的幂集。
SELECT empid, custid, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY CUBE(empid, custid);
7.3.3 ROLLUP从属子句
ROLLUP认为输入成员之间存在一定的层次关系,从而生成让这种层次关系有意义的所有分组集。
ROLLUP(a, b, c)与GROUPING SETS((a, b, c), (a,b), (a), ())等效。
SELECT YEAR(orderdate) AS orderyear, MONTH(orderdate) AS ordermonth, DAY(orderdate) AS orderday, SUM(qty) AS sumqty FROM dbo.Orders GROUP BY ROLLUP(YEAR(orderdate), MONTH(orderdate), DAY(orderdate));
7.3.4 GROUPING 和 GROUPING_ID 函数
从总的结果中,判断具体的一条记录属于哪个分组集。