SQL开窗函数
之前只使用过
Row_number()函数,最近发现除了这个,还另有几个开窗函数,顺道也了解了下。
RANK
顾名思义,就是对行进行排名。不过这个排名可能会不连续,如果存在并列的情况,下一个不同值会跳过相应的排名数字。
示例如下:
USE AdventureWorks2022;
GO
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID;
GO
结果如下:
ProductID Name LocationID Quantity Rank
----------- ---------------------- ------------ -------- ----
494 Paint - Silver 3 49 1
495 Paint - Blue 3 49 1
493 Paint - Red 3 41 3
496 Paint - Yellow 3 30 4
492 Paint - Black 3 17 5
495 Paint - Blue 4 35 1
496 Paint - Yellow 4 25 2
493 Paint - Red 4 24 3
492 Paint - Black 4 14 4
494 Paint - Silver 4 12 5
(10 row(s) affected)
DENSE_RANK
DENSE_RANK和RANK很像,都是用于排名的,唯一区别就是这个排名是连续的,而上一个是不连续的。
示例:
USE AdventureWorks2022;
GO
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity
,DENSE_RANK() OVER
(PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank
FROM Production.ProductInventory AS i
INNER JOIN Production.Product AS p
ON i.ProductID = p.ProductID
WHERE i.LocationID BETWEEN 3 AND 4
ORDER BY i.LocationID;
GO
结果:
ProductID Name LocationID Quantity Rank
----------- ---------------------------------- ---------- -------- -----
494 Paint - Silver 3 49 1
495 Paint - Blue 3 49 1
493 Paint - Red 3 41 2
496 Paint - Yellow 3 30 3
492 Paint - Black 3 17 4
495 Paint - Blue 4 35 1
496 Paint - Yellow 4 25 2
493 Paint - Red 4 24 3
492 Paint - Black 4 14 4
494 Paint - Silver 4 12 5
(10 row(s) affected)
NTILE
这个函数需要提供一个数字类型的参数,用于将有序数据集划分为指定数量的近似相等的组,并为每一行分配一个组号。需要注意,当总函数不能被提供的数字整除时,前面组会比后面组多1行。另外余行会优先分配给前面的组。
示例:
USE AdventureWorks2022;
GO
SELECT p.FirstName, p.LastName
,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile
,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD
, a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0;
GO
结果:
FirstName LastName Quartile SalesYTD PostalCode
------------- --------------------- --------- -------------- ----------
Linda Mitchell 1 4,251,368.55 98027
Jae Pak 1 4,116,871.23 98055
Michael Blythe 1 3,763,178.18 98027
Jillian Carson 1 3,189,418.37 98027
Ranjit Varkey Chudukatil 2 3,121,616.32 98055
José Saraiva 2 2,604,540.72 98055
Shu Ito 2 2,458,535.62 98055
Tsvi Reiter 2 2,315,185.61 98027
Rachel Valdez 3 1,827,066.71 98055
Tete Mensa-Annan 3 1,576,562.20 98055
David Campbell 3 1,573,012.94 98055
Garrett Vargas 4 1,453,719.47 98027
Lynn Tsoflias 4 1,421,810.92 98055
Pamela Ansman-Wolfe 4 1,352,577.13 98027
(14 row(s) affected)
此示例中,14行不能被4整除(14 ÷ 4 = 3 余 2),所以基础行数为3,前余数个行分配基础行数+1行,剩余组分配基础行数行。之所以这样做,是因为函数的设计目标是尽量均匀分配。
四个排名函数效果一览
USE AdventureWorks2022;
GO
SELECT p.FirstName, p.LastName
,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS "Row Number"
,RANK() OVER (ORDER BY a.PostalCode) AS Rank
,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS "Dense Rank"
,NTILE(4) OVER (ORDER BY a.PostalCode) AS Quartile
,s.SalesYTD
,a.PostalCode
FROM Sales.SalesPerson AS s
INNER JOIN Person.Person AS p
ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address AS a
ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL AND SalesYTD <> 0;
| FirstName | LastName | Row Number | Rank | Dense Rank | Quartile | SalesYTD | PostalCode |
|---|---|---|---|---|---|---|---|
| Michael | Blythe | 1 | 1 | 1 | 1 | 4,557,045.05 | 98027 |
| Linda | Mitchell | 2 | 1 | 1 | 1 | 5,200,475.23 | 98027 |
| Jillian | Carson | 3 | 1 | 1 | 1 | 3,857,163.63 | 98027 |
| Garrett | Vargas | 4 | 1 | 1 | 1 | 1,764,938.99 | 98027 |
| Tsvi | Reiter | 5 | 1 | 1 | 2 | 2,811,012.72 | 98027 |
| Shu | Ito | 6 | 6 | 2 | 2 | 3,018,725.49 | 98055 |
| José | Saraiva | 7 | 6 | 2 | 2 | 3,189,356.25 | 98055 |
| David | Campbell | 8 | 6 | 2 | 3 | 3,587,378.43 | 98055 |
| Tete | Mensa-Annan | 9 | 6 | 2 | 3 | 1,931,620.18 | 98055 |
| Lynn | Tsoflias | 10 | 6 | 2 | 3 | 1,758,385.93 | 98055 |
| Rachel | Valdez | 11 | 6 | 2 | 4 | 2,241,204.04 | 98055 |
| Jae | Pak | 12 | 6 | 2 | 4 | 5,015,682.38 | 98055 |
| Ranjit | Varkey Chudukatil | 13 | 6 | 2 | 4 | 3,827,950.24 | 98055 |
LAG
LAG函数用于获取上一行的值,这在比较数据是否连续时非常有用。
假设我希望分析一张表中的数据,其ModifyDate是否是连续的,则可以使用此函数结合DATEDIFF函数,来实现
WITH GroupedDates AS (
SELECT MODIFYDATE, COUNT(1) AS CountRecords
FROM MyTable
GROUP BY MODIFYDATE
),
DateDifferences AS (
SELECT
MODIFYDATE,
--使用开窗函数,得出前一行数据的日期
LAG(MODIFYDATE) OVER (ORDER BY MODIFYDATE) AS PrevDate,
COUNTRecords
FROM GroupedDates
)
SELECT
MODIFYDATE,
CountRecords,
CASE
--使用DATEDIFF计算天数差是否为1
WHEN DATEDIFF(DAY, PrevDate, MODIFYDATE) = 1 THEN '连续'
ELSE '不连续'
END AS Continuity
FROM DateDifferences
ORDER BY MODIFYDATE DESC;
| 修改日期 | 记录数 | 连续性 |
|------------|-------|-------|
| 2022-04-06 | 261 | 连续 |
| 2022-04-05 | 237 | 连续 |
| 2022-04-04 | 279 | 连续 |
| 2022-04-03 | 262 | 连续 |
| 2022-04-02 | 334 | 连续 |
| 2022-04-01 | 280 | 不连续 |
| 2021-08-06 | 278 | 连续 |
| 2021-08-05 | 276 | 连续 |
| 2021-08-04 | 326 | 连续 |
| 2021-08-03 | 288 | 连续 |
| 2021-08-02 | 323 | 连续 |
| 2021-08-01 | 297 | 不连续 |

浙公网安备 33010602011771号