SQL开窗函数

之前只使用过Row_number()函数,最近发现除了这个,还另有几个开窗函数,顺道也了解了下。

RANK

顾名思义,就是对行进行排名。不过这个排名可能会不连续,如果存在并列的情况,下一个不同值会跳过相应的排名数字。
示例如下:

USE AdventureWorks2022;  
GO  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
    ,RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
ORDER BY i.LocationID;  
GO

结果如下:

ProductID   Name                   LocationID   Quantity Rank  
----------- ---------------------- ------------ -------- ----  
494         Paint - Silver         3            49       1  
495         Paint - Blue           3            49       1  
493         Paint - Red            3            41       3  
496         Paint - Yellow         3            30       4  
492         Paint - Black          3            17       5  
495         Paint - Blue           4            35       1  
496         Paint - Yellow         4            25       2  
493         Paint - Red            4            24       3  
492         Paint - Black          4            14       4  
494         Paint - Silver         4            12       5  
 (10 row(s) affected)

DENSE_RANK

DENSE_RANKRANK很像,都是用于排名的,唯一区别就是这个排名是连续的,而上一个是不连续的。
示例:

USE AdventureWorks2022;  
GO  
SELECT i.ProductID, p.Name, i.LocationID, i.Quantity  
    ,DENSE_RANK() OVER   
    (PARTITION BY i.LocationID ORDER BY i.Quantity DESC) AS Rank  
FROM Production.ProductInventory AS i   
INNER JOIN Production.Product AS p   
    ON i.ProductID = p.ProductID  
WHERE i.LocationID BETWEEN 3 AND 4  
ORDER BY i.LocationID;  
GO

结果:

ProductID   Name                               LocationID Quantity Rank  
----------- ---------------------------------- ---------- -------- -----  
494         Paint - Silver                     3          49       1  
495         Paint - Blue                       3          49       1  
493         Paint - Red                        3          41       2  
496         Paint - Yellow                     3          30       3  
492         Paint - Black                      3          17       4  
495         Paint - Blue                       4          35       1  
496         Paint - Yellow                     4          25       2  
493         Paint - Red                        4          24       3  
492         Paint - Black                      4          14       4  
494         Paint - Silver                     4          12       5  
  
(10 row(s) affected)

NTILE

这个函数需要提供一个数字类型的参数,用于将有序数据集划分为指定数量的近似相等的组,并为每一行分配一个组号。需要注意,当总函数不能被提供的数字整除时,前面组会比后面组多1行。另外余行会优先分配给前面的组。
示例:

USE AdventureWorks2022;   
GO  
SELECT p.FirstName, p.LastName  
    ,NTILE(4) OVER(ORDER BY SalesYTD DESC) AS Quartile  
    ,CONVERT(NVARCHAR(20),s.SalesYTD,1) AS SalesYTD  
    , a.PostalCode  
FROM Sales.SalesPerson AS s   
INNER JOIN Person.Person AS p   
    ON s.BusinessEntityID = p.BusinessEntityID  
INNER JOIN Person.Address AS a   
    ON a.AddressID = p.BusinessEntityID  
WHERE TerritoryID IS NOT NULL   
    AND SalesYTD <> 0;  
GO

结果:

FirstName      LastName              Quartile  SalesYTD       PostalCode  
-------------  --------------------- --------- -------------- ----------  
Linda          Mitchell              1         4,251,368.55   98027  
Jae            Pak                   1         4,116,871.23   98055  
Michael        Blythe                1         3,763,178.18   98027  
Jillian        Carson                1         3,189,418.37   98027  
Ranjit         Varkey Chudukatil     2         3,121,616.32   98055  
José           Saraiva               2         2,604,540.72   98055  
Shu            Ito                   2         2,458,535.62   98055  
Tsvi           Reiter                2         2,315,185.61   98027  
Rachel         Valdez                3         1,827,066.71   98055  
Tete           Mensa-Annan           3         1,576,562.20   98055  
David          Campbell              3         1,573,012.94   98055  
Garrett        Vargas                4         1,453,719.47   98027  
Lynn           Tsoflias              4         1,421,810.92   98055  
Pamela         Ansman-Wolfe          4         1,352,577.13   98027  

(14 row(s) affected)

此示例中,14行不能被4整除(14 ÷ 4 = 3 余 2),所以基础行数为3,前余数个行分配基础行数+1行,剩余组分配基础行数行。之所以这样做,是因为函数的设计目标是尽量均匀分配

四个排名函数效果一览

USE AdventureWorks2022;  
GO  
SELECT p.FirstName, p.LastName  
    ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS "Row Number"  
    ,RANK() OVER (ORDER BY a.PostalCode) AS Rank  
    ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS "Dense Rank"  
    ,NTILE(4) OVER (ORDER BY a.PostalCode) AS Quartile  
    ,s.SalesYTD  
    ,a.PostalCode  
FROM Sales.SalesPerson AS s   
    INNER JOIN Person.Person AS p   
        ON s.BusinessEntityID = p.BusinessEntityID  
    INNER JOIN Person.Address AS a   
        ON a.AddressID = p.BusinessEntityID  
WHERE TerritoryID IS NOT NULL AND SalesYTD <> 0;
FirstName LastName Row Number Rank Dense Rank Quartile SalesYTD PostalCode
Michael Blythe 1 1 1 1 4,557,045.05 98027
Linda Mitchell 2 1 1 1 5,200,475.23 98027
Jillian Carson 3 1 1 1 3,857,163.63 98027
Garrett Vargas 4 1 1 1 1,764,938.99 98027
Tsvi Reiter 5 1 1 2 2,811,012.72 98027
Shu Ito 6 6 2 2 3,018,725.49 98055
José Saraiva 7 6 2 2 3,189,356.25 98055
David Campbell 8 6 2 3 3,587,378.43 98055
Tete Mensa-Annan 9 6 2 3 1,931,620.18 98055
Lynn Tsoflias 10 6 2 3 1,758,385.93 98055
Rachel Valdez 11 6 2 4 2,241,204.04 98055
Jae Pak 12 6 2 4 5,015,682.38 98055
Ranjit Varkey Chudukatil 13 6 2 4 3,827,950.24 98055

LAG

LAG函数用于获取上一行的值,这在比较数据是否连续时非常有用。
假设我希望分析一张表中的数据,其ModifyDate是否是连续的,则可以使用此函数结合DATEDIFF函数,来实现

WITH GroupedDates AS (
    SELECT MODIFYDATE, COUNT(1) AS CountRecords
    FROM MyTable
    GROUP BY MODIFYDATE
),
DateDifferences AS (
    SELECT
        MODIFYDATE,
		--使用开窗函数,得出前一行数据的日期
        LAG(MODIFYDATE) OVER (ORDER BY MODIFYDATE) AS PrevDate,
        COUNTRecords
    FROM GroupedDates
)
SELECT
    MODIFYDATE,
    CountRecords,
    CASE
	--使用DATEDIFF计算天数差是否为1
        WHEN DATEDIFF(DAY, PrevDate, MODIFYDATE) = 1 THEN '连续'
        ELSE '不连续'
    END AS Continuity
FROM DateDifferences
ORDER BY MODIFYDATE DESC;
| 修改日期    | 记录数 | 连续性  |
|------------|-------|-------|
| 2022-04-06 | 261   | 连续   |
| 2022-04-05 | 237   | 连续   |
| 2022-04-04 | 279   | 连续   |
| 2022-04-03 | 262   | 连续   |
| 2022-04-02 | 334   | 连续   |
| 2022-04-01 | 280   | 不连续 |
| 2021-08-06 | 278   | 连续   |
| 2021-08-05 | 276   | 连续   |
| 2021-08-04 | 326   | 连续   |
| 2021-08-03 | 288   | 连续   |
| 2021-08-02 | 323   | 连续   |
| 2021-08-01 | 297   | 不连续 |
posted @ 2025-05-26 16:04  南山有榛  阅读(17)  评论(0)    收藏  举报