【PostgreSQL 17】11 窗口函数

和聚合函数的区别

窗口函数不是将一组数据汇总为单个结果,而是针对每一行数据,基于和它相关的一组数据计算出一个结果。
image

直接这么写会报错,缺少GROUP BY

SELECT
	employee_id,
	first_name,
	last_name,
	salary,
	AVG(salary)
FROM employees
;

image

添加OVER()

空括号表示将所有数据作为整体进行分析,所以得到的数值和聚合函数一样。

SELECT
	employee_id,
	first_name,
	last_name,
	salary,
	AVG(salary) OVER()
FROM employees
;

image

窗口函数的定义

window_function (expression, ...) OVER (
    PARTITION BY ...
    ORDER BY ...
    frame_clause
)

分区选项 PARTITION BY

用于定义分区,作用类似于 GROUP BY

image
如下代码含义:基于不同部门,分别计算平均值。

SELECT
	employee_id,
	first_name,
	last_name,
	salary,
	department_id,
	AVG(salary) OVER(PARTITION BY department_id)
FROM
	employees
;

排序选项 ORDER BY

指定分区内的排序方式
如下代码含义:在部门内部按salary降序排名

SELECT
	employee_id,
	first_name,
	last_name,
	salary,
	department_id,
	RANK() OVER(PARTITION BY department_id ORDER BY salary DESC)
FROM
	employees
;

image

窗口大小

image

求累计和:从对应分区的第一行一直累计到当前行

SELECT
	description, amount, is_income,
	SUM(amount) OVER(PARTITION BY is_income ORDER BY expense_time
	ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM expenses
;

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 是默认值,不加也可以有同样的效果。
image

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 处理前面1行~后面1行
例如处理到这边第4行时,则计算3~5行
image

窗口选项 frame_clause

在当前分区内指定一个计算窗口。
指定了之后,分析函数不再基于分区计算,而是基于窗口内的数据进行计算。

例题:1321. 餐馆营业额变化增长
计算以 7 天(某日期 + 该日期前的 6 天)为一个时间段的顾客消费平均值。

-- 外层查询:获取最终结果,包含日期、7天总营业额和平均营业额
SELECT
    DISTINCT visited_on,  -- 去重后的访问日期
    amount,               -- 子查询计算的7天累计营业额
    -- 计算7天平均营业额并保留2位小数
    ROUND(amount::NUMERIC / 7, 2) AS average_amount
FROM (
    -- 子查询:计算每个日期往前7天(含当天)的累计营业额
    SELECT
        visited_on,
        -- 使用窗口函数计算滚动总和
        SUM(amount) OVER (
            ORDER BY visited_on  -- 按日期排序
            -- 定义窗口范围:当前行及之前6天(共7天)
            RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
        ) AS amount
    FROM Customer  -- 从顾客消费表获取数据
) t  -- 子查询别名
-- 过滤条件:只保留有完整7天数据的日期
-- 即当前日期减去6天后的日期必须存在于表中
WHERE visited_on - INTERVAL '6 days' IN (SELECT visited_on FROM Customer)
ORDER BY visited_on;  -- 按日期升序排列结果

聚合窗口函数

SELECT
	*,
	AVG(amount) OVER(),
	SUM(amount) OVER(),
	COUNT(*) OVER(),
	MIN(amount) OVER()
FROM
	sales_monthly
;

image

SELECT
	*,
	AVG(amount) OVER(PARTITION BY product),
	SUM(amount) OVER(PARTITION BY product),
	COUNT(*) OVER(PARTITION BY product),
	MIN(amount) OVER(PARTITION BY product)
FROM
	sales_monthly
;

image

SELECT
	*,
	AVG(amount) OVER(PARTITION BY product ORDER BY ym),
	SUM(amount) OVER(PARTITION BY product ORDER BY ym),
	COUNT(*) OVER(PARTITION BY product ORDER BY ym),
	MIN(amount) OVER(PARTITION BY product ORDER BY ym)
FROM
	sales_monthly
;

加了ORDER会有默认值 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
image

移动平均值

SELECT
	*,
	AVG(amount) OVER(
		PARTITION BY product 
		ORDER BY ym 
		ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
	)
FROM
	sales_monthly
;

例:厂房温度检测器获取每秒钟的温度。计算前5分钟内的平均温度。

AVG(temperature) OVER(ORDER BY ts RANGE BETWEEN interval '5 minute' PRECEDING AND CURRENT ROW)

排名窗口函数

用于对数据进行分组排名。

ROW_NUMBER 为分区中的每行数据分配一个序列号,从1开始分配
RANK 计算分区内数据名次。名次相同的话后续排名会产生跳跃
DENSE_RANK 计算分区内数据名次。名次相同的话后续排名不会产生跳跃
PERCENT_RANK 以百分比形式显示名称,名次相同的话后续排名会产生跳跃

SELECT
	description, amount, 
	ROW_NUMBER() OVER(PARTITION BY is_income ORDER BY amount DESC),
	RANK() OVER(PARTITION BY is_income ORDER BY amount DESC),
	DENSE_RANK() OVER(PARTITION BY is_income ORDER BY amount DESC),
	PERCENT_RANK() OVER(PARTITION BY is_income ORDER BY amount DESC)
FROM expenses
;

简洁写法

SELECT
	description, amount, 
	ROW_NUMBER() OVER w,
	RANK() OVER w,
	DENSE_RANK() OVER w,
	PERCENT_RANK() OVER w
FROM expenses
WINDOW w AS (PARTITION BY is_income ORDER BY amount DESC)
;

image

CUME_DIST: 计算每行数据在其分区内的累积分布,即该行数据及其之前的数据比率;取值范围大于0小于等于1。
NTILE:将分区内数据分为N等分,为每行数据计算其所在的位置。

SELECT
	description, amount, 
	CUME_DIST() OVER w,
	NTILE(5) OVER w
FROM expenses
WINDOW w AS (PARTITION BY is_income ORDER BY amount DESC)
;

image

取值窗口函数

FIRST_VALUE 返回窗口内第一行数据
LAST_VALUE 返回窗口内最后一行数据
NTH_VALUE 返回窗口内第 N 行数据

SELECT
	description, amount, expense_time,
	FIRST_VALUE(amount) OVER(PARTITION BY is_income ORDER BY amount DESC),
	LAST_VALUE(amount) OVER w,
	NTH_VALUE(amount, 2) OVER w
FROM expenses
WINDOW w AS (PARTITION BY is_income ORDER BY amount DESC
			ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
;

image

LAG 返回分区中,当前行之前的第 N 行数据
LEAD 返回分区中,当前行之后第 N 行数据

这俩不支持动态的窗口大小(frame_clause),而是以当前分区作为分析的窗口。

SELECT
	description, amount, expense_time,
	LAG(amount, 1) OVER w,
	LEAD(amount, 1) OVER w
FROM expenses
WINDOW w AS (PARTITION BY is_income ORDER BY amount)
;

image

参考资料

[1] 不剪发的Tony老师【PostgreSQL开发指南】第33节

posted @ 2025-09-22 15:45  苦涩如影相随固  阅读(17)  评论(0)    收藏  举报