postgres —— 窗口函数入门
注:测试数据在 postgres —— 分组集与部分聚集 中
聚集将多行转变成较少、聚集的行。而窗口则不同,它把当前行与分组中的所有行对比,并且返回的行数没有变化。
组合当前行与 production 的均值
SELECT country, year, production,comsumption, avg(production) over() from t_oil;
1.分组
AVG(production) 计算的是每年(year
)所有国家的平均石油产量(production
)
用途:对比各国产量与年度平均水平的差异
SELECT country, year, production,comsumption, avg(production) over(partition by year) from t_oil;
AVG(production) 计算的是当前行所属时间段的(2000年前或2000年后)的平均石油产量(production
)
用途:对比各国产量与2000年内(后)的平均水平的差异
SELECT country, year, production,comsumption, avg(production) over(partition by year > 2000) from t_oil;
2.排序
MIN(production)
计算的是 从分组开始到当前行的逐年累计最小值(滚动最小值)。
例如:伊朗1978-1980年的产量为 [100, 90, 110]
,则结果中的min(production)
列会依次为 100, 90, 90
。
特点:每行的min(production)
是该国家 截至当前年份的历史最低产量
SELECT country, year, production, min(production) over (partition by country order by year) from t_oil where year between 1978 and 1983 and country in ('Iran', 'Oman')
注:可以看到,随着行数增加,最小值一直在改变。比如,到第5行为止时,最小值为 1321。
3.滑动窗口
① 移动窗口应与 order by 一起使用。此窗口的范围是当前行,当前行的前一行,当前行的下一行
SELECT country, year, production, min(production) over (PARTITION by country ORDER BY year ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) from t_oil WHERE year BETWEEN 1978 AND 1983 AND country IN ('Iran', 'Oman');
② 窗口的范围与上面一样。为上一行,当前行与下一行。注:array_agg 的作用是将范围内的该字段值转变成一个 postgresql 数组
SELECT *, array_agg(id) OVER (ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM generate_series(1, 5) as id;
③ 此窗口的范围是:当前行与当前行之前的行
SELECT *, array_agg(id) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) FROM generate_series(1, 5) as id;
④ 此窗口的范围是:当前行的前两行,当前行以及当前行之后的行
SELECT *, array_agg(id) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND UNBOUNDED FOLLOWING) FROM generate_series(1, 5) as id;
4.提取窗口子句
组合当前行,按年份排序后 到当前行为止 最小与最大的产量
SELECT country, year, production, min(production) OVER (w), max(production) OVER (w) FROM t_oil WHERE country = 'Canada' AND year BETWEEN 1980 AND 1985 WINDOW w AS (ORDER BY year);
5.使用内建窗口函数
四大排名函数
-- rank SELECT year, production, rank() over (order by production) from t_oil where country = 'Other Middle East' order by rank limit 7; -- dense_rank SELECT year, production, dense_rank() over (order by production) from t_oil where country = 'Other Middle East' order by dense_rank limit 7; -- ntile SELECT year, production, ntile(4) OVER (order by production) from t_oil where country = 'Iraq' and year between 2000 and 2006; -- row_number select country, production, row_number() over (order by production) from t_oil limit 3;
233