over() (分析函数)


聚合函数(如 sum()、max() 等)可以计算基于组的某种聚合值,但是聚合函数对于某个组只能返回一行记录。若想对于某组返回多行记录,则需要使用分析函数

演示数据

create table rownumber(
       id varchar(10) not null,
       name varchar(10) null,
       age varchar(10) null,
       salary int null,
       team varchar(10) null
);

insert into rownumber(id,name,age,salary,team) values(1,'a',10,8000, 'a');
insert into rownumber(id,name,age,salary,team) values(2,'a2',11,7500, 'a');
insert into rownumber(id,name,age,salary,team) values(3,'b',12,7500, 'b');
insert into rownumber(id,name,age,salary,team) values(4,'b2',13,4500, 'b');
insert into rownumber(id,name,age,salary,team) values(5,'c',14,8000, 'c');
insert into rownumber(id,name,age,salary,team) values(6,'c2',15,20000, 'c');
insert into rownumber(id,name,age,salary,team) values(7,'d',16,30000, 'd');
insert into rownumber(id,name,age,salary,team) values(8,'d2',17,8000, 'd');

select * from rownumber;

![[Pasted image 20241202143656.png]]

rank()/dense_rank()

rank()/dense_rank over(partition by ... order by ...)

说明:

  • over() 在什么条件之上;
  • partition by 按哪个字段划分组(如果要分组必须,有此关键字 partition);
  • order by 按哪个字段排序;
    注意:
  • 使用 rank()/dense_rank() 时,必须要带 order by 否则非法
  • rank()/dense_rank() 分级的区别:

rank(): 跳跃排序,如果有两个第一级时,接下来就是第三级。
dense_rank():连续排序,如果有两个第一级时,接下来仍然是第二级。

示例

根据 salary 排序,跳过同级

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	rank() over(order by salary desc) as "rank" 
from rownumber

![[Pasted image 20241202142224.png]]

根据 salary 排序,不跳过同级

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	dense_rank() over(order by salary desc) as "rank" 
from rownumber

![[Pasted image 20241202142235.png]]

min()/max()

min()/max() over(partition by ... order by ...)

说明:

  • over() 在什么条件之上;
  • partition by 按哪个字段划分组(如果要分组必须,有此关键字 partition);
  • order by 按哪个字段排序;

示例

取每个 team 中 salary 最大的

select * from 
	(select id, name, age, salary, team, max(salary) over(partition by team order by salary desc) as "rank" from rownumber) res 
where res.rank = res.salary;

![[Pasted image 20241202142210.png]]

取每个 team 中 salary 最小的

select * from 
	(
		select 
			id, 
			name, 
			age, 
			salary, 
			team, 
			min(salary) over(partition by team order by salary asc) as "rank" 
		from rownumber
	) res 
where res.rank = res.salary;

![[Pasted image 20241202142036.png]]

lead()/lag()

lead()/lag() over(partition by ... order by ...) 

前面/后面n行记录说明:

  • lead(列名,n,m): 当前记录后面第 n 行记录的列名的值,没有则默认值为 m;如果不带参数 n,m,则查找当前记录后面第一行的记录列名的值,没有则默认值为 null。
  • lag(列名,n,m): 当前记录前面第n行记录的列名的值,没有则默认值为m;如果不带参数 n,m,则查找当前记录前面第一行的记录列名的值,没有则默认值为 null

示例

查询 salary 与比自己高一位、低一位的 salary 的差额

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	lead(salary, 1, 0) over(partition by team order by salary asc) as lead_sal, --记录后面第n行记录
	lag(salary, 1, 0) over(partition by team order by salary asc) as lag_sal --记录前面第N行记录
from rownumber;

![[Pasted image 20241202142718.png]]

first_value()/last_value()

取首尾记录

first_value()/last_value() over(partition by ... order by ...) 

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	first_value(salary) over(partition by team) as first_sal,
	last_value(salary) over(partition by team) as last_sal
from rownumber;

![[Pasted image 20241202143155.png]]

row_number()

row_number() over(partition by ... order by ...) 

排序(应用:分页)

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	row_number() over(partition by team order by salary) as row_num
from rownumber;

![[Pasted image 20241202143434.png]]

sum/avg/count()

sum/avg/count() over(partition by ..)

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	sum(salary) over(partition by team) as sum_sal,			--统计某组中的总计值
	avg(salary) over(partition by team) as avg_sal,			--统计某组中的平均值
	count(salary) over(partition by team) as count_sal	--按某列分组,并统计该组中记录数量
from rownumber;

![[Pasted image 20241202143638.png]]

rows/range

rows/range  between … preceding and … following

上下范围内求值说明:

  • unbounded:不受控制的,无限的
  • preceding:在…之前
  • following:在…之后

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	max(salary) over(partition by team order by salary rows 
	--unbounded preceding and unbounded following针对当前所有记录的前一条、后一条记录,也就是表中的所有记录 
	--unbounded:不受控制的,无限的 
	--preceding:在...之前 
	--following:在...之后
	between unbounded preceding and unbounded following) as max_sal
from rownumber;

![[Pasted image 20241202144318.png]]

nulls first/last

将空值字段记录放到最前或最后显示
说明:通过 rank()、dense_rank()、row_number() 对记录进行全排列、分组排列取值,但有时候,会遇到空值的情况,空值会影响得到的结果的正确性
nulls first/last 可以帮助我们在处理含有空值的排序排列中,将空值字段记录放到最前最后显示,帮助我们得到期望的结果

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	rank() over(partition by team order by salary desc nulls last) as "rank" 
from rownumber

ntile(n)

有时会有这样的需求:如果数据排序后分为三部分,业务人员只关心其中的一部分,如何将这中间的三分之一数据拿出来呢?
这时比较好的选择,就是使用 ntile 函数

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	max(salary) over(partition by team order by salary rows between unbounded preceding and unbounded following) as max_sal
	ntile(3) over(order by salary desc nulls last) all_cmp, --若只取前三分之一,all_cmp=1即可,若只取中间三分之一,all_cmp=2即可
  ntile(3) over(partition by team order by salary desc nulls last) all_team  --每个team的分成三部分
from rownumber;

keep(dense_rank first/last)

keep字面意思就是保持,也就是说保存满足 keep()括号内条件的记录,这里可以想象到,会有多条记录的情况,即存在多个 last 或 first 的情况

  • dense_rank 是排序策略
  • first/last 是筛选策略

示例

select 
	id, 
	name, 
	age, 
	salary, 
	team, 
	min(salary) keep(dense_rank first order by salary) min_sal
from rownumber;
posted @ 2025-01-21 08:55  Thousand_Mesh  阅读(29)  评论(0)    收藏  举报