SQL 笔记

SQL——窗口函数专题

概述

作用：在不聚合数据行的前提下，对每行动态计算分组排名、累计值、相邻行差值等。

典型场景：

各部门工资排名前 N 名的员工列表
各部门每人工资占部门总工资的百分比

基本语法：

<窗口函数> over (
    partition by <分组列>
    order by <排序列>
)

执行顺序（逻辑上）：

FROM → JOIN → WHERE → GROUP BY → HAVING
    → 【窗口函数】
        → ORDER BY → LIMIT → SELECT DISTINCT

窗口函数执行时 group by 的聚合已完成，因此不会再产生数据聚合，每行数据仍独立保留。

一、专用窗口函数

1.1 序号函数

函数	并列处理	编号示例（分数 90,90,80,70）
`row_number()`	不考虑并列，顺序编号	1, 2, 3, 4
`rank()`	并列后跳号	1, 1, 3, 4
`dense_rank()`	并列后不跳号	1, 1, 2, 3

示例：找到每类试卷得分前三名（依次按最大分数、最小分数、uid 降序）

select tag, uid, ranking
from (
    select tag, uid,
        rank() over (
            partition by tag
            order by max(score) DESC, min(score) DESC, uid DESC
        ) as ranking
    from exam_record
    group by tag, uid
) as rank_info
where ranking <= 3;

1.2 分布函数

1.2.1 percent_rank()

公式：(rank - 1) / (rows - 1)
其中：rank 为 RANK() 产生的序号，rows 为当前窗口的总行数
结果范围：[0, 1]

1.2.2 cume_dist()

公式：分组内 ≥ 当前 rank 值的行数 / 分组内总行数
结果范围：(0, 1]
典型场景：班级中成绩不低于当前同学的学生比例

cume_dist() 比 percent_rank() 使用场景更广。

示例：统计 SQL 试卷上未完成率较高的前 50% 用户

select uid
from (
    select uid,
        percent_rank() over (
            order by incomp_rate DESC, uid
        ) as incomp_rate_rank
    from (
        select uid,
               1 - count(submit_time) / count(1) as incomp_rate  -- 每人未完成率
        from exam_record
        where exam_id in (
            select exam_id from examination_info where tag = 'SQL'
        )
        group by uid
    ) as t_exam_incom_rate
) as t_exam_incom_rate_rank
where incomp_rate_rank <= 0.5;

1.3 前后函数

典型场景：计算每个用户相邻两次浏览/作答的时间差。

函数	说明
`lead(列, n)`	取当前行后 n 行的值（向后看）
`lag(列, n)`	取当前行前 n 行的值（向前看）

示例：计算每位用户连续两次作答的最大时间窗

select uid,
    count(start_time)                               as exam_cnt,     -- 总作答试卷数
    datediff(max(start_time), min(start_time)) + 1  as diff_days,    -- 最早到最晚作答相差天数
    max(datediff(next_start_time, start_time)) + 1  as days_window   -- 相邻两次作答的最大时间窗
from (
    select uid, exam_id, start_time,
        lead(start_time) over (
            partition by uid order by start_time
        ) as next_start_time   -- 将下一次作答时间拼到当前行
    from exam_record
    where year(start_time) = 2021
) as t_exam_record_lead
group by uid;

1.4 头尾函数

函数	说明
`first_value(列)`	返回分区中第一行的指定列值
`last_value(列)`	返回分区中最后一行的指定列值

1.5 其他函数

1.5.1 nth_value(列, n)

返回窗口中第 n 行的指定列值，列 可以是表达式或列名。

示例：查询每套试卷作答用时第二快和第二慢的时间

select distinct exam_id, duration, release_time,
    nth_value(time_took, 2) over (
        partition by exam_id order by time_took DESC
    ) as max2_time_took,
    nth_value(time_took, 2) over (
        partition by exam_id order by time_took ASC
    ) as min2_time_took
from t_exam_record_timetook;

1.5.2 ntile(n)

将分区内有序数据均匀分成 n 个桶，返回桶号（1 到 n）。

记录数不能被 n 整除时，多余的行依次分配给第 1、2、3... 桶。
典型场景：将大量数据平均分配给 N 个并行进程分别处理。

二、聚合窗口函数

在窗口内对每行动态应用聚合函数，不折叠行数，可计算累计值、滚动均值等。

典型场景：截至当前课程，每个学生的累计总分 / 平均分 / 最高分 / 最低分 / 课程数。

示例：用 window 别名复用相同窗口定义，避免重复书写

select s.sname, c.cname, sc.score,
    sum(sc.score)   over w as sum_val,
    avg(sc.score)   over w as avg_val,
    max(sc.score)   over w as max_val,
    min(sc.score)   over w as min_val,
    count(sc.score) over w as count_val
from student s
inner join score  sc on s.sid = sc.sid
inner join course c  on sc.cid = c.cid
window w as (partition by s.sname order by sc.score DESC);

⚠️ 聚合窗口函数括号内不可为空，必须填写要聚合的列名。

三、函数速查表

类别	函数	核心用途
序号	`row_number()`	不并列连续编号
序号	`rank()`	并列跳号
序号	`dense_rank()`	并列不跳号
分布	`percent_rank()`	百分比排名，范围 [0,1]
分布	`cume_dist()`	累积分布，范围 (0,1]
前后	`lead(列, n)`	取后 n 行的值
前后	`lag(列, n)`	取前 n 行的值
头尾	`first_value(列)`	分区第一行值
头尾	`last_value(列)`	分区最后一行值
其他	`nth_value(列, n)`	分区第 n 行值
其他	`ntile(n)`	均匀分桶，返回桶号
聚合	`sum/avg/max/min/count`	动态累计聚合，不折叠行

SQL 学习笔记（一）

1. 截取、拼接、转换及长度判断

1.1 截取字符串

1.1.1 从左开始截取

left(文本字段, 截取长度)

1.1.2 从右开始截取

right(文本字段, 截取长度)

1.1.3 截取字符串

substring(文本字段, 起始位置, 截取字符数)
-- 或
mid(文本字段, 起始位置, 截取字符数)

注：若起始位置为负数，则从字符串末尾倒数计位。

1.1.4 按关键字截取

substring_index(文本字段, 关键字, 关键字出现的次数)

示例：截取第 2 个句号之前的内容

select substring_index('blog.jb51.net', '.', 2) as abstract
from my_content_t;
-- 结果：blog.jb51

1.2 拼接

1.2.1 普通拼接

concat(字段1, 字段2, ...)

1.2.2 带分隔符拼接

concat_ws(分隔符, 字段1, 字段2, ...)

第一个参数为分隔符，放在后续字段之间；

分隔符为 NULL 则结果为 NULL；

会忽略 NULL 值，但不会忽略空字符串。

1.2.3 分组拼接

group_concat([distinct] 字段 [order by ASC/DESC 排序字段] [SEPARATOR '分隔符'])

将 group by 分组后的值连接成一个字符串，默认以逗号分隔。

示例：统计有效用户的未完成/完成试卷数及作答 tag 集合

select
    uid,
    sum(incomplete)                                                    as incomplete_cnt,
    sum(complete)                                                      as complete_cnt,
    group_concat(distinct concat_ws(':', day_start, tag) SEPARATOR ';') as detail
from record
group by uid
having complete_cnt >= 1
   and incomplete_cnt between 2 and 4
order by incomplete_cnt DESC;

注：许多数据库会用空格填充列宽，可用 trim() 去除首尾空格。

1.3 文本处理

1.3.1 大小写转换

upper(字段)  -- 或 ucase(字段)   转大写
lower(字段)  -- 或 lcase(字段)   转小写

1.3.2 字段类型转换

cast(字段 as 数据类型)

数据类型	说明
`BINARY`	二进制，同带 `BINARY` 前缀的效果
`CHAR`	字符型，可带参数
`date`	日期
`TIME`	时间
`DATETIME`	日期时间型
`DECIMAL`	浮点数
`SIGNED`	整数
`UNSIGNED`	无符号整数

注：不能使用用户自定义的数据类型。

1.4 长度判断

1.4.1 按字符计算

char_length(字段)

汉字、数字、字母均算 1 个字符。

1.4.2 按字节计算

length(字段)

编码	汉字	字母/数字
UTF-8	3 字节	1 字节
GBK	2 字节	1 字节

2. 联接与合并

核心区别：联接是对列操作（增加匹配维度）；合并是对行操作（增加数据行）。

2.1 联接

2.1.1 where 连接（隐式）

select *
from a1, a2
where a1.factor = a2.factor;

2.1.2 join 连接（显式）

类型	语法	说明
内联接	`inner join`	两表均有匹配才返回
左联接	`left join`	返回左表全部行，右表无匹配则 NULL
右联接	`right join`	返回右表全部行，左表无匹配则 NULL
全联接	`full join`	任一表有匹配即返回

2.2 合并

union      -- 合并两表行数据，自动去重
union all  -- 合并两表行数据，保留重复行

3. 模糊匹配

3.1 like / not like

-- 字符串值需加引号
where City like '[!ALN]%'  -- 不以 A、L、N 开头的城市

通配符	含义
`%`	0 个或多个字符
`_`	仅替代 1 个字符
`[charlist]`	字符列中的任意单一字符
`[^charlist]` 或 `[!charlist]`	不在字符列中的任意单一字符

⚠️ 通配符位于开头时匹配极慢，避免滥用。

3.2 正则匹配 regexp / rlike

where 字段 REGEXP '正则表达式'
-- RLIKE 与 REGEXP 为同义词

符号	含义
`^`	匹配字符串开头
`	符号
--------	----------------------------
`^`	匹配字符串开头

  | 匹配字符串结尾               |

| . | 匹配除 \n 外的任意单个字符 |
| [...] | 匹配括号内任意字符 |
| [^...] | 匹配不在括号内的字符 |
| p1\|p2 | 匹配 p1 或 p2 |
| * | 前面元素出现 0 次或多次 |
| + | 前面元素出现 1 次或多次 |
| {n} | 精确匹配 n 次 |
| {n,m} | 匹配 n 到 m 次 |

3.3 匹配多个特定值

3.3.1 in 子查询

select *
from Orders
where order_num in (
    select order_num
    from OrderItems
    where prod_id = 'BR01'
);

3.3.2 exists / not exists

select *
from student
where exists (
    select sno
    from student_course_relation
    where cno = 1
      and student_course_relation.sno = student.sno
);

exists 只关心子查询是否返回行（返回布尔值），与具体结果集内容无关。

4. 排序与汇总

4.1 排序

order by 字段 ASC/DESC

注：存在拼接数据时，需先拼接再排序。

4.2 with rollup 汇总

group by 字段 with rollup

在 group by 结果末尾追加汇总行，用 null 标记；
使用前应将 group by 字段中的 null 替换为不可能出现的常量，避免混淆；
with rollup 后不可直接使用 order by，如需排序需在外层再套一层查询。

5. 时间相关

5.1 提取特定时间部分

5.1.1 日期函数

date(时间字段)    -- 返回日期部分
month(时间字段)   -- 返回月份
year(时间字段)    -- 返回年份

示例：筛选 2021 年 8 月

where year(date) = 2021 and month(date) = 8

5.1.2 extract

extract(参数 from 时间字段)

常用参数	说明
`second`	秒
`minute`	分
`hour`	时
`day`	日
`week`	周
`month`	月
`quarter`	季度
`year`	年
`YEAR_MONTH`	年和月
`DAY_HOUR`	日和小时
`HOUR_MINUTE`	时和分
`MINUTE_SECOND`	分和秒

5.2 时间间隔计算

5.2.1 增加时间间隔

date_add(时间字段, INTERVAL 数值 单位)

5.2.2 减去时间间隔

date_sub(时间字段, INTERVAL 数值 单位)

常用单位同 date_add：second、minute、hour、day、week、month、quarter、year 等。

5.2.3 计算时间差

datediff(结束时间, 开始时间)                      -- 返回天数差（只计算日期部分）
timestampdiff(单位, 开始时间, 结束时间)            -- 按指定单位返回差值

单位	说明
`second`	秒
`minute`	分钟
`hour`	小时
`day`	天
`month`	月
`year`	年

5.3 时间格式转化

date_format(时间字段, 格式)

格式符	含义
`%Y`	年（4 位）
`%m`	月（01-12）
`%d`	日（01-31）
`%H`	小时（00-23）
`%i`	分钟（00-59）
`%s`	秒（00-59）

示例：筛选 2021 年 8 月

where date_format(date, '%Y-%m') = '2021-08'

5.4 返回当月第几天

dayofmonth(时间字段)

5.5 返回当前时间

函数	说明
`now()`	当前日期和时间
`curdate()`	当前日期
`curtime()`	当前时间

6. 多分支函数 case

6.1 简单 case（等值匹配）

case 测试表达式
    when 值1 then 结果1
    when 值2 then 结果2
    ...
    [else 默认结果]
end

示例：系号转系名

select 班号, 班名,
    case 系号
        when 1 then '软件工程系'
        when 2 then '计算机系'
        when 3 then '物联网系'
    end as 系号,
    班主任号
from 班级表;

6.2 搜索 case（范围/条件匹配）

case
    when 布尔表达式1 then 结果1
    when 布尔表达式2 then 结果2
    ...
    [else 默认结果]
end

示例：按年龄段统计用户数

select
    case
        when age < 25 or age is null then '25岁以下'
        when age >= 25               then '25岁及以上'
    end as age_cut,
    count(*) as number
from user_profile
group by age_cut;

7. 空值处理

7.1 ifnull

ifnull(表达式1, 表达式2)

若表达式1 不为 NULL，返回表达式1；否则返回表达式2。
根据上下文返回字符串或数字。

7.2 if

if(条件, 结果1, 结果2)

条件为真返回结果1，条件为假返回结果2。
适合基于 true/false 而非 NULL 判断的场景。

7.3 coalesce

coalesce(字段1, 字段2, ...)

依次检查各参数，返回第一个非 NULL 值；
所有参数均为 NULL 时返回 NULL。

8. 子查询与公用表表达式

8.1 with as（公用表表达式 CTE）

with 表名 as (
    select ...
)
select ...
from 表名;

定义可复用的 SQL 片段，提升可读性，减少嵌套；
可定义多个 CTE，用逗号分隔。

示例：

with t_tag_count as (
    select tag, count(uid) as answer_cnt
    from exam_record
    left join examination_info using(exam_id)
    group by tag
)
select *
from t_tag_count
where answer_cnt > 10;

9. 指定返回行数

9.1 limit（MySQL）

limit [偏移量,] 行数

偏移量从 0 开始（默认 0）；
limit n 等价于 limit 0, n；
行数为 -1 表示到最后一行。

示例：每页 10 条，取第 6 页数据

select *
from articles
where category_id = 123
order by id
limit 50, 10;   -- 跳过前 50 条，取第 51-60 条

9.2 TOP（SQL Server）

select TOP n 列名 from 表名
select TOP n PERCENT 列名 from 表名

注：TOP 并非所有数据库支持，MySQL 使用 limit。

10. 其他

10.1 and 与 or 优先级

默认 and 优先级高于 or；
使用括号 () 显式控制优先级，避免歧义。

-- 错误理解：可能优先执行了 and
where a = 1 or b = 2 and c = 3

-- 明确意图：用括号
where (a = 1 or b = 2) and c = 3

SQL 学习笔记（二）

一、增删改操作

1. 插入

1.1 全字段插入

insert into 表名 values (值1, 值2, ...);

1.2 部分字段插入

insert into 表名 (字段1, 字段2, ...) values (值1, 值2, ...);

1.3 一次性插入多条记录

insert into 表名 (字段1, 字段2, ...)
values (值1-1, 值1-2, ...),
       (值2-1, 值2-2, ...);

1.4 从其他表查询后插入

方式一：insert into ... select

insert into 表名1 (字段1, 字段2, ...)
select 字段1, 字段2, ...
from 表名2
where 条件;

方式二：select into（复制到新表）

select 字段1, 字段2, ...
into 新表名 [in 外部数据库]
from 旧表名;

示例：将 2021 年之前已完成的试卷作答记录导入备份表

insert into exam_record_before_2021
    (uid, exam_id, start_time, submit_time, score)
select uid, exam_id, start_time, submit_time, score
from exam_record
where year(submit_time) < 2021;

1.5 更新性插入（replace into）

replace into 表名 (字段1, 字段2, ...) values (值1, 值2, ...);

原理：检测到主键或唯一索引键冲突时，先删除原记录再重新插入。

示例：不管 ID 是否存在，都强制插入一条试卷记录

replace into examination_info
    (exam_id, tag, difficulty, duration, release_time)
values (9003, 'SQL', 'hard', '90', '2021-01-01 00:00:00');

2. 更新

2.1 设置为新值

update 表名
set 字段1 = 值1, 字段2 = 值2, ...
where 条件;

示例：将 2021 年 9 月 1 日前开始、未完成的记录设为被动完成

update exam_record
set submit_time = '2099-01-01 00:00:00', score = 0
where start_time < '2021-09-01 00:00:00'
  and submit_time is null;

2.2 替换已有值

update 表名
set 字段 = replace(字段, '查找内容', '替换内容')
where 条件;

示例：将 tag 为 PYTHON 全部改为 Python

update examination_info
set tag = replace(tag, 'PYTHON', 'Python')
where tag = 'PYTHON';

3. 删除

3.1 条件删除

delete from 表名
where 条件
order by 排序字段
limit n;

示例：删除未完成或作答时长不足 5 分钟的记录中最早的 3 条

delete from exam_record
where timestampdiff(minute, start_time, submit_time) < 5
   or submit_time is null
order by start_time
limit 3;

3.2 清空全表

truncate 表名;

与 delete 的区别：truncate 不写日志、不可回滚、速度更快；delete 可加 where 条件且可回滚。

二、表与索引操作

1. 表的创建、修改与删除

1.1 创建表

方式一：直接创建

create table [if not exists] 表名 (
    列名1  数据类型  [约束],
    列名2  数据类型  [约束],
    ...
) [character set 字符集] [COLLATE 校对规则];

常用约束：

约束	说明
`primary key`	主键，唯一且非空
`foreign key`	外键，引用其他表的主键
`auto_increment`	自增 ID
`not null`	该列不允许为空
`unique`	该列值唯一，不允许重复
`default 值`	指定默认值
`check (条件)`	限制列值范围
`COMMENT '注释'`	列注释

方式二：复制表结构

create table 新表名 like 旧表名;

方式三：查询结果建表

create table 新表名 as
select *
from 旧表名
where 条件;

1.2 修改表

alter table 表名 修改选项;

修改选项	说明
`ADD COLUMN 列名类型`	增加列
`CHANGE COLUMN 旧列名新列名新类型`	修改列名或类型
`MODIFY COLUMN 列名新类型`	仅修改列类型
`alter COLUMN 列名 set default 值`	修改列默认值
`alter COLUMN 列名 drop default`	删除列默认值
`drop COLUMN 列名`	删除列
`RENAME TO 新表名`	修改表名
`character set 字符集`	修改字符集
`COLLATE 校对规则`	修改校对规则

1.3 删除表

drop table [if exists] 表名1 [, 表名2, ...];

1.4 约束说明

约束	说明
`not null`	强制列不接受 NULL，插入/更新时必须有值
`unique`	列值唯一，每表可有多个
`primary key`	唯一标识每条记录，自带 UNIQUE + NOT NULL，每表只能有一个
`foreign key`	指向另一表的 PRIMARY KEY，防止非法数据插入
`check`	限制列值范围（单列或多列）
`default`	插入时若未指定值则使用默认值

2. 索引的创建与删除

2.1 创建索引

方式一：create index

create [unique | FULLTEXT] index 索引名
on 表名 (列1 [(长度)] [ASC|DESC], 列2, ...);

方式二：alter table

alter table 表名 ADD [unique | FULLTEXT] [index] 索引名 (列名);

不指定类型时默认为普通索引；

可对多列建立组合索引；

FULLTEXT 为全文索引，适用于文本搜索。

2.2 删除索引

方式一：drop index

drop index 索引名 on 表名;

方式二：alter table

alter table 表名 drop index 索引名;

三、视图

视图是基于 SQL 查询结果的虚拟表，不存储数据，查询时由数据库引擎实时重建。

优点：

简化复杂 SQL（如多表连接）
只暴露部分数据，保障数据安全
统一数据格式与展示方式

1. 创建视图

create view 视图名 as
select 列名
from 表名
where 条件;

2. 更新视图

create or replace view 视图名 as
select 列名
from 表名
where 条件;

3. 删除视图

drop view 视图名;

四、聚合分组查询

1. 聚合函数

函数	说明
`sum(列)`	求和
`count(列)`	计数
`avg(列)`	平均值
`max(列)`	最大值
`min(列)`	最小值

⚠️ 除 count() 外，所有聚合函数忽略 NULL 值。

2. 分组

只有前面使用了聚合函数的时候，才需要group by, 并且需要group by所有除了聚合函数以外的列名（column_name）

2.1 group by

SELECT 列名1, 列名2,聚合函数(列名1)
FROM 表名
WHERE 列名1 运算符 value
GROUP BY 列名1, 列名2;
HAVING 聚合函数(列名1) 运算符 value

2.2 with rollup 汇总

group by 字段 with rollup;

with rollup 是在 group by 的基础上，额外追加一行汇总行。

举例：

原始数据：

device	pass_count
app	5
pc	10
pc	5

select device, sum(pass_count)
from table
group by device with rollup;

结果：

device	sum(pass_count)
app	5
pc	30
NULL	35

NULL那行就是 rollup 追加的汇总行，含义：NULL 代表"所有分组的合计"，即 5 + 30 = 35。

注意两点：

rollup 后面不能直接接 order by，需要套一层再排序：

select * from (
    select device, sum(pass_count) as total
    from table
    group by device with rollup
) table1
order by total;

汇总行的 NULL 容易和真正的 NULL 值混淆，所以用之前要先把 group by 字段里的 NULL 值替换掉（用 ifnull 或 coalesce）：

group by ifnull(device, '未知') with rollup;

在分组结果末尾追加汇总行，用 null 标记汇总行；
使用前需将 group by 字段中的 null 替换为不可能出现的常量；
with rollup 后不可直接接 order by，需外层再套查询。

举例：

device	pass_count
app	5
pc	10
NULL	8

如果原始数据里 device 有 NULL 值

不替换：

device	sum
app	5
pc	10
NULL	8
NULL	23

→ 两行 NULL，分不清哪个是真实数据，哪个是汇总

替换后：

select ifnull(device, '未知数'), sum(pass_count)
from table
group by ifnull(device, '未知数') with rollup;

device	sum
app	5
pc	10
未知数	8
NULL	23

→ 未知数 = 真实的 NULL 数据，NULL = rollup 汇总行，一目了然

五、窗口函数

作用：在不聚合数据行的前提下，对每行动态计算分组排名、累计值、相邻行差值等。
典型场景：各部门工资排名前 N 名 / 每人工资占部门总工资的比例。

基本语法：

<窗口函数> over (
    partition by <分组列>
    order by <排序列>
)

执行顺序（逻辑上）：

FROM → JOIN → WHERE → GROUP BY → HAVING
    → 窗口函数
        → ORDER BY → LIMIT → SELECT DISTINCT

窗口函数执行时 group by 已完成，因此不会再产生数据聚合。

1. 专用窗口函数

1.1 序号函数

函数	并列处理方式
`row_number()`	不考虑并列，连续编号 1,2,3,4
`rank()`	并列后跳号，如 1,1,3,4
`dense_rank()`	并列后不跳号，如 1,1,2,3

示例：找到每类试卷得分前三名

select tag, uid, ranking
from (
    select tag, uid,
        rank() over (
            partition by tag
            order by max(score) DESC, min(score) DESC, uid DESC
        ) as ranking
    from exam_record
    group by tag, uid
) as rank_info
where ranking <= 3;

1.2 分布函数

函数	公式/说明
`percent_rank()`	`(rank - 1) / (rows - 1)`，结果在 [0,1] 之间
`cume_dist()`	分组内 ≥ 当前 rank 值的行数 / 分组总行数；使用场景更广

典型场景：班级中比当前同学成绩高的学生比例。

示例：统计 SQL 试卷未完成率较高的前 50% 用户

select uid
from (
    select uid,
        percent_rank() over (order by incomp_rate DESC, uid) as incomp_rate_rank
    from (
        select uid,
               1 - count(submit_time) / count(1) as incomp_rate
        from exam_record
        where exam_id in (
            select exam_id from examination_info where tag = 'SQL'
        )
        group by uid
    ) as t_exam_incom_rate
) as t_exam_incom_rate_rank
where incomp_rate_rank <= 0.5;

1.3 前后函数

函数	说明
`lead(列, n)`	取当前行后 n 行的值
`lag(列, n)`	取当前行前 n 行的值

典型场景：计算用户相邻两次浏览/作答的时间差。

示例：计算每位用户连续两次作答的最大时间窗

select uid,
    count(start_time)                              as exam_cnt,
    datediff(max(start_time), min(start_time)) + 1 as diff_days,
    max(datediff(next_start_time, start_time)) + 1 as days_window
from (
    select uid, exam_id, start_time,
        lead(start_time) over (
            partition by uid order by start_time
        ) as next_start_time
    from exam_record
    where year(start_time) = 2021
) as t_exam_record_lead
group by uid;

1.4 头尾函数

函数	说明
`first_value(列)`	返回分区中第一行的指定列值
`last_value(列)`	返回分区中最后一行的指定列值

1.5 其他函数

nth_value(列, n)：返回窗口中第 n 行的指定列值。

示例：查询作答第二快和第二慢的用时

select distinct exam_id, duration, release_time,
    nth_value(time_took, 2) over (
        partition by exam_id order by time_took DESC
    ) as max2_time_took,
    nth_value(time_took, 2) over (
        partition by exam_id order by time_took ASC
    ) as min2_time_took
from t_exam_record_timetook;

ntile(n)：将分区内数据均匀分成 n 个桶，返回桶号。

数据量不能被 n 整除时，多余的行依次分配给第 1、2、3... 桶。
典型场景：将大量数据平均分配给 N 个并行进程分别处理。

2. 聚合窗口函数

在窗口内对每行动态应用 sum / avg / max / min / count，不折叠行数。

典型场景：截至当前课程，每个学生的累计总分 / 平均分 / 最高分。

示例：使用 window 别名复用窗口定义

select s.sname, c.cname, sc.score,
    sum(sc.score)   over w as sum_val,
    avg(sc.score)   over w as avg_val,
    max(sc.score)   over w as max_val,
    min(sc.score)   over w as min_val,
    count(sc.score) over w as count_val
from student s
inner join score  sc on s.sid = sc.sid
inner join course c  on sc.cid = c.cid
window w as (partition by s.sname order by sc.score DESC);

⚠️ 聚合窗口函数括号内不可为空，需填写要聚合的列名。
window 别名 as (...) 可将相同的窗口定义复用，避免重复书写。

posted @ 2026-03-22 18:10 Stéphane 阅读(1) 评论(0) 收藏举报

刷新页面返回顶部