SQL 经典面试题
第1题 连续问题
题目描述
如下数据为蚂蚁森林中用户领取的减少碳排放量。找出连续 3 天及以上减少碳排放量在 100 以上的用户。
输入:
-- 创建表
CREATE TABLE carbon_reduction (
id INT,
dt DATE,
lowcarbon INT
);
-- 插入测试数据
INSERT INTO carbon_reduction (id, dt, lowcarbon) VALUES
(1001, '2021-12-12', 123),
(1002, '2021-12-12', 45),
(1001, '2021-12-13', 43),
(1001, '2021-12-13', 45),
(1001, '2021-12-13', 23),
(1002, '2021-12-14', 45),
(1001, '2021-12-14', 230),
(1002, '2021-12-15', 45),
(1001, '2021-12-15', 23),
(1001, '2021-12-16', 150),
(1001, '2021-12-17', 110),
(1001, '2021-12-18', 95),
(1001, '2021-12-19', 120),
(1002, '2021-12-16', 130),
(1002, '2021-12-17', 105),
(1002, '2021-12-18', 115),
(1002, '2021-12-20', 125),
(1003, '2021-12-14', 101),
(1003, '2021-12-15', 102),
(1003, '2021-12-16', 103),
(1003, '2021-12-17', 104),
(1003, '2021-12-18', 99),
(1004, '2021-12-13', 110),
(1004, '2021-12-14', 120),
(1004, '2021-12-15', 130);
输出:
| id |
|---|
| 1002 |
| 1003 |
| 1004 |
结果代码
select
distinct id
from (
select
*,
sum(flag)over(partition by id order by dt) as group_id
from (
select
*,
if(datediff(dt, lag(dt,1)over(partition by id order by dt)) = 1, 0, 1) as flag
from carbon_reduction cr
where lowcarbon > 100
) t1
) t2
group by id, group_id
having count(*) >= 3
第 2 题 分组问题
题目描述
如下为电商公司用户访问时间数据。某个用户连续的访问记录如果时间间隔小于 60 秒,则分为同一个组。
输入:
| id | ts(秒) |
|---|---|
| 1001 | 17523641234 |
| 1001 | 17523641256 |
| 1002 | 17523641278 |
| 1001 | 17523641334 |
| 1002 | 17523641434 |
| 1001 | 17523641534 |
| 1001 | 17523641544 |
| 1002 | 17523641634 |
| 1001 | 17523641638 |
| 1001 | 17523641654 |
输出:
| id | ts(秒) | group |
|---|---|---|
| 1001 | 17523641234 | 1 |
| 1001 | 17523641256 | 1 |
| 1001 | 17523641334 | 2 |
| 1001 | 17523641534 | 3 |
| 1001 | 17523641544 | 3 |
| 1001 | 17523641638 | 4 |
| 1001 | 17523641654 | 4 |
| 1002 | 17523641278 | 1 |
| 1002 | 17523641434 | 2 |
| 1002 | 17523641634 | 3 |
建表语句与插入数据语句:
CREATE TABLE user_visits (
id INT,
ts BIGINT
);
INSERT INTO user_visits (id, ts) VALUES
(1001, 17523641234),
(1001, 17523641256),
(1002, 17523641278),
(1001, 17523641334),
(1002, 17523641434),
(1001, 17523641534),
(1001, 17523641544),
(1002, 17523641634),
(1001, 17523641638),
(1001, 17523641654);
结果代码
select
*,
sum(flag) over(partition by id order by ts) as `group`
from (
select
*,
if(ts-lag(ts,1,null)over(partition by id order by ts) < 60, 0, 1) as flag
from user_visits uv
) t1
第 3 题 间隔连续问题
题目描述
某游戏公司记录的用户每日登录数据。计算每个用户最大的连续登录天数,可以间隔一天。解释:如果一个用户在 1,3,5,6 号登录游戏,则视为连续 6 天登录。
输入:
-- 创建用户登录记录表
CREATE TABLE user_login (
id INT,
dt DATE
);
-- 测试案例1:用户1001 - 有间隔但符合连续登录规则
INSERT INTO user_login (id, dt) VALUES
(1001, '2021-12-01'),
(1001, '2021-12-03'), -- 间隔1天
(1001, '2021-12-05'), -- 间隔1天
(1001, '2021-12-06'), -- 连续
(1001, '2021-12-08'), -- 间隔1天
(1001, '2021-12-10'), -- 间隔1天
(1001, '2021-12-12'); -- 间隔1天
-- 测试案例2:用户1002 - 完美连续登录
INSERT INTO user_login (id, dt) VALUES
(1002, '2021-12-01'),
(1002, '2021-12-02'),
(1002, '2021-12-03'),
(1002, '2021-12-04'),
(1002, '2021-12-05');
-- 测试案例3:用户1003 - 有间断的情况
INSERT INTO user_login (id, dt) VALUES
(1003, '2021-12-01'),
(1003, '2021-12-02'),
(1003, '2021-12-05'), -- 中断2天
(1003, '2021-12-07'), -- 间隔1天
(1003, '2021-12-08'), -- 连续
(1003, '2021-12-10'); -- 间隔1天
-- 测试案例4:用户1004 - 单次登录
INSERT INTO user_login (id, dt) VALUES
(1004, '2021-12-01');
-- 测试案例5:用户1005 - 复杂间隔模式
INSERT INTO user_login (id, dt) VALUES
(1005, '2021-12-01'),
(1005, '2021-12-02'),
(1005, '2021-12-04'), -- 间隔1天
(1005, '2021-12-06'), -- 间隔1天
(1005, '2021-12-07'), -- 连续
(1005, '2021-12-08'), -- 连续
(1005, '2021-12-10'), -- 间隔1天
(1005, '2021-12-11'), -- 连续
(1005, '2021-12-13'); -- 间隔1天
-- 测试案例6:用户1006 - 跨月登录
INSERT INTO user_login (id, dt) VALUES
(1006, '2021-11-28'),
(1006, '2021-11-30'), -- 间隔1天
(1006, '2021-12-01'), -- 连续(跨月)
(1006, '2021-12-03'); -- 间隔1天
-- 测试案例7:用户1007 - 有较长间断
INSERT INTO user_login (id, dt) VALUES
(1007, '2021-12-01'),
(1007, '2021-12-03'), -- 间隔1天
(1007, '2021-12-04'), -- 连续
(1007, '2021-12-10'); -- 中断5天,新的开始
输出:
id | max_consecutive_days
----|---------------------
1001| 12天 (12-01到12-12,允许间隔1天)
1002| 5天 (完美连续)
1003| 6天 (12-05到12-10,允许间隔1天)
1004| 1天 (单次登录)
1005| 13天 (12-01到12-13,允许间隔1天)
1006| 6天 (11-28到12-03,允许间隔1天,跨月)
1007| 4天 (12-01到12-04,允许间隔1天)
结果代码
select
id,
max(consecutive_days) as max_consecutive_days
from (
select
id,
group_id,
datediff(max(dt),min(dt))+1 as consecutive_days
from (
select
*,
sum(flag)over(partition by id order by dt) as group_id
from (
select
*,
if(datediff(dt, lag(dt,1)over(partition by id order by dt))<=2, 0, 1) as flag
from user_login
) t1
) t2
group by id, group_id
) t3
group by id
第 4 题 打折日期交叉问题
题目描述
如下为平台商品促销数据:字段为品牌,打折开始日期,打折结束日期
| brand | stt | edt |
|---|---|---|
| oppo | 2021-06-05 | 2021-06-09 |
| oppo | 2021-06-11 | 2021-06-21 |
| vivo | 2021-06-05 | 2021-06-15 |
| vivo | 2021-06-09 | 2021-06-21 |
| redmi | 2021-06-05 | 2021-06-21 |
| redmi | 2021-06-09 | 2021-06-15 |
| redmi | 2021-06-17 | 2021-06-26 |
| huawei | 2021-06-05 | 2021-06-26 |
| huawei | 2021-06-09 | 2021-06-15 |
| huawei | 2021-06-17 | 2021-06-21 |
计算每个品牌总的打折销售天数,注意其中的交叉日期,比如 vivo 品牌,第一次活动时间为 2021-06-05 到 2021-06-15,第二次活动时间为 2021-06-09 到 2021-06-21 其中 9 号到 15号为重复天数,只统计一次,即 vivo 总打折天数为 2021-06-05 到 2021-06-21 共计 17 天
编写建表和插入语句,编写测试输入和输出
输入:
-- 创建商品促销表
CREATE TABLE product_promotion (
brand VARCHAR(20),
stt DATE,
edt DATE
);
-- 插入题目中给定的数据
INSERT INTO product_promotion (brand, stt, edt) VALUES
('oppo', '2021-06-05', '2021-06-09'),
('oppo', '2021-06-11', '2021-06-21'),
('vivo', '2021-06-05', '2021-06-15'),
('vivo', '2021-06-09', '2021-06-21'),
('redmi', '2021-06-05', '2021-06-21'),
('redmi', '2021-06-09', '2021-06-15'),
('redmi', '2021-06-17', '2021-06-26'),
('huawei', '2021-06-05', '2021-06-26'),
('huawei', '2021-06-09', '2021-06-15'),
('huawei', '2021-06-17', '2021-06-21');
-- 补充测试案例1:完全重叠
INSERT INTO product_promotion (brand, stt, edt) VALUES
('apple', '2021-06-01', '2021-06-10'),
('apple', '2021-06-05', '2021-06-08');
-- 补充测试案例2:不连续的活动
INSERT INTO product_promotion (brand, stt, edt) VALUES
('samsung', '2021-06-01', '2021-06-05'),
('samsung', '2021-06-10', '2021-06-15'),
('samsung', '2021-06-20', '2021-06-25');
-- 补充测试案例3:嵌套关系
INSERT INTO product_promotion (brand, stt, edt) VALUES
('xiaomi', '2021-06-01', '2021-06-30'),
('xiaomi', '2021-06-10', '2021-06-20'),
('xiaomi', '2021-06-15', '2021-06-25');
-- 补充测试案例4:首尾相接
INSERT INTO product_promotion (brand, stt, edt) VALUES
('oneplus', '2021-06-01', '2021-06-10'),
('oneplus', '2021-06-10', '2021-06-20');
-- 补充测试案例5:单次活动
INSERT INTO product_promotion (brand, stt, edt) VALUES
('realme', '2021-06-01', '2021-06-10');
输出:
brand | total_days | 说明
--------|------------|-----------------------------
oppo | 16天 | (6-5到6-9)5天 + (6-11到6-21)11天 = 16天,无重叠
vivo | 17天 | 6-5到6-21,去除重叠部分
redmi | 22天 | 6-5到6-26,去除6-9到6-15的重复,6-17到6-26
huawei | 22天 | 6-5到6-26,去除6-9到6-15、6-17到6-21的重复
apple | 10天 | 6-1到6-10,去除6-5到6-8的完全重叠
samsung | 17天 | (6-1到6-5)5天 + (6-10到6-15)6天 + (6-20到6-25)6天 = 17天
xiaomi | 30天 | 6-1到6-30,去除中间重叠部分
oneplus | 20天 | 6-1到6-20,首尾相接(6-10重复只算一次)
realme | 10天 | 6-1到6-10
结果代码
本题可参考 算法题:合并区间 的思路: 算法题:合并区间 - 拾月凄辰 - 博客园
WITH ranked_promotion AS (
SELECT
brand,
stt,
edt,
-- 标记是否为新区间开始
CASE
WHEN stt > MAX(edt) OVER (PARTITION BY brand ORDER BY stt, edt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
THEN 1
ELSE 0
END AS is_new_interval
FROM product_promotion
),
interval_groups AS (
SELECT
brand,
stt,
edt,
-- 为连续的区间分配相同的组号
SUM(is_new_interval) OVER (PARTITION BY brand ORDER BY stt, edt) AS interval_group
FROM ranked_promotion
),
merged_intervals AS (
SELECT
brand,
interval_group,
MIN(stt) AS merged_stt,
MAX(edt) AS merged_edt
FROM interval_groups
GROUP BY brand, interval_group
)
SELECT
brand,
SUM(DATEDIFF(merged_edt, merged_stt) + 1) AS total_days
FROM merged_intervals
GROUP BY brand
ORDER BY brand;
第 5 题 同时在线问题
题目描述
SQL163 每篇文章同一时刻最大在看人数
用户行为日志表tb_user_log
| id | uid | artical_id | in_time | out_time | sign_in |
|---|---|---|---|---|---|
| 1 | 101 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:11 | 0 |
| 2 | 102 | 9001 | 2021-11-01 10:00:09 | 2021-11-01 10:00:38 | 0 |
| 3 | 103 | 9001 | 2021-11-01 10:00:28 | 2021-11-01 10:00:58 | 0 |
| 4 | 104 | 9002 | 2021-11-01 11:00:45 | 2021-11-01 11:01:11 | 0 |
| 5 | 105 | 9001 | 2021-11-01 10:00:51 | 2021-11-01 10:00:59 | 0 |
| 6 | 106 | 9002 | 2021-11-01 11:00:55 | 2021-11-01 11:01:24 | 0 |
| 7 | 107 | 9001 | 2021-11-01 10:00:01 | 2021-11-01 10:01:50 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:artical_id-文章ID代表用户浏览的文章的ID,artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。
问题:统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序。
输出示例:
示例数据的输出结果如下
| artical_id | max_uv |
|---|---|
| 9001 | 3 |
| 9002 | 2 |
解释:10点0分10秒时,有3个用户正在浏览文章9001;11点01分0秒时,有2个用户正在浏览文章9002。
示例一:
输入:
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
artical_id INT NOT NULL COMMENT '视频ID',
in_time datetime COMMENT '进入时间',
out_time datetime COMMENT '离开时间',
sign_in TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
(101, 9001, '2021-11-01 10:00:00', '2021-11-01 10:00:11', 0),
(102, 9001, '2021-11-01 10:00:09', '2021-11-01 10:00:38', 0),
(103, 9001, '2021-11-01 10:00:28', '2021-11-01 10:00:58', 0),
(104, 9002, '2021-11-01 11:00:45', '2021-11-01 11:01:11', 0),
(105, 9001, '2021-11-01 10:00:51', '2021-11-01 10:00:59', 0),
(106, 9002, '2021-11-01 11:00:55', '2021-11-01 11:01:24', 0),
(107, 9001, '2021-11-01 10:00:01', '2021-11-01 10:01:50', 0);
输出:
9001|3
9002|2
思路
解法来源:利用SUM窗口函数找到同一时刻内的最大计数_牛客博客
本题需求不难理解,难点在于如何计算瞬时的最大计数(在看人数)
首先,我们自然会想到常见的编码+联立。在此对原表in_time和out_time进行编码,in为观看人数+1, out为观看人数-1,进行两次SELECT联立,并按artical_id升序,时间戳升序:
代码:
SELECT
artical_id, in_time dt, 1 diff
FROM tb_user_log
WHERE artical_id != 0
UNION ALL
SELECT
artical_id, out_time dt, -1 diff
FROM tb_user_log
WHERE artical_id != 0
ORDER BY 1,2
结果:
9001|2021-11-01 10:00:00|1
9001|2021-11-01 10:00:01|1
9001|2021-11-01 10:00:09|1
9001|2021-11-01 10:00:11|-1
9001|2021-11-01 10:00:28|1
9001|2021-11-01 10:00:38|-1
9001|2021-11-01 10:00:51|1
9001|2021-11-01 10:00:58|-1
9001|2021-11-01 10:00:59|-1
9001|2021-11-01 10:01:50|-1
9002|2021-11-01 11:00:45|1
9002|2021-11-01 11:00:55|1
9002|2021-11-01 11:01:11|-1
9002|2021-11-01 11:01:24|-1
意义:
某篇文章artical_id,在给定的时间戳dt的,瞬时观看人数变化diff
到这一步,本题的解法基本就明朗了:
我们考虑使用SUM窗口函数,按文章id维度,统计按时间戳升序的观看人数变化情况:
代码:
SELECT
artical_id,
dt,
SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt) instant_viewer_cnt
FROM (
SELECT
artical_id, in_time dt, 1 diff
FROM tb_user_log
WHERE artical_id != 0
UNION ALL
SELECT
artical_id, out_time dt, -1 diff
FROM tb_user_log
WHERE artical_id != 0) t1
结果:
9001|2021-11-01 10:00:00|1
9001|2021-11-01 10:00:01|2
9001|2021-11-01 10:00:09|3
9001|2021-11-01 10:00:11|2
9001|2021-11-01 10:00:28|3
9001|2021-11-01 10:00:38|2
9001|2021-11-01 10:00:51|3
9001|2021-11-01 10:00:58|2
9001|2021-11-01 10:00:59|1
9001|2021-11-01 10:01:50|0
9002|2021-11-01 11:00:45|1
9002|2021-11-01 11:00:55|2
9002|2021-11-01 11:01:11|1
9002|2021-11-01 11:01:24|0
意义:
某篇文章artical_id,在给定的时间戳dt的,瞬时累计观看人数instant_viewer_cnt
然后到了本题的坑点。
题目要求在瞬时统计时遵循【先进后出】:如果同一时刻有进入也有离开时,先记录用户数增加,再记录减少。
因此在ORDER BY层面,在遵循dt升序的同时,还要遵循先+1,再-1的原则,即diff DESC:
SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt, diff DESC)
——————————————————————————————————
步骤拆解:
- 对原表编码并联立;
- 按artical_id维度,dt升序 ,diff降序,对diff进行SUM开窗统计,得到每个artical_id的瞬时观看人数instant_viewer_cnt;
- 最外层SELECT按artical_id聚合,通过MAX(instant_viewer_cnt)取出瞬时观看最大值max_uv,并排序。
——————————————————————————————————
完整代码如下:
SELECT
artical_id,
MAX(instant_viewer_cnt) max_uv
FROM (
SELECT
artical_id,
SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt, diff DESC) instant_viewer_cnt
FROM (
SELECT
artical_id, in_time dt, 1 diff
FROM tb_user_log
WHERE artical_id != 0
UNION ALL
SELECT
artical_id, out_time dt, -1 diff
FROM tb_user_log
WHERE artical_id != 0) t1
) t2
GROUP BY 1
ORDER BY 2 DESC
结果代码
SELECT
artical_id,
MAX(sum_diff) max_uv
FROM (
select
artical_id,
-- 如果同一时刻有进入也有离开时,先记录用户数增加再记录减少
sum(diff) over (partition by artical_id order by dt, diff desc) as sum_diff
from (
select
artical_id,
in_time as dt,
1 as diff
from tb_user_log
where artical_id != 0
union all
select
artical_id,
out_time as dt,
-1 as diff
from tb_user_log
where artical_id != 0
) t1
) t2
group by artical_id
order by max_uv desc

浙公网安备 33010602011771号