Loading

SQL 经典面试题

第1题 连续问题

题目描述

如下数据为蚂蚁森林中用户领取的减少碳排放量。找出连续 3 天及以上减少碳排放量在 100 以上的用户。

输入:

-- 创建表
CREATE TABLE carbon_reduction (
    id INT,
    dt DATE,
    lowcarbon INT
);

-- 插入测试数据
INSERT INTO carbon_reduction (id, dt, lowcarbon) VALUES
(1001, '2021-12-12', 123),
(1002, '2021-12-12', 45),
(1001, '2021-12-13', 43),
(1001, '2021-12-13', 45),
(1001, '2021-12-13', 23),
(1002, '2021-12-14', 45),
(1001, '2021-12-14', 230),
(1002, '2021-12-15', 45),
(1001, '2021-12-15', 23),
(1001, '2021-12-16', 150),
(1001, '2021-12-17', 110),
(1001, '2021-12-18', 95),
(1001, '2021-12-19', 120),
(1002, '2021-12-16', 130),
(1002, '2021-12-17', 105),
(1002, '2021-12-18', 115),
(1002, '2021-12-20', 125),
(1003, '2021-12-14', 101),
(1003, '2021-12-15', 102),
(1003, '2021-12-16', 103),
(1003, '2021-12-17', 104),
(1003, '2021-12-18', 99),
(1004, '2021-12-13', 110),
(1004, '2021-12-14', 120),
(1004, '2021-12-15', 130);

输出:

id
1002
1003
1004

结果代码

select 
	distinct id
from (
	select 
		*,
		sum(flag)over(partition by id order by dt) as group_id
	from (
		select 
			*,
			if(datediff(dt, lag(dt,1)over(partition by id order by dt)) = 1, 0, 1) as flag
		from carbon_reduction cr 
		where lowcarbon > 100
	) t1
) t2
group by id, group_id
having count(*) >= 3

2 题 分组问题

题目描述

如下为电商公司用户访问时间数据。某个用户连续的访问记录如果时间间隔小于 60 秒,则分为同一个组。

输入:

id ts(秒)
1001 17523641234
1001 17523641256
1002 17523641278
1001 17523641334
1002 17523641434
1001 17523641534
1001 17523641544
1002 17523641634
1001 17523641638
1001 17523641654

输出:

id ts(秒) group
1001 17523641234 1
1001 17523641256 1
1001 17523641334 2
1001 17523641534 3
1001 17523641544 3
1001 17523641638 4
1001 17523641654 4
1002 17523641278 1
1002 17523641434 2
1002 17523641634 3

建表语句与插入数据语句:

CREATE TABLE user_visits (
    id INT,
    ts BIGINT
);

INSERT INTO user_visits (id, ts) VALUES
(1001, 17523641234),
(1001, 17523641256),
(1002, 17523641278),
(1001, 17523641334),
(1002, 17523641434),
(1001, 17523641534),
(1001, 17523641544),
(1002, 17523641634),
(1001, 17523641638),
(1001, 17523641654);

结果代码

select 
	*,
	sum(flag) over(partition by id order by ts) as `group`
from (
	select
		*,
		if(ts-lag(ts,1,null)over(partition by id order by ts) < 60, 0, 1) as flag
	from user_visits uv 
) t1

3 题 间隔连续问题

题目描述

某游戏公司记录的用户每日登录数据。计算每个用户最大的连续登录天数,可以间隔一天。解释:如果一个用户在 1,3,5,6 号登录游戏,则视为连续 6 天登录。

输入:

-- 创建用户登录记录表
CREATE TABLE user_login (
    id INT,
    dt DATE
);

-- 测试案例1:用户1001 - 有间隔但符合连续登录规则
INSERT INTO user_login (id, dt) VALUES
(1001, '2021-12-01'),
(1001, '2021-12-03'),  -- 间隔1天
(1001, '2021-12-05'),  -- 间隔1天
(1001, '2021-12-06'),  -- 连续
(1001, '2021-12-08'),  -- 间隔1天
(1001, '2021-12-10'),  -- 间隔1天
(1001, '2021-12-12');  -- 间隔1天

-- 测试案例2:用户1002 - 完美连续登录
INSERT INTO user_login (id, dt) VALUES
(1002, '2021-12-01'),
(1002, '2021-12-02'),
(1002, '2021-12-03'),
(1002, '2021-12-04'),
(1002, '2021-12-05');

-- 测试案例3:用户1003 - 有间断的情况
INSERT INTO user_login (id, dt) VALUES
(1003, '2021-12-01'),
(1003, '2021-12-02'),
(1003, '2021-12-05'),  -- 中断2天
(1003, '2021-12-07'),  -- 间隔1天
(1003, '2021-12-08'),  -- 连续
(1003, '2021-12-10');  -- 间隔1天

-- 测试案例4:用户1004 - 单次登录
INSERT INTO user_login (id, dt) VALUES
(1004, '2021-12-01');

-- 测试案例5:用户1005 - 复杂间隔模式
INSERT INTO user_login (id, dt) VALUES
(1005, '2021-12-01'),
(1005, '2021-12-02'),
(1005, '2021-12-04'),  -- 间隔1天
(1005, '2021-12-06'),  -- 间隔1天
(1005, '2021-12-07'),  -- 连续
(1005, '2021-12-08'),  -- 连续
(1005, '2021-12-10'),  -- 间隔1天
(1005, '2021-12-11'),  -- 连续
(1005, '2021-12-13');  -- 间隔1天

-- 测试案例6:用户1006 - 跨月登录
INSERT INTO user_login (id, dt) VALUES
(1006, '2021-11-28'),
(1006, '2021-11-30'),  -- 间隔1天
(1006, '2021-12-01'),  -- 连续(跨月)
(1006, '2021-12-03');  -- 间隔1天

-- 测试案例7:用户1007 - 有较长间断
INSERT INTO user_login (id, dt) VALUES
(1007, '2021-12-01'),
(1007, '2021-12-03'),  -- 间隔1天
(1007, '2021-12-04'),  -- 连续
(1007, '2021-12-10');  -- 中断5天,新的开始

输出:

id  | max_consecutive_days
----|---------------------
1001| 12天 (12-01到12-12,允许间隔1天)
1002| 5天  (完美连续)
1003| 6天  (12-05到12-10,允许间隔1天)
1004| 1天  (单次登录)
1005| 13天 (12-01到12-13,允许间隔1天)
1006| 6天  (11-28到12-03,允许间隔1天,跨月)
1007| 4天  (12-01到12-04,允许间隔1天)

结果代码

select 
	id,
	max(consecutive_days) as max_consecutive_days
from (
	select 
		id,
		group_id,
		datediff(max(dt),min(dt))+1 as consecutive_days
	from (
		select 
			*,
			sum(flag)over(partition by id order by dt) as group_id
		from (
			select 
				*,
				if(datediff(dt, lag(dt,1)over(partition by id order by dt))<=2, 0, 1) as flag
			from user_login
		) t1
	) t2
	group by id, group_id
) t3
group by id

第 4 题 打折日期交叉问题

题目描述

如下为平台商品促销数据:字段为品牌,打折开始日期,打折结束日期

brand stt edt
oppo 2021-06-05 2021-06-09
oppo 2021-06-11 2021-06-21
vivo 2021-06-05 2021-06-15
vivo 2021-06-09 2021-06-21
redmi 2021-06-05 2021-06-21
redmi 2021-06-09 2021-06-15
redmi 2021-06-17 2021-06-26
huawei 2021-06-05 2021-06-26
huawei 2021-06-09 2021-06-15
huawei 2021-06-17 2021-06-21

计算每个品牌总的打折销售天数,注意其中的交叉日期,比如 vivo 品牌,第一次活动时间为 2021-06-05 到 2021-06-15,第二次活动时间为 2021-06-09 到 2021-06-21 其中 9 号到 15号为重复天数,只统计一次,即 vivo 总打折天数为 2021-06-05 到 2021-06-21 共计 17 天

编写建表和插入语句,编写测试输入和输出

输入:

-- 创建商品促销表
CREATE TABLE product_promotion (
    brand VARCHAR(20),
    stt DATE,
    edt DATE
);


-- 插入题目中给定的数据
INSERT INTO product_promotion (brand, stt, edt) VALUES
('oppo', '2021-06-05', '2021-06-09'),
('oppo', '2021-06-11', '2021-06-21'),
('vivo', '2021-06-05', '2021-06-15'),
('vivo', '2021-06-09', '2021-06-21'),
('redmi', '2021-06-05', '2021-06-21'),
('redmi', '2021-06-09', '2021-06-15'),
('redmi', '2021-06-17', '2021-06-26'),
('huawei', '2021-06-05', '2021-06-26'),
('huawei', '2021-06-09', '2021-06-15'),
('huawei', '2021-06-17', '2021-06-21');

-- 补充测试案例1:完全重叠
INSERT INTO product_promotion (brand, stt, edt) VALUES
('apple', '2021-06-01', '2021-06-10'),
('apple', '2021-06-05', '2021-06-08');

-- 补充测试案例2:不连续的活动
INSERT INTO product_promotion (brand, stt, edt) VALUES
('samsung', '2021-06-01', '2021-06-05'),
('samsung', '2021-06-10', '2021-06-15'),
('samsung', '2021-06-20', '2021-06-25');

-- 补充测试案例3:嵌套关系
INSERT INTO product_promotion (brand, stt, edt) VALUES
('xiaomi', '2021-06-01', '2021-06-30'),
('xiaomi', '2021-06-10', '2021-06-20'),
('xiaomi', '2021-06-15', '2021-06-25');

-- 补充测试案例4:首尾相接
INSERT INTO product_promotion (brand, stt, edt) VALUES
('oneplus', '2021-06-01', '2021-06-10'),
('oneplus', '2021-06-10', '2021-06-20');

-- 补充测试案例5:单次活动
INSERT INTO product_promotion (brand, stt, edt) VALUES
('realme', '2021-06-01', '2021-06-10');

输出:

brand   | total_days | 说明
--------|------------|-----------------------------
oppo    | 16天       | (6-5到6-9)5天 + (6-11到6-21)11天 = 16天,无重叠
vivo    | 17天       | 6-5到6-21,去除重叠部分
redmi   | 22天       | 6-5到6-26,去除6-9到6-15的重复,6-17到6-26
huawei  | 22天       | 6-5到6-26,去除6-9到6-15、6-17到6-21的重复
apple    | 10天       | 6-1到6-10,去除6-5到6-8的完全重叠
samsung  | 17天       | (6-1到6-5)5天 + (6-10到6-15)6天 + (6-20到6-25)6天 = 17天
xiaomi   | 30天       | 6-1到6-30,去除中间重叠部分
oneplus  | 20天       | 6-1到6-20,首尾相接(6-10重复只算一次)
realme   | 10天       | 6-1到6-10

结果代码

本题可参考 算法题:合并区间 的思路: 算法题:合并区间 - 拾月凄辰 - 博客园

WITH ranked_promotion AS (
    SELECT 
        brand,
        stt,
        edt,
        -- 标记是否为新区间开始
        CASE 
            WHEN stt > MAX(edt) OVER (PARTITION BY brand ORDER BY stt, edt 
                                       ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
            THEN 1
            ELSE 0
        END AS is_new_interval
    FROM product_promotion
),
interval_groups AS (
    SELECT 
        brand,
        stt,
        edt,
        -- 为连续的区间分配相同的组号
        SUM(is_new_interval) OVER (PARTITION BY brand ORDER BY stt, edt) AS interval_group
    FROM ranked_promotion
),
merged_intervals AS (
    SELECT 
        brand,
        interval_group,
        MIN(stt) AS merged_stt,
        MAX(edt) AS merged_edt
    FROM interval_groups
    GROUP BY brand, interval_group
)
SELECT 
    brand,
    SUM(DATEDIFF(merged_edt, merged_stt) + 1) AS total_days
FROM merged_intervals
GROUP BY brand
ORDER BY brand;

第 5 题 同时在线问题

题目描述

SQL163 每篇文章同一时刻最大在看人数

来源:每篇文章同一时刻最大在看人数_牛客题霸_牛客网

用户行为日志表tb_user_log

id uid artical_id in_time out_time sign_in
1 101 9001 2021-11-01 10:00:00 2021-11-01 10:00:11 0
2 102 9001 2021-11-01 10:00:09 2021-11-01 10:00:38 0
3 103 9001 2021-11-01 10:00:28 2021-11-01 10:00:58 0
4 104 9002 2021-11-01 11:00:45 2021-11-01 11:01:11 0
5 105 9001 2021-11-01 10:00:51 2021-11-01 10:00:59 0
6 106 9002 2021-11-01 11:00:55 2021-11-01 11:01:24 0
7 107 9001 2021-11-01 10:00:01 2021-11-01 10:01:50 0

(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)

场景逻辑说明artical_id-文章ID代表用户浏览的文章的ID,artical_id-文章ID0表示用户在非文章内容页(比如App内的列表页、活动页等)。

问题:统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序。

输出示例

示例数据的输出结果如下

artical_id max_uv
9001 3
9002 2

解释:10点0分10秒时,有3个用户正在浏览文章9001;11点01分0秒时,有2个用户正在浏览文章9002。

示例一:

输入:
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    artical_id INT NOT NULL COMMENT '视频ID',
    in_time datetime COMMENT '进入时间',
    out_time datetime COMMENT '离开时间',
    sign_in TINYINT DEFAULT 0 COMMENT '是否签到'
) CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
  (101, 9001, '2021-11-01 10:00:00', '2021-11-01 10:00:11', 0),
  (102, 9001, '2021-11-01 10:00:09', '2021-11-01 10:00:38', 0),
  (103, 9001, '2021-11-01 10:00:28', '2021-11-01 10:00:58', 0),
  (104, 9002, '2021-11-01 11:00:45', '2021-11-01 11:01:11', 0),
  (105, 9001, '2021-11-01 10:00:51', '2021-11-01 10:00:59', 0),
  (106, 9002, '2021-11-01 11:00:55', '2021-11-01 11:01:24', 0),
  (107, 9001, '2021-11-01 10:00:01', '2021-11-01 10:01:50', 0);


输出:
9001|3
9002|2

思路

解法来源:利用SUM窗口函数找到同一时刻内的最大计数_牛客博客

本题需求不难理解,难点在于如何计算瞬时的最大计数(在看人数)

首先,我们自然会想到常见的编码+联立。在此对原表in_time和out_time进行编码,in为观看人数+1, out为观看人数-1,进行两次SELECT联立,并按artical_id升序,时间戳升序:

代码:

    SELECT 
      artical_id, in_time dt, 1 diff
    FROM tb_user_log
    WHERE artical_id != 0
    UNION ALL
    SELECT 
      artical_id, out_time dt, -1 diff
    FROM tb_user_log
    WHERE artical_id != 0
    ORDER BY 1,2
    
结果:

9001|2021-11-01 10:00:00|1
9001|2021-11-01 10:00:01|1
9001|2021-11-01 10:00:09|1
9001|2021-11-01 10:00:11|-1
9001|2021-11-01 10:00:28|1
9001|2021-11-01 10:00:38|-1
9001|2021-11-01 10:00:51|1
9001|2021-11-01 10:00:58|-1
9001|2021-11-01 10:00:59|-1
9001|2021-11-01 10:01:50|-1
9002|2021-11-01 11:00:45|1
9002|2021-11-01 11:00:55|1
9002|2021-11-01 11:01:11|-1
9002|2021-11-01 11:01:24|-1

意义:

某篇文章artical_id,在给定的时间戳dt的,瞬时观看人数变化diff

到这一步,本题的解法基本就明朗了:

我们考虑使用SUM窗口函数,按文章id维度,统计按时间戳升序的观看人数变化情况:

代码:

  SELECT
    artical_id,
    dt,
    SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt) instant_viewer_cnt
  FROM (
    SELECT 
      artical_id, in_time dt, 1 diff
    FROM tb_user_log
    WHERE artical_id != 0
    UNION ALL
    SELECT 
      artical_id, out_time dt, -1 diff
    FROM tb_user_log
    WHERE artical_id != 0) t1 
 
结果:
 
9001|2021-11-01 10:00:00|1
9001|2021-11-01 10:00:01|2
9001|2021-11-01 10:00:09|3
9001|2021-11-01 10:00:11|2
9001|2021-11-01 10:00:28|3
9001|2021-11-01 10:00:38|2
9001|2021-11-01 10:00:51|3
9001|2021-11-01 10:00:58|2
9001|2021-11-01 10:00:59|1
9001|2021-11-01 10:01:50|0
9002|2021-11-01 11:00:45|1
9002|2021-11-01 11:00:55|2
9002|2021-11-01 11:01:11|1
9002|2021-11-01 11:01:24|0

意义:

某篇文章artical_id,在给定的时间戳dt的,瞬时累计观看人数instant_viewer_cnt

然后到了本题的坑点。

题目要求在瞬时统计时遵循【先进后出】:如果同一时刻有进入也有离开时,先记录用户数增加,再记录减少。

因此在ORDER BY层面,在遵循dt升序的同时,还要遵循先+1,再-1的原则,即diff DESC:

SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt, diff DESC)

——————————————————————————————————

步骤拆解:

  1. 对原表编码并联立;
  2. 按artical_id维度,dt升序 ,diff降序,对diff进行SUM开窗统计,得到每个artical_id的瞬时观看人数instant_viewer_cnt;
  3. 最外层SELECT按artical_id聚合,通过MAX(instant_viewer_cnt)取出瞬时观看最大值max_uv,并排序。

——————————————————————————————————

完整代码如下:

SELECT
  artical_id,
  MAX(instant_viewer_cnt) max_uv
FROM (
  SELECT
    artical_id,
    SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt, diff DESC) instant_viewer_cnt
  FROM (
    SELECT 
      artical_id, in_time dt, 1 diff
    FROM tb_user_log
    WHERE artical_id != 0
    UNION ALL
    SELECT 
      artical_id, out_time dt, -1 diff
    FROM tb_user_log
    WHERE artical_id != 0) t1 
) t2
GROUP BY 1
ORDER BY 2 DESC

结果代码

SELECT
  artical_id,
  MAX(sum_diff) max_uv
FROM (
    select
        artical_id,
        -- 如果同一时刻有进入也有离开时,先记录用户数增加再记录减少
        sum(diff) over (partition by artical_id order by dt, diff desc) as sum_diff
    from (
        select 
            artical_id, 
            in_time as dt, 
            1 as diff
        from tb_user_log
        where artical_id != 0
        union all
        select 
            artical_id, 
            out_time as dt, 
            -1 as diff
        from tb_user_log
        where artical_id != 0
    ) t1
) t2
group by artical_id
order by max_uv desc
posted @ 2025-12-25 21:10  拾月凄辰  阅读(6)  评论(0)    收藏  举报