截至累计求解
题目一:
用户-视频互动表tb_user_video_log
|
id |
uid |
video_id |
start_time |
end_time |
if_follow |
if_like |
if_retweet |
comment_id |
|
1 |
101 |
2001 |
2021-09-01 10:00:00 |
2021-09-01 10:00:20 |
0 |
1 |
1 |
NULL |
|
2 |
105 |
2002 |
2021-09-10 11:00:00 |
2021-09-10 11:00:30 |
1 |
0 |
1 |
NULL |
|
3 |
101 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:20 |
1 |
1 |
1 |
NULL |
|
4 |
102 |
2001 |
2021-10-01 10:00:00 |
2021-10-01 10:00:15 |
0 |
0 |
1 |
NULL |
|
5 |
103 |
2001 |
2021-10-01 11:00:50 |
2021-10-01 11:01:15 |
1 |
1 |
0 |
1732526 |
|
6 |
106 |
2002 |
2021-10-01 10:59:05 |
2021-10-01 11:00:05 |
2 |
0 |
0 |
NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
|
id |
video_id |
author |
tag |
duration |
release_time |
|
1 |
2001 |
901 |
影视 |
30 |
2021-01-01 07:00:00 |
|
2 |
2002 |
901 |
美食 |
60 |
2021-01-01 07:00:00 |
|
3 |
2003 |
902 |
旅游 |
90 |
2020-01-01 07:00:00 |
|
4 |
2004 |
902 |
美女 |
90 |
2020-01-01 08:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量
注:
- 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
- if_follow-是否关注为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。
输出示例:
示例数据的输出结果如下
|
author |
month |
fans_growth_rate |
total_fans |
|
901 |
2021-09 |
0.500 |
1 |
|
901 |
2021-10 |
0.250 |
2 |
解释:
示例数据中表tb_user_video_log里只有视频2001和2002的播放记录,都来自创作者901,播放时间在2021年9月和10月;其中9月里加粉量为1,掉粉量为0,播放量为2,因此涨粉率为0.500(保留3位小数);其中10月里加粉量为2,掉份量为1,播放量为4,因此涨粉率为0.250,截止当前总粉丝数为2。
思路:
首先要计算粉丝量,播放量,
粉丝量:1对应的减去-1对应的粉丝和
播放量:count(*)/count(1)
分组:作者,月份
select author,month,round(fans_growth_number/counts,3) as fans_growth_rate,
sum(fans_growth_number) over (partition by author order by month) as total_fans
from(select author,DATE_FORMAT(start_time,'%Y-%m') as month,
sum(case when if_follow = 1 then 1
when if_follow = 2 then -1
else 0 end) as fans_growth_number,
count(*) as counts
from tb_user_video_log as t1
inner join tb_video_info as t2
on t1.video_id = t2.video_id
where year(start_time)=2021
group by author,month) t
order by author,total_fans
其他思路:
第一个需要解决的地方是每个月状态2相当于掉粉,状态1相当于涨粉,这里不能用if去做,因为有多个状态,所以用case when去处理。
第二个需要解决的就是,我们需要计算每个月截止当前的粉丝量,这里自然地想到用窗口函数去处理即可,这里窗口函数需要去partition by author如果这样处理的话,如果有多个作者就会混乱。
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT author,DATE_FORMAT(DATE(end_time),'%Y-%m') AS month, round(SUM(case when if_follow=1 then 1 when if_follow=2 then -1 else 0 end)/ COUNT(author), 3), sum(SUM(case when if_follow=1 then 1 when if_follow=2 then -1 else 0 end)) over(partition by author order by DATE_FORMAT(DATE(end_time),'%Y-%m')) total_fans FROM tb_user_video_log JOIN tb_video_info USING(video_id) WHERE YEAR(end_time)=2021 GROUP BY author,month ORDER BY author,total_fans |
题目二:
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
|
id |
uid |
exam_id |
start_time |
submit_time |
score |
|
1 |
1001 |
9001 |
2020-01-01 09:01:01 |
2020-01-01 09:21:59 |
90 |
|
2 |
1002 |
9001 |
2020-01-20 10:01:01 |
2020-01-20 10:10:01 |
89 |
|
3 |
1002 |
9001 |
2020-02-01 12:11:01 |
2020-02-01 12:31:01 |
83 |
|
4 |
1003 |
9001 |
2020-03-01 19:01:01 |
2020-03-01 19:30:01 |
75 |
|
5 |
1004 |
9001 |
2020-03-01 12:01:01 |
2020-03-01 12:11:01 |
60 |
|
6 |
1003 |
9001 |
2020-03-01 12:01:01 |
2020-03-01 12:41:01 |
90 |
|
7 |
1002 |
9001 |
2020-05-02 19:01:01 |
2020-05-02 19:32:00 |
90 |
|
8 |
1001 |
9002 |
2020-01-02 19:01:01 |
2020-01-02 19:59:01 |
69 |
|
9 |
1004 |
9002 |
2020-02-02 12:01:01 |
2020-02-02 12:20:01 |
99 |
|
10 |
1003 |
9002 |
2020-02-02 12:01:01 |
2020-02-02 12:31:01 |
68 |
|
11 |
1001 |
9002 |
2020-02-02 12:01:01 |
2020-02-02 12:43:01 |
81 |
|
12 |
1001 |
9002 |
2020-03-02 12:11:01 |
(NULL) |
(NULL) |
请输出每份试卷每月作答数和截止当月的作答总数。
由示例数据结果输出如下:
|
exam_id |
start_month |
month_cnt |
cum_exam_cnt |
|
9001 |
202001 |
2 |
2 |
|
9001 |
202002 |
1 |
3 |
|
9001 |
202003 |
3 |
6 |
|
9001 |
202005 |
1 |
7 |
|
9002 |
202001 |
1 |
1 |
|
9002 |
202002 |
3 |
4 |
|
9002 |
202003 |
1 |
5 |
解释:试卷9001在202001、202002、202003、202005共4个月有被作答记录,每个月被作答数分别为2、1、3、1,截止当月累积作答总数为2、3、6、7。
思路:
select exam_id,
start_month,
month_cnt,
sum(month_cnt) over(partition by exam_id order by start_month) as cum_exam_cnt
from (
select
exam_id,
date_format(start_time,'%Y%m') as start_month,
count(*) as month_cnt
from exam_record
group by exam_id,date_format(start_time,'%Y%m')
)t
步骤:
- 根据试卷和日期进行分组,并且进行计数
- 对计数后的进行求和,按月累计就是order by 月,每份试卷就是partition by 试卷
题目三:
描述
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
|
id |
uid |
exam_id |
start_time |
submit_time |
score |
|
1 |
1001 |
9001 |
2020-01-01 09:01:01 |
2020-01-01 09:21:59 |
90 |
|
2 |
1002 |
9001 |
2020-01-20 10:01:01 |
2020-01-20 10:10:01 |
89 |
|
3 |
1002 |
9001 |
2020-02-01 12:11:01 |
2020-02-01 12:31:01 |
83 |
|
4 |
1003 |
9001 |
2020-03-01 19:01:01 |
2020-03-01 19:30:01 |
75 |
|
5 |
1004 |
9001 |
2020-03-01 12:01:01 |
2020-03-01 12:11:01 |
60 |
|
6 |
1003 |
9001 |
2020-03-01 12:01:01 |
2020-03-01 12:41:01 |
90 |
|
7 |
1002 |
9001 |
2020-05-02 19:01:01 |
2020-05-02 19:32:00 |
90 |
|
8 |
1001 |
9002 |
2020-01-02 19:01:01 |
2020-01-02 19:59:01 |
69 |
|
9 |
1004 |
9002 |
2020-02-02 12:01:01 |
2020-02-02 12:20:01 |
99 |
|
10 |
1003 |
9002 |
2020-02-02 12:01:01 |
2020-02-02 12:31:01 |
68 |
|
11 |
1001 |
9002 |
2020-01-02 19:01:01 |
2020-02-02 12:43:01 |
81 |
|
12 |
1001 |
9002 |
2020-03-02 12:11:01 |
(NULL) |
(NULL) |
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数。结果按月份升序输出。
由示例数据结果输出如下:
|
start_month |
mau |
month_add_uv |
max_month_add_uv |
cum_sum_uv |
|
202001 |
2 |
2 |
2 |
2 |
|
202002 |
4 |
2 |
2 |
4 |
|
202003 |
3 |
0 |
2 |
4 |
|
202005 |
1 |
0 |
2 |
4 |
|
month |
1001 |
1002 |
1003 |
1004 |
|
202001 |
1 |
1 |
|
|
|
202002 |
1 |
1 |
1 |
1 |
|
202003 |
1 |
|
1 |
1 |
|
202005 |
|
1 |
|
|
由上述矩阵可以看出,2020年1月有2个用户活跃(mau=2),当月新增用户数为2;
2020年2月有4个用户活跃,当月新增用户数为2,最大单月新增用户数为2,当前累积用户数为4。
思路1:
第一步选出活跃用户:
select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month


第二步选出新增用户:
select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
where (date_format(start_time,'%Y%m'),uid) in
(
select min(date_format(start_time,'%Y%m')) as dt,uid
from exam_record
group by uid
)
group by dt


第三步:使用月份左连接
select * from
(
select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month
) a
left join
(select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
where (date_format(start_time,'%Y%m'),uid) in
(
select min(date_format(start_time,'%Y%m')) as dt,uid
from exam_record
group by uid
)
group by dt
) b
on a.month = b.dt

第四步:计算最大和累计
select a.month,a.月活用户数,ifnull(b.新增,0),
max(ifnull(b.新增,0)) over(order by a.month),
sum(ifnull(b.新增,0)) over (order by a.month)
from
(
select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month
) a
left join
(select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
where (date_format(start_time,'%Y%m'),uid) in
(
select min(date_format(start_time,'%Y%m')) as dt,uid
from exam_record
group by uid
)
group by dt
) b
on a.month = b.dt

思路二:
1.首先用窗口函数构造辅助列new_day(当月登录是否为最早登陆月):

2.计算每个月的new-day就是新增数,对新增计算max,sum分别为最大和累计
SELECT start_month ,#每个月
COUNT(DISTINCT uid) mau, #月活用户数
count(new_day) month_add_uv, #新增用户
MAX(count(new_day))OVER(ORDER BY start_month) max_month_add_uv, #截止当月的单月最大新增用户数
SUM(count(new_day))OVER(ORDER BY start_month) cum_sum_uv
FROM (
SELECT *,DATE_FORMAT(start_time,'%Y%m') start_month,
IF(start_time=MIN(start_time)OVER(PARTITION BY uid),1,null) new_day
FROM exam_record
)t1
GROUP BY start_month;

思路三:
1.对每位用户的月份进行排序,排名为1的月份就是新增月份
select
uid, date_format(start_time, '%Y%m') as start_month
, row_number() over(partition by uid order by start_time) as rn
from exam_record
注意:这里使用row_number,dense_rank,rank均可

2.对每个月的1计数得到的就是每个月的新增
select start_month,
count(distinct uid) as mau
, count(if(rn=1, uid, null)) as month_add_uv
from (
select
uid, date_format(start_time, '%Y%m') as start_month
, row_number() over(partition by uid order by start_time) as rn
from exam_record
) t
group by start_month

3.然后加入max,sum
select
start_month
, count(distinct uid) as mau
, count(if(rn=1, uid, null)) as month_add_uv
, max(count(if(rn=1, uid, null))) over(order by start_month) as max_month_add_uv
, sum(count(if(rn=1, uid, null))) over(order by start_month) as cum_sum_uv
from (
select
uid, date_format(start_time, '%Y%m') as start_month
, row_number() over(partition by uid order by start_time) as rn
from exam_record
) t
group by start_month

浙公网安备 33010602011771号