1 select
2 distinct month, user_id
3 from (
4 select
5 user_id,
6 date_format(login_time, "yyyy-MM") month,
7 row_number over(partition by user_id, new_login_time order by new_login_time asc) as rn_1
8 from (
9 select
10 user_id,
11 login_time,
12 date_sub(login_time, rn) new_login_time
13 from (
14 select
15 user_id,
16 login_time,
17 row_number over(partition by user_id order by login_time asc) as rn
18 from (
19 select
20 distinct user_id,
21 date_format(login_time, "yyyy-MM-dd") login_time
22 from t
23 ) t1
24 ) t2
25 ) t3
26 ) t4
27 where rn_1 >= 5
实现思路:
1.由于一天之内同一个用户会登录多次,我们将login_time格式化成时分秒(yyyy-MM-dd)的格式,并对整个数据进行去重。
2.使用窗口函数对步骤1输出的数据进行组内排序:
//rn为行号标记
row_number over(partition by user_id order by login_time asc) as rn
3.将login_time列减去rn列。
4.对步骤3输出的数据进行组内排序。
//rn_1为行号标记
row_number over(partition by user_id, new_login_time order by new_login_time asc) as rn_