连续登录问题解决

连续登陆问题

在电商、物流和银行可能经常会遇到这样的需求:统计用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数等

数据:

注意:每个用户每天可能会有多条记录

id  datestr   amount
1,2019-02-08,6214.23 
1,2019-02-08,6247.32 
1,2019-02-09,85.63 
1,2019-02-09,967.36 
1,2019-02-10,85.69 
1,2019-02-12,769.85 
1,2019-02-13,943.86 
1,2019-02-14,538.42
1,2019-02-15,369.76
1,2019-02-16,369.76
1,2019-02-18,795.15
1,2019-02-19,715.65
1,2019-02-21,537.71
2,2019-02-08,6214.23 
2,2019-02-08,6247.32 
2,2019-02-09,85.63 
2,2019-02-09,967.36 
2,2019-02-10,85.69 
2,2019-02-12,769.85 
2,2019-02-13,943.86 
2,2019-02-14,943.18
2,2019-02-15,369.76
2,2019-02-18,795.15
2,2019-02-19,715.65
2,2019-02-21,537.71
3,2019-02-08,6214.23 
3,2019-02-08,6247.32 
3,2019-02-09,85.63 
3,2019-02-09,967.36 
3,2019-02-10,85.69 
3,2019-02-12,769.85 
3,2019-02-13,943.86 
3,2019-02-14,276.81
3,2019-02-15,369.76
3,2019-02-16,369.76
3,2019-02-18,795.15
3,2019-02-19,715.65
3,2019-02-21,537.71
建表语句
create table deal_tb(
    id string
    ,datestr string
    ,amount string
)row format delimited fields terminated by ',';
计算逻辑
  • 先按用户和日期分组求和,使每个用户每天只有一条数据

select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr;

  • 根据用户ID分组按日期排序,将日期和分组序号相减得到连续登陆的开始日期,如果开始日期相同说明连续登陆

select tt1.id,tt1.datestr,tt1.sum_amount,date_sub(tt1.datestr,tt1.rn) as grp from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by t1.id order by t1.datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1) tt1;

 

 

 

 

 

  • datediff(string end_date,string start_date); 等于0说明连续登录

  • 统计用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数

select ttt1.id,ttt1.grp,round(sum(ttt1.sum_amount),2) as user_sum_amount,count(1) as user_days,min(ttt1.datestr) as user_start_date,max(ttt1.datestr) as user_end_date,datediff(ttt1.grp,lag(ttt1.grp,1) over(partition by ttt1.id order by ttt1.grp)) as interval_days from (select tt1.id as id,tt1.datestr as datestr,tt1.sum_amount as sum_amount,date_sub(tt1.datestr,tt1.rn) as grp from (select t1.id as id,t1.datestr as datestr,t1.sum_amount as sum_amount,row_number() over(partition by t1.id order by t1.datestr) as rn from (select id,datestr,sum(amount) as sum_amount from deal_tb group by id,datestr) t1) tt1) ttt1 group by ttt1.id,ttt1.grp;
SELECT ttt1.id, ttt1.grp
    , round(sum(ttt1.sum_amount), 2) AS user_sum_amount
    , count(1) AS user_days, min(ttt1.datestr) AS user_start_date
    , max(ttt1.datestr) AS user_end_date
    , datediff(ttt1.grp, lag(ttt1.grp, 1) OVER (PARTITION BY ttt1.id ORDER BY ttt1.grp)) AS interval_days
FROM (
    SELECT tt1.id AS id, tt1.datestr AS datestr, tt1.sum_amount AS sum_amount
        , date_sub(tt1.datestr, tt1.rn) AS grp
    FROM (
        SELECT t1.id AS id, t1.datestr AS datestr, t1.sum_amount AS sum_amount, row_number() OVER (PARTITION BY t1.id ORDER BY t1.datestr) AS rn
        FROM (
            SELECT id, datestr, sum(amount) AS sum_amount
            FROM deal_tb
            GROUP BY id, datestr
        ) t1
    ) tt1
) ttt1
GROUP BY ttt1.id, ttt1.grp;

 

  • 结果

1   2019-02-07  13600.23    3   2019-02-08  2019-02-10 NULL
1   2019-02-08  2991.650    5   2019-02-12  2019-02-16  1
1   2019-02-09  1510.8      2   2019-02-18  2019-02-19  1
1   2019-02-10  537.71      1   2019-02-21  2019-02-21  1
2   2019-02-07  13600.23    3   2019-02-08  2019-02-10 NULL
2   2019-02-08  3026.649    4   2019-02-12  2019-02-15  1
2   2019-02-10  1510.8      2   2019-02-18  2019-02-19  2
2   2019-02-11  537.71      1   2019-02-21  2019-02-21  1
3   2019-02-07  13600.23    3   2019-02-08  2019-02-10 NULL
3   2019-02-08  2730.04     5   2019-02-12  2019-02-16  1
3   2019-02-09  1510.8      2   2019-02-18  2019-02-19  1
3   2019-02-10  537.71      1   2019-02-21  2019-02-21  1
 
posted @ 2022-06-23 20:44  呆滞的猪蹄  阅读(113)  评论(0)    收藏  举报