蚂蚁森林题目解答

题目:

以下表记录了用户每天的蚂蚁森林低碳生活领取的记录流水。
table_name:user_low_carbon

user_id data_dt low_carbon
用户 日期 减少碳排放(g)

蚂蚁森林植物换购表,用于记录申领环保植物所需要减少的碳排放量
table_name: plant_carbon

plant_id plant_name low_carbon
植物编号 植物名 换购植物所需要的碳

数据放在文末

题目如下

1.蚂蚁森林植物申领统计
问题:假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,
剩余的能量全部用来领取“p002-沙柳” 。
统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。
得到的统计结果如下表样式:
user_id plant_count less_count(比后一名多领了几颗沙柳)
u_101 1000 100
u_088 900 400
u_103 500 …

SQL语句

SELECT
user_id,
plant_count,
(plant_count -LEAD(plant_count,1,0) over(ORDER BY plant_count DESC))  less_count
FROM (
    SELECT --筛选出11个为后面窗口减少计算量
    user_id,
    plant_count
    FROM ( 
        SELECT --获取能种多沙柳
        user_id,
        FLOOR((SUM(low_carbon) - (SELECT low_carbon FROM plant_carbon WHERE plant_name = '胡杨'))
        / (SELECT low_carbon FROM plant_carbon WHERE plant_name = '沙柳')) AS plant_count
        FROM (
            SELECT--对时间进行格式化并且筛选
            user_id,
            date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt ,
            low_carbon
            FROM user_low_carbon
            WHERE MONTH(date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd'))<10
        )T1 
        GROUP  BY user_id
    )T2
    WHERE  T2.plant_count>0--对前面的可能出现的当前不够用进行排除
    ORDER BY plant_count DESC
    LIMIT 11
)T3
LIMIT 10;

 

2、蚂蚁森林低碳用户排名分析
问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g:
seq(key) user_id data_dt low_carbon
xxxxx10 u_002 2017/1/2 150
xxxxx11 u_002 2017/1/2 70
xxxxx12 u_002 2017/1/3 30
xxxxx13 u_002 2017/1/3 80
xxxxx14 u_002 2017/1/4 150
xxxxx14 u_002 2017/1/5 101

 

with user_low_carbon_tmp as(
SELECT --获取每天大于100的日期
user_id,
date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt,
SUM(low_carbon) SUM_low_carbon
FROM user_low_carbon
WHERE  YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017
GROUP BY user_id,data_dt
HAVING SUM_low_carbon=100
),
T4 AS (
SELECT  T1.user_id,T1.data_dt DU1,--获取连续三天大于100的
T2.data_dt DU2,
T3.data_dt DU3
FROM user_low_carbon_tmp T1 
LEFT JOIN user_low_carbon_tmp T2 ON  (datediff(T1.data_dt,T2.data_dt) =-1) AND T1.user_id = T2.user_id
LEFT JOIN user_low_carbon_tmp T3 ON  (datediff(T1.data_dt,T2.data_dt) =-2) AND T1.user_id = T3.user_id
order by T1.user_id
),
T5 as (--将天数合并再一个字段中,并且排除掉不符合条件的天
SELECT user_id,CONCAT(CAST(du1 as string),',',CAST(du2 as string),',',CAST(du3 as string)) AS DAYS
FROM T4
WHERE DU2 IS NOT NULL
AND DU3 IS NOT NULL
),
T6 AS (
 SELECT  user_id,DS--将行转列
 FROM T5
 lateral VIEW
 explode(split(DAYS,",")) tmp  AS DS
 GROUP BY user_id,DS
)
SELECT T7.*--展示符合条件的天数
FROM user_low_carbon T7 RIGHT JOIN  T6 ON  T7.user_id = T6.user_id AND date_format(regexp_replace(T7.data_dt,'/','-'),'yyyy-MM-dd') = T6.DS
ORDER BY T7.USER_ID,DATA_DT;

对于第二题来说,比较容易想到用窗口函数解决,但是窗口函数的效率低下,这里提供了一种不适用窗口函数的方案,可以通过自连接的方法实现对相邻天数的操作,对于这里的第二题,还有改进的余地,但是在操作的过程中遇到了一些问题,最后一步的join,需要对同一行的不同栏位进行比对,但是采用了几种join方式,都没有成功希望大家看到了能一起解决

 更新:已经解决了,主要是对日期的处理出现了不同

with user_low_carbon_tmp as(
SELECT --获取每天大于100的日期
user_id,
date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt,
SUM(low_carbon) SUM_low_carbon
FROM user_low_carbon
WHERE  YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017
GROUP BY user_id,data_dt
HAVING SUM_low_carbon>100
),
T4 AS (
SELECT  T1.user_id,T1.data_dt DU1,
T2.data_dt DU2,
T3.data_dt DU3
FROM user_low_carbon_tmp T1 
LEFT JOIN user_low_carbon_tmp T2 ON  (datediff(T1.data_dt,T2.data_dt) =-1) AND T1.user_id = T2.user_id
LEFT JOIN user_low_carbon_tmp T3 ON  (datediff(T1.data_dt,T2.data_dt) =-2) AND T1.user_id = T3.user_id
),
T5 AS (
SELECT T6.user_id,T6.data_dt 
FROM T4
LEFT JOIN  user_low_carbon T6  ON T4.user_id = T6.user_id 
AND (date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU1 --T4的日期已经变化了,需要按照变化后的日期格式进行比对
    OR date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU2 
    OR date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU3)
WHERE T4.DU2 IS NOT NULL
AND T4.DU3 IS NOT NULL
GROUP BY T6.user_id,T6.data_dt
)
SELECT  T6.user_id,T6.data_dt, T6.low_carbon
FROM T5
LEFT JOIN  user_low_carbon T6  ON T5.user_id = T6.user_id 
AND T5.data_dt = T6.data_dt
ORDER BY T6.user_id ,T6.data_dt;

 

错误的版本:

with user_low_carbon_tmp as(
SELECT --获取每天大于100的日期
user_id,
date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt,
SUM(low_carbon) SUM_low_carbon
FROM user_low_carbon
WHERE  YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017
GROUP BY user_id,data_dt
HAVING SUM_low_carbon>100
),
T4 AS (
SELECT  T1.user_id,T1.data_dt DU1,
T2.data_dt DU2,
T3.data_dt DU3
FROM user_low_carbon_tmp T1 
LEFT JOIN user_low_carbon_tmp T2 ON  (DAY (T1.data_dt) - DAY (T2.data_dt) =-1) AND T1.user_id = T2.user_id
LEFT JOIN user_low_carbon_tmp T3 ON  (DAY (T1.data_dt) - DAY (T3.data_dt) =-2) AND T1.user_id = T3.user_id
order by T1.user_id
),
T5 AS (
SELECT user_id,
DU1,
DU2,
DU3
FROM T4
WHERE DU2 IS NOT NULL
AND DU3 IS NOT NULL
)
SELECT T6.*--在这里的join试了好几种方案,都没能成功,要不然是空集,要不然是null,希望能有大佬对其修正
FROM user_low_carbon T6  
RIGHT JOIN  T5 ON T6.user_id = T5.user_id 
AND (regexp_replace(T6.data_dt,'/','-') = T5.DU1 OR regexp_replace(T6.data_dt,'/','-') = T5.DU2 OR regexp_replace(T6.data_dt,'/','-') = T5.DU3)
ORDER BY T6.user_id ,T6.data_dt ;

 

数据如下:

user_low_carbon

u_001    2017/1/1    10
u_001    2017/1/2    150
u_001    2017/1/2    110
u_001    2017/1/2    10
u_001    2017/1/4    50
u_001    2017/1/4    10
u_001    2017/1/6    45
u_001    2017/1/6    90
u_002    2017/1/1    10
u_002    2017/1/2    150
u_002    2017/1/2    70
u_002    2017/1/3    30
u_002    2017/1/3    80
u_002    2017/1/4    150
u_002    2017/1/5    101
u_002    2017/1/6    68
u_003    2017/1/1    20
u_003    2017/1/2    10
u_003    2017/1/2    150
u_003    2017/1/3    160
u_003    2017/1/4    20
u_003    2017/1/5    120
u_003    2017/1/6    20
u_003    2017/1/7    10
u_003    2017/1/7    110
u_004    2017/1/1    110
u_004    2017/1/2    20
u_004    2017/1/2    50
u_004    2017/1/3    120
u_004    2017/1/4    30
u_004    2017/1/5    60
u_004    2017/1/6    120
u_004    2017/1/7    10
u_004    2017/1/7    120
u_005    2017/1/1    80
u_005    2017/1/2    50
u_005    2017/1/2    80
u_005    2017/1/3    180
u_005    2017/1/4    180
u_005    2017/1/4    10
u_005    2017/1/5    80
u_005    2017/1/6    280
u_005    2017/1/7    80
u_005    2017/1/7    80
u_006    2017/1/1    40
u_006    2017/1/2    40
u_006    2017/1/2    140
u_006    2017/1/3    210
u_006    2017/1/3    10
u_006    2017/1/4    40
u_006    2017/1/5    40
u_006    2017/1/6    20
u_006    2017/1/7    50
u_006    2017/1/7    240
u_007    2017/1/1    130
u_007    2017/1/2    30
u_007    2017/1/2    330
u_007    2017/1/3    30
u_007    2017/1/4    530
u_007    2017/1/5    30
u_007    2017/1/6    230
u_007    2017/1/7    130
u_007    2017/1/7    30
u_008    2017/1/1    160
u_008    2017/1/2    60
u_008    2017/1/2    60
u_008    2017/1/3    60
u_008    2017/1/4    260
u_008    2017/1/5    360
u_008    2017/1/6    160
u_008    2017/1/7    60
u_008    2017/1/7    60
u_009    2017/1/1    70
u_009    2017/1/2    70
u_009    2017/1/2    70
u_009    2017/1/3    170
u_009    2017/1/4    270
u_009    2017/1/5    70
u_009    2017/1/6    70
u_009    2017/1/7    70
u_009    2017/1/7    70
u_010    2017/1/1    90
u_010    2017/1/2    90
u_010    2017/1/2    90
u_010    2017/1/3    90
u_010    2017/1/4    90
u_010    2017/1/4    80
u_010    2017/1/5    90
u_010    2017/1/5    90
u_010    2017/1/6    190
u_010    2017/1/7    90
u_010    2017/1/7    90
u_011    2017/1/1    110
u_011    2017/1/2    100
u_011    2017/1/2    100
u_011    2017/1/3    120
u_011    2017/1/4    100
u_011    2017/1/5    100
u_011    2017/1/6    100
u_011    2017/1/7    130
u_011    2017/1/7    100
u_012    2017/1/1    10
u_012    2017/1/2    120
u_012    2017/1/2    10
u_012    2017/1/3    10
u_012    2017/1/4    50
u_012    2017/1/5    10
u_012    2017/1/6    20
u_012    2017/1/7    10
u_012    2017/1/7    10
u_013    2017/1/1    50
u_013    2017/1/2    150
u_013    2017/1/2    50
u_013    2017/1/3    150
u_013    2017/1/4    550
u_013    2017/1/5    350
u_013    2017/1/6    50
u_013    2017/1/7    20
u_013    2017/1/7    60
u_014    2017/1/1    220
u_014    2017/1/2    120
u_014    2017/1/2    20
u_014    2017/1/3    20
u_014    2017/1/4    20
u_014    2017/1/5    250
u_014    2017/1/6    120
u_014    2017/1/7    270
u_014    2017/1/7    20
u_015    2017/1/1    10
u_015    2017/1/2    20
u_015    2017/1/2    10
u_015    2017/1/3    10
u_015    2017/1/4    20
u_015    2017/1/5    70
u_015    2017/1/6    10
u_015    2017/1/7    80
u_015    2017/1/7    60

plant_carbon

p001    梭梭树    17
p002    沙柳    19
p003    樟子树    146
p004    胡杨    215

 

 

posted @ 2020-08-04 21:29  Velome  阅读(865)  评论(0)    收藏  举报