蚂蚁森林题目解答
题目:
以下表记录了用户每天的蚂蚁森林低碳生活领取的记录流水。
table_name:user_low_carbon
user_id data_dt low_carbon
用户 日期 减少碳排放(g)
蚂蚁森林植物换购表,用于记录申领环保植物所需要减少的碳排放量
table_name: plant_carbon
plant_id plant_name low_carbon
植物编号 植物名 换购植物所需要的碳
数据放在文末
题目如下
1.蚂蚁森林植物申领统计
问题:假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,
剩余的能量全部用来领取“p002-沙柳” 。
统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。
得到的统计结果如下表样式:
user_id plant_count less_count(比后一名多领了几颗沙柳)
u_101 1000 100
u_088 900 400
u_103 500 …
SQL语句
SELECT user_id, plant_count, (plant_count -LEAD(plant_count,1,0) over(ORDER BY plant_count DESC)) less_count FROM ( SELECT --筛选出11个为后面窗口减少计算量 user_id, plant_count FROM ( SELECT --获取能种多沙柳 user_id, FLOOR((SUM(low_carbon) - (SELECT low_carbon FROM plant_carbon WHERE plant_name = '胡杨')) / (SELECT low_carbon FROM plant_carbon WHERE plant_name = '沙柳')) AS plant_count FROM ( SELECT--对时间进行格式化并且筛选 user_id, date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt , low_carbon FROM user_low_carbon WHERE MONTH(date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd'))<10 )T1 GROUP BY user_id )T2 WHERE T2.plant_count>0--对前面的可能出现的当前不够用进行排除 ORDER BY plant_count DESC LIMIT 11 )T3 LIMIT 10;
2、蚂蚁森林低碳用户排名分析
问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g:
seq(key) user_id data_dt low_carbon
xxxxx10 u_002 2017/1/2 150
xxxxx11 u_002 2017/1/2 70
xxxxx12 u_002 2017/1/3 30
xxxxx13 u_002 2017/1/3 80
xxxxx14 u_002 2017/1/4 150
xxxxx14 u_002 2017/1/5 101
with user_low_carbon_tmp as( SELECT --获取每天大于100的日期 user_id, date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt, SUM(low_carbon) SUM_low_carbon FROM user_low_carbon WHERE YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017 GROUP BY user_id,data_dt HAVING SUM_low_carbon=100 ), T4 AS ( SELECT T1.user_id,T1.data_dt DU1,--获取连续三天大于100的 T2.data_dt DU2, T3.data_dt DU3 FROM user_low_carbon_tmp T1 LEFT JOIN user_low_carbon_tmp T2 ON (datediff(T1.data_dt,T2.data_dt) =-1) AND T1.user_id = T2.user_id LEFT JOIN user_low_carbon_tmp T3 ON (datediff(T1.data_dt,T2.data_dt) =-2) AND T1.user_id = T3.user_id order by T1.user_id ), T5 as (--将天数合并再一个字段中,并且排除掉不符合条件的天 SELECT user_id,CONCAT(CAST(du1 as string),',',CAST(du2 as string),',',CAST(du3 as string)) AS DAYS FROM T4 WHERE DU2 IS NOT NULL AND DU3 IS NOT NULL ), T6 AS ( SELECT user_id,DS--将行转列 FROM T5 lateral VIEW explode(split(DAYS,",")) tmp AS DS GROUP BY user_id,DS ) SELECT T7.*--展示符合条件的天数 FROM user_low_carbon T7 RIGHT JOIN T6 ON T7.user_id = T6.user_id AND date_format(regexp_replace(T7.data_dt,'/','-'),'yyyy-MM-dd') = T6.DS ORDER BY T7.USER_ID,DATA_DT;
对于第二题来说,比较容易想到用窗口函数解决,但是窗口函数的效率低下,这里提供了一种不适用窗口函数的方案,可以通过自连接的方法实现对相邻天数的操作,对于这里的第二题,还有改进的余地,但是在操作的过程中遇到了一些问题,最后一步的join,需要对同一行的不同栏位进行比对,但是采用了几种join方式,都没有成功希望大家看到了能一起解决
更新:已经解决了,主要是对日期的处理出现了不同
with user_low_carbon_tmp as( SELECT --获取每天大于100的日期 user_id, date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt, SUM(low_carbon) SUM_low_carbon FROM user_low_carbon WHERE YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017 GROUP BY user_id,data_dt HAVING SUM_low_carbon>100 ), T4 AS ( SELECT T1.user_id,T1.data_dt DU1, T2.data_dt DU2, T3.data_dt DU3 FROM user_low_carbon_tmp T1 LEFT JOIN user_low_carbon_tmp T2 ON (datediff(T1.data_dt,T2.data_dt) =-1) AND T1.user_id = T2.user_id LEFT JOIN user_low_carbon_tmp T3 ON (datediff(T1.data_dt,T2.data_dt) =-2) AND T1.user_id = T3.user_id ), T5 AS ( SELECT T6.user_id,T6.data_dt FROM T4 LEFT JOIN user_low_carbon T6 ON T4.user_id = T6.user_id AND (date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU1 --T4的日期已经变化了,需要按照变化后的日期格式进行比对 OR date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU2 OR date_format(regexp_replace(T6.data_dt,'/','-'),'yyyy-MM-dd') = T4.DU3) WHERE T4.DU2 IS NOT NULL AND T4.DU3 IS NOT NULL GROUP BY T6.user_id,T6.data_dt ) SELECT T6.user_id,T6.data_dt, T6.low_carbon FROM T5 LEFT JOIN user_low_carbon T6 ON T5.user_id = T6.user_id AND T5.data_dt = T6.data_dt ORDER BY T6.user_id ,T6.data_dt;
错误的版本:
with user_low_carbon_tmp as( SELECT --获取每天大于100的日期 user_id, date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt, SUM(low_carbon) SUM_low_carbon FROM user_low_carbon WHERE YEAR (date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd')) =2017 GROUP BY user_id,data_dt HAVING SUM_low_carbon>100 ), T4 AS ( SELECT T1.user_id,T1.data_dt DU1, T2.data_dt DU2, T3.data_dt DU3 FROM user_low_carbon_tmp T1 LEFT JOIN user_low_carbon_tmp T2 ON (DAY (T1.data_dt) - DAY (T2.data_dt) =-1) AND T1.user_id = T2.user_id LEFT JOIN user_low_carbon_tmp T3 ON (DAY (T1.data_dt) - DAY (T3.data_dt) =-2) AND T1.user_id = T3.user_id order by T1.user_id ), T5 AS ( SELECT user_id, DU1, DU2, DU3 FROM T4 WHERE DU2 IS NOT NULL AND DU3 IS NOT NULL ) SELECT T6.*--在这里的join试了好几种方案,都没能成功,要不然是空集,要不然是null,希望能有大佬对其修正 FROM user_low_carbon T6 RIGHT JOIN T5 ON T6.user_id = T5.user_id AND (regexp_replace(T6.data_dt,'/','-') = T5.DU1 OR regexp_replace(T6.data_dt,'/','-') = T5.DU2 OR regexp_replace(T6.data_dt,'/','-') = T5.DU3) ORDER BY T6.user_id ,T6.data_dt ;
数据如下:
user_low_carbon
u_001 2017/1/1 10
u_001 2017/1/2 150
u_001 2017/1/2 110
u_001 2017/1/2 10
u_001 2017/1/4 50
u_001 2017/1/4 10
u_001 2017/1/6 45
u_001 2017/1/6 90
u_002 2017/1/1 10
u_002 2017/1/2 150
u_002 2017/1/2 70
u_002 2017/1/3 30
u_002 2017/1/3 80
u_002 2017/1/4 150
u_002 2017/1/5 101
u_002 2017/1/6 68
u_003 2017/1/1 20
u_003 2017/1/2 10
u_003 2017/1/2 150
u_003 2017/1/3 160
u_003 2017/1/4 20
u_003 2017/1/5 120
u_003 2017/1/6 20
u_003 2017/1/7 10
u_003 2017/1/7 110
u_004 2017/1/1 110
u_004 2017/1/2 20
u_004 2017/1/2 50
u_004 2017/1/3 120
u_004 2017/1/4 30
u_004 2017/1/5 60
u_004 2017/1/6 120
u_004 2017/1/7 10
u_004 2017/1/7 120
u_005 2017/1/1 80
u_005 2017/1/2 50
u_005 2017/1/2 80
u_005 2017/1/3 180
u_005 2017/1/4 180
u_005 2017/1/4 10
u_005 2017/1/5 80
u_005 2017/1/6 280
u_005 2017/1/7 80
u_005 2017/1/7 80
u_006 2017/1/1 40
u_006 2017/1/2 40
u_006 2017/1/2 140
u_006 2017/1/3 210
u_006 2017/1/3 10
u_006 2017/1/4 40
u_006 2017/1/5 40
u_006 2017/1/6 20
u_006 2017/1/7 50
u_006 2017/1/7 240
u_007 2017/1/1 130
u_007 2017/1/2 30
u_007 2017/1/2 330
u_007 2017/1/3 30
u_007 2017/1/4 530
u_007 2017/1/5 30
u_007 2017/1/6 230
u_007 2017/1/7 130
u_007 2017/1/7 30
u_008 2017/1/1 160
u_008 2017/1/2 60
u_008 2017/1/2 60
u_008 2017/1/3 60
u_008 2017/1/4 260
u_008 2017/1/5 360
u_008 2017/1/6 160
u_008 2017/1/7 60
u_008 2017/1/7 60
u_009 2017/1/1 70
u_009 2017/1/2 70
u_009 2017/1/2 70
u_009 2017/1/3 170
u_009 2017/1/4 270
u_009 2017/1/5 70
u_009 2017/1/6 70
u_009 2017/1/7 70
u_009 2017/1/7 70
u_010 2017/1/1 90
u_010 2017/1/2 90
u_010 2017/1/2 90
u_010 2017/1/3 90
u_010 2017/1/4 90
u_010 2017/1/4 80
u_010 2017/1/5 90
u_010 2017/1/5 90
u_010 2017/1/6 190
u_010 2017/1/7 90
u_010 2017/1/7 90
u_011 2017/1/1 110
u_011 2017/1/2 100
u_011 2017/1/2 100
u_011 2017/1/3 120
u_011 2017/1/4 100
u_011 2017/1/5 100
u_011 2017/1/6 100
u_011 2017/1/7 130
u_011 2017/1/7 100
u_012 2017/1/1 10
u_012 2017/1/2 120
u_012 2017/1/2 10
u_012 2017/1/3 10
u_012 2017/1/4 50
u_012 2017/1/5 10
u_012 2017/1/6 20
u_012 2017/1/7 10
u_012 2017/1/7 10
u_013 2017/1/1 50
u_013 2017/1/2 150
u_013 2017/1/2 50
u_013 2017/1/3 150
u_013 2017/1/4 550
u_013 2017/1/5 350
u_013 2017/1/6 50
u_013 2017/1/7 20
u_013 2017/1/7 60
u_014 2017/1/1 220
u_014 2017/1/2 120
u_014 2017/1/2 20
u_014 2017/1/3 20
u_014 2017/1/4 20
u_014 2017/1/5 250
u_014 2017/1/6 120
u_014 2017/1/7 270
u_014 2017/1/7 20
u_015 2017/1/1 10
u_015 2017/1/2 20
u_015 2017/1/2 10
u_015 2017/1/3 10
u_015 2017/1/4 20
u_015 2017/1/5 70
u_015 2017/1/6 10
u_015 2017/1/7 80
u_015 2017/1/7 60
plant_carbon
p001 梭梭树 17
p002 沙柳 19
p003 樟子树 146
p004 胡杨 215

浙公网安备 33010602011771号