hsql-蚂蚁森林

一.原始数据

u_001,2020/1/1,10
u_001,2020/1/2,150
u_001,2020/1/2,110
u_001,2020/1/2,10
u_001,2020/1/4,50
u_001,2020/1/4,10
u_001,2020/1/6,45
u_001,2020/1/6,90
u_002,2020/1/1,10
u_002,2020/1/2,150
u_002,2020/1/2,70
u_002,2020/1/2,70
u_002,2020/1/3,30
u_002,2020/1/3,80
u_002,2020/1/4,150
u_002,2020/1/5,101
u_002,2020/1/6,68
user_low_carbon.txt
p001,梭梭树,17
p002,沙柳,19
p003,樟子树,146
p004,胡杨,215
plant_carbon.txt

二.数据导入hive 

1. 创建表user_low_carbon
create table user_low_carbon(user_id String,data_dt String,low_carbon int) row format delimited fields terminated by ","; 

2. 导入数据
local data local inpath
"/opt/module/hive/data/user_low_carbon.txt" into table user_low_carbon;
1.创建plant_carbon表
create table plant_carbon(plant_id String,plant_name string,plant_carbon int) row format delimited fields terminated by ",";
2. 导数据 load data local inpath
"/opt/module/hive/data/plant_carbon.txt" into table plant_carbon;

三.问题

1. 

假设2020年1月1日开始记录低碳数据(user_low_carbon),假设2020年6月1日之前满足申领条件的用户都申领了一棵 p004-胡杨【用掉了一棵胡杨的低碳量】,剩余的能量全部用来领取 p002-沙柳。

【用户总低碳量减去一棵胡杨所需的低碳量,剩余用来领取沙柳,再除以沙柳的低碳量】

需求:统计在6月1日累计申领 p002-沙柳排名前10的用户信息;以及当前用户比排名靠后的用户多领沙柳的棵数。

得到的统计结果如下表样式:

user_id  plant_count less_count(比后一名多领了几棵沙柳)

思路:

1. 年份过滤(where),先求每个用户(分组 group by)的总低碳量(sum聚合),并降序排名(order by desc)前11名(limit 11) 得到表 t1

2.求胡杨需要多少低碳量  -- t2

3.求沙柳需要多少低碳量 -- t3

4.联立 : from t1,t2,t3 ,选出需要的字段

user_id ---用户量

total  ---总量

num2 --- 胡杨量

num3  --- 沙柳量

floor((total-num2)/num3) plant_count

5. 开窗 上移lead() ,lag()下移

lead(plant_count,1,0)  over(order by plant_count desc)  countn

6.相减取排序取前10

代码如下:

select
      user_id,
       plant_count,
       (plant_count-countn) less_count
from
(select
       user_id,
       plant_count,
       lead(plant_count, 1, 0) over (order by plant_count desc) countn
from (select
             user_id,
             floor((total - num2) / num3) plant_count
      from (
               select
                      t1.user_id,
                      total,
                      t2.plant_carbon num2,
                      t3.plant_carbon num3
               from (
                        select
                               user_id,
                               sum(low_carbon) as total
                        from user_low_carbon
--                         where (int(year(regexp_replace(data_dt, "/", "-"))) <= 2020)
--                           and (int(year(regexp_replace(data_dt, "/", "-"))) > 2019)
--                           and int(month(regexp_replace(data_dt, "/", "-"))) < 6
                        where datediff(regexp_replace(data_dt, "/", "-"),regexp_replace('2020/6/1', "/", "-"))<0
                        group by user_id
                        order by total
                            desc -- 降序
                        limit 11) t1,
                    (select
                            plant_carbon
                     from plant_carbon
                     where plant_id = "p004") t2,
                    (select
                            plant_carbon
                     from plant_carbon
                     where plant_id = "p002") t3
           ) t4

     ) t5
limit 10)t6
order by plant_count
desc
;

 

posted @ 2021-08-31 00:53  冰底熊  阅读(3)  评论(0)    收藏  举报