hive 剔除极值后取均值问题

需求描述

有一张记录用户购物信息表,第一个字段 shop_id 订单ID,第二个字段 commodity_id 物品ID,第三个字段 sale 物品金额,

110    1    10
订单ID110 物品ID1 金额10
现在要求出每个订单中剔除物品金额最大最小后的平均值。

原始数据

shop_id
commodity_id
sale
110
1
10
110
2
20
110
3
30
110
4
50
110
5
60
110
6
20
110
7
80
111
1
90
111
2
80
111
3
50
111
4
70
111
5
20
111
6
10

解题思路

根据需求描述可以得知 需要按订单分组,并且每个订单中的最大值和最小值要剔除然后再取平均值

1、先找出订单中的极值
select *,
       dense_rank() over (partition by shop_id order by sale) rn,
       dense_rank() over (partition by shop_id order by sale desc) rn_
from temp_shop_info;
shop_id
commodity_id
sale
rn
rn_
110
7
80
6
1
110
5
60
5
2
110
4
50
4
3
110
3
30
3
4
110
2
20
2
5
110
6
20
2
5
110
1
10
1
6
111
1
90
6
1
111
2
80
5
2
111
4
70
4
3
111
3
50
3
4
111
5
20
2
5
111
6
10
1
6

2、剔除极值后的结果如下

select *
from (select *,
             dense_rank() over (partition by shop_id order by sale)      rn,
             dense_rank() over (partition by shop_id order by sale desc) rn_
      from temp_shop_info) u
where rn > 1
  and rn_ > 1;
shop_id
commodity_id
sale
rn
rn_
110
5
60
5
2
110
4
50
4
3
110
3
30
3
4
110
2
20
2
5
110
6
20
2
5
111
2
80
5
2
111
4
70
4
3
111
3
50
3
4
111
5
20
2
5

3、数据范围确定后,根据要求取出均值

select shop_id, cast(avg(sale) as decimal(10, 2)) avg_sale
from (select *,
             dense_rank() over (partition by shop_id order by sale)      rn,
             dense_rank() over (partition by shop_id order by sale desc) rn_
      from temp_shop_info) u
where rn > 1
  and rn_ > 1
group by shop_id;
shop_id
avg_sale
111
55
110
36

posted @ 2022-03-08 16:26  晓枫的春天  阅读(226)  评论(0编辑  收藏  举报