hive优化

Hive优化:MR阶段优化– 调整task数目

Hive优化:MR阶段优化– Reduce阶段

mapreduce.job.reduces直接设置
num_reduce_tasks⼤⼩影响参数
• hive.exec.reducers.max 默认:1099
• hive.exec.reducers.bytes.per.reducer默认:1G
切割算法
• numRTasks = min[maxReducers,input.size/perReducer]
• maxReducers = ${hive.exec.reducers.max}
• perReducer= {hive.exec.reducers.bytes.per.reducer}

Hive优化:整体优化– 压缩

原始⽇志BZ2压缩

MR中间输出LZO压缩

中间表SEQUENCEFILE、ORCFile

Hive优化:SQL作业优化– SQL并行执行

hive.exec.parallel=true (default false)

hive.exec.parallel.thread.number =8 (default 8)

Hive优化:整体优化– 表分区

查询维度、业务需求,⽇期分区,类型分区

Hive优化:数据倾斜– count distinct

Select count(distinct id) from acorn_3g.iplog where log_date like ‘2013-12%’;

耗时:1600S

Select count(1) from (select distinct id from acorn_3g.iplog where log_date like ‘2013-12%’ and id>0) tmp;

耗时:260s

posted @ 2017-06-15 16:28  Super_Orco  阅读(238)  评论(0编辑  收藏  举报