continue 解读HIVE SQL 执行计划

背景

若干SQL执行在HIVE上,需要查看特定的执行计划,理解内部的运行机制,笔者以现有的背景做为理解案例:

案例

1 hive> explain 
2     > select count(1) from (
3     > select s_age
4     > from student_tb_txt
5     > group by s_age
6     > ) b;
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: student_tb_txt
            Statistics: Num rows: 4798993 Data size: 3359295744 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: s_age (type: bigint)
              outputColumnNames: s_age
              Statistics: Num rows: 4798993 Data size: 3359295744 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                keys: s_age (type: bigint)
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 4798993 Data size: 3359295744 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: bigint)
                  sort order: +
                  Map-reduce partition columns: _col0 (type: bigint)
                  Statistics: Num rows: 4798993 Data size: 3359295744 Basic stats: COMPLETE Column stats: NONE
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 2399496 Data size: 1679647521 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            Statistics: Num rows: 2399496 Data size: 1679647521 Basic stats: COMPLETE Column stats: NONE
            Group By Operator
              aggregations: count()
              mode: hash
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
                compressed: false
                table:
                    input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              sort order: 
              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col0 (type: bigint)
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.364 seconds, Fetched: 76 row(s)

案例分析

 

结论

 

posted @ 2020-12-14 22:07  lenomail  阅读(126)  评论(0)    收藏  举报