# Hive 窗口分析函数

### 1.窗口函数

#### 4.LAST_VALUE(col,false) 用于统计窗口内截止到当前行，最后一个出现的值

SELECT vid    ,LAST_VALUE(a,true) OVER(partition by vid order by time ) a    ,LAST_VALUE(b,true) OVER(partition by vid order by time ) b    ,LAST_VALUE(c,true) OVER(partition by vid order by time ) c    ,LAST_VALUE(d,true) OVER(partition by vid order by time ) d

...

#### 5.over子句中的partition by和order by

over(partition by col1[,col2...] )

over(order by by col1[,col2...] )

over(partition by col1[,col2...]  order by by col1[,col2...] )

#### 6.over子句中的window子句(clause)

OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support the following formats:

(ROWS | RANGE) BETWEEN ... PRECEDING AND ... FOLLOWING
(ROWS | RANGE) BETWEEN ... PRECEDING AND ... PRECEDING
(ROWS | RANGE) BETWEEN ... FOLLOWING AND ... FOLLOWING

When ORDER BY is specified with missing WINDOW clause, the WINDOW specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

When both ORDER BY and WINDOW clauses are missing, the WINDOW specification defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

range是逻辑窗口，是指定当前行对应值的范围取值

rows是物理窗口，即根据order by 子句排序后，取的前N行及后N行的数据计算（与当前行的值无关，只与排序后的行号相关)

(UNBOUNDED | [num]) PRECEDING：从分区第一行头开始，则为 unbounded。 N为：相对当前行向前的偏移量
(UNBOUNDED | [num]) FOLLOWING：到该分区结束，则为 unbounded。N为：相对当前行向后的偏移量
CURRENT ROW：当前行，偏移量为0

### 2.分析函数

• COUNT
• SUM
• MIN
• MAX
• AVG

#### 4.ROW_NUMBER

5.CUME_DIST(累积分布)

小于等于当前值的行数 与 分组内总行数的占比

RANK

DENSE_RANK

ROW_NUMBER

2,2,5 1,2,3

CUME_DIST

PERCENT_RANK

NTILE(n)

### 4.附录

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

posted @ 2019-09-28 09:34  混沌战神阿瑞斯  阅读(374)  评论(0编辑  收藏