HIVE--udf函数案例总结
hive除了包含很多内置函数外,也提供了自定义函数功能。一般有两种方法实现:自定义内置函数 UDF(Java)和Transform关键字(Python)
案例一:使用 transform+python 的方式去转换 unixtime 为 weekday
注意: 使用TRANSFORM需要将所有的候选变量全部做处理,hive中不能接受select a, transform(b)的形式; 只能是select transform(a, b) using '***.py' as (new_a, new_b);
| movie | rate | timeinfo | userid |
| 1193 | 5 | 978300760 | 1 |
| 1195 | 8 | 978301900 | 2 |
1.1建表,load数据
create table movie_v1 (movie int , rate int , timeinfo int , userid int ) row format delimited fields terminated by '|';
load data local inpath '/home/qianwb_yzx/datas/moviee.txt' into table qianwb_yzx.movie_v1;
1.2编辑python脚本文件
1.3添加py文件
add file /home/qianwb_yzx/datas/move.py;
1.4使用udf建表
create table qianwb_yzx.movie_v2 as
select transform(*)
using "/opt/anaconda3/bin/Python move.py"
as (movie,rate,weekday,userid)
from qianwb_yzx.movie_v1;
案例二:返回number的最大概率
2.1get_career表
create table get_career (number string , prob_0 string , prob_1 string , prob_2 string ) row format delimited fields terminated by '\t';
load data local inpath '/home/qianwb_yzx/datas/get_career.txt' into table qianwb_yzx.get_career;
select * from qianwb_yzx.get_career limit5;
| number | prob_0 | prob_1 | prob_2 |
| 1 |
0.000822
|
0.099589
|
0.000848
|
| 2 |
0.109882
|
0.055241
|
0.016469
|
2.2添加py文件
add file /home/qianwb_yzx/datas/get_career.py;
2.3使用udf建表(185789条数据)
create table aa as
select transform(*)
using "/opt/anaconda3/bin/Python get_career.py"
as (number,career)
from qianwb_yzx.get_career;
浙公网安备 33010602011771号