HIVE--udf函数案例总结

hive除了包含很多内置函数外,也提供了自定义函数功能。一般有两种方法实现:自定义内置函数 UDF(Java)和Transform关键字(Python)

 

案例一:使用 transform+python 的方式去转换 unixtime 为 weekday

注意: 使用TRANSFORM需要将所有的候选变量全部做处理,hive中不能接受select a, transform(b)的形式; 只能是select transform(a, b) using '***.py' as (new_a, new_b);

movie rate timeinfo userid
1193 5 978300760 1
1195 8 978301900 2

1.1建表,load数据

create table movie_v1 (movie int , rate int , timeinfo int , userid int ) row format delimited fields terminated by '|';

load data local inpath '/home/qianwb_yzx/datas/moviee.txt' into table qianwb_yzx.movie_v1;

 

1.2编辑python脚本文件

# -*- coding: utf-8 -*-
#!/usr/bin/env python3
import  sys
import  datetime
for line in sys.stdin:
  #获取每一行的内容并去掉头尾的空格
  line = line.strip()
  #根据换行符截取对应的字段
  movie,rate,timeinfo,userid = line.split('\t')
  #将timeinfo转成时间格式
  weekday = datetime.datetime.fromtimestamp(float(timeinfo)).isoweekday()
  #重新拼接新的内容加入换行符
  print ('\t'.join([str(movie),str(rate),str(weekday),str(userid)]))
 

1.3添加py文件

add file /home/qianwb_yzx/datas/move.py;

 

1.4使用udf建表

create table qianwb_yzx.movie_v2 as 

select transform(*)

using "/opt/anaconda3/bin/Python move.py"

as (movie,rate,weekday,userid)

from qianwb_yzx.movie_v1;

 

案例二:返回number的最大概率

2.1get_career表

create table get_career (number string , prob_0 string , prob_1 string , prob_2 string ) row format delimited fields terminated by '\t';

load data local inpath '/home/qianwb_yzx/datas/get_career.txt' into table qianwb_yzx.get_career;

select * from qianwb_yzx.get_career limit5;

number prob_0 prob_1 prob_2
1
0.000822
0.099589
0.000848
2
0.109882
0.055241
0.016469

2.2添加py文件

# -*- coding: utf-8 -*-
#!/usr/bin/env python3
import os
import sys
import numpy as np
 
for line in sys.stdin:
ls = line.strip().split('\t')
number = ls[0].strip()
score_list = [float(x) for x in ls[1:]]
if max(score_list)>0.01:
career=np.argmax(score_list)
print("\t".join([str(number),str(career)]))

 

add file /home/qianwb_yzx/datas/get_career.py;

2.3使用udf建表(185789条数据)

create table aa as

select transform(*) 

using "/opt/anaconda3/bin/Python get_career.py"

as (number,career)

from qianwb_yzx.get_career;

 

posted on 2023-03-21 11:48  比较是快乐的小偷·  阅读(196)  评论(0)    收藏  举报