Hive调用python脚本
python脚本如下:
#!/bin/env python
# -*- coding=utf-8 -*-
import sys
import datetime
d_user = {
"user1": "true",
"user2": "true"
}
for line in sys.stdin:
line = line.strip()
userid = line.split('\t')[0]
if d_user.get(userid, "false") == "true":
print "\t".join([userid, "1"])
hive执行添加文件命令如下
$ hive hive> add file /home/user/test.py
hql命令如下
select userid, sum(1) from( select TRANSFORM (user_pin) USING '/home/user/test.py' AS userid, cnt from hive_table where dt = "2021-03-01" )a group by userid
浙公网安备 33010602011771号