Hive调用python脚本

 

 

 

python脚本如下:

#!/bin/env python
# -*- coding=utf-8 -*-
import sys
import datetime

d_user = {
   "user1": "true",
   "user2": "true"
}

for line in sys.stdin:
   line = line.strip()
   userid = line.split('\t')[0]
   if d_user.get(userid, "false") == "true":
       print "\t".join([userid, "1"])

hive执行添加文件命令如下

$ hive
hive> add file /home/user/test.py

hql命令如下

select userid, sum(1)
from(
select
TRANSFORM (user_pin)
USING '/home/user/test.py'
AS userid, cnt
from hive_table
where dt = "2021-03-01"
)a
group by userid

 

posted on 2021-04-19 17:33  cfox  阅读(792)  评论(0)    收藏  举报

导航