Hadoop Streaming 运行Python脚本

若出现如下错误,

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2

 将#!/usr/bin/env python插入到python脚本的顶端即可解决。

#mapper.py
#
!/usr/bin/env python import sys dic = {} for line in sys.stdin: line = line.strip().split() for key in line: if dic.has_key(key): dic[key] += 1 else: dic[key] = 1 for key, value in dic.items(): print "%s\t%d" % (key, value)
#reducer.py
#
!/usr/bin/env python import sys wordcount = {} for line in sys.stdin: line = line.strip() word,count=line.split("\t",1) count=int(count) wordcount[word]=wordcount.get(word,0)+count for word,count in wordcount.items(): print "%s\t%d" % (word, count)

 Hadoop命令:

hadoop jar /hadoop/hadoop-streaming-1.1.2.jar 
-input * -output *
-file /home/map.py
-mapper map.py
-file /home/red.py
-reducer red.py

注意:hadoop-streaming-1.1.2.jar并不在hadoop的根目录下,请去/hadoop/contrib/streaming下寻找。

posted on 2013-12-12 18:06  zach_Emrys  阅读(942)  评论(0编辑  收藏  举报