1.配置hadoop环境变量,本文已hadoop版本2.5.2为例。
下载hadoop2.5.2后解压,配置环境变量如下(若不生效,需要重启)

将winutils.exe文件放到hadoop的bin目录下
hadoop2.x版本未发布winutils.exe,没有该文件会报如下错误:

2.安装eclipse插件
在hadoop1的较早版本中提供了该插件,hadoop2中未提供该插件,需要到github中自己下载。此处使用:hadoop-eclipse-plugin-2.5.2.jar。
将hadoop-eclipse-plugin-2.5.2.jar复制到eclipse的dropins目录下解压后重启eclipse:

3.配置hadoop插件
将Hadoop installation directory设置为hadoop的根目录

显示Hadoop连接配置窗口:Window--Show View--Other-MapReduce Tools,如下图所示

配置连接Hadoop


4.检查是否与服务器连接
能够显示hadoop服务器上的文件和目录即已连接

5.新建一个mapreduce项目

hadoop相关的jar文件会本自动引入到项目中
6.运行wordCount程序
hadoop2.5.2源码自带的WordCount程序所在目录如下:
hadoop-2.5.2-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples\WordCount.java
(对代码的main方法稍作了修改)
package mapreduce;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens()) {
word.set(token.nextToken());
context.write(word, one);
}
}
}
public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path("hdfs://192.168.107.167:9000/input/test"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.107.167:9000/output/test"));
job.waitForCompletion(true);
}
}
使用eclipse在hadoop上运行上述代码,报如下错误

拷贝源码文件hadoop-2.5.2-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java
到项目的org.apache.hadoop.io.nativeio.NativeIO中,定位到570行,直接修改为return true。如下图所示:

修改后,程序运行后的结果如下:


后记
eclipse中无法编辑目录,切执行mapreduce程序时报如下错误

原因是文件系统权限设置了检查,可在hdfs-site.xml文件中添加以下配置取消检查
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
浙公网安备 33010602011771号