一、把Hadoop的插件文件放到Eclipse路径的plugin文件夹内
重启Eclipse,Window -> Preferences ->Hadoop Map/Reduce 选择解压的过后的hadoop程序路径
项目视图里就会出现DFS分布式文件系统的选项。
二、创建与服务节点的连接

右键图中红色框内新建一个与Master节点的连接

如图填写好name,ip,port就可以在DFS中访问该节点的信息
三、new一个新项目就会出现Map/Reduce选项
项目文件结构如下:

四、编写Mapper类
右键包名new -> Mapper
package org.znufe.cnwc; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class CNWordMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object ikey, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()){ word.set(itr.nextToken()); context.write(word, one); } } }
五、编写Reducer类
右键包名new -> Reducer
package org.znufe.cnwc; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class CNWordReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // process values int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }
六、编写Driver驱动类
右键包名new -> MapReduce Driver
package org.znufe.cnwc; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class CNWordMain { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); /** * 这里必须有输入和输出 * */ if(otherArgs.length != 2){ System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = Job.getInstance(conf, "CN Word Count"); job.setJarByClass(org.znufe.cnwc.CNWordMain.class); // TODO: specify a mapper job.setMapperClass(org.znufe.cnwc.CNWordMapper.class); // TODO: specify a reducer job.setReducerClass(org.znufe.cnwc.CNWordReducer.class); // TODO: specify output types job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files) FileInputFormat.setInputPaths(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
七、把项目打包成jar文件并上传
右键项目名 -> Export,取消勾选classpath与project,填好路径与名字
把项目jar包上传到Master节点Hadoop主目录下(我的是/home/hadoop/hadoop-2.5.2)
八、在Master节点创建文档进行词频统计测试
1. 在hadoop-2.5.2文件夹中创建两个文件
test.txt
Hello World!
Hello Hadoop!
test1.txt
Hello What Ghost!
4S is Super Stupid Suspension System.
然后通过命令把两个文件上传到文件系统中
bin/hdfs dfs -copyFromLocal /home/hadoop/hadoop-2.5.2/test.txt /testtemp
bin/hdfs dfs -copyFromLocal /home/hadoop/hadoop-2.5.2/test1.txt /testtemp
此时可以通过bin/hdfs dfs -ls /testtemp这个命令或者直接刷新Eclipse的文件列表查看是否上传成功
2. 运行程序(我打包的程序是wordcount.jar)
bin/hadoop jar wordcount.jar org.znufe.cnwc.CNWordMain /testtemp/ /outputwordcount_01 //注意第二个路径必须是不存在的
如果成功就会看到以下结果


4. 在Eclipse中就可以查看到统计结果

一些基本的Hadoop命令
hadoop fs -copyFromLocal
hadoop fs -cooyToLocal