Maven构建Hadoop Maven构建Hadoop工程

一.安装maven

linux eclipse3.6.1 maven安装

二:官网依赖库

  我们可以直接去官网查找我们需要的依赖包的配置pom,然后加到项目中。

  官网地址:http://mvnrepository.com/

三:Hadoop依赖

  我们需要哪些Hadoop的jar包?

  做一个简单的工程,可能需要以下几个

复制代码
hadoop-common
hadoop-hdfs
hadoop-mapreduce-client-core
hadoop-mapreduce-client-jobclient
hadoop-mapreduce-client-common
复制代码

四:配置

  打开工程的pom.xml文件。根据上面我们需要的包去官网上找,找对应版本的,这么我使用的2.5.2版本。

  修改pom.xml如下:

复制代码
复制代码
<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.5.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.5.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.5.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>2.5.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>2.5.2</version>
        </dependency>
        <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.7</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
    </dependencies>
复制代码
复制代码

五:构建完毕

  点击保存,就会发现maven在帮我们吧所需要的环境开始构建了。

  等待构建完毕。

六:新建WordCountEx类

  在src/main/java下新建WordCountEx类

package firstExample;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCountEx {
	static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {
		private final static IntWritable one = new IntWritable(1);

		private Text word = new Text();

		protected void map(
				Object key,
				Text value,
				org.apache.hadoop.mapreduce.Mapper<Object, Text, Text, IntWritable>.Context context)
				throws java.io.IOException, InterruptedException {

			// 分隔字符串
			StringTokenizer itr = new StringTokenizer(value.toString());
			while (itr.hasMoreTokens()) {
				// 排除字母少于5个字
				String tmp = itr.nextToken();
				if (tmp.length() < 5) {
					continue;
				}
				word.set(tmp);
				context.write(word, one);
			}

		}

	}

	static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();
		private Text keyEx = new Text();

		protected void reduce(
				Text key,
				java.lang.Iterable<IntWritable> values,
				org.apache.hadoop.mapreduce.Reducer<Text, IntWritable, Text, IntWritable>.Context context)
				throws java.io.IOException, InterruptedException {
			
			int sum=0;
			for (IntWritable val:values) {
				//
				sum+= val.get()*2;
			}
			
			result.set(sum);
			//自定义输出key
			
			keyEx.set("输出:"+key.toString());
			context.write(keyEx, result);
			
		}
	}
	
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		//配置信息
		Configuration conf=new Configuration();
		
		//job的名称
		Job job=Job.getInstance(conf,"mywordcount");
		
		job.setJarByClass(WordCountEx.class);
		job.setMapperClass(MyMapper.class);
		
		job.setReducerClass(MyReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		//输入, 输出path
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		
		//结束
		System.out.println(job.waitForCompletion(true)?0:1);
		
	}
	
	
	

}

  

 

七:导出Jar包

  点击工程,右键->Export,如下:

 

八:执行

  将导出的jar包放到C:\Users\hadoop\Desktop\下,而后上传的Linux中/home/hadoop/workspace/下

     上传world_ 01.txt , hadoop fs -put  /home/hadoop/workspace/words_01.txt   /user/hadoop

  执行命令,发现很顺利的就成功了

 hadoop jar /home/hadoop/workspace/first.jar firstExample.WordCountEx  /user/hadoop/world_ 01.txt  /user/hadoop/out

 

结果为:

 

 

 

示例下载

 Github:https://github.com/sinodzh/HadoopExample/tree/master/2015/first

posted @ 2016-08-28 20:59  dy9776  阅读(484)  评论(0编辑  收藏  举报