一.前言
因为,有一个项目需要从 Java web 远程管理 Hadoop MapReduce作业,实现B/S架构,需要远程提交MapReduce作业。传统的jar包方式,插件方式,无法满足需求,因此用Java实现。本文是在实现过程中,解决使用Eclipse开发时所遇问题的记录和解决办法。
二.运行环境
windows 7 64bit
MyEclipse-blue-2014
CDH 完全分布式( hadoop2.5.X)
三.测试用例 WordCount.java
package hadoop.job.submit.tool.test; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private Text word = new Text(); private final IntWritable one = new IntWritable(1); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] tokens = value.toString().split(" "); for (String str : tokens) { word.set(str); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { int sum = 0; @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { for (IntWritable v : values) { sum += v.get(); } IntWritable res = new IntWritable(); res.set(sum); context.write(key, res); } } }
四.运行驱动类
package hadoop.job.submit.tool.service; import hadoop.job.submit.tool.test.WordCount; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SubmitService extends Configured implements Tool { public int run(String[] args0) throws Exception { String in = "your input path on hdfs";//e.g "hdfs://192.168.0.100:8020/mapreduce/in" String out = "your output path on hdfs"; Configuration conf = new Configuration(); /*远程提交MR作业,需要添加Hadoop相关jar包,最重要的是设置作业运行时的相关配置。 *文件系统设置 "fs.defaultFS" *JobTracker IP 和 端口 */ conf.set("dfs.permissions.enabled", "false"); conf.set("mapred.job.tracker", "JobTracker and post"); conf.set("fs.defaultFS", "hdfs path"); conf.set("hadoop.job.ugi", "hdfs"); Job job = Job.getInstance(conf); job.setJarByClass(SubmitService.class); job.setMapperClass(WordCount.Map.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setNumReduceTasks(1); job.setReducerClass(WordCount.Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.setInputPaths(job, new Path(in)); FileOutputFormat.setOutputPath(job, new Path(out)); boolean success = job.waitForCompletion(true); return success ? 0 : 1; } public static void main(String[] args) throws Exception { int ret = ToolRunner.run(new SubmitService(), args); System.exit(ret); } }
五.错误分析和解决办法
1.Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V
解决办法
这是由于hadoop.dll 版本问题,2.4之前的和自后的需要的不一样,需要选择正确的版本并且在 Hadoop/bin和 C:\windows\system32 上将其替换
2.HDFS 文件读写权限
解决办法
hdfs dfs -chmod -R 777 outputDir

浙公网安备 33010602011771号