M.pucb

导航

 
Technorati 标签: Hadoop,MapReduce

一.前言

因为,有一个项目需要从 Java web 远程管理 Hadoop MapReduce作业,实现B/S架构,需要远程提交MapReduce作业。传统的jar包方式,插件方式,无法满足需求,因此用Java实现。本文是在实现过程中,解决使用Eclipse开发时所遇问题的记录和解决办法。

二.运行环境

windows 7 64bit

MyEclipse-blue-2014

CDH 完全分布式( hadoop2.5.X)

三.测试用例 WordCount.java

package hadoop.job.submit.tool.test;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCount {
    public static class Map extends
            Mapper<LongWritable, Text, Text, IntWritable> {
        private Text word = new Text();
        private final IntWritable one = new IntWritable(1);

        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String[] tokens = value.toString().split(" ");
            for (String str : tokens) {
                word.set(str);
                context.write(word, one);
            }
        }
    }

    public static class Reduce extends
            Reducer<Text, IntWritable, Text, IntWritable> {
        int sum = 0;

        @Override
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            for (IntWritable v : values) {
                sum += v.get();
            }

            IntWritable res = new IntWritable();
            res.set(sum);
            context.write(key, res);
        }
    }
}

四.运行驱动类

package hadoop.job.submit.tool.service;

import hadoop.job.submit.tool.test.WordCount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class SubmitService extends Configured implements Tool
{

    public int run(String[] args0) throws Exception
    {

        String in = "your input path on hdfs";//e.g "hdfs://192.168.0.100:8020/mapreduce/in"
        String out = "your output path on hdfs";

        Configuration conf = new Configuration();
        /*远程提交MR作业,需要添加Hadoop相关jar包,最重要的是设置作业运行时的相关配置。
        *文件系统设置 "fs.defaultFS"
        *JobTracker IP 和 端口
        */
        conf.set("dfs.permissions.enabled", "false");
        conf.set("mapred.job.tracker", "JobTracker and post");
        conf.set("fs.defaultFS", "hdfs path");
        conf.set("hadoop.job.ugi", "hdfs");

        Job job = Job.getInstance(conf);
        job.setJarByClass(SubmitService.class);
        job.setMapperClass(WordCount.Map.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(1);

        job.setReducerClass(WordCount.Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.setInputPaths(job, new Path(in));
        FileOutputFormat.setOutputPath(job, new Path(out));

        boolean success = job.waitForCompletion(true);
        return success ? 0 : 1;
    }

    public static void main(String[] args) throws Exception
    {

        int ret = ToolRunner.run(new SubmitService(), args);
        System.exit(ret);
    }
}

五.错误分析和解决办法

1.Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V

 

1

解决办法

这是由于hadoop.dll 版本问题,2.4之前的和自后的需要的不一样,需要选择正确的版本并且在 Hadoop/bin和 C:\windows\system32 上将其替换

windows64位平台的hadoop2.6插件包(hadoop.dll,winutils.exe)

2.HDFS 文件读写权限

解决办法

hdfs dfs -chmod -R 777 outputDir
posted on 2015-05-20 21:12  M.pucb  阅读(433)  评论(0)    收藏  举报