MapReduce编程

MapReduce

运行步骤:input=》split=》map=》shuffle=》reduce=》output

数据文件 =》分片记录1=》分片处理1=》按键分组按键排序键值对=》处理输出的键值对=》处理结果

分片记录2=》分片处理2

例子:单词计数原理

1

1.默认情况下,分片个数与数据块一致

2.一个分片对应一个Map

3.Map与Reduce读取与输入的数据均为键值对

4.Shuffle阶段能够按键值对数据进行分组,排序

WordCount

 package demo;
 
 import java.io.IOException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
 
 
 public class WordCount {
 
  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
  //实例化Configuration,获取集群配置
  Configuration conf = new Configuration();
  //实例化Job,提交到集群的任务
  Job job = Job.getInstance(conf);
  job.setJarByClass(WordCount.class);
  job.setMapperClass(MyMapper.class);
  job.setReducerClass(MyReducer.class);
 
  //设置Map、Reduce输出键值对类型
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(IntWritable.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
 
  //输入路径
  FileInputFormat.addInputPath(job, new Path(args[0]));
  //输出路径
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  System.out.println(job.waitForCompletion(true)?0:1);
 
  }
 
 }

MyMapper

 package demo;
 
 import java.io.IOException;
 
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;
 
 public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
  Text word= new Text();
  IntWritable one = new IntWritable(1);
  @Override
  protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
  throws IOException, InterruptedException {
  String[] vals = value.toString().split(" ");
  for(String val:vals) {
  word.set(val);
  context.write(word, one);
  }
  }
 
 }

MyReducer

 package demo;
 import java.io.IOException;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Reducer;
 
 public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
  IntWritable counts = new IntWritable();
  @Override
  protected void reduce(Text word, Iterable<IntWritable> ones,
  Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  int sum=0;
  for(IntWritable one:ones) {
  sum+=one.get();
  }
  counts.set(sum);
  context.write(word, counts);
  }
 }

运行一个任务

 hadoop jar /usr/local/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /usr/root/a.txt /usr/root/wordcount01

Hadoop jar 本地代码 wordcount 要规划的文件 清洗过后的文件

posted @ 2020-07-07 15:21  我是小杨  阅读(353)  评论(0)    收藏  举报