MapReduce编程

MapReduce

运行步骤：input=》split=》map=》shuffle=》reduce=》output

数据文件 =》分片记录1=》分片处理1=》按键分组按键排序键值对=》处理输出的键值对=》处理结果

分片记录2=》分片处理2

例子：单词计数原理

1.默认情况下，分片个数与数据块一致

2.一个分片对应一个Map

3.Map与Reduce读取与输入的数据均为键值对

4.Shuffle阶段能够按键值对数据进行分组，排序

WordCount

 package demo;
 
 import java.io.IOException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
 
 
 public class WordCount {
 
     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
         //实例化Configuration，获取集群配置
         Configuration conf = new Configuration();
         //实例化Job，提交到集群的任务
         Job job = Job.getInstance(conf);
         job.setJarByClass(WordCount.class);
         job.setMapperClass(MyMapper.class);
         job.setReducerClass(MyReducer.class);
         
         //设置Map、Reduce输出键值对类型
         job.setMapOutputKeyClass(Text.class);
         job.setMapOutputValueClass(IntWritable.class);
         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(IntWritable.class);
         
         //输入路径
         FileInputFormat.addInputPath(job, new Path(args[0]));
         //输出路径
         FileOutputFormat.setOutputPath(job, new Path(args[1]));
         System.out.println(job.waitForCompletion(true)?0:1);
 
     }
 
 }

MyMapper

 package demo;
 
 import java.io.IOException;
 
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;
 
 public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
     Text word= new Text();
     IntWritable one = new IntWritable(1);
     @Override
     protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
             throws IOException, InterruptedException {
         String[] vals = value.toString().split(" ");
         for(String val:vals) {
             word.set(val);
             context.write(word, one);
         }
     }
 
 }

MyReducer

 package demo;
 import java.io.IOException;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Reducer;
 
 public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
     IntWritable counts = new IntWritable();
     @Override
     protected void reduce(Text word, Iterable<IntWritable> ones,
             Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
         int sum=0;
         for(IntWritable one:ones) {
             sum+=one.get();
         }
         counts.set(sum);
         context.write(word, counts);
     }
 }

运行一个任务

 hadoop jar /usr/local/hadoop-2.6.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /usr/root/a.txt /usr/root/wordcount01

Hadoop jar 本地代码 wordcount 要规划的文件清洗过后的文件

posted @ 2020-07-07 15:21 我是小杨阅读(353) 评论(0) 收藏举报

刷新页面返回顶部

Liguangyang

小杨的学习与生活

MapReduce编程

MapReduce

公告