9.11

MapReduce 是 Hadoop 用于处理大规模数据的核心编程模型。本文将通过 MapReduce 代码实现简单的词频统计任务。

MapReduce 工作原理：Mapper 和 Reducer

Hadoop 项目结构

MapReduce 程序代码

public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

　　 while (itr.hasMoreTokens()) {

　　　　　　word.set(itr.nextToken());

　　　　　　context.write(word, one);

　　　　　　　　 }

　　　　　} public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

　　　　　　public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

　　　　　　　　int sum = 0; for (IntWritable val : values) { sum += val.get();

　　　　　　 } context.write(key, new IntWritable(sum)); } } }

posted @ 2024-12-26 13:21 kxzzow 阅读(12) 评论(0) 收藏举报

刷新页面返回顶部