2025/2/4

Scala可以通过编写MapReduce程序与Hadoop集成，实现高效的数据处理。本篇博客将展示如何使用Scala编写一个简单的MapReduce程序来统计单词出现的次数。
MapReduce程序：编写Mapper和Reducer。
运行MapReduce任务：将Scala程序打包并提交到Hadoop。
示例代码：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}

object WordCount {
class TokenizerMapper extends Mapper[Object, Text, Text, IntWritable] {
private val one = new IntWritable(1)
private val word = new Text()

override def map(key: Object, value: Text, context: Mapper[Object, Text, Text, IntWritable]#Context): Unit = {
value.toString.split("\\s+").foreach { w =>
word.set(w.toLowerCase)
context.write(word, one)
}
}
}

class IntSumReducer extends Reducer[Text, IntWritable, Text, IntWritable] {
private val result = new IntWritable()

override def reduce(key: Text, values: java.lang.Iterable[IntWritable], context: Reducer[Text, IntWritable, Text, IntWritable]#Context): Unit = {
var sum = 0
values.forEach(value => sum += value.get())
result.set(sum)
context.write(key, result)
}
}

def main(args: Array[String]): Unit = {
val job = Job.getInstance(new Configuration(), "word count")
job.setJarByClass(WordCount.getClass)
job.setMapperClass(classOf[TokenizerMapper])
job.setCombinerClass(classOf[IntSumReducer])
job.setReducerClass(classOf[IntSumReducer])
job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[IntWritable])
job.setInputFormatClass(classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat])
job.setOutputFormatClass(classOf[org.apache.hadoop.mapreduce.lib.output.TextOutputFormat])
job.setInputPaths(new Path(args(0)))
job.setOutputPath(new Path(args(1)))
System.exit(if (job.waitForCompletion(true)) 0 else 1)
}
}

运行步骤：
将上述代码保存为WordCount.scala。
使用SBT打包项目：

sbt package
将生成的JAR文件提交到Hadoop集群：

hadoop jar target/scala-2.13/wordcount_2.13-0.1.jar WordCount input output
查看输出结果：

hdfs dfs -cat output/part-r-00000

Scala与Hadoop的集成使得我们可以利用Scala的强大功能编写MapReduce程序，处理大规模数据集。

posted @ 2025-02-04 17:05 伐木工熊大阅读(9) 评论(0) 收藏举报

刷新页面返回顶部

zhenaifen

2025/2/4

公告