2025/2/5

Scala可以通过编写MapReduce程序与Hadoop集成,实现高效的数据处理。本篇博客将展示如何使用Scala编写一个简单的MapReduce程序来统计单词出现的次数。
MapReduce程序:编写Mapper和Reducer。
运行MapReduce任务:将Scala程序打包并提交到Hadoop。
示例代码:

 

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{IntWritable, Text}
import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}

object WordCount {
class TokenizerMapper extends Mapper[Object, Text, Text, IntWritable] {
private val one = new IntWritable(1)
private val word = new Text()

override def map(key: Object, value: Text, context: Mapper[Object, Text, Text, IntWritable]#Context): Unit = {
value.toString.split("\\s+").foreach { w =>
word.set(w.toLowerCase)
context.write(word, one)
}
}
}

class IntSumReducer extends Reducer[Text, IntWritable, Text, IntWritable] {
private val result = new IntWritable()

override def reduce(key: Text, values: java.lang.Iterable[IntWritable], context: Reducer[Text, IntWritable, Text, IntWritable]#Context): Unit = {
var sum = 0
values.forEach(value => sum += value.get())
result.set(sum)
context.write(key, result)
}
}

def main(args: Array[String]): Unit = {
val job = Job.getInstance(new Configuration(), "word count")
job.setJarByClass(WordCount.getClass)
job.setMapperClass(classOf[TokenizerMapper])
job.setCombinerClass(classOf[IntSumReducer])
job.setReducerClass(classOf[IntSumReducer])
job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[IntWritable])
job.setInputFormatClass(classOf[org.apache.hadoop.mapreduce.lib.input.TextInputFormat])
job.setOutputFormatClass(classOf[org.apache.hadoop.mapreduce.lib.output.TextOutputFormat])
job.setInputPaths(new Path(args(0)))
job.setOutputPath(new Path(args(1)))
System.exit(if (job.waitForCompletion(true)) 0 else 1)
}
}

 

运行步骤:
将上述代码保存为WordCount.scala。
使用SBT打包项目:

sbt package
将生成的JAR文件提交到Hadoop集群:

hadoop jar target/scala-2.13/wordcount_2.13-0.1.jar WordCount input output
查看输出结果:

hdfs dfs -cat output/part-r-00000

Scala与Hadoop的集成使得我们可以利用Scala的强大功能编写MapReduce程序,处理大规模数据集。

posted @ 2025-02-05 21:09  伐木工熊大  阅读(8)  评论(0)    收藏  举报