Loading

Spark入门案例 - 统计单词个数 / wordcount


Scala版

import org.apache.spark.{SparkConf, SparkContext}

object WordCountScala {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setAppName("WordCountScala").setMaster("local[1]")
    val sc: SparkContext = new SparkContext(conf)
    val data = Array("hello world", "simple app is good", "good world")
    val result: Array[(String, Int)] = sc.parallelize(data)
      .flatMap(_.split(" "))
      .map((_, 1))
      .reduceByKey(_ + _)
      .collect()
    result.foreach(println)
  }
}

Java版

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;

import java.util.Arrays;
import java.util.List;


public class WordCountJava {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf().setAppName("WordCountJava").setMaster("local[1]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        List<String> data = Arrays.asList("hello world", "simple app is good", "good world");
        List<Tuple2<String, Integer>> result = jsc.parallelize(data)
                .flatMap(s -> Arrays.asList(s.split(" ")).iterator())
                .mapToPair(v -> new Tuple2<>(v, 1))
                .reduceByKey(Integer::sum)
                .collect();
        result.forEach(System.out::println);
    }
}

计算结果

(is,1)
(app,1)
(simple,1)
(hello,1)
(good,2)
(world,2)

可以看出在Spark中,Scala的语法显然要比Java简洁许多,毕竟Spark是用Scala写的,更加纯粹的函数式编程,建议尽可能优先采用Scala学习与使用Spark。



posted @ 2021-05-30 15:25  Convict  阅读(525)  评论(0编辑  收藏  举报