RDD的执行

Action算子用来触发RDD的计算,得到相关计算的结果

Action触发Job,一个Spark程序包含多少Action算子,就有多少个Job

1.count,返回数据量

2.mean,返回平均数

3.stdev,返回平方差

4.max,返回最大值

5.min,返回最小值

6.stats,返回以上5个值

scala> val rdd1 = sc.makeRDD(1 to 10)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:24

scala> rdd1.count
res0: Long = 10                                                                 

scala> rdd1.mean
res1: Double = 5.5

scala> rdd1.stdev
res2: Double = 2.8722813232690143

scala> rdd1.max
res3: Int = 10

scala> rdd1.min
res4: Int = 1

scala> rdd1.stats
res5: org.apache.spark.util.StatCounter = (count: 10, mean: 5.500000, stdev: 2.872281, max: 10.000000, min: 1.000000)

7.reduce(func),聚合操作

8.fold(func),和reduce相似,带初始值

9.aggregate,和fold相似,可分别定义分区内聚合函数和分区间聚合函数

scala> rdd1.reduce(_+_)
res6: Int = 55

scala> rdd1.fold(0)(_+_)
res7: Int = 55

scala> rdd1.fold(2)(_+_)
res8: Int = 61

scala> rdd1.fold(4)(_+_)
res9: Int = 67

scala> rdd1.aggregate(0)(_+_,_+_)
res11: Int = 55

scala> rdd1.aggregate(2)(_+_,_+_)
res12: Int = 61

scala> rdd1.aggregate(4)(_+_,_+_)
res13: Int = 67

10.first(),返回第一个元素

11.take(n),返回前n个元素

12.top(n),按照默认(降序)或指定排序规则,返回前n个元素

13.takeSample(withReplacement, num, [seed]),返回抽样的数据

scala> val lst = (1 to 10).map(x => random.nextInt(20))
lst: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 8, 10, 10, 16, 12, 4, 4, 12, 15)

scala> val rdd1 = sc.makeRDD(lst)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[10] at makeRDD at <console>:26

scala> rdd1.collect
res26: Array[Int] = Array(2, 8, 10, 10, 16, 12, 4, 4, 12, 15)

scala> rdd1.first
res27: Int = 2

scala> rdd1.take(3)
res28: Array[Int] = Array(2, 8, 10)

scala> rdd1.top(3)
res29: Array[Int] = Array(16, 15, 12)

scala> rdd1.takeSample(true,3,66)
res34: Array[Int] = Array(15, 10, 15)

scala> rdd1.takeSample(true,3,66)
res35: Array[Int] = Array(15, 10, 15)

scala> rdd1.takeSample(false,3,66)
res36: Array[Int] = Array(8, 4, 10)

14.foreach(func),和map类似,区别是foreach是Action

15.foreachPartition(func),和mapPartitions类似,区别是foreachPartitions是Action

16.saveAsTextFile(path)/saveAsSequenceFile(path)/saveAsObjectFile(path),保存文件到指定路径,有多少个分区就保存为多少个文件

scala> rdd1.saveAsTextFile("/data/textFile")
                                                                                
scala> rdd1.getNumPartitions
res45: Int = 2

 

posted @ 2022-03-16 11:04  NeilCheung514  阅读(80)  评论(0)    收藏  举报