spark-shell 交互式编程

数据集:

Tom,DataBase,80

Tom,Algorithm,50

Tom,DataStructure,60

Jim,DataBase,90

Jim,Algorithm,60

Jim,DataStructure,80

……

请根据给定的实验数据,在 spark-shell 中通过编程来计算以下内容:

(1) 该系总共有多少学生:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val par = lines.map(row=>row.split(",")(0))      
val distinct_par = par.distinct()  //去重操作 
distinct_par.count  //取得总数

答案为265人。

(2) 该系共开设来多少门课程:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val par = lines.map(row=>row.split(",")(1))  
val distinct_par = par.distinct()  
distinct_par.count

答案为8门。

(3) Tom 同学的总成绩平均分是多少:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(0)=="Tom") 
pare.foreach(println) 
//Tom,DataBase,26 
//Tom,Algorithm,12 
//Tom,OperatingSystem,16 
//Tom,Python,40 
//Tom,Software,60 
pare.map(row=>(row.split(",")(0),row.split(",")(2).toInt)).mapValues(x=>(x,1)).reduceByKey((x,y ) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
//res9: Array[(String, Int)] = Array((Tom,30)) 

Tom的平均分为30分。

(4) 求每名同学的选修的课程门数:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.map(row=>(row.split(",")(0),row.split(",")(1)))
pare.mapValues(x => (x,1)).reduceByKey((x,y) => (" ",x._2 + y._2)).mapValues(x => x._2).foreach(println)

答案为265行。

(5) 该系 DataBase 课程共有多少人选修:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(1)=="DataBase") 
pare.count 
res1: Long = 126

答案为126人。

(6) 各门课程的平均分是多少:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.map(row=>(row.split(",")(1),row.split(",")(2).toInt)) 
pare.mapValues(x=>(x,1)).reduceByKey((x,y) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
res0: Array[(String, Int)] = Array((Python,57), (OperatingSystem,54), (CLanguage,50), (Software,50), (Algorithm,48), (DataStructure,47), (DataBase,50), (ComputerNetwork,51))

答案为: (CLanguage,50) (Python,57) (Software,50) (OperatingSystem,54) (Algorithm,48) (DataStructure,47) (DataBase,50) (ComputerNetwork,51)

(7)使用累加器计算共有多少人选了 DataBase 这门课:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(1)=="DataBase").map(row=>(row.split(",")(1),1)) 
val accum = sc.longAccumulator("My Accumulator") 
pare.values.foreach(x => accum.add(x)) 
accum.value 
res19: Long = 126 

答案为126 人。

 

posted @ 2020-02-08 13:08  袁小丑  阅读(1714)  评论(0编辑  收藏  举报