David_Zhu

导航

 

1.前期数据准备(同之前的章节)

 

val collegesRdd= sc.textFile("/user/hdfs/CollegeNavigator.csv")
val header= collegesRdd.first

val headerlessRdd= collegesRdd.filter( line=>{ line!= header } )

 

2.获得map

val typeMapCount= headerlessRdd.map(line=>{
val strtype=line.split("\",\"")(3)
val strCount=line.split("\",\"")(7)
val stuCount=if (strCount.length()>0) strCount.toLong
else 0
(strtype,stuCount)
})
typeMapCount.take(10).foreach(println)

 

3使用reducebykey 方法

 

val typeReduce=typeMapCount.reduceByKey((sum,current)=>{
sum+current
})

 

 

4.数据排序

 由于只有sortByKey这个方法,所以想按照后面的数据来排序,比较麻烦,必须把key value做两次置换,如下:

 

val typeReduce=typeMapCount.reduceByKey((sum,current)=>{
sum+current
}).map(line=>(line._2,line._1)).sortByKey().map(line=>(line._2,line._1))

typeReduce.take(10).foreach(println)

posted on 2018-11-23 11:59  David_Zhu  阅读(383)  评论(0)    收藏  举报