spark处理csv文件的进行统计
打开spark-shell命令以3号机为master:
bin/spark-shell --master spark://linux-centos7-03:7077
Scala读取csv文件
var df=spark.read.format("csv").option("sep",",").option("inferSchema","true").option("header","true").load("hdfs://linux-centos7-03:8020/10061789243186.csv")
展示读取到的数据:
df.show
创建临时视图:
df.createTempView("comments")
查询并展示:
spark.sql("select isMobile ,count(*) as sum from comments group by isMobile").show
还有如果出现了问题,
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registe
就是类似这样的问题,直接stop-all.sh全部关闭然后在开启就可以了

浙公网安备 33010602011771号