记一次SparkUI的使用记录

内存不足问题

spark 默认分配的内存是4G,通过这个命令分配更大的内存空间给我们的任务

spark-shell --driver-memory 12g

import org.apache.spark.sql.DataFrame
val hdfs_path_apply: String = s"/mnt/g/BaiduNetdiskDownload/2011-2019小汽车摇号数据/apply"
val applyNumbersDF: DataFrame = spark.read.parquet(hdfs_path_apply)
val hdfs_path_lucky: String = s"/mnt/g/BaiduNetdiskDownload/2011-2019小汽车摇号数据/lucky"
val luckyDogsDF: DataFrame = spark.read.parquet(hdfs_path_lucky)
val filteredLuckyDogs: DataFrame = luckyDogsDF.filter(col("batchNum") >= "201601").select("carNum")
val jointDF: DataFrame = applyNumbersDF.join(filteredLuckyDogs, Seq("carNum"), "inner")
val multipliers: DataFrame = jointDF.groupBy(col("batchNum"),col("carNum")).agg(count(lit(1)).alias("multiplier"))
val uniqueMultipliers: DataFrame = multipliers.groupBy("carNum").agg(max("multiplier").alias("multiplier"))
val result: DataFrame = uniqueMultipliers.groupBy("multiplier").agg(count(lit(1)).alias("cnt")).orderBy("multiplier")
result.collect

结果

浏览器访问:http://192.168.128.5:4040。页面分成Job,Stages,Storage,Envoronment,Executors,SQL

JOB

Stages

Executors

posted @ 2022-05-22 22:13  yihailin  阅读(65)  评论(0编辑  收藏  举报