随笔分类 -  spark

摘要:.withColumn("size", size(col("columnname"))) \ 阅读全文
posted @ 2023-06-19 17:23 ivyJ 阅读(63) 评论(0) 推荐(0)
摘要:截取列值/左边的值.withColumn("columnname1", substring_index(col("columnname"), "/", -1)) 截取列值/右边的值 .withColumn("columnname2", substring_index(col("columnname" 阅读全文
posted @ 2023-06-19 17:18 ivyJ 阅读(120) 评论(0) 推荐(0)
摘要:参考 : https://www.coder.work/article/3220624 spark 2.4的新方法 Spark 2.4 引入了新的 SQL 函数 slice,可用于从数组列中提取一定范围的元素。 阅读全文
posted @ 2023-06-19 17:14 ivyJ 阅读(81) 评论(0) 推荐(0)
摘要:result = resdf.withColumn("Date", to_date(col("Date"), "yyyy-MM-dd")).\ withColumn("arrayDouble", regexp_replace(col("arrayDouble"), "\\]", "")).\ wit 阅读全文
posted @ 2023-01-29 15:14 ivyJ 阅读(356) 评论(0) 推荐(0)
摘要:Data Processing code sample : Scala : https://github.com/awslabs/deequ python : https://github.com/awslabs/python-deequ 阅读全文
posted @ 2023-01-28 18:18 ivyJ 阅读(52) 评论(0) 推荐(0)
摘要:比较frame2和frame1每列的内容的不同 val dfColumns = frame2.columns dfColumns.foreach(item => { println(" "+item) val empty = frame2.select(item).except(frame1.sel 阅读全文
posted @ 2022-12-14 17:04 ivyJ 阅读(377) 评论(0) 推荐(0)
摘要:把array<double>里的null值转换为0,transform 的用法: .withColumn("aa",transform("arrayColumnName", fill_zero)) .withColumn("bb", transform(col("arrayColumnName"), 阅读全文
posted @ 2022-12-14 16:58 ivyJ 阅读(78) 评论(0) 推荐(0)
摘要:import org.apache.spark.sql.functions.{col, regexp_replace, to_date, udf} 把字符串数组"[0.1,0.2]"转换array<double>:frame = frame.withColumn("ArrayDoubleValue" 阅读全文
posted @ 2022-12-14 16:54 ivyJ 阅读(108) 评论(0) 推荐(0)
摘要:打印spark处理失败的日志SparkLauncher launcher = sparkJobUtil.buildSparkLauncher(feedConfig, appName, params);SparkAppHandle handler = launcher.startApplication 阅读全文
posted @ 2022-06-14 18:22 ivyJ 阅读(148) 评论(0) 推荐(0)
摘要:Apache Spark Tutorial with Examples - Spark by {Examples} (sparkbyexamples.com) https://sparkbyexamples.com/spark/spark-dataframe-where-filter/ https: 阅读全文
posted @ 2022-04-22 18:02 ivyJ 阅读(4691) 评论(0) 推荐(0)