dataFrame把某列类型为array<double>或者array<string>数组里的值为null的置换为非null

----把array<double>里的null值转换为0,transform 的用法:
.withColumn("aa",transform("arrayColumnName", fill_zero))
.withColumn("bb", transform(col("arrayColumnName"), lambda x: round(x, 4)))  --数组元素每个截取4位


def fill_zero(x):
return when((x == np.inf) | (x == np.nan) | (x == -np.inf) | (isnull(x)), 0).otherwise(x)

transform 方法:
Returns an array of elements after applying a transformation to each element in the input array.


----把array<double>里的null值转换为0:
df.withColumn("Value", replaceArrayNullToZeroUDF(col("Value")))
def replaceArrayNullToNOVALUEUDF = udf(replaceArrayNullToNOVALUE)
def replaceArrayNullToZero: (List[String] => List[String]) = {
s =>
s match {
case null => List()
case ::(head, next) =>
s.map(item => {
val value = item match {
case null => "0"
case "" => "0"
case "null" => "0"
case _ => item
}
value
})
}
}
----把array<string>里的null值转换为字符串NOVALUE:
def replaceArrayNullToNOVALUE: (List[String] => List[String]) = {
s =>
s match {
case null => List()
case _ => s.map(item => {
val value = item match {
case null => "NOVALUE"
case "" => "NOVALUE"
case "null" => "NOVALUE"
case _ => item
}
value
})
}

}
posted @ 2022-12-14 16:58  ivyJ  阅读(74)  评论(0)    收藏  举报