Spark 特殊数据类型处理

Dataframe Array数据类型处理

简单处理

var simpleArrayDF = Seq(("beatles", "help,hey jude,some time"),
                        ("romeo", "eres mia,hahah,check")
                        ).toDF("name","songs")

simpleArrayDF = simpleArrayDF.withColumn(
    "hit_songs",functions.split(col("songs"),"\\,")
)

Array内容展开

借助explode

simpleArrayDF.select($"name",functions.explode($"hit_songs").as("hit_songs")).show()

Array是否包含某个元素

simpleArrayDF.withColumn(
    "is_contained",
    functions.array_contains($"hit_songs","help")
).show()

Dataframe Map 数据类型处理

var simpleMapDataFrame = Seq(
        ("sublime", Map("good_song" -> "santeria","bad_song" -> "doesn't exist")),
        ("prince_royce", Map("good_song" -> "darte un beso","bad_song" -> "back it up"))
        ).toDF("name","songs")

simpleMapDataFrame.select($"name",col("songs")("good_song").as("fun")).show()

代码:https://github.com/HCMY/SparkMLInAction/blob/main/jupyter/FeatureProcessPractice/SpecialFeatureProcess.ipynb

posted @ 2021-07-08 23:07  real-zhouyc  阅读(99)  评论(0)    收藏  举报