Spark3.0中Dates和Timestamps
Spark3.0使用的是预公历,而之前都是儒略历和公历的混合(即1582年之前的日期使用儒略历,1582年之后使用公历,java.sql.Date这个API用的就是这种,而Java8里使用java.time.LocalDate代替,使用的就是Spark3.0目前的预公历),Date类型不考虑时区
Spark3.0移植了Java8中的时间戳,将更加精确
构造日期和时间戳
1、构造日期
- make_date(),spark3.0中该函数有三个参数:YEAR,MONTH,DAY,输入参数均被隐式转换为int类型,该函数会根据预公历检查形成的日期是否是有效日期,无效则返回NULL。
-
>>> spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)], ... ['Y', 'M', 'D']).createTempView('YMD') >>> df = sql('select make_date(Y, M, D) as date from YMD') >>> df.printSchema() root |-- date: date (nullable = true) >>> df.show() +-----------+ | date| +-----------+ | 2020-06-26| | null| |-0044-01-01| +-----------+
2、构造时间戳
- make_timestamp(), spark3.0中共有6个参数,YEAR,MONTH,DAY,HOUR,MINUTE,SECOND,其中second为decimal类型,因为时间戳中的秒可以以微秒形式传递,提高精度
-
>>> df = spark.createDataFrame([(2020, 6, 28, 10, 31, 30.123456), ... (1582, 10, 10, 0, 1, 2.0001), (2019, 2, 29, 9, 29, 1.0)], ... ['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND']) >>> df.show() +----+-----+---+----+------+---------+ |YEAR|MONTH|DAY|HOUR|MINUTE| SECOND| +----+-----+---+----+------+---------+ |2020| 6| 28| 10| 31|30.123456| |1582| 10| 10| 0| 1| 2.0001| |2019| 2| 29| 9| 29| 1.0| +----+-----+---+----+------+---------+ >>> ts = df.selectExpr("make_timestamp(YEAR, MONTH, DAY, HOUR, MINUTE, SECOND) as MAKE_TIMESTAMP") >>> ts.printSchema() root |-- MAKE_TIMESTAMP: timestamp (nullable = true) >>> ts.show(truncate=False) +--------------------------+ |MAKE_TIMESTAMP | +--------------------------+ |2020-06-28 10:31:30.123456| |1582-10-10 00:01:02.0001 | |null | +--------------------------+
//转换时区只需要再加一个参数,如下>>> df = spark.createDataFrame([(2020, 6, 28, 10, 31, 30, 'UTC'), ... (1582, 10, 10, 0, 1, 2, 'America/Los_Angeles'), ... (2019, 2, 28, 9, 29, 1, 'Europe/Moscow')], ... ['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'TZ']) >>> df = df.selectExpr('make_timestamp(YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TZ) as MAKE_TIMESTAMP') >>> df = df.selectExpr("date_format(MAKE_TIMESTAMP, 'yyyy-MM-dd HH:mm:SS VV') AS TIMESTAMP_STRING") >>> df.show(truncate=False) +---------------------------------+ |TIMESTAMP_STRING | +---------------------------------+ |2020-06-28 13:31:00 Europe/Moscow| |1582-10-10 10:24:00 Europe/Moscow| |2019-02-28 09:29:00 Europe/Moscow| +---------------------------------+

浙公网安备 33010602011771号