Spark3.0中Dates和Timestamps

 

Spark3.0使用的是预公历,而之前都是儒略历和公历的混合(即1582年之前的日期使用儒略历,1582年之后使用公历,java.sql.Date这个API用的就是这种,而Java8里使用java.time.LocalDate代替,使用的就是Spark3.0目前的预公历),Date类型不考虑时区

Spark3.0移植了Java8中的时间戳,将更加精确


 

构造日期和时间戳

1、构造日期

  •   make_date(),spark3.0中该函数有三个参数:YEAR,MONTH,DAY,输入参数均被隐式转换为int类型,该函数会根据预公历检查形成的日期是否是有效日期,无效则返回NULL。
  • >>> spark.createDataFrame([(2020, 6, 26), (1000, 2, 29), (-44, 1, 1)],
    ... ['Y', 'M', 'D']).createTempView('YMD')
    >>> df = sql('select make_date(Y, M, D) as date from YMD')
    >>> df.printSchema()
    root
     |-- date: date (nullable = true)
    >>> df.show()
    +-----------+
    |       date|
    +-----------+
    | 2020-06-26|
    |       null|
    |-0044-01-01|
    +-----------+    

2、构造时间戳

  •   make_timestamp(),  spark3.0中共有6个参数,YEAR,MONTH,DAY,HOUR,MINUTE,SECOND,其中second为decimal类型,因为时间戳中的秒可以以微秒形式传递,提高精度
  • >>> df = spark.createDataFrame([(2020, 6, 28, 10, 31, 30.123456),
    ... (1582, 10, 10, 0, 1, 2.0001), (2019, 2, 29, 9, 29, 1.0)],
    ... ['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND'])
    >>> df.show()
    +----+-----+---+----+------+---------+
    |YEAR|MONTH|DAY|HOUR|MINUTE|   SECOND|
    +----+-----+---+----+------+---------+
    |2020|    6| 28|  10|    31|30.123456|
    |1582|   10| 10|   0|     1|   2.0001|
    |2019|    2| 29|   9|    29|      1.0|
    +----+-----+---+----+------+---------+
    
    >>> ts = df.selectExpr("make_timestamp(YEAR, MONTH, DAY, HOUR, MINUTE, SECOND) as MAKE_TIMESTAMP")
    >>> ts.printSchema()
    root
     |-- MAKE_TIMESTAMP: timestamp (nullable = true) 
    >>> ts.show(truncate=False)
    +--------------------------+
    |MAKE_TIMESTAMP            |
    +--------------------------+
    |2020-06-28 10:31:30.123456|
    |1582-10-10 00:01:02.0001  |
    |null                      |
    +--------------------------+
    //转换时区只需要再加一个参数,如下
    >>> df = spark.createDataFrame([(2020, 6, 28, 10, 31, 30, 'UTC'),
    ...     (1582, 10, 10, 0, 1, 2, 'America/Los_Angeles'),
    ...     (2019, 2, 28, 9, 29, 1, 'Europe/Moscow')],
    ...     ['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'TZ'])
    >>> df = df.selectExpr('make_timestamp(YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TZ) as MAKE_TIMESTAMP')
    >>> df = df.selectExpr("date_format(MAKE_TIMESTAMP, 'yyyy-MM-dd HH:mm:SS VV') AS TIMESTAMP_STRING")
    >>> df.show(truncate=False)
    +---------------------------------+
    |TIMESTAMP_STRING                 |
    +---------------------------------+
    |2020-06-28 13:31:00 Europe/Moscow|
    |1582-10-10 10:24:00 Europe/Moscow|
    |2019-02-28 09:29:00 Europe/Moscow|
    +---------------------------------+

     

posted @ 2020-10-07 21:00  dretrtg  阅读(404)  评论(0)    收藏  举报