Skip to content

[VL] Result mismatch in date format week year #7069

@ccat3z

Description

@ccat3z

Backend

VL (Velox)

Bug description

'Y' means week-based-year in spark (SimpleDateFormat). But velox parse 'Y' as year.

Seq(
  ( 883584000), // 1998-01-01 00:00:00
  (1608912000), // 2020-12-26 00:00:00
  (1608998400), // 2020-12-27 00:00:00
  (1640361600), // 2021-12-25 00:00:00
  (1640448000), // 2021-12-26 00:00:00
  (1640966400), // 2022-01-01 00:00:00
  (1672416000), // 2022-12-31 00:00:00
  (1672502400), // 2023-01-01 00:00:00
  (1703865600), // 2023-12-30 00:00:00
  (1703952000), // 2023-12-31 00:00:00
).toDF("date")

spark.sql(s"""
  select
    from_unixtime(date, 'Y') as week_year,
    date
  from tmp
""")

/*
vanilla:                  gluten:
+---------+----------+    +---------+----------+
|week_year|      date|    |week_year|      date|
+---------+----------+    +---------+----------+
|     1998| 883584000|    |     1998| 883584000|
|     2020|1608912000|    |     2020|1608912000|
|     2021|1608998400|    |     2020|1608998400|
|     2021|1640361600|    |     2021|1640361600|
|     2022|1640448000|    |     2021|1640448000|
|     2022|1640966400|    |     2022|1640966400|
|     2022|1672416000|    |     2022|1672416000|
|     2023|1672502400|    |     2023|1672502400|
|     2023|1703865600|    |     2023|1703865600|
|     2024|1703952000|    |     2023|1703952000|
+---------+----------+    +---------+----------+
*/

I'm trying tofix this mismatch, and there are two issues that need to be resolved:

  1. JodaDateTimeFormatter interprets Y as the 'year of era', which is diff from SimpleDateFormat in java.

  2. Velox uses ISO standard to calc week date (e.g. Fix Spark WeekFunction on long years facebookincubator/velox#10713). But SimpleDateFormat use GregorianCalendar.

    The main difference is that SimpleDateFormat will define the firstDayOfWeek and the minimalDaysInFirstWeek based on locale. Tt is Sunday and 1 day by default. ISO8601 defines it as Monday and 4 days.

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions