-
Notifications
You must be signed in to change notification settings - Fork 588
[VL] Result mismatch in date format week year #7069
Copy link
Copy link
Open
Labels
Description
Backend
VL (Velox)
Bug description
'Y' means week-based-year in spark (SimpleDateFormat). But velox parse 'Y' as year.
Seq(
( 883584000), // 1998-01-01 00:00:00
(1608912000), // 2020-12-26 00:00:00
(1608998400), // 2020-12-27 00:00:00
(1640361600), // 2021-12-25 00:00:00
(1640448000), // 2021-12-26 00:00:00
(1640966400), // 2022-01-01 00:00:00
(1672416000), // 2022-12-31 00:00:00
(1672502400), // 2023-01-01 00:00:00
(1703865600), // 2023-12-30 00:00:00
(1703952000), // 2023-12-31 00:00:00
).toDF("date")
spark.sql(s"""
select
from_unixtime(date, 'Y') as week_year,
date
from tmp
""")
/*
vanilla: gluten:
+---------+----------+ +---------+----------+
|week_year| date| |week_year| date|
+---------+----------+ +---------+----------+
| 1998| 883584000| | 1998| 883584000|
| 2020|1608912000| | 2020|1608912000|
| 2021|1608998400| | 2020|1608998400|
| 2021|1640361600| | 2021|1640361600|
| 2022|1640448000| | 2021|1640448000|
| 2022|1640966400| | 2022|1640966400|
| 2022|1672416000| | 2022|1672416000|
| 2023|1672502400| | 2023|1672502400|
| 2023|1703865600| | 2023|1703865600|
| 2024|1703952000| | 2023|1703952000|
+---------+----------+ +---------+----------+
*/I'm trying tofix this mismatch, and there are two issues that need to be resolved:
-
JodaDateTimeFormatter interprets Y as the 'year of era', which is diff from SimpleDateFormat in java.
-
Velox uses ISO standard to calc week date (e.g. Fix Spark WeekFunction on long years facebookincubator/velox#10713). But SimpleDateFormat use GregorianCalendar.
The main difference is that SimpleDateFormat will define the firstDayOfWeek and the minimalDaysInFirstWeek based on locale. Tt is Sunday and 1 day by default. ISO8601 defines it as Monday and 4 days.
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
Reactions are currently unavailable