Skip to content

[Go][Parquet] Arrow DATE64 type is coerced into Parquet TIMESTAMP[ms] logical type instead of DATE (32-bit) #39456

@joellubi

Description

@joellubi

Describe the bug, including details regarding any error messages, version, and platform.

The Parquet DATE logical type must annotate an int32 representing days since the UNIX epoc per the spec. The Arrow DATE64 (ms since UNIX epoch) type does not have a direct analog in Parquet, so it must be coerced into a compatible representation when writing Arrow data to Parquet.

The prevailing convention is to coerce DATE64 to int32 seconds since the UNIX epoch (Parquet DATE logical type) [e.g. C++, Rust]. The behavior for handling an int64 value not on a date boundary (i.e. not divisible by 86400000) is not defined. Some implementations validate this condition while others truncate to the date the physical value falls within.

The current Go implementation diverges from the approach followed by these languages, coercing instead to a UTC-normalized TIMESTAMP[ms]. This may lead to surprising behavior in cross-language use-cases and alters the original semantics of the type (at least for non-arrow consumers that don't handle store_schema). It seems that it would increase overall compatibility in the ecosystem to align Go to the convention currently followed in the other implementations.

See also: https://lists.apache.org/thread/q036r1q3cw5ysn3zkpvljx3s9ho18419

Component(s)

Go, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions