-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13033: [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time) #10610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the discussion on the JIRA issue, can you a bit more clearly state what exactly you implemented and what part it covers?
For example, the current kernel returns int64 and not timestamp with timezone?
It also seems that the implementation is converting system time (UTC epoch) to local (naive) time, and not the other way around as the docstring says.
(also, given all the confusion also on the JIRA, we should probably try to come up with a more explicit/descriptive name ...)
3b14063 to
7fa50a6
Compare
|
Thanks for the review @jorisvandenbossche :).
I think there is no consensus on this yet. |
|
@rok can you explain the rationale of adding a |
In case we start with "naive" a timestamp we don't know the source and target timezone so both need to be specified. Or am I missing something here? |
|
What's the difference between both timezones? In my mind this is the same time zone? If you want to convert your localized timezone-aware timestamp to another timezone after localizing, you can use another kernel to do that? (a kernel we don't have yet, but one we should also add, although this is a trivial, metadata-only change) |
I was thinking only of the metadata change not "materialization" of a target timezone. Having another kernel to only change metadata seems redundant. |
I don't think that's redundant, as a user will regularly want to change the timezone of a timestamp column that is already localized (already has a timezone). I am of course coming from the point of view of pandas, where those two operations are defined as two distinct methods: tz_localize to convert from "timestamp without timezone" to "timestamp with timezone" (i.e. from naive clock time to tz-aware time) and tz_convert to convert between "timestamp with timezone" with different timezones (a metadata-only change). |
|
(if we keep both source and destination timezone in the "localize" kernel, I think at least the destination timezone, if not specified, should default to be the same as the source timezone, and not "UTC", so that a user doesn't have to specify the timezone they want twice, like |
|
I opened https://issues.apache.org/jira/browse/ARROW-13247 for the discussion whether we want a separate "change timezone" kernel |
I suppose these are two different operations indeed. I've changed this to only do "localization" now. I like
|
|
Since the Arrow spec is going to refer to "LocalDateTime", "Instant" and "ZonedDateTime", how about reusing terms from the spec? Something along the lines of |
It's nice that it's idiomatic if a bit verbose :) Let's add it to the list. If we don't get to consensus here we can always call a vote on the ML.
|
|
I like all except |
|
@rok I was checking out this PR, and in the process wrote python bindings for the option class to be able to test it in Python, so thought to directly push that here as well. (and at the same time also fixed (I think) the initialization of the timezone, which was causing the C++ tests to fail here as well) And from testing, you will still need to handle the errors raised by |
Thanks @jorisvandenbossche! The new option initialization was indeed problematic. |
7c1ebb5 to
c5a62a9
Compare
|
I'm going to suggest |
|
What would be a good way to reach consensus here? A doodle poll to ML to see what people find most intuitive? |
|
Well, I don't think anything is intuitive here, because the semantics of timestamps in Arrow are slightly weird and surprising :-) You may still ask for opinions on the ML, but I would personally go with |
You mean timestamps and timezones in general are weird (because the underlying problem is counter intuitive)? If not we should really make this as simple as the underlying complexity is. |
551cb2c to
76ad88f
Compare
|
@pitrou I've changed this to Would still be good to hear @jorisvandenbossche @adamhooper @westonpace |
|
@rok Thanks for asking! I prefer |
cpp/src/arrow/compute/api_scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be here. Is it deliberate or just a leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Probably a bad rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what this is for. Presumably this should be unreachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was just to satisfy the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the status but kept the return 0 to keep he compiler happy. Please check.
008bf23 to
87ddd31
Compare
|
CI fail seems unrelated. |
|
Ok, I've pushed some minor changes and will merge once CI is green. Thanks a lot for this @rok ! |
|
Let me rebase that. |
|
@rok I'm on it already. |
Work. Removing destination_timezone. fix initialization of timezone + add python bindings Renaming localize to tz_localize. Adding ambiguous and nonexistent handling. Removing R wrapper. Reintroducing R wrapper. Issue with date.h building on rtools4.0. Review feedback. tz_localize -> assume_timezone Review feedback. Changes to nonexistent handling. More changes to nonexistent handling. * Documentation changes * Simplify kernel generation
|
Thanks @jorisvandenbossche @pitrou @adamhooper @westonpace ! |
…(preserving clock-time) This is to resolve [ARROW-13033](https://issues.apache.org/jira/browse/ARROW-13033). Closes apache#10610 from rok/ARROW-13033 Authored-by: Rok <rok@mihevc.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This is to resolve ARROW-13033.