-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type #10176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Missing features at moment:
Different input / output types should also be handled. |
Unfortunately yes .. But since that's a lot more complicated, and I don't think we already have tz-related functionality, for now those kernels could maybe just raise an error if the timestamp type has a |
2c4968d to
2765cef
Compare
|
It seems tests on windows are not able to download IANA timezone database or access it post-download (https://howardhinnant.github.io/date/tz.html#Installation). I'll solve it but meanwhile I I'd like to ask for review. Open questions I have at the moment:
|
|
I've adopted some code from iso_week.h. Would it make more sense to add iso_week.h as a vendored library instead? Should we add a note to our license? (iso_week.h is MIT licensed). |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @rok. There's a bunch of things that look unexpected to me, see below.
|
General comment: the PR is quite big, and the timezone-related logic further added complexity. So from a workflow perspective (to ensure reviewing / getting this merged is manageable), could it make sense to keep the timezone-related logic for a separate PR? (didn't check the code to see whether that's actually feasible) |
Keeping tz-aware logic would be fairly easy to move to a separate PR. I'll move it out. |
f8c6665 to
23f51e7
Compare
7bcc910 to
69c5760
Compare
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Did a final pass through the docs.
Also, it would maybe be good to add a test for a timestamp that doesn't fit into the nanosecond range?
| {"values"}}; | ||
|
|
||
| const FunctionDoc millisecond_doc{ | ||
| "Extract millisecond values", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here the clarification about it being the number of milliseconds since the last full second, as you did for the C++ functions? (and same for microsecond/nanosecond below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| {"values"}}; | ||
|
|
||
| const FunctionDoc subsecond_doc{ | ||
| "Extract subsecond values", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also best to add the clarification of what subsecond is from the Subsecond docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| } | ||
| return {static_cast<int64_t>(static_cast<int32_t>(y)), | ||
| static_cast<int64_t>(trunc<weeks>(t - start).count() + 1), | ||
| static_cast<int64_t>(weekday(ymd).iso_encoding())}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the DayOfWeek implementation you subtract 1 (to have 0-6), while here not. I assume that's the ISO definition? If so, we should clearly document that difference in the ISOCalendar documentation (since now it seems to imply that the "day_of_week" field is the same as the "day_of_week" kernel. Maybe also prepend it with "iso_" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to docs and prepended.
4dc50cb to
ee53452
Compare
|
Thanks for the comments and suggestions @jorisvandenbossche! :) |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
Thank you for this PR @rok ! I believe this is ready for merging. I'll just wait for CI. |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
Just noticed I missed this:
Do you mind if I move it to ARROW-12980? |
|
Actually @jorisvandenbossche what do you mean by "doesn't fit into the nanosecond range"? |
|
Cool, this is a nice start for datetime kernels!
I meant a date that falls outside the date range of 1677-09-21 - 2262-04-11 (the range that ns resolution can cover). |
|
Thanks @jorisvandenbossche @pitrou for the help and feedback! I'm very glad this is merged! :) |
This is to resolve ARROW-11759.