-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13548: [C++] Implement temporal difference kernels #10960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think Python doesn't have bindings for the interval types - that should be fixed in a separate issue I would think. |
|
Would it be straightforward to add a week variant of this that takes |
It should be doable. I'll update the PR. |
5bcb371 to
f4ce70e
Compare
|
@rok can you take a look here? |
|
I'll *rebase this in a little while |
55dcad6 to
342e6d8
Compare
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lidavidm !
Couple of comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a better suggestion, but boundaries sounds very implementation-like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
const FunctionDoc years_between_doc{
"Compute the number of years between two timestamps",
("Returns the number of year boundaries crossed from the first timestamp to the\n"
"second. That is, the difference is calculated as if the timestamps were\n"
"truncated to the year.\n"
"Null values emit null."),
{"start", "end"}};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better. How about using 'complete years passed/elapsed/..' instead of 'year boundaries crossed'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the reason why I'm using 'boundaries' is because the difference between 1998-12-01 and 1999-01-1 is '1 year' here even though a complete year hasn't passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought the entire year needs to pass. Right, boundaries are better yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could find the first week starting weekday after the starting date and then calculate the difference // 7?
4d17d71 to
6807df5
Compare
|
Changes:
|
|
I'll take another look tomorrow. But this already looks pretty good. |
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes you use _interval_ and sometimes you don't. I realize it's due to the type returned but it still bothers my eyes :). Probably best keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't sure how to disambiguate these. Note that month_interval_between and months_between are basically redundant, except MonthIntervalType is int32_t, not int64_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, is it useful to have a int64_t return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not particularly, except for consistency; that said for Python, we don't have the interval types bound yet, so there'll be a period where this kernel is unavailable if we remove this version. I think overall it's not worth keeping it though so I'll remove it.
66b79bc to
7dbd1ee
Compare
|
LGTM @lidavidm! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a recommended format for months? Should we use "M" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the existing formatting of the array printers, but M probably makes more sense/I can align these with strftime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I suppose that doesn't make any sense. M seems preferable to m though so I'll do that.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, is it useful to have a int64_t return?
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the "notes" column is empty, then can omit it.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"has a defined timezone" is less weird than "is zoned".
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "because the local year is the same even though the UTC years would be different".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MakeArrayFromScalar necessary or is the kernel able to broadcast a scalar input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, we may instead define a Validate() method on FunctionOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but "Extract" refers to component extraction, which this isn't doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about: Return the number of year boundaries crossed from `start` to `end`.
20742d5 to
5a5421d
Compare
Co-authored-by: Rok Mihevc <rok@mihevc.org>
5a5421d to
7124872
Compare
|
I've rebased this. |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @lidavidm !
Do we want to support this for time/date/32/64 at some point?
| static int64_t GetQuarters(const year_month_day& ymd) { | ||
| return static_cast<int64_t>(static_cast<int32_t>(ymd.year())) * 4 + | ||
| (static_cast<uint32_t>(ymd.month()) - 1) / 3; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use this function for the quarter extractor and delete this logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this:
arrow/cpp/src/arrow/compute/kernels/scalar_temporal.cc
Lines 405 to 410 in 7124872
| template <typename T, typename Arg0> | |
| T Call(KernelContext*, Arg0 arg, Status*) const { | |
| const auto ymd = | |
| year_month_day(floor<days>(localizer_.template ConvertTimePoint<Duration>(arg))); | |
| return static_cast<T>((static_cast<const uint32_t>(ymd.month()) - 1) / 3 + 1); | |
| } |
Thanks for the review. Yeah, let me add support for the other temporal types - I suppose all that's merged by now. |
|
I've added the support for date/time types (years, months, etc. don't accept time types). |
rok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| &default_day_of_week_options, DayOfWeekState::Init); | ||
| DCHECK_OK(registry->AddFunction(std::move(weeks_between))); | ||
|
|
||
| auto day_time_between = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: day_time_interval_between would be more consistent. Same for month_day_nano_interval_between. I don't feel strongly about it, just noticing :).
|
Thanks, fixed those issues. |
|
I think this is ready to go. |
Implements kernels to get the number of {years, months, days and millis, days, hours, minutes, seconds, millis, micros, nanos} between two timestamps. Everything returns either Int64 or DayTimeInterval.
Closes apache#10960 from lidavidm/arrow-13548
Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Implements kernels to get the number of {years, months, days and millis, days, hours, minutes, seconds, millis, micros, nanos} between two timestamps. Everything returns either Int64 or DayTimeInterval.