-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14097: [C++] subtract(time, duration) -> time kernel #12139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a8888c1 to
3de6888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the PR title, and time makes more sense than duration as a result right?
Also, in that case, we need to validate that the result is not <0 or >=24h (see #12014).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah switching to subtract(time32/64, duration) -> time32/64. I suppose validation should a be part of the subtract kernel then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the validation. Still needs tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it doesn't make sense to cast time to duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Removed.
bcd65bd to
d8db736
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but is there any advantage to using the type matcher + computed output type vs the literal types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we could subtract a negative duration, we should also check if the output is larger than the maximum acceptable value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively - maybe this arithmetic should wrap around? Does that make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I wonder how expected this would be. But you're proposing something like this:
if (result < 0) {
result = result % multiple;
}
Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Though, I'm not sure if that really makes any sense to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels semantically correct but I'm going to check what Pandas and others do to see what conventions are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around stdlib datetime but it doesn't seem to implement this. However, I still think we need to check whether the result is in range on both ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this reference: https://bugs.python.org/issue1487389#msg54803
I think raising an error is OK. If we need wrap-around, we can add that as a separate kernel perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still raise an error if the result is >24:00 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this! Pandas appears delegate time to Python and rather work with timedelta which is like our duration.
I made this raise on time < 0 and time >= 24:00. Lets revisit if/when desired.
|
Sorry, I'll take a look at this today or Monday |
No worries, this is on me anyway :) |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just think we need to add a check for something like '23:59:59 - (-1s)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this reference: https://bugs.python.org/issue1487389#msg54803
I think raising an error is OK. If we need wrap-around, we can add that as a separate kernel perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still raise an error if the result is >24:00 though.
|
Looks like the linter is unhappy here. |
|
Benchmark runs are scheduled for baseline = 49d2a41 and contender = 11e9f26. 11e9f26 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
ARROW-14097