-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14942: [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds #12855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dragosmg mind reviewing this one? |
|
I've been thinking a bit about this. Do you think it's worth having a helper function (to avoid all the repetition), something like make_duration <- function(x, unit) {
x <- build_expr("cast", x, options = cast_options(to_type = int64()))
x$cast(duration(unit))
} |
|
Sure, makes sense 👍 Will do. |
dragosmg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Many thanks. @thisisnic @jonkeane would you mind taking a look and merging the PR.
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I have one substantive comment about test additions, one small suggestion, and a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test what happens when we pass floats here too?
> lubridate::dseconds(1.5)
[1] "1.5s"
Seems to work, so we should ensure we can do that (or error helpfully if we can't for some reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, thanks for this!
Will search for discussions Dragos already had about casting float -> duration, then test and see =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As duration type in Arrow is int64 and we can't pass floats here I will go with erroring helpfully. Will add it in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW-16253 might be relevant here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for the error when the argument multiplied with the value of the multiplication factor of the duration helper function is float (went with easier solution - didn't go forward with forcing evaluation to check for type of an argument or try catching C++ error).
|
@jonkeane I tried to address all the comments and I think the PR is ready for another review. |
|
The errors do not look related ... |
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thank you so much for the work on this.
I have one small question about possibly adding a comment — let me know if you want to add that and I'll wait to merge
| duration_helpers_map_factory <- function(value, unit) { | ||
| force(value) | ||
| force(unit) | ||
| function(x = 1) make_duration(x * value, unit) | ||
| } | ||
|
|
||
| for (name in names(.helpers_function_map)) { | ||
| register_binding( | ||
| name, | ||
| duration_helpers_map_factory( | ||
| .helpers_function_map[[name]][[1]], | ||
| .helpers_function_map[[name]][[2]] | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is actually even shorter than I though it would be!
| # double -> duration not supported in Arrow. | ||
| # Error is generated in the C++ code | ||
| expect_error( | ||
| test_df %>% | ||
| arrow_table() %>% | ||
| mutate(r_obj_dminutes = dminutes(1.12345)) %>% | ||
| collect() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment about why we are expect_error() but not actually asserting it (since this is all C++). 💯
| ) %>% | ||
| collect(), | ||
| example_d, | ||
| ignore_attr = TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see this in the PR (though might have missed something), what attr are we ignoring? Maybe we should add a comment about what we're using that for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment, you can wait with merging. But I have to remember, if I am honest =) Will do it tomorrow morning and add the comment then.
Thank you for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using ignore_attr = TRUE due to the diff in attributes package, units and class: (difftime vs Duration). I added a comment about it in the beginning of both tests.
|
Errors do not seem to be related to this PR. |
|
Benchmark runs are scheduled for baseline = 0ce8ce8 and contender = c4b646e. c4b646e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
['Python', 'R'] benchmarks have high level of regressions. |
This PR adds bindings for lubridate's
dseconds,dmilliseconds,dmicrosecondsanddnanoseconds.As
picosecondsare not supported by duration in Arrow and duration is of integer type, the call topicoseconds()raises a warning.