-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14810 [R] Implement bindings for lubridate's date_decimal() and decimal_date()
#12707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A few comments for you
|
|
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions, but this is looking good!
r/R/dplyr-funcs-datetime.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these be Expression$scalar(31622400L)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to do a bit of benchmarking (even micro bench with {bench}) on these implementation details (honestly, even the old implementation of doing strptime more...) I would be curious if any of them are better or worse for performance (my gut says this is quicker than strptime a bunch, but I might be totally wrong about that!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of scale are those numbers on? Could you post or link to the code you used to run that, I would be interested to see how you got there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary(test1, relative = TRUE)
#> A tibble: 2 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#> <bch:expr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#> 1 new_implementation 1 1 1.40 1 1.03 94 6 1.32s <tibble> <Rprofmem> <bench_tm> <tibble>
#> 2 old_implementation 1.46 1.46 1 1.27 1 92 8 1.81s <tibble> <Rprofmem> <bench_tm> <tibble>
test1
#> A tibble: 2 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
#> 1 new_implementation 12.7ms 13.3ms 71.1 76.1KB 4.54 94 6 1.32s <tibble> <Rprofmem> <bench_tm> <tibble>
#> 2 old_implementation 18.5ms 19.4ms 50.8 96.9KB 4.42 92 8 1.81s <tibble> <Rprofmem> <bench_tm> <tibble>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any reason to believe this would be different at larger scales, but for benchmarks like these we probably should use larger sets of data than a handful of rows. It depends on what we're looking at, but 10k-100k rows is more reasonable for something like this
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two tiny things! It looks like all the comments from Jon were addressed...it looks great!
|
Benchmark runs are scheduled for baseline = 633687c and contender = 80bba5c. 80bba5c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
| # if time1 or time2 are timestamps they cannot be expressed in "s" /seconds | ||
| # otherwise they cannot be added subtracted with durations | ||
| # TODO delete the casting to "us" once | ||
| # https://issues.apache.org/jira/browse/ARROW-16060 is solved | ||
| if (inherits(time1, "Expression") && | ||
| time1$type_id() %in% Type[c("TIMESTAMP")] && time1$type()$unit() != 2L) { | ||
| time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp("us"))) | ||
| } | ||
|
|
||
| if (inherits(time2, "Expression") && | ||
| time2$type_id() %in% Type[c("TIMESTAMP")] && time2$type()$unit() != 2L) { | ||
| time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp("us"))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this section since it looks like ARROW-16060 has already been solved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, feel free to open a Jira + PR to do so if we don't already have one

This would allow the following operations:
Created on 2022-03-28 by the reprex package (v2.0.1)