-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14817 [R] Implement bindings for lubridate::tz()
#12357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
suppressMessages(library(lubridate))
suppressMessages(library(dplyr))
tibble(
a = "2022-02-07"
) %>%
mutate(timezone = tz(a))
#> # A tibble: 1 × 2
#> a timezone
#> <chr> <chr>
#> 1 2022-02-07 UTCCreated on 2022-02-07 by the reprex package (v2.0.1) Not sure we want to support all of them. Albeit, it defaults to |
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. This is looking good (and nice and simple, which is fantastic!) I have a few questions for you so far
|
To add to the conversation above. I think the right way to go is to extract timezone from the timestamp metadata and not look at each individual value. However, that could result in a situation where a value is NA, but the column is a suppressMessages(library(arrow))
suppressMessages(library(dplyr))
df <- tibble(
x = as.POSIXct(c("2022-02-07", NA), tz = "Pacific/Marquesas")
)
df %>%
record_batch() %>%
mutate(timezone_x = tz(x)) %>%
collect()
#> A tibble: 2 × 2
#> x timezone_x
#> <dttm> <chr>
#> 1 2022-02-07 00:00:00 Pacific/Marquesas
#> 2 NA Pacific/MarquesasCreated on 2022-02-10 by the reprex package (v2.0.1) |
…snapshot()` for the error chunk
|
I'm going to answer these two points below together, since they are similar. Both have tradeoffs, but what I do in circumstances like this is to look through our other bindings and see if I can find examples that do similar things.
I agree that we should use the column metadata here (which is what it looks like you're doing), but then wrap it in an if_else detection if any elements are arrow/r/R/dplyr-funcs-conditional.R Lines 51 to 61 in af33dd1
is.finite() and is.infinite() Lines 239 to 249 in af33dd1
One example of having an error in a binding is arrow/r/R/dplyr-funcs-conditional.R Lines 86 to 88 in af33dd1
|
…NA`, lubridate returns `"UTC"`
|
@jonkeane Thanks for your comments. On surfacing errors: for example, I used |
|
Worth noting that the current implementation diverges from suppressMessages(library(lubridate))
tz(NA)
#> [1] "UTC"
tz(2)
#> Warning: tz(): Don't know how to compute timezone for object of class numeric;
#> returning "UTC". This warning will become an error in the next major version of
#> lubridate.
#> [1] "UTC"
tz(2.2)
#> Warning: tz(): Don't know how to compute timezone for object of class numeric;
#> returning "UTC". This warning will become an error in the next major version of
#> lubridate.
#> [1] "UTC"
tz("string")
#> [1] "UTC"Created on 2022-02-10 by the reprex package (v2.0.1) |
|
suppressMessages(library(arrow))
suppressMessages(library(lubridate))
suppressMessages(library(dplyr))
df <- tibble(
x = as.POSIXct(c("2022-02-07", NA), tz = "Pacific/Marquesas")
)
df %>%
record_batch() %>%
mutate(timezone_x = tz(x)) %>%
collect()
#> # A tibble: 2 × 2
#> x timezone_x
#> <dttm> <chr>
#> 1 2022-02-07 00:00:00 UTC
#> 2 NA NA
df %>%
mutate(timezone_x = tz(x))
#> # A tibble: 2 × 2
#> x timezone_x
#> <dttm> <chr>
#> 1 2022-02-07 00:00:00 Pacific/Marquesas
#> 2 NA Pacific/Marquesas
Created on 2022-02-10 by the reprex package (v2.0.1) |
I think this is totally fine. For at least the numerics, lubridate will soon be doing what we do. For the others (strings, and bare NAs) it doesn't look like they are going to deprecate them, but I don't think we need to also follow that. We should note that (at least in a comment in the code, or in our documentation for the bindings if there's a place to put it).
Yeah, we have a choice here. I'm actually a bit surprised that Given that, we probably should take out the |
lubridate::tz()
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting there! A few more comments and suggestions.
| collect(), | ||
| df | ||
| ), | ||
| error = TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're looking for here (and below too) is:
| error = TRUE | |
| warning = TRUE |
That will ensure that the warning we're getting is the "not supported" kind, and then compare the results after that.
arrow/r/tests/testthat/helper-expectation.R
Lines 81 to 85 in 9678566
| #' @param warning The expected warning from the RecordBatch and Table comparison | |
| #' paths, passed to `expect_warning()`. Special values: | |
| #' * `NA` (the default) for ensuring no warning message | |
| #' * `TRUE` is a special case to mean to check for the | |
| #' "not supported in Arrow; pulling data into R" message. |
error = TRUE is being passed via ... to expect_equal() which also has ... which ends up being silently ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH, sorry I was off one level of the nesting. I might be misunderstanding the output that you're seeing there, but I interpret that as the snapshot is recording that there's no output (i.e. the expectation has passed). Which is what we expect with warning = TRUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use warning = TRUE and compare_dplyr_binding() (and no longer use expect_snapshot()), but we will no longer be testing the error messages (which we worked to improve only because they were visible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that if we want to check the error messages, we should do that directly with expect_error() or expect_snapshot() wrapped around something like:
call_binding("tz", "this is a string!")
Or we might need to turn that R string into an Arrow Scalar or Array...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was testing was the warning in compare_dplyr_warning() being promoted to error level. It is the message for this error we improved which otherwise would not have been visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think something like call_binding("tz", ...) works because whatever we pass via the ... needs to be an Expression. Which I think is the reason why arrow::tz() only works in a dplyr context, i.e. it doesn't work on standalone Arrays. That was why I was testing errors with dplyr pipelines.
…snapshot()` for the error chunk
…NA`, lubridate returns `"UTC"`
792f4ae to
3bff651
Compare
|
This is very very very close! I rebased off of the default branch since #12447 merged to get that fix. I also updated the tests to mostly be the more simple R object in the I did notice when testing locally (and I assume this'll pop up in the CI too), I think there might be a slight problem with #12447. It looks like I think that Line 81 in f9f2c08
should actually be It might be nice to add a test that asserts not just that arrow/r/tests/testthat/test-type.R Line 238 in f9f2c08
could also be tested with something like It's a bit muddying this PR, but I'm happy for you to make those small adjustments here rather than a whole new Jira + commit chain + another wait to rebase after that's fixed. Other than that, this PR looks ready to go! |
thisisnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the update to type.Expression, looks like this is all done now.
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
|
Benchmark runs are scheduled for baseline = 5216c2b and contender = 0eaafe8. 0eaafe8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |

This PR should make the following snippets of code equivalent:
and