-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13550: [R] Support .groups argument to dplyr::summarize() #11184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r/R/dplyr-summarize.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the length of .groups is > 1, it's caught here, which raises an error and drops to the R level, which also raises an error.
However, the error message from dplyr isn't quite fit for our purposes - it mentions that rowwise is a valid value for .groups, which is true in dplyr but not in arrow. Is it worth catching this case ourselves and providing our own error message, which doesn't mention rowwise?
[Edited after reading the tests below] Or is this something worth living with, as the alternative would require significant work to implement for little actual benefit?
mtcars %>%
Table$create() %>%
group_by(mpg) %>%
summarise(.groups = c("drop", "keep")) %>%
collect()
# Warning: Error : .groups is not a string (a length one character vector).; pulling data into R
# Error: `.groups` can't be <chr>
# ℹ Possible values are NULL (default), "drop_last", "drop", "keep", and "rowwise"
# Run `rlang::last_error()` to see where the error occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestion, and I'll think some more about whether this is a problem we have anywhere else, but my thinking for leaving it as it is: (1) when querying on a dataset, it will just error and tell you to pull into R (where rowwise would work); (2) it's an experimental feature in dplyr so IDK how much we should worry about the smoothest experience when you go off the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that makes a lot of sense and good point about it being an experimental feature - sounds like something to leave for now but revisit if it comes up again elsewhere.
thisisnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
27aa78a to
67c4f89
Compare
Closes apache#11184 from nealrichardson/summarize-groups Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
No description provided.