Skip to content

Conversation

@nealrichardson
Copy link
Member

No description provided.

@github-actions
Copy link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the length of .groups is > 1, it's caught here, which raises an error and drops to the R level, which also raises an error.

However, the error message from dplyr isn't quite fit for our purposes - it mentions that rowwise is a valid value for .groups, which is true in dplyr but not in arrow. Is it worth catching this case ourselves and providing our own error message, which doesn't mention rowwise?

[Edited after reading the tests below] Or is this something worth living with, as the alternative would require significant work to implement for little actual benefit?

mtcars %>% 
   Table$create() %>%
   group_by(mpg) %>% 
   summarise(.groups = c("drop", "keep")) %>%
   collect()
# Warning: Error : .groups is not a string (a length one character vector).; pulling data into R
# Error: `.groups` can't be <chr>
# ℹ Possible values are NULL (default), "drop_last", "drop", "keep", and "rowwise"
# Run `rlang::last_error()` to see where the error occurred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestion, and I'll think some more about whether this is a problem we have anywhere else, but my thinking for leaving it as it is: (1) when querying on a dataset, it will just error and tell you to pull into R (where rowwise would work); (2) it's an experimental feature in dplyr so IDK how much we should worry about the smoothest experience when you go off the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes a lot of sense and good point about it being an experimental feature - sounds like something to leave for now but revisit if it comes up again elsewhere.

@thisisnic thisisnic self-requested a review September 20, 2021 12:49
Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
Closes apache#11184 from nealrichardson/summarize-groups

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants