-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13995: [R] Bindings for join node #11155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
df4c33f to
e947054
Compare
8cf93e6 to
ec85e9f
Compare
ec85e9f to
174450c
Compare
jonkeane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few comments / suggestions. Once #11150 merges I'll rebase + push
r/R/dplyr-join.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a comment on this code, but it never occurred to me that it's a bit funny that by isn't tidy-evaled and must be character strings (in dplyr!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense for the by = c(col_x = "col_y") interface, but a bit(?) odd for by = "col" case
r/tests/testthat/test-dplyr-join.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this when I rebase, but should we have tests for the error handling as well (e.g. columns named but not in x or y, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, there are lots of missing tests, and these datasets aren't ideal for testing joins. We may want to just use the datasets that dplyr uses in its tests or something.
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
2ea4435 to
c922dff
Compare
This is based on apache#11150. Among the issues observed: * Dictionary columns aren't allowed even in the left data, though you can first `Project` to remove them * Duplicate column names aren't allowed at all, even though there is a provision for deduping with a prefix Closes apache#11155 from nealrichardson/join-dplyr Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: Jonathan Keane <jkeane@gmail.com> Signed-off-by: Jonathan Keane <jkeane@gmail.com>
This is based on #11150.
Among the issues observed:
Projectto remove them