Skip to content

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Sep 14, 2021

This is based on #11150.

Among the issues observed:

  • Dictionary columns aren't allowed even in the left data, though you can first Project to remove them
  • Duplicate column names aren't allowed at all, even though there is a provision for deduping with a prefix

@apache apache deleted a comment from github-actions bot Sep 14, 2021
@nealrichardson nealrichardson changed the title [WIP] ARROW-13995: [R] Bindings for join node ARROW-13995: [R] Bindings for join node Sep 20, 2021
@github-actions
Copy link

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few comments / suggestions. Once #11150 merges I'll rebase + push

r/R/dplyr-join.R Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a comment on this code, but it never occurred to me that it's a bit funny that by isn't tidy-evaled and must be character strings (in dplyr!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense for the by = c(col_x = "col_y") interface, but a bit(?) odd for by = "col" case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this when I rebase, but should we have tests for the error handling as well (e.g. columns named but not in x or y, etc.)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, there are lots of missing tests, and these datasets aren't ideal for testing joins. We may want to just use the datasets that dplyr uses in its tests or something.

@jonkeane jonkeane closed this in c273ea7 Sep 30, 2021
@nealrichardson nealrichardson deleted the join-dplyr branch October 1, 2021 16:19
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This is based on apache#11150.

Among the issues observed:

* Dictionary columns aren't allowed even in the left data, though you can first `Project` to remove them
* Duplicate column names aren't allowed at all, even though there is a provision for deduping with a prefix

Closes apache#11155 from nealrichardson/join-dplyr

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants