-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11515: [R] Bindings for strsplit #10190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3118928 to
6e6addc
Compare
b7953fa to
5615486
Compare
|
This PR now also implements |
Co-authored-by: Ian Cook <ianmcook@gmail.com>
|
I'm going to do a bit more testing and polishing and then merge this after the CI is green if I don't hear any objections |
Update: I found a few issues when reviewing which I'm going to try to fix now; I will describe in comments soon |
|
I think this is ready to merge now! The R 3.5 CI failure was happening in other recent PRs so I assume it's unrelated to any changes here. The commits I added yesterday were mostly just minor things. @thisisnic if you have questions or uncertainty about any of the commits I added, please let me know today |
Will do! |
|
|
||
| if (func_name == "split_pattern") { | ||
| using Options = arrow::compute::SplitPatternOptions; | ||
| int64_t max_splits = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than setting defaults manually, can you do auto out = std::make_shared<Options>(Options::Defaults()); and then just set them if they're provided? (We do this above for other types.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this didn't work when I tried it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that there is no Defaults() method in arrow::compute::SplitPatternOptions. Apparently some of the options classes just don't have one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They probably should be added in the C++ library, can you do that and/or make another JIRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thisisnic could you make a Jira for this please? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this, @nealrichardson and @ianmcook , it may make sense for arrow::compute::SplitPatternOptions not to have defaults, given that the pattern is one of the options - it might not make sense to pre-specify a pattern to split the string on? Neither of the R functions strsplit nor stringr::str_split have defaults for this argument.
Could one perhaps make a case for pattern moving from being an option to instead being a second input, similarly to how arrow::compute::filter and arrow::compute::take operate? At that point, having defaults for SplitPatternOptions feels more natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all makes good sense to me but I'm not up to speed on what design conventions we follow in the C++ compute functions. Maybe it'd be best to start a Zulip thread to ask folks about this (Joris, Antonie, Maarten Breddels) to get a better idea of the best change to suggest (if any).
|
I opened #10271 to resolve the outstanding comments here. @nealrichardson PTAL |
This resolves a few outstanding comments raised in #10190 Closes #10271 from ianmcook/ARROW-12692 Authored-by: Ian Cook <ianmcook@gmail.com> Signed-off-by: Ian Cook <ianmcook@gmail.com>
This PR adds bindings for both
strsplit()andstringr::str_split()for dplyr