Skip to content

Conversation

@thisisnic
Copy link
Member

No description provided.

@github-actions
Copy link

@ianmcook
Copy link
Member

Do you think there's any way to define a str_locate function to match the behavior of stringr::str_locate using both the find_substring and utf8_length kernels?

I think the answer is no because this would require returning a struct column, and that's not possible? Is this limitation related to the issue in ARROW-13149 that @jonkeane was discussing recently?

@jonkeane
Copy link
Member

On the struct column front, returning a struct column is totally fine right now (making a struct from R is a bit more complicated though).

However, right now more likely than not that struct will bet automatically turned in to a tibble (or at least that will be attempted) when pulling it into R. ARROW-13149 aims to make it so that structs will return as named lists instead of always being converted into tibbles automatically.

All of this is to say, I think it would be totally fine for this to a return a struct right now (so long as it doesn't return something super bad when attempting to be turned into a tibble) and so long as we are ok with the R side of what the results come out as (hopefully) changing soon

@ianmcook
Copy link
Member

Cool, thanks @jonkeane.

IIUC, to achieve this we would need to use the project kernel, which takes ProjectOptions and isn't implemented in the R bindings yet. This seems be beyond the scope of what's reasonable to achieve in this PR. Should we create a separate Jira for that?

@ianmcook
Copy link
Member

I created ARROW-13165 to follow up on the project kernel, with a comment in the Jira about using it to implement a binding for str_locate().

@thisisnic
Copy link
Member Author

@jonkeane @ianmcook I agree with what you both say here, that all makes sense. So, is the plan to merge this as-is, and then look at creating bindings for str_locate once the other tickets are implemented?

@ianmcook
Copy link
Member

So, is the plan to merge this as-is, and then look at creating bindings for str_locate once the other tickets are implemented?

Yes, if that sounds 👍 to you

@thisisnic
Copy link
Member Author

Sounds like a plan!

@ianmcook
Copy link
Member

Meanwhile it looks like ARROW-13157 has a PR, so if we wait a few days for that to get merged, then we can remove the skipped test in this PR before we merge it.

@thisisnic
Copy link
Member Author

Meanwhile it looks like ARROW-13157 has a PR, so if we wait a few days for that to get merged, then we can remove the skipped test in this PR before we merge it.

Let's just do that then!

@ianmcook ianmcook force-pushed the ARROW-12868_str_locate branch from a0ea147 to 9f4b3d7 Compare June 28, 2021 18:33
@ianmcook
Copy link
Member

The PR for ARROW-13157 is merged, so I rebased and removed the skipped test. I'll merge when the CI is green.

@ianmcook ianmcook changed the title ARROW-12868: [R] find_substring ARROW-12868: [R] find_substring and find_substring_regex Jun 28, 2021
@ianmcook ianmcook changed the title ARROW-12868: [R] find_substring and find_substring_regex ARROW-12868: [R] Bindings for find_substring and find_substring_regex Jun 28, 2021
@ianmcook
Copy link
Member

I incorporated the find_substring_regex kernel added in ARROW-13157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants