-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-12946: [C++] String swap case kernel #10855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12946: [C++] String swap case kernel #10855
Conversation
|
LGTM. Need to add implementation for UTF8. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! Just a couple comments.
|
@lidavidm please enable CI |
|
For CI, it appears that Ubuntu 20.04 provides utf8proc 2.5, but the isupper/islower functions are not provided until 2.6: https://juliastrings.github.io/utf8proc/releases/ Meanwhile, RTools35 is using utf8proc 2.4; it builds utf8proc from source due to some other issue, but I believe it's using the system headers anyways looking at the compiler command since the system headers come before the self-built utf8proc ones. I think bumping the minimum utf8proc version and fiddling with the build flags so that the utf8proc headers take precedence will be needed. |
|
An alternative solution is to use helper functions already available, refer to https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1391-L1408 Related notes in Arrow w.r.t. to lower/upper casing of UTF8 can be found in: |
Yeah the logs say which doesn't make sense. |
|
As @edponce suggested, I replaced some |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @Christian8491
|
Some tests invoke the incorrect kernel (ASCII test uses |
This PR adds
swapcasecompute kernel for string. It is similar toPython str.swapcase()