-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-12714: [C++] String title case kernel #10869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12714: [C++] String title case kernel #10869
Conversation
213d532 to
2a5d59f
Compare
d83bc17 to
5a02758
Compare
|
@edponce Do you know when this will be ready for review? Or do you need help on this? |
|
@pitrou I am working on completing this PR today and would greatly appreciate your review. |
5a02758 to
6d23d4b
Compare
|
The capitalize and title kernels are the first vector string kernels that perform code point transforms. The code point transforms (case changes) can grow in bytes and thus required the use of cc @pitrou |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
78911fd to
02957f2
Compare
|
@ianmcook Could you revise the R binding for the titlecase kernel? |
0fff885 to
78bd427
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple more questions / comments
78bd427 to
5955c4e
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question, you may or may not want to act on it.
However, can you fix the lint failure? archery lint --clang-format should do it.
|
Thank you very much @edponce ! |
This PR adds scalar string compute functions for titlecasing a string, namely "ascii_title" and "utf8_title". Simple titlecasing is performed, only every cased character following an uncased character is uppercased. Additional changes included with this PR are: * restructure StringTransformCodepointXXX classes to support vector string kernels using codepoint transforms * update capitalize kernels Closes apache#10869 from edponce/ARROW-12714-String-title-case-kernel Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
This PR adds scalar string compute functions for titlecasing a string, namely "ascii_title" and "utf8_title". Simple titlecasing is performed, only every cased character following an uncased character is uppercased.
Additional changes included with this PR are: