Skip to content

Conversation

@Christian8491
Copy link
Contributor

This PR adds swapcase compute kernel for string. It is similar to Python str.swapcase()

@github-actions
Copy link

github-actions bot commented Aug 2, 2021

@Christian8491 Christian8491 marked this pull request as ready for review August 2, 2021 21:19
@edponce
Copy link
Contributor

edponce commented Aug 2, 2021

LGTM. Need to add implementation for UTF8.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Just a couple comments.

@Christian8491
Copy link
Contributor Author

@lidavidm please enable CI

@lidavidm
Copy link
Member

lidavidm commented Aug 3, 2021

For CI, it appears that Ubuntu 20.04 provides utf8proc 2.5, but the isupper/islower functions are not provided until 2.6: https://juliastrings.github.io/utf8proc/releases/

Meanwhile, RTools35 is using utf8proc 2.4; it builds utf8proc from source due to some other issue, but I believe it's using the system headers anyways looking at the compiler command since the system headers come before the self-built utf8proc ones.

I think bumping the minimum utf8proc version and fiddling with the build flags so that the utf8proc headers take precedence will be needed.

@edponce
Copy link
Contributor

edponce commented Aug 3, 2021

An alternative solution is to use helper functions already available, refer to https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1391-L1408

Related notes in Arrow w.r.t. to lower/upper casing of UTF8 can be found in:

@nealrichardson
Copy link
Member

Meanwhile, RTools35 is using utf8proc 2.4; it builds utf8proc from source due to some other issue, but I believe it's using the system headers anyways looking at the compiler command since the system headers come before the self-built utf8proc ones.

Yeah the logs say

-- Could NOT find utf8proc (missing: utf8proc_LIB) (found suitable version "2.4.0", minimum required is "2.2.0")

which doesn't make sense.

@Christian8491
Copy link
Contributor Author

As @edponce suggested, I replaced some utf8proc functions with helper functions. After all the CI passes. This PR is ready for review.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @Christian8491

@pitrou pitrou closed this in e4ba2f2 Aug 4, 2021
@edponce
Copy link
Contributor

edponce commented Aug 5, 2021

Some tests invoke the incorrect kernel (ASCII test uses utf8_capitalize and/or viceversa). These "typos" are fixed in #10869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants