The changes in #140388 use SWAR techniques to processes 8 bytes at once. If this is performance critical, let's evaluate writing an implementation using the Panama Vector API, so that we can leverage wider registers/instructions.
A useful place to start is to look at how ESVectorUtil::indexOf is implemented. I would expectcodePointCount to be implemented in a similar way.
The changes in #140388 use SWAR techniques to processes 8 bytes at once. If this is performance critical, let's evaluate writing an implementation using the Panama Vector API, so that we can leverage wider registers/instructions.
A useful place to start is to look at how
ESVectorUtil::indexOfis implemented. I would expectcodePointCountto be implemented in a similar way.