Skip to content

Conversation

@camel-cdr
Copy link
Contributor

This shifts the mask register directly, instead of shifting the vector elements and recomputing the mask.
It allows us to operate on a reduced LMUL, instead of the original 4 LMUL=8 instructions, it's now 3 LMUL=1/2, which should be a lot faster: https://godbolt.org/z/5Wc11z7f6

Sadly there are no dedicated mask shift instructions in RVV yet, so we have to emulate it with element slide and element-wise bit-shifts.
While this works, for right shifts it requires vl=vlmax, otherwise the vslide1down may shift in undefined bits.
So I adjusted the code to use the mask shift in the inner loop and handle the tail with the old behavior.

I also moved the implementation::* functions to the respective rvv_*.cpp files to keep the coding style consistent.

@lemire
Copy link
Member

lemire commented Oct 11, 2025

Will be in the next release

@lemire lemire merged commit ad0dd9a into simdutf:master Oct 11, 2025
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants