utf16fix_block_rvv: improve mask shift #842

camel-cdr · 2025-10-09T22:13:16Z

This shifts the mask register directly, instead of shifting the vector elements and recomputing the mask.
It allows us to operate on a reduced LMUL, instead of the original 4 LMUL=8 instructions, it's now 3 LMUL=1/2, which should be a lot faster: https://godbolt.org/z/5Wc11z7f6

Sadly there are no dedicated mask shift instructions in RVV yet, so we have to emulate it with element slide and element-wise bit-shifts.
While this works, for right shifts it requires vl=vlmax, otherwise the vslide1down may shift in undefined bits.
So I adjusted the code to use the mask shift in the inner loop and handle the tail with the old behavior.

I also moved the implementation::* functions to the respective rvv_*.cpp files to keep the coding style consistent.

lemire · 2025-10-11T23:05:58Z

Will be in the next release

utf16fix_block_rvv: improve mask shift

5f319f2

lemire merged commit ad0dd9a into simdutf:master Oct 11, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

utf16fix_block_rvv: improve mask shift #842

utf16fix_block_rvv: improve mask shift #842

Uh oh!

camel-cdr commented Oct 9, 2025

Uh oh!

lemire commented Oct 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

utf16fix_block_rvv: improve mask shift #842

utf16fix_block_rvv: improve mask shift #842

Uh oh!

Conversation

camel-cdr commented Oct 9, 2025

Uh oh!

lemire commented Oct 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants