Skip to content

Conversation

@pauldreik
Copy link
Collaborator

@pauldreik pauldreik commented Dec 26, 2025

there was a bug introduced in version 7.7.0 https://github.com/simdutf/simdutf/pull/863/files#diff-d1273109039f1fb5e518645f4a1560e70c431b5dab1f8ace04ed6852f1dc3f6aR2161-R2169

the bug only matters when all of these conditions are met:

  1. utf8_length_from_utf16be_with_replacement() would give another answer than utf8_length_from_utf16be() (requires invalid input)
  2. the host system is big endian

the api guarantee for utf8_length_from_utf16be() is: (quoting from the documentation)

This function does not validate the input. It is acceptable to pass invalid UTF-16 strings but in such cases the result is implementation defined.

this is strictly speaking not wrong.

the test added in this PR fails when running on s390x (big endian), but succeeds with the bug fix applied.

@pauldreik pauldreik force-pushed the fix_utf8_length_from_utf16 branch from 0db0afc to 1638f45 Compare December 26, 2025 12:16
this is strictly not a violation of the api, since it only happens
on invalid input where the result is implementation defined.
@pauldreik pauldreik force-pushed the fix_utf8_length_from_utf16 branch from 1638f45 to 7a19b1d Compare December 26, 2025 13:22
@pauldreik pauldreik marked this pull request as ready for review December 26, 2025 13:22
@pauldreik pauldreik requested a review from lemire December 26, 2025 13:23
@lemire lemire merged commit f7f8521 into master Dec 28, 2025
75 checks passed
@lemire
Copy link
Member

lemire commented Dec 28, 2025

Thanks for catching this.

@pauldreik pauldreik deleted the fix_utf8_length_from_utf16 branch December 28, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants