Skip to content

Conversation

@leezaj
Copy link
Contributor

@leezaj leezaj commented Dec 16, 2025

Adds char8_t as a byte-like type, which in turn enables C++20 std::u8string and std::u8string_view (aliases for std::basic_string<char8_t> and std::basic_string_view<char8_t>) objects to be passed in as spans. Right now, simdutf fails to compile if users pass these objects.

cppreference states (emphasis mine):

char8_t — type for UTF-8 character representation, required to be large enough to represent any UTF-8 code unit (8 bits). It has the same size, signedness, and alignment as unsigned char (and therefore, the same size and alignment as char and signed char), but is a distinct type. (since C++20)

This is enabled in >=C++20 only, and it won't affect older versions.

@lemire lemire requested a review from pauldreik December 16, 2025 15:42
@pauldreik
Copy link
Collaborator

Thanks for the fix! I think it looks good. I am however not sure if we should guard it with the __cpp_lib_char8_t feature macro?
The support seems to be pretty solid: https://cppstat.dev/?search=char8_t

I lean towards merging it as is (provided the CI tests go through).

@pauldreik
Copy link
Collaborator

All the tests go through. Let's merge it and if there are any problems, we fix them later.

@pauldreik pauldreik merged commit 34a8c31 into simdutf:master Dec 16, 2025
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants