-
Notifications
You must be signed in to change notification settings - Fork 116
Convert latin1 to utf8 safe #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
…imdutf into convert_latin1_to_utf8_s
|
As soon as the test go green, we will merge and release soon after. |
| * @param utf8_len the maximum output length | ||
| * @return the number of written char; 0 if conversion is not possible | ||
| */ | ||
| simdutf_warn_unused size_t convert_latin1_to_utf8_safe(const char * input, size_t length, char* utf8_output, size_t utf8_len) noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I suggest keeping the name but changing the signature to take a std::span as output? that means the type system conveys the meaning. std span is C++20 which may be a no go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std span is C++20 which may be a no go.
We support C++11. We could add std::span support, but it would have to be optional (so, on top of the existing API). Note that this applies to our whole API. I will add an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done : #557
|
I added a fuzzer for this, it found nothing. I will make a PR later. In the meanwhile, it can be reviewed here: pauldreik@7bf34b6 |
|
@pauldreik Fantastic!!! |
|
I am merging this, but we will wait for the fuzzer before releasing. |
|
@ronag Note that the PR is merged with your credit:
|
|
there seems to be potential for improvement or optimization based on the coverage from the fuzzer (or possibly, the fuzzer is not able to generate all possible interesting input). ping @lemire |
|
Not sure what that means? Does it mean that it never gets there? It's a very edge casy condition that needs a very specific state to occur. |
yes. the light blue (and red) numbers are the hit counts from the fuzz corpus.
Can you provide me with an example of input data that gets there? |
|
@pauldreik Hmmmm.... What aboout the single byte |


This is an alternative PR to @ronag's PR (who should be fully credited for the work) at #554
It is basically @ronag's code with
_sis replaced by_safe.