-
Notifications
You must be signed in to change notification settings - Fork 115
converting binary data to base64 with lines #840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new function binary_to_base64_with_lines that outputs base64 encoding with line separators. The implementation provides optimized paths for x64 and ARM architectures with both fast and slow execution paths depending on line length.
- Adds
binary_to_base64_with_linesfunction with configurable line length (default 76 characters) - Implements optimized SIMD versions for various architectures (x64, ARM, etc.)
- Updates all existing base64 function calls to use the new namespaced
simdutf::versions
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/random_fuzzer.cpp | Updated function calls to use simdutf namespace |
| tests/null_safety_tests.cpp | Updated function call to use simdutf namespace |
| tests/base64_tests.cpp | Added comprehensive tests for line-based encoding and updated namespace usage |
| tests/atomic_base64_tests.cpp | Updated function calls to use simdutf namespace |
| src/westmere/sse_base64.cpp | Added SIMD implementation with line support for SSE |
| src/westmere/implementation.cpp | Added wrapper for new line-based function |
| src/scalar/base64.h | Added templated implementation supporting line breaks |
| src/implementation.cpp | Moved base64 length functions to global namespace and added line support |
| include/simdutf/implementation.h | Added new API declarations and global constants |
| benchmarks/base64/benchmark_base64.cpp | Added benchmarks for line-based encoding |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
77bdf70 to
d878e0f
Compare
pauldreik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty to push three minor fixes, hope you agree.
|
@pauldreik Always happy. I optimized the code further. It is now pretty faster!!! Generating lines is not free, but it is now pretty decent. |
|
I imagine using the default line length is a much more common usecase than other line lengths. Is there any significant runtime performance to be gained by knowing it at compile time? |
|
I started working on extending the fuzzer but too tired to finish it today. I might have found a problem. |
|
@pauldreik We can wait for the fuzzer prior to a release. |
|
@pauldreik It is green, but let us wait for the fuzzer. |
|
I pushed a very basic extension of the existing fuzzer, just checking the return value and it does not match. for the case I found (almost instantly) with line length 5, I surprisingly got 9 returned from binary_to_base64_with_lines(), 11 from base64_length_from_binary_with_lines() and 10 from base64_length_from_binary(). |
|
@pauldreik What you caught was that base64url was untested (I had written reasonable code but missed cases). |
no need to change the option size, there was one spare byte left
|
I have run the fuzzer for a short while on arm64 and amd64 - seems to work fine. I think this is good to merge now! |
Add a new function that can output base64 outputs with line separators. By default we use lines of length 76. So far, I have added x64 and ARM support. It comes with new tests and benchmarks.
The implementation is not too difficult. We have a fast path for the case where the line length is long enough (say 64 characters) otherwise we fall back on a slow approach (since we assume that short lines are uncommon).