Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Sep 27, 2025

Add a new function that can output base64 outputs with line separators. By default we use lines of length 76. So far, I have added x64 and ARM support. It comes with new tests and benchmarks.

The implementation is not too difficult. We have a fast path for the case where the line length is long enough (say 64 characters) otherwise we fall back on a slow approach (since we assume that short lines are uncommon).

/**
 * Convert a binary input to a base64 output with line breaks.
 *
 * The default option (simdutf::base64_default) uses the characters `+` and `/`
 * as part of its alphabet. Further, it adds padding (`=`) at the end of the
 * output to ensure that the output length is a multiple of four.
 *
 * The URL option (simdutf::base64_url) uses the characters `-` and `_` as part
 * of its alphabet. No padding is added at the end of the output.
 *
 * This function always succeeds.
 *
 * The default line length is default_line_length (76)
 *
 * @param input         the binary to process
 * @param length        the length of the input in bytes
 * @param output        the pointer to a buffer that can hold the conversion
 * result (should be at least base64_length_from_binary(length) bytes long)
 * @param line_length   the length of lines, must be at least 4 (otherwise it is interpreted as 4),
 * @param options       the base64 options to use, can be base64_default or
 * base64_url, is base64_default by default.
 * @return number of written bytes, will be equal to
 * base64_length_from_binary(length, options)
 */
size_t binary_to_base64_with_lines(const char *input, size_t length, char *output,
                        size_t line_length = simdutf::default_line_length,
                        base64_options options = base64_default) noexcept;

@lemire lemire changed the title converting binary data to base64 with lines [WIP] converting binary data to base64 with lines Oct 13, 2025
@lemire lemire requested review from Copilot and pauldreik October 13, 2025 21:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new function binary_to_base64_with_lines that outputs base64 encoding with line separators. The implementation provides optimized paths for x64 and ARM architectures with both fast and slow execution paths depending on line length.

  • Adds binary_to_base64_with_lines function with configurable line length (default 76 characters)
  • Implements optimized SIMD versions for various architectures (x64, ARM, etc.)
  • Updates all existing base64 function calls to use the new namespaced simdutf:: versions

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/random_fuzzer.cpp Updated function calls to use simdutf namespace
tests/null_safety_tests.cpp Updated function call to use simdutf namespace
tests/base64_tests.cpp Added comprehensive tests for line-based encoding and updated namespace usage
tests/atomic_base64_tests.cpp Updated function calls to use simdutf namespace
src/westmere/sse_base64.cpp Added SIMD implementation with line support for SSE
src/westmere/implementation.cpp Added wrapper for new line-based function
src/scalar/base64.h Added templated implementation supporting line breaks
src/implementation.cpp Moved base64 length functions to global namespace and added line support
include/simdutf/implementation.h Added new API declarations and global constants
benchmarks/base64/benchmark_base64.cpp Added benchmarks for line-based encoding

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@pauldreik pauldreik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty to push three minor fixes, hope you agree.

@lemire
Copy link
Member Author

lemire commented Oct 15, 2025

@pauldreik Always happy.

I optimized the code further. It is now pretty faster!!! Generating lines is not free, but it is now pretty decent.

@pauldreik
Copy link
Collaborator

I imagine using the default line length is a much more common usecase than other line lengths. Is there any significant runtime performance to be gained by knowing it at compile time?

@pauldreik
Copy link
Collaborator

I started working on extending the fuzzer but too tired to finish it today. I might have found a problem.

@lemire
Copy link
Member Author

lemire commented Oct 15, 2025

@pauldreik We can wait for the fuzzer prior to a release.

@lemire
Copy link
Member Author

lemire commented Oct 16, 2025

@pauldreik It is green, but let us wait for the fuzzer.

@pauldreik
Copy link
Collaborator

I pushed a very basic extension of the existing fuzzer, just checking the return value and it does not match. for the case I found (almost instantly) with line length 5, I surprisingly got 9 returned from binary_to_base64_with_lines(), 11 from base64_length_from_binary_with_lines() and 10 from base64_length_from_binary().

@lemire
Copy link
Member Author

lemire commented Oct 16, 2025

@pauldreik What you caught was that base64url was untested (I had written reasonable code but missed cases).

@pauldreik
Copy link
Collaborator

I have run the fuzzer for a short while on arm64 and amd64 - seems to work fine. I think this is good to merge now!

@lemire lemire merged commit cb049de into master Oct 16, 2025
69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants