converting binary data to base64 with lines #840

lemire · 2025-09-27T05:05:00Z

Add a new function that can output base64 outputs with line separators. By default we use lines of length 76. So far, I have added x64 and ARM support. It comes with new tests and benchmarks.

The implementation is not too difficult. We have a fast path for the case where the line length is long enough (say 64 characters) otherwise we fall back on a slow approach (since we assume that short lines are uncommon).

/**
 * Convert a binary input to a base64 output with line breaks.
 *
 * The default option (simdutf::base64_default) uses the characters `+` and `/`
 * as part of its alphabet. Further, it adds padding (`=`) at the end of the
 * output to ensure that the output length is a multiple of four.
 *
 * The URL option (simdutf::base64_url) uses the characters `-` and `_` as part
 * of its alphabet. No padding is added at the end of the output.
 *
 * This function always succeeds.
 *
 * The default line length is default_line_length (76)
 *
 * @param input         the binary to process
 * @param length        the length of the input in bytes
 * @param output        the pointer to a buffer that can hold the conversion
 * result (should be at least base64_length_from_binary(length) bytes long)
 * @param line_length   the length of lines, must be at least 4 (otherwise it is interpreted as 4),
 * @param options       the base64 options to use, can be base64_default or
 * base64_url, is base64_default by default.
 * @return number of written bytes, will be equal to
 * base64_length_from_binary(length, options)
 */
size_t binary_to_base64_with_lines(const char *input, size_t length, char *output,
                        size_t line_length = simdutf::default_line_length,
                        base64_options options = base64_default) noexcept;

Copilot

Pull Request Overview

This PR adds a new function binary_to_base64_with_lines that outputs base64 encoding with line separators. The implementation provides optimized paths for x64 and ARM architectures with both fast and slow execution paths depending on line length.

Adds binary_to_base64_with_lines function with configurable line length (default 76 characters)
Implements optimized SIMD versions for various architectures (x64, ARM, etc.)
Updates all existing base64 function calls to use the new namespaced simdutf:: versions

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/random_fuzzer.cpp	Updated function calls to use simdutf namespace
tests/null_safety_tests.cpp	Updated function call to use simdutf namespace
tests/base64_tests.cpp	Added comprehensive tests for line-based encoding and updated namespace usage
tests/atomic_base64_tests.cpp	Updated function calls to use simdutf namespace
src/westmere/sse_base64.cpp	Added SIMD implementation with line support for SSE
src/westmere/implementation.cpp	Added wrapper for new line-based function
src/scalar/base64.h	Added templated implementation supporting line breaks
src/implementation.cpp	Moved base64 length functions to global namespace and added line support
include/simdutf/implementation.h	Added new API declarations and global constants
benchmarks/base64/benchmark_base64.cpp	Added benchmarks for line-based encoding

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/westmere/sse_base64.cpp

benchmarks/base64/node_base64.h

tests/base64_tests.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

pauldreik

I took the liberty to push three minor fixes, hope you agree.

include/simdutf/implementation.h

src/implementation.cpp

lemire · 2025-10-15T03:59:58Z

@pauldreik Always happy.

I optimized the code further. It is now pretty faster!!! Generating lines is not free, but it is now pretty decent.

pauldreik · 2025-10-15T07:16:19Z

I imagine using the default line length is a much more common usecase than other line lengths. Is there any significant runtime performance to be gained by knowing it at compile time?

include/simdutf/implementation.h

pauldreik · 2025-10-15T18:51:10Z

I started working on extending the fuzzer but too tired to finish it today. I might have found a problem.

lemire · 2025-10-15T20:04:27Z

@pauldreik We can wait for the fuzzer prior to a release.

lemire · 2025-10-16T02:55:05Z

@pauldreik It is green, but let us wait for the fuzzer.

pauldreik · 2025-10-16T06:21:46Z

I pushed a very basic extension of the existing fuzzer, just checking the return value and it does not match. for the case I found (almost instantly) with line length 5, I surprisingly got 9 returned from binary_to_base64_with_lines(), 11 from base64_length_from_binary_with_lines() and 10 from base64_length_from_binary().

fuzz/base64.cpp

lemire · 2025-10-16T16:00:23Z

@pauldreik What you caught was that base64url was untested (I had written reasonable code but missed cases).

no need to change the option size, there was one spare byte left

pauldreik · 2025-10-16T18:50:33Z

I have run the fuzzer for a short while on arm64 and amd64 - seems to work fine. I think this is good to merge now!

lemire and others added 6 commits September 27, 2025 01:04

converting binary data to base64 with lines

45ce96f

lint and optimisation

71fa4d3

Lint

1826998

sse

18ebf34

avx2

1b8b7e7

complete

9cf0e21

lemire changed the title ~~converting binary data to base64 with lines [WIP]~~ converting binary data to base64 with lines Oct 13, 2025

This was referenced Oct 13, 2025

implement conversion of binary data to base64 with lines for RISC-V processors #843

Open

implement conversion of binary data to base64 with lines for Loonson processors #844

Open

lemire requested review from Copilot and pauldreik October 13, 2025 21:45

Merge branch 'master' into base64_with_lines

2b48b7f

Copilot AI reviewed Oct 13, 2025

View reviewed changes

src/westmere/sse_base64.cpp Show resolved Hide resolved

benchmarks/base64/node_base64.h Outdated Show resolved Hide resolved

tests/base64_tests.cpp Show resolved Hide resolved

lemire and others added 5 commits October 13, 2025 17:58

Update benchmarks/base64/node_base64.h

cfb94e8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

minor fixes

5ca145e

readd implementation::base64_length_from_binary implementation

d1bd971

add unit test for the base64 span api

cb26b73

fix mistake in base64 span api

d878e0f

pauldreik force-pushed the base64_with_lines branch from 77bdf70 to d878e0f Compare October 14, 2025 19:36

pauldreik reviewed Oct 14, 2025

View reviewed changes

include/simdutf/implementation.h Outdated Show resolved Hide resolved

src/implementation.cpp Show resolved Hide resolved

Daniel Lemire and others added 3 commits October 14, 2025 17:35

fixing bad change in RISC-V kernel

00b9c2d

various optimizations

d217935

minor correction

b2e4731

pauldreik reviewed Oct 15, 2025

View reviewed changes

include/simdutf/implementation.h Outdated Show resolved Hide resolved

include/simdutf/implementation.h Outdated Show resolved Hide resolved

lemire added 2 commits October 15, 2025 16:02

minor fixes

bb084fb

cleaning

1590e34

lint

6e9afc0

extend base64 fuzzer to exercise line splitting

797a801

github-advanced-security bot found potential problems Oct 16, 2025

View reviewed changes

fuzz/base64.cpp Fixed Show fixed Hide fixed

fix issue with base64url

f5c58e6

pauldreik added 4 commits October 16, 2025 20:16

finalize the fuzzer for line splitted base64

edabd59

fix slight mistake in function documentation

6d5091e

make line length variable in base64 fuzzer

cf9d407

no need to change the option size, there was one spare byte left

fix mistake in readme

66c0441

pauldreik approved these changes Oct 16, 2025

View reviewed changes

lemire merged commit cb049de into master Oct 16, 2025
69 checks passed

converting binary data to base64 with lines #840

converting binary data to base64 with lines #840

Uh oh!

Conversation

lemire commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pauldreik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lemire commented Oct 15, 2025

Uh oh!

pauldreik commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

pauldreik commented Oct 15, 2025

Uh oh!

lemire commented Oct 15, 2025

Uh oh!

lemire commented Oct 16, 2025

Uh oh!

pauldreik commented Oct 16, 2025

Uh oh!

Uh oh!

lemire commented Oct 16, 2025

Uh oh!

pauldreik commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lemire commented Sep 27, 2025 •

edited

Loading