introducing a thread-local parser by lemire · Pull Request #2412 · simdjson/simdjson

lemire · 2025-08-11T20:45:10Z

This modifies @the-moisrex's code slightly so that we can default on a thread-local parser. This should drastically improve performance in some cases.

the-moisrex · 2025-08-11T20:58:59Z

Is it an attempt to re-use the parser in these situations?

for (auto val : from(json_string_1)) { ... }
for (auto val : from(json_string_2)) { ... }

Because this would be cool, but wish we wouldn't need to allocate it on the heap in the first place though, right?

Maybe a thread local std::optional that gets initialized on-demand?

FranciscoThiesen · 2025-08-11T21:12:45Z

I like this! One thing that I miss is having some benchmark/perf number to we can assess how much this impacts performance. (Given some analysis that I've done locally, I definitely expect this will speed-up things, but I am still curious about how much)

FranciscoThiesen

Great work (that will likely lead to significant perf improvement)

doc/basics.md

include/simdjson/generic/ondemand/parser-inl.h

the-moisrex

Much of parser itself is already on the heap; this will try to remove this one additional indirection.

the-moisrex · 2025-08-11T21:36:33Z

include/simdjson/generic/ondemand/parser-inl.h

+
+  return parser_instance;
+}
+


simdjson_inline simdjson_warn_unused std::optional<ondemand::parser> & parser::get_parser_instance() { thread_local std::optional<ondemand::parser> parser_instance; // Si l'instance n'existe pas encore, on la crée if (!parser_instance.has_value()) { parser_instance = std::make_optional<ondemand::parser>(); } return parser_instance; }

the-moisrex · 2025-08-11T21:37:15Z

include/simdjson/generic/ondemand/parser.h

 #endif // SIMDJSON_CONDITIONAL_INCLUDE

 #include <memory>
+#include <thread>


#include <optional>

That's C++17. We compile as C++11.

the-moisrex · 2025-08-11T21:37:50Z

include/simdjson/generic/ondemand/parser.h

 private:
+  friend bool release_parser();
+  friend ondemand::parser& get_parser();
+  static simdjson_inline simdjson_warn_unused std::unique_ptr<ondemand::parser> & get_parser_instance();


static simdjson_inline simdjson_warn_unused std::optional<ondemand::parser> & get_parser_instance();

lemire · 2025-08-11T22:54:16Z

Because this would be cool, but wish we wouldn't need to allocate it on the heap in the first place though, right?

The parser instance itself is small. But it will then allocate quite a bit of memory and that's what we worry about.

the-moisrex · 2025-08-11T23:01:09Z

@lemire: The parser instance itself is small. But it will then allocate quite a bit of memory and that's what we worry about.

Yes, that's why I suggest std::optional and not just a static local thread instance. std::optional does not construct the object until explicitly told to, but it does reserve the storage.

It's just one less allocation, that's all.

lemire · 2025-08-12T14:23:18Z

@the-moisrex

std::optional is C++17.

We can do the same thing portably with a union and a bool.

lemire · 2025-08-12T15:59:28Z

@the-moisrex I'll make a comment with your name, pointing out that it could be implemented differently. It is an internal matter, so we can change it later.

minor update to document_stream

lemire · 2025-08-12T23:17:22Z

@the-moisrex @FranciscoThiesen

We now have a rather complete benchmark. Unfortunately, the approach with ranges comes with a significant performance penalty. I think it is probably fine to push it but with a warning that is will slow things down.

$ ./build/benchmark/from/from_benchmark 
Trial 1:
Running benchmarks on large array...
simdjson classic                         :     0.89 ns/char   764.29 ns/value     1.12 GB/s     3.09 GHz     2.76 cycles/char     9.72 ins./char  8319.46 ins./value     3.53 i/c
simdjson::from<Car>() (no parser)        :     1.05 ns/char   898.21 ns/value     0.95 GB/s     3.09 GHz     3.24 cycles/char    10.82 ins./char  9261.46 ins./value     3.34 i/c
simdjson::from<Car>() (with parser)      :     1.01 ns/char   863.71 ns/value     0.99 GB/s     3.09 GHz     3.11 cycles/char    10.75 ins./char  9201.46 ins./value     3.45 i/c
Running stream benchmarks...
simdjson classic                         :     0.78 ns     1.28 GB/s     3.08 GHz     2.41 cycles/char     9.27 ins./char     3.85 i/c
simdjson with thread local               :     0.77 ns     1.29 GB/s     3.08 GHz     2.38 cycles/char     9.25 ins./char     3.89 i/c
Running benchmarks on tiny input...
simdjson classic                         :     0.98 ns     1.02 GB/s     3.09 GHz     3.03 cycles/char    10.77 ins./char     3.55 i/c
simdjson::from<Car>() (no parser)        :     1.12 ns     0.90 GB/s     3.09 GHz     3.44 cycles/char    11.53 ins./char     3.35 i/c
simdjson::from<Car>() (with parser)      :     1.07 ns     0.93 GB/s     3.09 GHz     3.30 cycles/char    11.48 ins./char     3.48 i/c

The thread-local parser is pretty much free.

lemire · 2025-08-12T23:30:47Z

Here are the numbers on an Apple system:

Running benchmarks on large array...
simdjson classic                         :     0.53 ns/char   453.10 ns/value     1.89 GB/s     4.01 GHz     2.12 cycles/char    10.21 ins./char  8742.65 ins./value     4.81 i/c
simdjson::from<Car>() (no parser)        :     0.61 ns/char   523.42 ns/value     1.64 GB/s     4.03 GHz     2.46 cycles/char    12.62 ins./char 10802.28 ins./value     5.13 i/c
simdjson::from<Car>() (with parser)      :     0.61 ns/char   519.79 ns/value     1.65 GB/s     4.03 GHz     2.45 cycles/char    12.62 ins./char 10802.50 ins./value     5.15 i/c

The last two use the new ranges API. The performance difference is quite obvious. It is about 2000 instructions of a penalty to parse a single 'Car' instance. I would say that it is quite bad...

But as long we warning users off that it comes with a significant penalty, I think it is ok.

lemire · 2025-08-12T23:40:11Z

@the-moisrex @FranciscoThiesen

I have removed the C++20 ranges examples from our documentation and I have marked it as experimental.

For the time being, until we can determine why it cost thousands of instructions per value in overhead, I would prefer that our users do not touch it.

the-moisrex · 2025-08-13T02:57:37Z

Great work @lemire.

I ran the benchmarks a bit on a noisy system, there are a few differences I can see between the classic way and the ranges way.

More error handling in auto_parser than what is being benchmarked in classic.
More boiler-plate ranges setup that I tried to put them behind consteval but it's an API limitation.
More error handling in ranges:

If we remove the no_errors from the default as<Car>(), we'd do better:

  /// Make it an adaptor
  template <std::ranges::range Range>
  auto operator()(Range &&rng) const noexcept {
    return std::forward<Range>(rng) | no_errors | std::views::transform(*this);
  }

This code show cases 2 of the 3 problems. For the second point it shows that we have to construct this filter and transform first. And for the third point it shows that we're excluding the errors by default which is like an additional if statement in the loop.

For the first point, there's an extra if statement of error handling in operator T, but at least on my system it seems to not affect performance that much.

We could maybe fix the API by doing something like this:

for (Car car : (from | no_errors | as<Car>)(json_string)) { ... }

// or maybe this:

for (Car car : from | no_errors | as<Car>(json_string))) { ... }

// or maybe this (but I'm not sure if it's possible cleanly):

for (Car car : no_errors | as<Car>(json_string)) { ... }

// Maybe even this:
for (Car car : from | no_errors | to<Car> | json_string) { ... }

// Or this:
for (Car car : (from | no_errors | to<Car>) << json_string) { ... }

// Or this:
for (Car car : (from | no_errors | to<Car>) + json_string) { ... }

// This looks ugly, and I'm not sure about this:
for (Car car : from | no_errors | to<Car> <<= json_string) { ... }
for (Car car : from | no_errors | to<Car> = json_string) { ... }


// But we can do these two without needing the parenthesis, but it don't look great:
// Operator Precedence doesn't leave us many choices.
for (Car car : from | no_errors | to<Car> && json_string) { ... }
for (Car car : from | no_errors | to<Car> || json_string) { ... }

The point is to mark the operator()(Range) stuff as consteval or constexpr, so the construction of the ranges wouldn't affect the performance.

But my system is not a good system to do benchmarks, so it might be a different thing.

Also, why preferring as<Car>() with extra parenthesis over to<Car>?

Also I got a segfault when I ran the benchmarks in debug mode, but I couldn't reproduce it. There might be a silent race-condition in the benchmarks or something.

lemire · 2025-08-13T13:35:05Z

Also, why preferring as() with extra parenthesis over to?

Because otherwise, you will get

 error: expected primary-expression before ‘)’ token
   69 |                 for(Car car : simdjson::from(json) | simdjson::as<Car>) {

lemire · 2025-08-13T13:37:44Z

I ran the benchmarks a bit on a noisy system,

If you have a linux or macos system, and you run the benchmark in privilege mode (i.e., 'sudo') then you should see the performance counters (i.e., the number of instructions). Even if you machine is noisy, the number of instructions should be a telling tale.

If you share with me a public key, I can give you access to a quiet Linux server.

so the construction of the ranges wouldn't affect the performance.

Here, importantly, we have a per-element overhead that's very large (i.e., about 2000 instructions).

lemire · 2025-08-13T13:54:20Z

The ranges work is not ready. I don't think we should publish it. We have insufficient testing and benchmarks.

It should not be slower on a per-element basis. It is fine if setting up the loop takes a few bit of extra work, but for long loops, the cost should be amortized.

the-moisrex · 2025-08-13T20:10:33Z

Also, why preferring as() with extra parenthesis over to?

Because otherwise, you will get
 error: expected primary-expression before ‘)’ token
   69 |                 for(Car car : simdjson::from(json) | simdjson::as<Car>) {

to<Car> not as<Car>.

for(Car car : simdjson::from(json) | simdjson::to<Car>)

We now have 3 versions, this will get confusing for users pretty quick:

template <typename T>
static constexpr to_adaptor<T> to{};

static constexpr to_adaptor<> from{};

template <typename T = void>
using as = to_adaptor<T>;

The ranges work is not ready. I don't think we should publish it. We have insufficient testing and benchmarks.

It should not be slower on a per-element basis. It is fine if setting up the loop takes a few bit of extra work, but for long loops, the cost should be amortized.

I agree. Ranges do have a cost, and also compilers still have a hard time optimizing them, and utilities like std::views::filter are flat out broken.

Let me try a few ideas see if we can match the instructions counts.

lemire · 2025-08-13T21:03:06Z

@the-moisrex

If I haven't done so, I am removing 'as'.

Note that the standard provides the following:

    auto vec = std::views::iota(1, 5)
             | std::views::transform([](int v){ return v * 2; })
             | std::ranges::to<std::vector>();

Specifically std::ranges::to.

lemire · 2025-08-13T21:05:38Z

@the-moisrex Verified, 'as' has been removed:

Capture d’écran, le 2025-08-13 à 17 04 24

We still have the to_adaptor and from but we are not documenting to_adaptor.

the-moisrex · 2025-08-13T21:37:47Z

If the conflict in name can be confusing for users, then I suggest renaming to to as but keeping the nature of to. We can also separate the operator()s that from needs, and the ones that as needs into to different structs.

I have this so far:

                for(Car car : auto_parser<>{} | simdjson::to<Car> | json) {
                    dummy = dummy + car.year;
                }

But modifying auto_parser after it's wrapped with transform_view and filter and what not is not that easy.

Calling trasnsform_view.base().base().base() seems way too hacky.

Let me think of something else.

lemire · 2025-08-13T22:04:17Z

If the conflict in name can be confusing for users, then I suggest renaming to to as but keeping the nature of to.

This PR will not include those. We just keep simdjson::from which seems fine.

lemire · 2025-08-13T22:05:00Z

@the-moisrex To be clearer, we have adaptor_to, I think, but that's an internal name that we can change.

lemire · 2025-08-13T22:08:08Z

@the-moisrex @FranciscoThiesen I am merging this PR.

I think that the thread-local parser is a good step. We also remove ranges for now, but make it easier to add them later.

lemire · 2025-08-13T22:08:32Z

Merged.

lemire added 2 commits August 11, 2025 16:27

introducing a thread-local parser

2dc21f7

adding functionality to release the memory

cfaea67

lemire requested a review from FranciscoThiesen August 11, 2025 20:45

some more documentation.

a6fef6d

FranciscoThiesen approved these changes Aug 11, 2025

View reviewed changes

doc/basics.md Show resolved Hide resolved

include/simdjson/generic/ondemand/parser-inl.h Outdated Show resolved Hide resolved

the-moisrex requested changes Aug 11, 2025

View reviewed changes

Daniel Lemire added 4 commits August 12, 2025 12:01

fixing build

83c4d35

adding benchmarks for 'from'

0629e85

generalizing the code somewhat.

72e8eff

adding tests, fixing the benchmark (now with arrays and streams), and a

65526a1

minor update to document_stream

Daniel Lemire and others added 3 commits August 12, 2025 19:19

adding missing files (I forgot to check them).

b35a375

We cannot use [[nodiscard]] without guarding it, it is C++17

b03a56a

fixing the cmake

47da5e7

marking it as experimental

b768471

Daniel Lemire added 2 commits August 13, 2025 12:20

removing ranges support (it is too experimental)

db25f1f

putting back documentation.

d2d6dc8

lemire mentioned this pull request Aug 13, 2025

Add support for std::ranges in On-Demand #2382

Open

guarding SIMDJSON_CONSTEVAL more carefully.

6ee7705

lemire merged commit faf921b into master Aug 13, 2025
143 checks passed

lemire deleted the use_thread_local_parser branch August 13, 2025 22:08

BrewTestBot mentioned this pull request Sep 12, 2025

simdjson 4.0.1 Homebrew/homebrew-core#238711

Closed

Conversation

lemire commented Aug 11, 2025

Uh oh!

the-moisrex commented Aug 11, 2025

Uh oh!

FranciscoThiesen commented Aug 11, 2025

Uh oh!

FranciscoThiesen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

the-moisrex left a comment

Choose a reason for hiding this comment

Uh oh!

the-moisrex Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

the-moisrex Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

lemire Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

the-moisrex Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

lemire commented Aug 11, 2025

Uh oh!

the-moisrex commented Aug 11, 2025

Uh oh!

lemire commented Aug 12, 2025

Uh oh!

lemire commented Aug 12, 2025

Uh oh!

lemire commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Aug 12, 2025

Uh oh!

lemire commented Aug 12, 2025

Uh oh!

the-moisrex commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

the-moisrex commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

the-moisrex commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

Uh oh!

lemire commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lemire commented Aug 12, 2025 •

edited

Loading

the-moisrex commented Aug 13, 2025 •

edited

Loading