Skip to content

Improve DOM implementation#2434

Merged
lemire merged 1 commit intosimdjson:masterfrom
evbse:master
Sep 21, 2025
Merged

Improve DOM implementation#2434
lemire merged 1 commit intosimdjson:masterfrom
evbse:master

Conversation

@evbse
Copy link
Contributor

@evbse evbse commented Sep 5, 2025

By removing stage1 and implementing yyjson's DOM parser. Running benchmark\bench_parse_call --benchmark_filter="parse_twitter|parse_gsoc" on an AMD Ryzen 3600 gets the following results:

GB/s MSVC before MSVC after Clang-CL before Clang-CL after
parse_twitter - haswell 1.49004 2.80757 2.93942 3.09504
parse_gsoc - haswell 2.06389 5.30087 4.1485G 6.45398
parse_twitter - westmere 1.82864 2.42042 2.14038 2.93942
parse_gsoc - westmere 2.65043 3.97565 2.83975 5.18563
parse_twitter - fallback 0.70212 1.05272 0.74675 1.13167
parse_gsoc - fallback 0.82877 1.22233 0.79636 1.24949

@lemire
Copy link
Member

lemire commented Sep 5, 2025

We get a buffer overflow in the following test.

  bool padded_with_open_bracket() {
    std::cout << __func__ << std::endl;
    simdjson::dom::parser parser;
    // This is an invalid document padded with open braces.
    ASSERT_ERROR( parser.parse("[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false), simdjson::TAPE_ERROR);
    // This is a valid document padded with open braces.
    ASSERT_SUCCESS( parser.parse("[][[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false) );
    return true;
  }
==7122==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5598a9c97112 at pc 0x5598a9b378a1 bp 0x7ffd030bbb70 sp 0x7ffd030bbb60

READ of size 2 at 0x5598a9c97112 thread T0

    #0 0x5598a9b378a0 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29

    #1 0x5598a9b378a0 in byte_match_2 /home/runner/work/simdjson/simdjson/src/generic/stage2/json_iterator.h:19

    #2 0x5598a9b378a0 in parse<simdjson::haswell::(anonymous namespace)::stage2::tape_builder> /home/runner/work/simdjson/simdjson/src/generic/stage2/json_iterator.h:293

    #3 0x5598a9b378a0 in parse /home/runner/work/simdjson/simdjson/src/generic/stage2/tape_builder.h:123

    #4 0x5598a9b378a0 in simdjson::haswell::dom_parser_implementation::parse(unsigned char const*, unsigned long, simdjson::dom::document&) /home/runner/work/simdjson/simdjson/src/haswell.cpp:167

    #5 0x5598a9a3a039 in simdjson::dom::parser::parse_into_document(simdjson::dom::document&, unsigned char const*, unsigned long, bool) & (/home/runner/work/simdjson/simdjson/builddebug/tests/dom/document_tests+0x1ba039) (BuildId: d69bfb97319954a37e7d506d3534550f844a1818)

    #6 0x5598a9a3a3da in simdjson::dom::parser::parse(unsigned char const*, unsigned long, bool) & (/home/runner/work/simdjson/simdjson/builddebug/tests/dom/document_tests+0x1ba3da) (BuildId: d69bfb97319954a37e7d506d3534550f844a1818)

    #7 0x5598a9a2b016 in simdjson::dom::parser::parse(char const*, unsigned long, bool) & /home/runner/work/simdjson/simdjson/include/simdjson/dom/parser-inl.h:160

    #8 0x5598a9a2b016 in document_tests::padded_with_open_bracket() /home/runner/work/simdjson/simdjson/tests/dom/document_tests.cpp:78

    #9 0x5598a9a31019 in document_tests::run() /home/runner/work/simdjson/simdjson/tests/dom/document_tests.cpp:190

    #10 0x5598a9a31d3e in main /home/runner/work/simdjson/simdjson/tests/dom/document_tests.cpp:238

    #11 0x7ffbd2e2a1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)

    #12 0x7ffbd2e2a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)

    #13 0x5598a9a248e4 in _start (/home/runner/work/simdjson/simdjson/builddebug/tests/dom/document_tests+0x1a48e4) (BuildId: d69bfb97319954a37e7d506d3534550f844a1818)



0x5598a9c97113 is located 0 bytes after global variable '*.LC63' defined in '/home/runner/work/simdjson/simdjson/tests/dom/document_tests.cpp' (0x5598a9c970c0) of size 83

  '*.LC63' is ascii string '[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[['

0x5598a9c97112 is located 46 bytes before global variable '*.LC64' defined in '/home/runner/work/simdjson/simdjson/tests/dom/document_tests.cpp' (0x5598a9c97140) of size 83

  '*.LC64' is ascii string '[][[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[['

SUMMARY: AddressSanitizer: global-buffer-overflow /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29 in memcpy

@lemire
Copy link
Member

lemire commented Sep 5, 2025

It might be useful to pass -D SIMDJSON_SANITIZE=ON to the CMake configuration.

log_end_value("document");
SIMDJSON_TRY( visitor.visit_document_end(*this) );

if (!simdjson::get_active_implementation()->validate_utf8((const char *)buf, dom_parser.len)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!simdjson::get_active_implementation()->validate_utf8((const char *)buf, dom_parser.len)) {
if (!simdjson::get_active_implementation()->validate_utf8(reinterpret_cast<const char *>(buf), dom_parser.len)) {

@lemire
Copy link
Member

lemire commented Sep 5, 2025

Ok. So the difficulty here is thus...

Suppose the user gives you the following string [[ as a JSON document but suppose it does not end with a NULL character. Maybe the next index points at the end of the memory page, or at garbage (like a long stream of [ characters). The code in this PR does not check for the end of the stream, so it just does buf[idx] without checking that idx is in range. Thus will it will read beyond the string [[.

If, beyond the string, you have, say, ]], then the parser will (wrongly) conclude that we have a valid JSON document. That's not great.

In the test case where we get a sanitizer error, the input is [[ followed by many [ and then, finally, a null character. The sanitizer complains because after reading a [, the code checks for the presence of two whitespace characters. When it arrives at the end, it will thus read just beyond the null character.

The whole code ends up working, somewhat accidentally, but it is not correct in the sense that if the input document is invalid JSON, we might end up with a crash (buffer overrun, page fault, etc.).

A reasonable option might be to replace idx++; throughout in json_iterator::parse by a function that does...

  idx++;
  if(idx >= dom_parser.len) goto some_kind_of_error;

and similarly for the idx += 2 calls.

Or there might be a smarter approach.

@lemire lemire marked this pull request as draft September 5, 2025 16:02
@lemire lemire marked this pull request as ready for review September 5, 2025 16:03
Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is promising but it is vulnerable to buffer overruns. It seems that index increments might need to be followed by checks to see whether we have reached the end of the input buffer.

@evbse
Copy link
Contributor Author

evbse commented Sep 5, 2025

The problem I don't get is that if I do

bool padded_with_open_bracket() {
    std::cout << __func__ << std::endl;
    simdjson::dom::parser parser;
    auto a = parser.parse("[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false).error();
    std::cout << a << std::endl;
    // This is a valid document padded with open braces.
    auto b = parser.parse("[][[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false).error();
    std::cout << b << std::endl;
    return true;
}

then it works just fine. It's only when parser.parse is run without calling a function on it that it doesn't work. This is what I was trying to understand.

@evbse
Copy link
Contributor Author

evbse commented Sep 5, 2025

I was also planning on removing the padding requirement (this pull request is a prerequisite), since it shouldn't be necessary for DOM parsing, but I didn't want to cram all these changes in one pull request (to make reviewing it more manageable), and I was trying to avoid changing test functions.

@lemire
Copy link
Member

lemire commented Sep 5, 2025

it works just fine

I don't think it works fine. It appears to work (by luck).

The code is not safe, it is subject to buffer overruns when the source JSON document is invalid.

I see that you are working under Windows. Use WSL and build with cmake -B build -D SIMDJSON_SANITIZE=ON. Then do cmake --build build --target document_tests. And then run ./build/tests/dom/document_tests.

Alternatively, you can run Visual Studio with the AddressSanitizer although we do not support it directly from CMake yet.

@lemire
Copy link
Member

lemire commented Sep 5, 2025

You can also add debug-time checks:

#if SIMDJSON_DEVELOPMENT_CHECKS
check stuff here
#endif

The SIMDJSON_DEVELOPMENT_CHECKS macro is true only in Debug mode (unless manually set).

@lemire
Copy link
Member

lemire commented Sep 5, 2025

Note that we need UTF-8 validation. This can either be done at the component level or as part of stage 1.

If you bypass stage 1, you still need to do UTF-8 validation somehow.

Stage 1 calls generic_validate_utf8.

@lemire
Copy link
Member

lemire commented Sep 5, 2025

The following should pass...

  bool shall_not_parse() {
    std::cout << "Running " << __func__ << std::endl;
    auto test = "{\"joe\":\"\xf0\x8f\xbf\xbf\"}"_padded;
    simdjson::dom::parser parser;
    simdjson::dom::element doc;
    auto error = parser.parse(test).get(doc);
    if(error) {
      return true; // expected
    }
    return false;
  }

(It is in our main branch, you can sync to get it.)

@lemire
Copy link
Member

lemire commented Sep 5, 2025

I am currently working on making sanitizers work under Visual Studio : #2435

@lemire
Copy link
Member

lemire commented Sep 5, 2025

@evbse Please sync with the main branch.

@lemire
Copy link
Member

lemire commented Sep 6, 2025

I will review.

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR as it stands does not provide superior performance in my tests.

ARM A4 processor (LLVM 16).

Main branch (prior to this PR):


------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       142095 ns       141950 ns           10 Gigabytes=4.44927G/s docs=7.04539k/s
parse_twitter/repeats:10_median     141474 ns       141406 ns           10 Gigabytes=4.46598G/s docs=7.07184k/s
parse_twitter/repeats:10_stddev       1474 ns         1435 ns           10 Gigabytes=44.6555M/s docs=70.7117/s
parse_twitter/repeats:10_cv           1.04 %          1.01 %            10 Gigabytes=1.00% docs=1.00%
parse_twitter/repeats:10_max        144731 ns       144444 ns           10 Gigabytes=4.49277G/s docs=7.11428k/s
parse_gsoc/repeats:10_mean          589927 ns       589460 ns           10 Gigabytes=5.64797G/s docs=1.69719k/s
parse_gsoc/repeats:10_median        589327 ns       589195 ns           10 Gigabytes=5.6481G/s docs=1.69723k/s
parse_gsoc/repeats:10_stddev         13021 ns        13014 ns           10 Gigabytes=121.578M/s docs=36.5338/s
parse_gsoc/repeats:10_cv              2.21 %          2.21 %            10 Gigabytes=2.15% docs=2.15%
parse_gsoc/repeats:10_max           620651 ns       620322 ns           10 Gigabytes=5.78052G/s docs=1.73702k/s

This PR:

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       184758 ns       184481 ns           10 Gigabytes=3.42335G/s docs=5.42085k/s
parse_twitter/repeats:10_median     184696 ns       184256 ns           10 Gigabytes=3.42739G/s docs=5.42725k/s
parse_twitter/repeats:10_stddev       1365 ns         1289 ns           10 Gigabytes=23.8696M/s docs=37.7974/s
parse_twitter/repeats:10_cv           0.74 %          0.70 %            10 Gigabytes=0.70% docs=0.70%
parse_twitter/repeats:10_max        186708 ns       186380 ns           10 Gigabytes=3.45453G/s docs=5.47023k/s
parse_gsoc/repeats:10_mean          572507 ns       571832 ns           10 Gigabytes=5.8204G/s docs=1.74901k/s
parse_gsoc/repeats:10_median        570025 ns       569972 ns           10 Gigabytes=5.83859G/s docs=1.75447k/s
parse_gsoc/repeats:10_stddev          7291 ns         7130 ns           10 Gigabytes=72.0199M/s docs=21.6417/s
parse_gsoc/repeats:10_cv              1.27 %          1.25 %            10 Gigabytes=1.24% docs=1.24%
parse_gsoc/repeats:10_max           585456 ns       585024 ns           10 Gigabytes=5.90396G/s docs=1.77412k/s

Intel Ice Lake (GCC 12)

Main branch (prior to this PR):

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       195839 ns       195492 ns           10 Gigabytes=3.29123G/s docs=5.21165k/s
parse_twitter/repeats:10_median     184950 ns       184677 ns           10 Gigabytes=3.41957G/s docs=5.41487k/s
parse_twitter/repeats:10_stddev      33715 ns        33522 ns           10 Gigabytes=394.403M/s docs=624.535/s
parse_twitter/repeats:10_cv          17.22 %         17.15 %            10 Gigabytes=11.98% docs=11.98%
parse_twitter/repeats:10_max        291684 ns       290793 ns           10 Gigabytes=3.44101G/s docs=5.44882k/s
parse_gsoc/repeats:10_mean          744698 ns       743526 ns           10 Gigabytes=4.47575G/s docs=1.34494k/s
parse_gsoc/repeats:10_median        744893 ns       743715 ns           10 Gigabytes=4.47461G/s docs=1.3446k/s
parse_gsoc/repeats:10_stddev           929 ns          911 ns           10 Gigabytes=5.48538M/s docs=1.64833/s
parse_gsoc/repeats:10_cv              0.12 %          0.12 %            10 Gigabytes=0.12% docs=0.12%
parse_gsoc/repeats:10_max           745917 ns       744752 ns           10 Gigabytes=4.48414G/s docs=1.34747k/s

This PR:

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       307024 ns       306558 ns           10 Gigabytes=2.07087G/s docs=3.27921k/s
parse_twitter/repeats:10_median     298793 ns       298342 ns           10 Gigabytes=2.11675G/s docs=3.35185k/s
parse_twitter/repeats:10_stddev      25902 ns        25754 ns           10 Gigabytes=143.497M/s docs=227.227/s
parse_twitter/repeats:10_cv           8.44 %          8.40 %            10 Gigabytes=6.93% docs=6.93%
parse_twitter/repeats:10_max        380734 ns       379845 ns           10 Gigabytes=2.11983G/s docs=3.35674k/s
parse_gsoc/repeats:10_mean          768600 ns       767416 ns           10 Gigabytes=4.33647G/s docs=1.30309k/s
parse_gsoc/repeats:10_median        768226 ns       767121 ns           10 Gigabytes=4.33808G/s docs=1.30358k/s
parse_gsoc/repeats:10_stddev          3074 ns         2918 ns           10 Gigabytes=16.5108M/s docs=4.96143/s
parse_gsoc/repeats:10_cv              0.40 %          0.38 %            10 Gigabytes=0.38% docs=0.38%
parse_gsoc/repeats:10_max           773020 ns       771516 ns           10 Gigabytes=4.36886G/s docs=1.31283k/s

@evbse
Copy link
Contributor Author

evbse commented Sep 21, 2025

It turns out, this was enough to get the performance mentioned. Is it possible for you to benchmark with MSVC, since is has the biggest performance win?

@lemire
Copy link
Member

lemire commented Sep 21, 2025

With your latest changes, and running the benchmarks once from the main branch and once from the PR, I seem to get a performance gain of something of the order of 0.5%, maybe more in some cases. It is a bit difficult to be certain that the gain is real.

Still, your changes are minor (in the sense that you affect very few lines), they don't make the code base worse (ie., the result is not less maintainable), so I would be tempted to merge. The risk that the resulting performance is worse is low, and there is a decent chance that your PR improves very slightly the performance.

It turns out, this was enough to get the performance mentioned. Is it possible for you to benchmark with MSVC, since is has the biggest performance win?

Out of frustration, I gave up years ago on benchmarking for Visual Studio. The C++ compiler team there seems to have given up keeping up with the competition and the latest standards (I suspect that they are underfunded, understaffed). The great thing is that you can easily configure Visual Studio to use clang as the compiler, which boosts the performance tremendously. So I only use the regular Visual Studio compiler to test/debug.

Presumably, you have your own numbers that you care about. Would you share them? Does this PR make the code run significantly faster under Visual Studio? If so, that's certainly an argument that would make taking this PR compelling. (I gave up personally on Visual Studio for performance, but I am quite happy if we can be faster with it.)

Intel Ice Lake processor (GCC)

This PR:

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       192652 ns       192347 ns           10 Gigabytes=3.32879G/s docs=5.27111k/s
parse_twitter/repeats:10_median     183519 ns       183255 ns           10 Gigabytes=3.4461G/s docs=5.45687k/s
parse_twitter/repeats:10_stddev      27775 ns        27670 ns           10 Gigabytes=352.171M/s docs=557.66/s
parse_twitter/repeats:10_cv          14.42 %         14.39 %            10 Gigabytes=10.58% docs=10.58%
parse_twitter/repeats:10_max        271537 ns       270933 ns           10 Gigabytes=3.46745G/s docs=5.49068k/s
parse_gsoc/repeats:10_mean          744417 ns       743316 ns           10 Gigabytes=4.47703G/s docs=1.34533k/s
parse_gsoc/repeats:10_median        744558 ns       743355 ns           10 Gigabytes=4.47677G/s docs=1.34525k/s
parse_gsoc/repeats:10_stddev          1807 ns         1797 ns           10 Gigabytes=10.8111M/s docs=3.2487/s
parse_gsoc/repeats:10_cv              0.24 %          0.24 %            10 Gigabytes=0.24% docs=0.24%
parse_gsoc/repeats:10_max           747584 ns       746436 ns           10 Gigabytes=4.49092G/s docs=1.3495k/s

Main branch

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       193218 ns       192860 ns           10 Gigabytes=3.31943G/s docs=5.25629k/s
parse_twitter/repeats:10_median     184401 ns       184096 ns           10 Gigabytes=3.43036G/s docs=5.43195k/s
parse_twitter/repeats:10_stddev      27808 ns        27604 ns           10 Gigabytes=349.032M/s docs=552.69/s
parse_twitter/repeats:10_cv          14.39 %         14.31 %            10 Gigabytes=10.51% docs=10.51%
parse_twitter/repeats:10_max        272328 ns       271389 ns           10 Gigabytes=3.45844G/s docs=5.47642k/s
parse_gsoc/repeats:10_mean          750424 ns       749184 ns           10 Gigabytes=4.44196G/s docs=1.33479k/s
parse_gsoc/repeats:10_median        750200 ns       748955 ns           10 Gigabytes=4.4433G/s docs=1.3352k/s
parse_gsoc/repeats:10_stddev          1481 ns         1464 ns           10 Gigabytes=8.67465M/s docs=2.6067/s
parse_gsoc/repeats:10_cv              0.20 %          0.20 %            10 Gigabytes=0.20% docs=0.20%
parse_gsoc/repeats:10_max           752419 ns       751194 ns           10 Gigabytes=4.4529G/s docs=1.33808k/s

Apple M processor (LLVM)

This PR:

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       149582 ns       149416 ns           10 Gigabytes=4.22666G/s docs=6.6929k/s
parse_twitter/repeats:10_median     149638 ns       149471 ns           10 Gigabytes=4.22501G/s docs=6.69027k/s
parse_twitter/repeats:10_stddev        752 ns          749 ns           10 Gigabytes=21.1023M/s docs=33.4153/s
parse_twitter/repeats:10_cv           0.50 %          0.50 %            10 Gigabytes=0.50% docs=0.50%
parse_twitter/repeats:10_max        151142 ns       151011 ns           10 Gigabytes=4.2521G/s docs=6.73317k/s
parse_gsoc/repeats:10_mean          605425 ns       604915 ns           10 Gigabytes=5.50144G/s docs=1.65316k/s
parse_gsoc/repeats:10_median        605028 ns       604475 ns           10 Gigabytes=5.50534G/s docs=1.65433k/s
parse_gsoc/repeats:10_stddev          2855 ns         2927 ns           10 Gigabytes=26.5774M/s docs=7.9864/s
parse_gsoc/repeats:10_cv              0.47 %          0.48 %            10 Gigabytes=0.48% docs=0.48%
parse_gsoc/repeats:10_max           609712 ns       609403 ns           10 Gigabytes=5.53101G/s docs=1.66205k/s

Main branch

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       151034 ns       150884 ns           10 Gigabytes=4.18551G/s docs=6.62773k/s
parse_twitter/repeats:10_median     151124 ns       151043 ns           10 Gigabytes=4.18103G/s docs=6.62064k/s
parse_twitter/repeats:10_stddev        692 ns          678 ns           10 Gigabytes=18.8397M/s docs=29.8326/s
parse_twitter/repeats:10_cv           0.46 %          0.45 %            10 Gigabytes=0.45% docs=0.45%
parse_twitter/repeats:10_max        152110 ns       151901 ns           10 Gigabytes=4.21305G/s docs=6.67134k/s
parse_gsoc/repeats:10_mean          616221 ns       615517 ns           10 Gigabytes=5.40715G/s docs=1.62483k/s
parse_gsoc/repeats:10_median        614076 ns       613652 ns           10 Gigabytes=5.42302G/s docs=1.6296k/s
parse_gsoc/repeats:10_stddev          7466 ns         6793 ns           10 Gigabytes=58.923M/s docs=17.7061/s
parse_gsoc/repeats:10_cv              1.21 %          1.10 %            10 Gigabytes=1.09% docs=1.09%
parse_gsoc/repeats:10_max           634266 ns       631082 ns           10 Gigabytes=5.46081G/s docs=1.64095k/s

@evbse
Copy link
Contributor Author

evbse commented Sep 21, 2025

Sure, here it is:

MSVC 19.44.35213.0 - Haswell

This PR:

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       274371 ns       274332 ns           10 Gigabytes=2.30226G/s docs=3.64562k/s
parse_twitter/repeats:10_median     274635 ns       276215 ns           10 Gigabytes=2.28631G/s docs=3.62036k/s
parse_twitter/repeats:10_stddev       1047 ns         3032 ns           10 Gigabytes=25.6836M/s docs=40.6698/s
parse_twitter/repeats:10_cv           0.38 %          1.11 %            10 Gigabytes=1.12% docs=1.12%
parse_twitter/repeats:10_max        276378 ns       276215 ns           10 Gigabytes=2.33948G/s docs=3.70456k/s
parse_gsoc/repeats:10_mean          943460 ns       943357 ns           10 Gigabytes=3.5278G/s docs=1.06009k/s
parse_gsoc/repeats:10_median        943629 ns       941265 ns           10 Gigabytes=3.53549G/s docs=1.0624k/s
parse_gsoc/repeats:10_stddev          1126 ns         6615 ns           10 Gigabytes=24.3048M/s docs=7.30349/s
parse_gsoc/repeats:10_cv              0.12 %          0.70 %            10 Gigabytes=0.69% docs=0.69%
parse_gsoc/repeats:10_max           944978 ns       962182 ns           10 Gigabytes=3.53549G/s docs=1.0624k/s

Main branch

------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean       422974 ns       422883 ns           10 Gigabytes=1.49372G/s docs=2.3653k/s
parse_twitter/repeats:10_median     420586 ns       423825 ns           10 Gigabytes=1.49004G/s docs=2.35947k/s
parse_twitter/repeats:10_stddev       4214 ns         6949 ns           10 Gigabytes=24.5027M/s docs=38.7998/s
parse_twitter/repeats:10_cv           1.00 %          1.64 %            10 Gigabytes=1.64% docs=1.64%
parse_twitter/repeats:10_max        432138 ns       433243 ns           10 Gigabytes=1.5239G/s docs=2.41309k/s
parse_gsoc/repeats:10_mean         1637057 ns      1639230 ns           10 Gigabytes=2.03012G/s docs=610.043/s
parse_gsoc/repeats:10_median       1636839 ns      1639230 ns           10 Gigabytes=2.03012G/s docs=610.043/s
parse_gsoc/repeats:10_stddev          1042 ns        0.000 ns           10 Gigabytes=0/s docs=0/s
parse_gsoc/repeats:10_cv              0.06 %          0.00 %            10 Gigabytes=0.00% docs=0.00%
parse_gsoc/repeats:10_max          1638516 ns      1639230 ns           10 Gigabytes=2.03012G/s docs=610.043/s

@lemire
Copy link
Member

lemire commented Sep 21, 2025

Ok. Merging. This will be part of the next release.

Thanks for the valuable contribution. Do keep it coming.

@lemire lemire merged commit 6e618b0 into simdjson:master Sep 21, 2025
83 checks passed
@evbse
Copy link
Contributor Author

evbse commented Sep 21, 2025

Awesome, thank you.

@toughengineer
Copy link
Contributor

It seems like making it a member function nudges the compiler to make a different inline decision. Has anyone checked that? I'm just curious.
Or, if otherwise, what is the source of the speedup?

@lemire
Copy link
Member

lemire commented Sep 22, 2025

@toughengineer In both cases, in release mode, the functions should be marked __force_inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants