Conversation
|
We get a buffer overflow in the following test. bool padded_with_open_bracket() {
std::cout << __func__ << std::endl;
simdjson::dom::parser parser;
// This is an invalid document padded with open braces.
ASSERT_ERROR( parser.parse("[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false), simdjson::TAPE_ERROR);
// This is a valid document padded with open braces.
ASSERT_SUCCESS( parser.parse("[][[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[", 2, false) );
return true;
} |
|
It might be useful to pass |
src/generic/stage2/json_iterator.h
Outdated
| log_end_value("document"); | ||
| SIMDJSON_TRY( visitor.visit_document_end(*this) ); | ||
|
|
||
| if (!simdjson::get_active_implementation()->validate_utf8((const char *)buf, dom_parser.len)) { |
There was a problem hiding this comment.
| if (!simdjson::get_active_implementation()->validate_utf8((const char *)buf, dom_parser.len)) { | |
| if (!simdjson::get_active_implementation()->validate_utf8(reinterpret_cast<const char *>(buf), dom_parser.len)) { |
|
Ok. So the difficulty here is thus... Suppose the user gives you the following string If, beyond the string, you have, say, In the test case where we get a sanitizer error, the input is The whole code ends up working, somewhat accidentally, but it is not correct in the sense that if the input document is invalid JSON, we might end up with a crash (buffer overrun, page fault, etc.). A reasonable option might be to replace idx++;
if(idx >= dom_parser.len) goto some_kind_of_error;and similarly for the Or there might be a smarter approach. |
lemire
left a comment
There was a problem hiding this comment.
The PR is promising but it is vulnerable to buffer overruns. It seems that index increments might need to be followed by checks to see whether we have reached the end of the input buffer.
|
The problem I don't get is that if I do then it works just fine. It's only when parser.parse is run without calling a function on it that it doesn't work. This is what I was trying to understand. |
|
I was also planning on removing the padding requirement (this pull request is a prerequisite), since it shouldn't be necessary for DOM parsing, but I didn't want to cram all these changes in one pull request (to make reviewing it more manageable), and I was trying to avoid changing test functions. |
I don't think it works fine. It appears to work (by luck). The code is not safe, it is subject to buffer overruns when the source JSON document is invalid. I see that you are working under Windows. Use WSL and build with Alternatively, you can run Visual Studio with the AddressSanitizer although we do not support it directly from CMake yet. |
|
You can also add debug-time checks: #if SIMDJSON_DEVELOPMENT_CHECKS
check stuff here
#endifThe SIMDJSON_DEVELOPMENT_CHECKS macro is true only in Debug mode (unless manually set). |
|
Note that we need UTF-8 validation. This can either be done at the component level or as part of stage 1. If you bypass stage 1, you still need to do UTF-8 validation somehow. Stage 1 calls |
|
The following should pass... bool shall_not_parse() {
std::cout << "Running " << __func__ << std::endl;
auto test = "{\"joe\":\"\xf0\x8f\xbf\xbf\"}"_padded;
simdjson::dom::parser parser;
simdjson::dom::element doc;
auto error = parser.parse(test).get(doc);
if(error) {
return true; // expected
}
return false;
}
(It is in our main branch, you can sync to get it.) |
|
I am currently working on making sanitizers work under Visual Studio : #2435 |
|
@evbse Please sync with the main branch. |
|
I will review. |
lemire
left a comment
There was a problem hiding this comment.
This PR as it stands does not provide superior performance in my tests.
ARM A4 processor (LLVM 16).
Main branch (prior to this PR):
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean 142095 ns 141950 ns 10 Gigabytes=4.44927G/s docs=7.04539k/s
parse_twitter/repeats:10_median 141474 ns 141406 ns 10 Gigabytes=4.46598G/s docs=7.07184k/s
parse_twitter/repeats:10_stddev 1474 ns 1435 ns 10 Gigabytes=44.6555M/s docs=70.7117/s
parse_twitter/repeats:10_cv 1.04 % 1.01 % 10 Gigabytes=1.00% docs=1.00%
parse_twitter/repeats:10_max 144731 ns 144444 ns 10 Gigabytes=4.49277G/s docs=7.11428k/s
parse_gsoc/repeats:10_mean 589927 ns 589460 ns 10 Gigabytes=5.64797G/s docs=1.69719k/s
parse_gsoc/repeats:10_median 589327 ns 589195 ns 10 Gigabytes=5.6481G/s docs=1.69723k/s
parse_gsoc/repeats:10_stddev 13021 ns 13014 ns 10 Gigabytes=121.578M/s docs=36.5338/s
parse_gsoc/repeats:10_cv 2.21 % 2.21 % 10 Gigabytes=2.15% docs=2.15%
parse_gsoc/repeats:10_max 620651 ns 620322 ns 10 Gigabytes=5.78052G/s docs=1.73702k/s
This PR:
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean 184758 ns 184481 ns 10 Gigabytes=3.42335G/s docs=5.42085k/s
parse_twitter/repeats:10_median 184696 ns 184256 ns 10 Gigabytes=3.42739G/s docs=5.42725k/s
parse_twitter/repeats:10_stddev 1365 ns 1289 ns 10 Gigabytes=23.8696M/s docs=37.7974/s
parse_twitter/repeats:10_cv 0.74 % 0.70 % 10 Gigabytes=0.70% docs=0.70%
parse_twitter/repeats:10_max 186708 ns 186380 ns 10 Gigabytes=3.45453G/s docs=5.47023k/s
parse_gsoc/repeats:10_mean 572507 ns 571832 ns 10 Gigabytes=5.8204G/s docs=1.74901k/s
parse_gsoc/repeats:10_median 570025 ns 569972 ns 10 Gigabytes=5.83859G/s docs=1.75447k/s
parse_gsoc/repeats:10_stddev 7291 ns 7130 ns 10 Gigabytes=72.0199M/s docs=21.6417/s
parse_gsoc/repeats:10_cv 1.27 % 1.25 % 10 Gigabytes=1.24% docs=1.24%
parse_gsoc/repeats:10_max 585456 ns 585024 ns 10 Gigabytes=5.90396G/s docs=1.77412k/s
Intel Ice Lake (GCC 12)
Main branch (prior to this PR):
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean 195839 ns 195492 ns 10 Gigabytes=3.29123G/s docs=5.21165k/s
parse_twitter/repeats:10_median 184950 ns 184677 ns 10 Gigabytes=3.41957G/s docs=5.41487k/s
parse_twitter/repeats:10_stddev 33715 ns 33522 ns 10 Gigabytes=394.403M/s docs=624.535/s
parse_twitter/repeats:10_cv 17.22 % 17.15 % 10 Gigabytes=11.98% docs=11.98%
parse_twitter/repeats:10_max 291684 ns 290793 ns 10 Gigabytes=3.44101G/s docs=5.44882k/s
parse_gsoc/repeats:10_mean 744698 ns 743526 ns 10 Gigabytes=4.47575G/s docs=1.34494k/s
parse_gsoc/repeats:10_median 744893 ns 743715 ns 10 Gigabytes=4.47461G/s docs=1.3446k/s
parse_gsoc/repeats:10_stddev 929 ns 911 ns 10 Gigabytes=5.48538M/s docs=1.64833/s
parse_gsoc/repeats:10_cv 0.12 % 0.12 % 10 Gigabytes=0.12% docs=0.12%
parse_gsoc/repeats:10_max 745917 ns 744752 ns 10 Gigabytes=4.48414G/s docs=1.34747k/s
This PR:
------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
parse_twitter/repeats:10_mean 307024 ns 306558 ns 10 Gigabytes=2.07087G/s docs=3.27921k/s
parse_twitter/repeats:10_median 298793 ns 298342 ns 10 Gigabytes=2.11675G/s docs=3.35185k/s
parse_twitter/repeats:10_stddev 25902 ns 25754 ns 10 Gigabytes=143.497M/s docs=227.227/s
parse_twitter/repeats:10_cv 8.44 % 8.40 % 10 Gigabytes=6.93% docs=6.93%
parse_twitter/repeats:10_max 380734 ns 379845 ns 10 Gigabytes=2.11983G/s docs=3.35674k/s
parse_gsoc/repeats:10_mean 768600 ns 767416 ns 10 Gigabytes=4.33647G/s docs=1.30309k/s
parse_gsoc/repeats:10_median 768226 ns 767121 ns 10 Gigabytes=4.33808G/s docs=1.30358k/s
parse_gsoc/repeats:10_stddev 3074 ns 2918 ns 10 Gigabytes=16.5108M/s docs=4.96143/s
parse_gsoc/repeats:10_cv 0.40 % 0.38 % 10 Gigabytes=0.38% docs=0.38%
parse_gsoc/repeats:10_max 773020 ns 771516 ns 10 Gigabytes=4.36886G/s docs=1.31283k/s
|
It turns out, this was enough to get the performance mentioned. Is it possible for you to benchmark with MSVC, since is has the biggest performance win? |
|
With your latest changes, and running the benchmarks once from the main branch and once from the PR, I seem to get a performance gain of something of the order of 0.5%, maybe more in some cases. It is a bit difficult to be certain that the gain is real. Still, your changes are minor (in the sense that you affect very few lines), they don't make the code base worse (ie., the result is not less maintainable), so I would be tempted to merge. The risk that the resulting performance is worse is low, and there is a decent chance that your PR improves very slightly the performance.
Out of frustration, I gave up years ago on benchmarking for Visual Studio. The C++ compiler team there seems to have given up keeping up with the competition and the latest standards (I suspect that they are underfunded, understaffed). The great thing is that you can easily configure Visual Studio to use clang as the compiler, which boosts the performance tremendously. So I only use the regular Visual Studio compiler to test/debug. Presumably, you have your own numbers that you care about. Would you share them? Does this PR make the code run significantly faster under Visual Studio? If so, that's certainly an argument that would make taking this PR compelling. (I gave up personally on Visual Studio for performance, but I am quite happy if we can be faster with it.) Intel Ice Lake processor (GCC)This PR: Main branch Apple M processor (LLVM)This PR: Main branch |
|
Sure, here it is: MSVC 19.44.35213.0 - Haswell This PR: Main branch |
|
Ok. Merging. This will be part of the next release. Thanks for the valuable contribution. Do keep it coming. |
|
Awesome, thank you. |
|
It seems like making it a member function nudges the compiler to make a different inline decision. Has anyone checked that? I'm just curious. |
|
@toughengineer In both cases, in release mode, the functions should be marked |
By removing stage1 and implementing yyjson's DOM parser. Running
benchmark\bench_parse_call --benchmark_filter="parse_twitter|parse_gsoc"on an AMD Ryzen 3600 gets the following results: