Fix convert ci failures by FranciscoThiesen · Pull Request #2406 · simdjson/simdjson

FranciscoThiesen · 2025-08-02T15:22:27Z

Easy deserializing: Car car = simdjson::from(json_car);
Easy serializing: std::string str = simdjson::to_json(Car{})
Work with ranges: for (Car car : simdjson::from(json_cars) | simdjson::as<Car>()) ) { ... }

- Fix deprecated reflect_value warning by using reflect_constant - Fix std::const_iterator C++23 requirement by using auto_iterator - Fix C++23 std::ranges::range_adaptor_closure availability check - Add convert.h to main simdjson.h includes These changes ensure compatibility across different C++ standards and compiler versions, fixing the Ubuntu CI failures. Co-Authored-By: Claude <noreply@anthropic.com>

Add defined() check before comparing the value to avoid preprocessor errors in compilers where this macro doesn't exist (like g++-13 with certain configurations).

Only enable range_adaptor_closure features when compiling with C++23 or later, as this feature is not available in C++20 implementations.

Regenerate singleheader/simdjson.cpp and singleheader/simdjson.h to include all the fixes for CI compatibility issues.

The ranges features were causing compatibility issues across different compilers and platforms. Disabling them for now until C++23 support is more widespread. This should fix the remaining Ubuntu and Windows CI failures.

The test_no_errors() and to_clean_array() tests depend on the C++23 ranges features that we disabled. This commit conditionally compiles these tests out when ranges support is disabled.

Instead of disabling the feature, provide C++20-compatible implementation of the pipe operators for ranges support. The range_adaptor_closure is C++23-only, so we implement our own pipe operators for C++20. This preserves the core functionality of the PR while ensuring compatibility across different compiler versions.

- Remove constexpr from functions that call non-constexpr methods - The no_errors and to<T> adaptors were marked constexpr but call simdjson_result methods that are not constexpr in C++20 - This was causing compilation failures in CI for C++20 builds - Tests now compile and pass with both C++17 and C++20 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The issue was that the auto_parser constructor was using implicit conversion from simdjson_result<document> to document, which could cause issues with certain implementations (particularly fallback). Changed to use value_unsafe() to explicitly extract the document after the parser is fully initialized. This ensures the document is in a valid state for subsequent operations. This fixes the ondemand_convert_tests failure in CI with clang++-16. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The issue might be related to how brace initialization vs parentheses initialization handles implicit conversion from simdjson_result<document> to document. This could be compiler-specific behavior. Using parentheses initialization to ensure the conversion operator is called properly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The issue was that we were calling m_parser.iterate() after moving the parser, which could leave it in an invalid state. In C++20, this might behave differently than C++17. Fixed by reordering the member initializer list to call parser.iterate() BEFORE moving the parser into m_parser. This ensures the document is created while the parser is still valid. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The compiler was warning about member initialization order mismatch. C++ initializes members in the order they are declared in the class, not the order they appear in the initializer list. Fixed by reordering member declarations to match the initialization order needed: m_doc must be initialized before m_parser since we need to call parser.iterate() before moving the parser. This fixes the -Werror=reorder compilation error in CI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

The issue was that we were trying to initialize ondemand::document directly from simdjson_result<ondemand::document> in the member initializer list. This caused a segfault in C++20 builds. The fix explicitly handles the simdjson_result in the constructor body, checking for errors and using value_unsafe() to extract the document. This avoids potential issues with implicit conversions and ensures proper error handling. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit addresses multiple issues: 1. Fixed -Werror=effc++ warnings by using #pragma to disable the warning for constructors that cannot initialize all members in the member initialization list due to error handling requirements. 2. Added proper error tracking (m_error member) to handle cases where document initialization fails, preventing segfaults when using invalid documents. 3. Fixed lifetime issues in tests where temporary auto_parser objects were being used, causing dangling references. Tests now properly store the parser object before using it. 4. Simplified range adaptor tests that were expecting features not yet implemented in simdjson's ondemand API. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Resolved conflicts by regenerating the amalgamated single-header files (simdjson.h, simdjson.cpp, and singleheader.zip) using the amalgamate.py script after merging latest changes from master. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

FranciscoThiesen · 2025-08-03T23:22:03Z

This PR builds on top of the changes proposed by @the-moisrex , mostly by fixing the CI pipeline issues.

FranciscoThiesen · 2025-08-03T23:22:32Z

@the-moisrex your feedback/review would be appreciated here!

the-moisrex · 2025-08-04T01:26:11Z

@FranciscoThiesen thanks for working on this; I got a bit side-tracked with my other project (which itself is a side-track anyway), hope I can get back to it.

What do you think for APIs for objects? Also, do you think we need a custom query language?

I mean, this is cool for simple things, but if what we want is nested deep in somewhere, we should be able to find it easier.

  for (Car const car : from(json_cars) | to<Car>) { ... }

Maybe something like this:

  // maybe a query language? (maybe we could even put the parsing of the query at compile time?)
  for (Car const car : from(json_cars) | query("users.cars") | to<Car>) { ... }

or if we have something like this:

  // to get all cars of all users:
  // {users: [{cars: [...] }, {...}]}
  for (Car const car : from(json_cars)["users"] | extract("cars") | join | to<Car>) { ... }

Let me know crazy use cases that can come to mind, so we can come up with better APIs.

FranciscoThiesen · 2025-08-04T12:54:04Z

@FranciscoThiesen thanks for working on this; I got a bit side-tracked with my other project (which itself is a side-track anyway), hope I can get back to it.

What do you think for APIs for objects? Also, do you think we need a custom query language?

I mean, this is cool for simple things, but if what we want is nested deep in somewhere, we should be able to find it easier.
  for (Car const car : from(json_cars) | to<Car>) { ... }
Maybe something like this:
  // maybe a query language? (maybe we could even put the parsing of the query at compile time?)
  for (Car const car : from(json_cars) | query("users.cars") | to<Car>) { ... }
or if we have something like this:
  // to get all cars of all users:
  // {users: [{cars: [...] }, {...}]}
  for (Car const car : from(json_cars)["users"] | extract("cars") | join | to<Car>) { ... }
Let me know crazy use cases that can come to mind, so we can come up with better APIs.

@lemire what do you think? You are definitely more qualified to answer about crazy use-cases and maybe a custom query language! (I definitely think there is enough information in this PR already, but it doesn't hurt to think about a few possible enhancements for later)

…for SIMDJSON_CLANG_VISUAL_STUDIO Added (void)position; to suppress unused parameter warning when compiling with SIMDJSON_CLANG_VISUAL_STUDIO defined, where the position parameter isn't used in the SIMDJSON_ASSUME statements. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…r cases The test was failing in CI with g++-13 because it only handled the exception case. However, the array() method is marked noexcept and returns a simdjson_result that may contain an error code instead of throwing an exception. This fix checks for both cases: 1. If array_result.error() is not SUCCESS, verify it's INCORRECT_TYPE 2. If no error is returned initially, the exception may be thrown when iterating over the result This ensures the test passes regardless of whether the error is reported via error code or exception. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

include/simdjson/convert.h

tests/ondemand/ondemand_convert_tests.cpp

the-moisrex · 2025-08-04T01:09:41Z

include/simdjson/convert.h

+  simdjson_inline auto_iterator begin() noexcept {
+    if (m_error != SUCCESS) {
+      // Create an iterator with the error
+      iter_storage.m_iter = iterator::type(m_error);
+      iter_storage.m_value = value_type{};
+      return auto_iterator{iter_storage};
+    }
+    if (iter_storage.m_iter.error() != SUCCESS &&
+        !iter_storage.m_iter.at_end()) {
+      // Try to get the document as an array
+      auto array_result = m_doc.get_array();
+      if (array_result.error() == SUCCESS) {
+        iter_storage = {.m_iter = iterator::type{array_result.value_unsafe().begin()},
+                        .m_value = iterator::value_type{
+                            iter_storage.m_iter.at_end() ||
+                                    iter_storage.m_iter.error() != SUCCESS
+                                ? value_type{}
+                                : *iter_storage.m_iter}};
+      } else {
+        // If it's not an array, create an error iterator
+        iter_storage.m_iter = iterator::type(array_result.error());
+        iter_storage.m_value = value_type{};
+      }
+    }
+    return auto_iterator{iter_storage};
+  }


Okay, .begin() is getting too bloated, specially since ranges algorithms will call it multiple times.

I had to cache it, and I didn't like doing that, but that's the limitation for now. We gotta think of something!

I agree this is not ideal, afaik there are 3 other approaches we can take here:

Potential Solutions:

Eager Initialization (Breaking change)
Move the array initialization to the constructor:

explicit auto_parser(padded_string_view str) { auto doc_result = m_parser.iterate(str); if (doc_result.error() == SUCCESS) { m_doc = std::move(doc_result.value_unsafe()); // Try to initialize as array immediately auto array_result = m_doc.get_array(); if (array_result.error() == SUCCESS) { m_begin_iter = array_result.value_unsafe().begin(); m_is_array = true; } } }

Separate Array Parser
Create a dedicated auto_array_parser for range iteration:

template<typename ParserType> struct auto_array_parser { // Focused only on array iteration // No lazy initialization needed };

Remove Range Support (Simplest)
Make auto_parser non-iterable and require explicit .array() call:

// Instead of: for (auto val : parser) // Use: for (auto val : parser.array())

I am not sure if I have a favorite at this point.

Moving it to the constructor would be a good idea if there was a way to know peak ahead and see the type of the current value, so we wouldn't be just calling .iterate just because we feel like it, but because we have seen [ and that only means we're seeing an array; the same goes for objects.

Separating the array parser is essentially the same as just calling .array() I guess? Or we'd be doing what we're doing in the .begin in the auto_array_parser. Could work; but I hoped we wouldn't have to do this since it affects the API.

We can peek. Let us do that.

Ok. I look at this and I am not sure I understand the issue. We can easily determine the type at the level of the constructor:

m_error = m_parser.iterate(str).get(m_doc); if(m_error == SUCCESS) { m_error = m_doc.type().get(type); } if(type == ondemand::json_type::array) { initialize_iterator(); }

But what is the point?

Okay, .begin() is getting too bloated, specially since ranges algorithms will call it multiple times.

I am not sure I understand. This is thin code.

@lemire Codes like from(...) | filter(...) | filter(...) | transform(...) will call .begin() multiple times.

If we cache .begin()'s result, this type of code will be invalid:

auto iter = from(...); for (auto val : iter) { ... } for (auto val : iter) { ... }

If we don't cache it, ranges codes like | filter | filter will have to call re-parse to the beginning multiple times.

It's a performance problem.

In the code above, I cached the iterator outside auto_iterator this way to make sure the .begin calls wouldn't do too much work, but it does make the from(...) calls non-reusable.

Essentially, it's not a very good idea for .begin(), .end(), iterator's constructor, and iterator's operator== to do any work.

That's why I used Sentinels for the .end(), because that would've made operator== and .end() (which will be called multiple times too), do a bit of work as well.

What would...

auto iter = from(...); for (auto val : iter) { ... } for (auto val : iter) { ... }

do?

Go through the JSON document multiple times, reparsing it twice ?

Nope, it does it once.
They share the same iterator!
This would call .begin twice, but the same iterator would be returned, at the same place during parsing.

include/simdjson/convert.h

tests/ondemand/ondemand_convert_tests.cpp

the-moisrex · 2025-08-06T00:07:31Z

Apparently I forgot to submit the reviews!

lemire · 2025-08-07T01:36:14Z

For the record, I am on this PR.

lemire · 2025-08-08T12:40:58Z

@FranciscoThiesen I recommend merging this as soon as you are happy. I think we could release it as part of 4.0.0 unless someone has an objection.

@the-moisrex has some performance concerns but as long as these concerns are not tied to the interface, it should be fine to release. (We can update the implementation later.)

the-moisrex and others added 23 commits July 19, 2025 03:58

Basic Auto Parser

3166253

Moving ondemand::document into auto_parser

11f273c

Auto Iterator

0a82fb1

Removing unneeded code

8d53840

Make auto_parser and auto_iterator comply with ranges.

9d9f242

From/To adaptors

228501f

Moving iterator's storage to auto_parser

5eec29a

test for no_errors

d43fb6f

Fix preprocessor check for __cpp_lib_ranges_zip

e94825c

Add defined() check before comparing the value to avoid preprocessor errors in compilers where this macro doesn't exist (like g++-13 with certain configurations).

Simplify ranges feature detection for C++23

114f149

Only enable range_adaptor_closure features when compiling with C++23 or later, as this feature is not available in C++20 implementations.

Update amalgamated files with convert.h fixes

4794b5d

Regenerate singleheader/simdjson.cpp and singleheader/simdjson.h to include all the fixes for CI compatibility issues.

Disable C++23 ranges features to fix CI compatibility

1b59b38

The ranges features were causing compatibility issues across different compilers and platforms. Disabling them for now until C++23 support is more widespread. This should fix the remaining Ubuntu and Windows CI failures.

Disable ranges-dependent tests to fix compilation

76ed73f

The test_no_errors() and to_clean_array() tests depend on the C++23 ranges features that we disabled. This commit conditionally compiles these tests out when ranges support is disabled.

FranciscoThiesen requested a review from lemire August 3, 2025 23:20

FranciscoThiesen and others added 2 commits August 5, 2025 16:29

Adding a few tests for the simdjson::to adapter.

484a092

the-moisrex reviewed Aug 6, 2025

View reviewed changes

Addressing @the-moisrex review.

e878dc6

lemire added 4 commits August 7, 2025 11:39

removing macros

26abf1d

removing another pragma and some code simplification.

74fb3ec

more code simplification.

64009f7

minor fix

fcae795

lemire mentioned this pull request Aug 7, 2025

Auto Convert (work in progress) #2398

Closed

10 tasks

lemire added 2 commits August 7, 2025 18:11

added a new test.

6c4c934

minor fixes

4c0f86d

lemire approved these changes Aug 8, 2025

View reviewed changes

updating commit

686a186

FranciscoThiesen merged commit 3e00a43 into master Aug 8, 2025
143 checks passed

FranciscoThiesen deleted the fix-convert-ci-failures branch August 8, 2025 17:42

BrewTestBot mentioned this pull request Sep 12, 2025

simdjson 4.0.1 Homebrew/homebrew-core#238711

Closed

Conversation

FranciscoThiesen commented Aug 2, 2025 • edited by lemire Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FranciscoThiesen commented Aug 3, 2025

Uh oh!

FranciscoThiesen commented Aug 3, 2025

Uh oh!

the-moisrex commented Aug 4, 2025

Uh oh!

FranciscoThiesen commented Aug 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lemire Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

the-moisrex commented Aug 6, 2025

Uh oh!

lemire commented Aug 7, 2025

Uh oh!

lemire commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FranciscoThiesen commented Aug 2, 2025 •

edited by lemire

Loading

lemire Aug 8, 2025 •

edited

Loading