Add FracturedJson formatting support for DOM serialization#2580
Add FracturedJson formatting support for DOM serialization#2580
Conversation
|
Ah Ah. This would be step 1. Later we would want it to work with the new builder component. |
|
I recently moved the builder API in a separate directory (for clarity). |
Implements FracturedJson formatting as requested in issue #2576. FracturedJson produces human-readable yet compact JSON output by intelligently choosing between different layout strategies based on content complexity, length, and structure similarity. Key features: - Four layout modes: inline, compact multiline, table, and expanded - Structure analysis pass to compute metrics before formatting - Table formatting for arrays of similar objects with column alignment - Configurable options for line length, indentation, padding, etc. New files: - fractured_json.h: Public API with fractured_json_options struct - fractured_json-inl.h: Implementation (~1000 lines) - json_structure_analyzer.h: Structure analysis for layout decisions - fractured_formatter.h: Formatter class using CRTP pattern Usage: dom::parser parser; element doc = parser.parse(json_string); std::cout << fractured_json(doc) << std::endl; // Or with custom options: fractured_json_options opts; opts.indent_spaces = 2; std::cout << fractured_json(doc, opts) << std::endl; // Or format any JSON string (useful with reflection API): auto formatted = fractured_json_string(minified_json); Resolves #2576
Adds 27 test cases covering all aspects of the FracturedJson formatter: Core functionality tests (13): - Roundtrip parsing verification - Inline formatting for simple arrays and objects - Expanded formatting for complex nested structures - Compact multiline arrays with configurable items per line - Table formatting for uniform arrays of objects - Empty container handling - All scalar types (string, int, uint, double, bool, null) - String escaping (quotes, backslashes, control characters) - Custom indentation options - Deep nesting (10+ levels) - Mixed type arrays Edge case tests (11): - Unicode strings (Chinese, emoji, Arabic, Russian, accented chars) - Boundary numbers (INT64_MIN/MAX, UINT64_MAX, DBL_MIN/MAX) - Nested arrays (arrays of arrays) - Empty string values - Keys with special characters (spaces, quotes, colons, etc.) - Non-uniform arrays (should not trigger table mode) - Very long strings (500+ chars) - Large arrays (100 elements) - Reflection API workflow simulation - Control characters (tab, newline, CR, null) - Single element containers Option tests (3): - Disable compact multiline mode - Disable table format mode - Disable all padding options
Extends FracturedJson to work seamlessly with the builder API, enabling
formatted output directly from C++ structs using static reflection.
New functions:
- to_fractured_json_string(obj, opts) - serialize struct to formatted JSON
- to_fractured_json(obj, output, opts) - same with output parameter
- extract_fractured_json<fields...>(obj, opts) - format only specific fields
These functions combine the builder's reflection-based serialization with
FracturedJson formatting in a single convenient call:
struct User { int id; std::string name; bool active; };
User user{1, "Alice", true};
// Minified output (existing):
auto minified = to_json_string(user);
// {"id":1,"name":"Alice","active":true}
// Formatted output (new):
auto formatted = to_fractured_json_string(user);
// { "id": 1, "name": "Alice", "active": true }
// Partial extraction with formatting:
auto partial = extract_fractured_json<"id", "name">(user);
// { "id": 1, "name": "Alice" }
New files:
- generic/builder/fractured_json_builder.h - builder integration
- tests/builder/static_reflection_fractured_json_tests.cpp - 7 tests
946e0d0 to
a147fce
Compare
|
@lemire I've added the builder/reflection API integration as requested. The PR now includes:
The implementation combines the builder's struct User { int id; std::string name; bool active; };
User user{1, "Alice", true};
// One-liner for formatted output from any struct
auto formatted = to_fractured_json_string(user);
// { "id": 1, "name": "Alice", "active": true }I've also added 7 tests for the builder integration in The PR has been rebased on the latest master which includes the builder directory reorganization (#2578). |
|
Wow. |
- Fix undefined behavior when negating INT64_MIN in estimate_number_length() and measure_value_length() by returning 20 (the exact length of the string representation) directly - Actually use table_similarity_threshold in check_array_uniformity() by calling compute_object_similarity() to compare objects against the first object in the array
4c96aaf to
e5fb747
Compare
Initialize all member variables in member initialization lists to
satisfy GCC's -Werror=effc++ flag:
- element_metrics::common_keys - add {} default initializer
- structure_analyzer - add default constructor with member init list
- fractured_formatter - add column_widths_{} to constructor
- fractured_string_builder - add analyzer_{} to constructor
The class has a pointer member (current_opts_) which triggers -Werror=effc++ requiring explicit copy/move operations. Delete copy operations (class shouldn't be copied due to cache) and default move operations.
Windows.h defines max/min macros that interfere with std::max/std::min. Wrapping in parentheses as (std::max)(...) prevents macro expansion.
GCC 15 on MINGW64 gives a false positive warning in parser_moving_parser() when the std::vector<std::string> goes out of scope. Suppress this specific warning with a pragma for GCC builds.
Code reviewFound 1 issue:
The caching mechanism in
This breaks the two-phase design where phase 1 analyzes structure and phase 2 uses those metrics for layout decisions. simdjson/include/simdjson/dom/fractured_json-inl.h Lines 82 to 94 in ce5b45a 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
The cache was using element addresses as keys, but dom::element objects are lightweight wrappers that get copied during iteration, causing different addresses between analysis and formatting phases. This resulted in cache misses and fallback to empty metrics. Solution: Store child metrics in the element_metrics struct and pass them through recursive calls, eliminating the need for address-based caching entirely. Changes: - Add children vector to element_metrics for hierarchical metrics - Remove metrics_cache_ and related get_metrics/has_metrics methods - Update all format functions to accept and pass child metrics - Add public analyze_array/analyze_object overloads for standalone use
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Add entries for node_modules, package-lock.json, Rust target directories, local ablation artifacts, and generated documentation files.
Extract common scalar type handling (STRING, INT64, UINT64, DOUBLE, BOOL, NULL_VALUE) into a dedicated analyze_scalar method. Each scalar type shares the same initialization pattern for complexity, child_count, can_inline, and recommended_layout. Also simplify boolean formatting in format_scalar to use ternary operator.
Reformat cramped is_amalgamator condition to multi-line for readability. Fix duplicate error message text in _included_filename_root and use correct variable name (relative_root instead of root).
Extract repeated newline counting loop into a reusable static helper function, used by inline_array_test, inline_object_test, and expanded_test.
This reverts commit 4760ea7.
|
@FranciscoThiesen Fantastic. I am currently travelling and this is a major PR so I want to wait to be back before reviewing it. Won't be long. On my todo. |
various minor changes to fractured JSON support
|
Merged. |
Summary
Implements FracturedJson formatting as requested in issue #2576. FracturedJson produces human-readable yet compact JSON output by intelligently choosing between different layout strategies based on content complexity, length, and structure similarity.
Key Features
Example Output
Inline mode (simple containers):
{ "id": 1, "name": "Alice", "active": true }Table mode (uniform arrays of objects):
[ { "id": 1, "name": "Alice", "score": 95 }, { "id": 2, "name": "Bob" , "score": 87 }, { "id": 3, "name": "Carol", "score": 92 } ]Compact multiline (arrays of simple elements):
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ]API
DOM API
Builder/Reflection API (new!)
New Files
dom/fractured_json.hfractured_json_optionsstructdom/fractured_json-inl.hinternal/json_structure_analyzer.hinternal/fractured_formatter.hgeneric/builder/fractured_json_builder.htests/fractured_json_tests.cpptests/builder/static_reflection_fractured_json_tests.cppTest plan
Resolves #2576