perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (resolves #1541).#1607
Conversation
WalkthroughMoves EncodedTextAst from clp::ir to clp::ffi backed by StringBlob, adds parse/encode and to_string APIs, introduces Result/std_checked error returns for encode/decode/deserialization, adds an ffi overload of encode_and_add_to_dictionary returning Result<size_t>, updates call sites/tests/CMake, and removes one search error enum member. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant FFI_AST as ffi::EncodedTextAst
participant StringBlob
participant Deserializer
participant DictWriter
Caller->>FFI_AST: parse_and_encode_from(text)
activate FFI_AST
note right of FFI_AST `#f0f9ff`: encode_message -> encoded_vars, dict bounds
FFI_AST->>StringBlob: build contiguous buffer (vars + logtype)
FFI_AST-->>Caller: EncodedTextAst (ok) / error (Result)
deactivate FFI_AST
Caller->>Deserializer: deserialize_encoded_text_ast(reader, tag)
activate Deserializer
Deserializer->>Deserializer: read encoded_vars & dict entries
Deserializer->>StringBlob: append slices (logtype + vars)
Deserializer-->>Caller: std_checked<EncodedTextAst, IRErrorCode> (ok | error)
deactivate Deserializer
Caller->>FFI_AST: to_string()
activate FFI_AST
FFI_AST->>StringBlob: assemble string from slices
FFI_AST-->>Caller: string (ok) or error (Result)
deactivate FFI_AST
Caller->>DictWriter: encode_and_add_to_dictionary(ffi::EncodedTextAst)
activate DictWriter
DictWriter->>FFI_AST: decode slices -> variables
DictWriter->>DictWriter: populate encoded_vars, var_ids, update dict entry
DictWriter-->>Caller: Result<size_t> (raw byte estimate | error)
deactivate DictWriter
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas needing focused review:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (25)
components/core/src/clp/EncodedVariableInterpreter.hpp(4 hunks)components/core/src/clp/clg/CMakeLists.txt(1 hunks)components/core/src/clp/clo/CMakeLists.txt(1 hunks)components/core/src/clp/clp/CMakeLists.txt(2 hunks)components/core/src/clp/ffi/EncodedTextAst.hpp(4 hunks)components/core/src/clp/ffi/KeyValuePairLogEvent.cpp(3 hunks)components/core/src/clp/ffi/StringBlob.hpp(1 hunks)components/core/src/clp/ffi/Value.hpp(3 hunks)components/core/src/clp/ffi/ir_stream/decoding_methods.cpp(4 hunks)components/core/src/clp/ffi/ir_stream/decoding_methods.hpp(2 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp(3 hunks)components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp(0 hunks)components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp(0 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp(5 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp(1 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp(3 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.hpp(0 hunks)components/core/src/clp/ffi/ir_stream/search/utils.cpp(3 hunks)components/core/src/clp_s/CMakeLists.txt(2 hunks)components/core/src/clp_s/ColumnWriter.cpp(2 hunks)components/core/src/clp_s/JsonParser.cpp(2 hunks)components/core/src/clp_s/ParsedMessage.hpp(2 hunks)components/core/src/clp_s/indexer/CMakeLists.txt(3 hunks)components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp(2 hunks)components/core/tests/test-ffi_KeyValuePairLogEvent.cpp(4 hunks)
💤 Files with no reviewable changes (3)
- components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
- components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
- components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp_s/ParsedMessage.hppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/EncodedTextAst.hppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/StringBlob.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/utils.cppcomponents/core/src/clp/ffi/Value.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp_s/ColumnWriter.cppcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cppcomponents/core/src/clp_s/JsonParser.cpp
🧠 Learnings (45)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/utils.cppcomponents/core/src/clp_s/ColumnWriter.cppcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/utils.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/ParsedMessage.hppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/search/utils.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp_s/ColumnWriter.cppcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
📚 Learning: 2024-10-07T20:10:08.254Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2025-08-25T07:27:19.449Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1262
File: docs/src/dev-guide/components-core/index.md:50-50
Timestamp: 2025-08-25T07:27:19.449Z
Learning: In the CLP project, nlohmann_json version 3.11.3 is installed through their dependency management process, but they don't enforce version constraints in CMake find_package calls - they control versions through installation rather than CMake version pinning.
Applied to files:
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.
Applied to files:
components/core/src/clp_s/ParsedMessage.hppcomponents/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-14T03:43:40.364Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-10-14T03:43:40.364Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-10-14T03:42:53.145Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-10-14T03:42:53.145Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-09-27T23:07:22.429Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:131-139
Timestamp: 2024-09-27T23:07:22.429Z
Learning: In the IR encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/EncodedTextAst.hppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/Value.hpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cppcomponents/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-13T09:29:56.553Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.
Applied to files:
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/clg/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp/clo/CMakeLists.txt
📚 Learning: 2025-07-01T14:49:07.333Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.
Applied to files:
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp/ffi/Value.hpp
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/clg/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Applied to files:
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Applied to files:
components/core/src/clp_s/CMakeLists.txtcomponents/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/src/clp/clg/CMakeLists.txtcomponents/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/clp/CMakeLists.txtcomponents/core/src/clp_s/indexer/CMakeLists.txtcomponents/core/src/clp/ffi/ir_stream/search/utils.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.hppcomponents/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-10T15:19:52.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Applied to files:
components/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-14T03:38:57.921Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-10-14T03:38:57.921Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.
Applied to files:
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.
Applied to files:
components/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp/ffi/KeyValuePairLogEvent.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/EncodedVariableInterpreter.hppcomponents/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cppcomponents/core/src/clp/ffi/ir_stream/search/utils.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/utils.cpp
📚 Learning: 2025-08-13T15:25:40.805Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:49-127
Timestamp: 2025-08-13T15:25:40.805Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, the FormattedFloatColumnWriter::add_value method intentionally allows std::stod to throw exceptions (std::invalid_argument or std::out_of_range) without catching them, as this is designed to terminate decompression when encountering invalid float strings for data integrity purposes.
Applied to files:
components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:265-275
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `generate_IR`, the `run_serializer` function is already a template, and the branching between `int32_t` and `int64_t` occurs outside the loop, making further template-based refactoring unnecessary.
Applied to files:
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:230-261
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a templated helper function does not simplify the logic.
Applied to files:
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
🧬 Code graph analysis (10)
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (1)
components/core/src/clp/ffi/Value.hpp (5)
Value(133-133)Value(142-144)Value(142-142)Value(151-151)Value(151-151)
components/core/src/clp/ffi/EncodedTextAst.hpp (3)
components/core/src/clp/ffi/StringBlob.hpp (5)
nodiscard(23-23)nodiscard(30-37)nodiscard(46-60)other(20-20)other(20-20)components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
nodiscard(528-568)components/core/src/clp/EncodedVariableInterpreter.hpp (8)
logtype(66-68)logtype(66-66)logtype(74-76)logtype(74-74)logtype(82-84)logtype(82-82)logtype(90-92)logtype(90-90)
components/core/src/clp/EncodedVariableInterpreter.hpp (3)
components/core/src/clp/ffi/EncodedTextAst.hpp (5)
nodiscard(80-87)nodiscard(94-110)nodiscard(124-126)nodiscard(157-172)encoded_vars(81-82)components/core/src/clp/ffi/encoding_methods.hpp (4)
decode_integer_var(158-158)encode_four_byte_integer_as_eight_byte(147-149)decode_float_var(130-130)encode_four_byte_float_as_eight_byte(75-77)components/core/src/clp/EncodedVariableInterpreter.cpp (2)
encode_var_dict_id(199-201)encode_var_dict_id(199-199)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
reader(46-47)components/core/src/clp/ffi/ir_stream/utils.hpp (3)
deserialize_int(52-52)deserialize_int(156-173)deserialize_int(156-156)components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
deserialize_int(35-35)deserialize_int(137-155)deserialize_int(137-137)is_variable_tag(25-25)is_variable_tag(109-134)is_variable_tag(109-109)components/core/src/clp/ffi/EncodedTextAst.hpp (1)
encoded_vars(81-82)
components/core/src/clp/ffi/StringBlob.hpp (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (2)
other(121-121)other(121-121)
components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)
components/core/src/clp_s/search/QueryRunner.hpp (10)
op(196-200)op(209-210)op(219-223)op(232-233)op(242-246)op(255-259)op(268-272)op(282-287)op(347-351)op(386-390)
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
nodiscard(528-568)components/core/src/clp/ffi/StringBlob.hpp (1)
reader(46-47)
components/core/src/clp_s/ColumnWriter.cpp (2)
components/core/src/clp/EncodedVariableInterpreter.hpp (9)
encode_and_add_to_dictionary(136-142)encode_and_add_to_dictionary(164-171)encode_and_add_to_dictionary(196-202)encode_and_add_to_dictionary(317-336)encode_and_add_to_dictionary(317-323)encode_and_add_to_dictionary(343-410)encode_and_add_to_dictionary(343-350)encode_and_add_to_dictionary(417-478)encode_and_add_to_dictionary(417-423)components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)
DecodingException(37-44)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (7)
deserialize_encoded_text_ast(487-525)deserialize_encoded_text_ast(487-493)deserialize_encoded_text_ast(528-529)deserialize_encoded_text_ast(736-742)deserialize_encoded_text_ast(744-750)deserialize_encoded_text_ast(752-755)deserialize_encoded_text_ast(757-760)components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (2)
deserialize_encoded_text_ast(109-115)deserialize_encoded_text_ast(130-131)
components/core/src/clp_s/JsonParser.cpp (2)
components/core/src/clp_s/ParsedMessage.hpp (6)
node_id(48-50)node_id(48-48)node_id(58-60)node_id(58-58)node_id(68-70)node_id(68-68)components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)
DecodingException(37-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (40)
components/core/src/clp/clo/CMakeLists.txt (2)
22-24: Additions align with PR objectives and appear consistent with build patterns.The four new files correctly introduce the ffi-namespace implementations (
EncodedTextAst,EncodedTextAstError, andStringBlob) needed for the deserialization optimization. The header-only pattern forStringBlob.hppandEncodedTextAst.hppis consistent with generic/template-heavy components in the codebase. The retention of legacyir/EncodedTextAst.*files (lines 43–44) aligns with the PR's rationale to defer full migration.Also applies to: 32-32
22-24: All newly added FFI files exist and are correctly referenced.The verification confirms that all four files added to CLO_SOURCES are present at the specified relative paths:
../ffi/EncodedTextAst.hpp✓../ffi/EncodedTextAstError.cpp✓../ffi/EncodedTextAstError.hpp✓../ffi/StringBlob.hpp✓The CMakeLists.txt additions are consistent with the PR's migration of
EncodedTextAstto theclp::ffinamespace.components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp (2)
11-11: LGTM: Include updated to reflect new API location.The include change from
ir/EncodedTextAst.hpptoffi/EncodedTextAst.hppcorrectly reflects the migration to the string-blob-based implementation described in the PR objectives.
217-220: Unchecked return value from encode_message() silently allows encoding failures.The
parse_and_encode_fromimplementation atcomponents/core/src/clp/ffi/EncodedTextAst.hpp:98callsencode_message()without checking its return value. Sinceencode_message()returnsboolto indicate success or failure, silently ignoring this status means malformed or invalid data could propagate unchecked. The test file shows this return value is checked elsewhere withREQUIRE, establishing the intended pattern.Additionally, the return value should be compared using
false == encode_message(...)per the coding guidelines.⛔ Skipped due to learnings
Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 570 File: components/core/tests/test-ir_encoding_methods.cpp:376-399 Timestamp: 2024-11-01T03:26:26.386Z Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 557 File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0 Timestamp: 2024-10-18T02:31:18.595Z Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.Learnt from: AVMatthews Repo: y-scope/clp PR: 595 File: components/core/tests/test-clp_s-end_to_end.cpp:109-110 Timestamp: 2024-11-29T22:50:17.206Z Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 558 File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236 Timestamp: 2024-10-14T03:43:40.364Z Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.Learnt from: haiqi96 Repo: y-scope/clp PR: 523 File: components/core/src/clp/clp/FileCompressor.cpp:189-220 Timestamp: 2024-10-11T16:16:02.866Z Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.Learnt from: AVMatthews Repo: y-scope/clp PR: 543 File: components/core/src/clp_s/JsonParser.cpp:769-779 Timestamp: 2024-10-07T21:16:41.660Z Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.Learnt from: AVMatthews Repo: y-scope/clp PR: 595 File: components/core/tests/test-clp_s-end_to_end.cpp:141-142 Timestamp: 2024-11-29T22:51:00.355Z Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using `WIFEXITED` and `WEXITSTATUS` with the return value of `std::system` is acceptable for checking the exit status of system commands, as per the C++ documentation.Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 593 File: components/core/tests/test-NetworkReader.cpp:216-219 Timestamp: 2024-11-15T03:15:45.919Z Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.Learnt from: LinZhihao-723 Repo: y-scope/clp PR: 557 File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286 Timestamp: 2024-10-13T09:27:43.408Z Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.Learnt from: AVMatthews Repo: y-scope/clp PR: 543 File: components/core/src/clp_s/JsonParser.cpp:735-794 Timestamp: 2024-10-07T21:35:04.362Z Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.components/core/src/clp/ffi/EncodedTextAst.hpp (4)
6-6: LGTM!The
<cstdint>include is necessary forint32_tusage in the newparse_and_encode_frommethod.
89-110: LGTM!The implementation correctly builds a
StringBlobfrom dictionary variable bounds and constructs anEncodedTextAst. The direct return (not wrapped inResult) is appropriate sinceffi::encode_messageis a parsing function that populates the output parameters and the private constructor is infallible when theStringBlobalready contains the logtype.
121-121: LGTM!The defaulted equality operator is appropriate for value-based comparison and testing. All member types support equality comparison.
187-188: LGTM!The type aliases improve code readability and follow consistent naming conventions throughout the codebase.
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
9-13: LGTM!The new includes are necessary to support the new
deserialize_encoded_text_astoverload returningboost::outcome_v2::std_checked.
57-98: LGTM!The helper correctly deserializes logtypes into a
StringBlob, reusing the existing tag-handling pattern. Error propagation is appropriate.
100-141: LGTM!The helper correctly deserializes dictionary variables into a
StringBlob. The implementation mirrors the existingdeserialize_dict_varpattern with appropriate error handling.
752-760: LGTM!The explicit template specializations are necessary and follow the existing pattern in the file.
components/core/src/clp/ffi/Value.hpp (2)
15-15: LGTM!The include path change and unqualified type names correctly reflect the migration of
EncodedTextAstfrom theirnamespace to theffinamespace. The reordering in the tuple doesn't affect functionality since access is type-based.Also applies to: 33-34
88-101: LGTM!The
ImmutableViewTypeConverterspecializations correctly use the unqualifiedEncodedTextAsttypes and const references, which is appropriate for the immutable view pattern.components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp (1)
10-10: LGTM!The Catch2 message header is a standard testing framework include, consistent with the existing Catch2 usage in the file.
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (2)
12-12: LGTM!The includes are necessary for the updated test code that uses
EncodedTextAst::parse_and_encode_fromand Catch2 message macros.Also applies to: 26-26
286-289: LGTM!The migration to
EncodedTextAst::parse_and_encode_fromis consistent with the new FFI-based API and provides a cleaner interface than the previous helper function approach.components/core/src/clp/ffi/StringBlob.hpp (1)
20-21: LGTM!The defaulted equality operator is appropriate for value-based comparison. All member types support equality comparison, enabling proper testing and comparison of
StringBlobinstances.components/core/src/clp_s/indexer/CMakeLists.txt (2)
24-26: LGTM!The new FFI sources are correctly added to support the migration from IR-based to FFI-based
EncodedTextAst. The file paths are consistent with the project structure.Also applies to: 39-39
143-143: LGTM!The
ystdlib::error_handlinglibrary is necessary for the newEncodedTextAstAPI which usesResulttypes from ystdlib.components/core/src/clp/clg/CMakeLists.txt (1)
18-20: LGTM!The new FFI sources are correctly added to the CLG build. The file paths are appropriate for the CLG directory structure, and the existing link to
ystdlib::error_handling(line 163) satisfies the dependency requirements.Also applies to: 24-24
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)
11-11: LGTM: Include additions support the API migration.The new includes correctly reference the ffi-based EncodedTextAst header and Catch2 message support needed for the updated encoding path.
Also applies to: 24-24
249-263: LGTM: Consistent migration to the new parse_and_encode_from API.The changes correctly replace the helper function with the new static factory method for both four-byte and eight-byte encoded text AST variants.
283-298: LGTM: Unmatchable values updated consistently.The migration pattern is applied correctly to the unmatchable values section, maintaining consistency with the matchable values changes.
916-919: LGTM: Array evaluation test updated correctly.The unstructured array test case properly uses the new parse_and_encode_from factory method, maintaining test coverage with the updated API.
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (2)
19-19: LGTM: Include path updated to ffi namespace.The local include path correctly reflects the migration of EncodedTextAst to the ffi namespace.
187-189: LGTM: Proper error handling with Result type.The reimplementation correctly uses the new to_string() Result-based API with appropriate error propagation via std::nullopt. The documentation accurately describes the updated behavior.
Also applies to: 423-433
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (1)
15-15: LGTM: Test file fully migrated to ffi namespace API.All test cases have been consistently updated to use the ffi namespace types and the new
parse_and_encode_fromfactory method. The test coverage remains intact while using the updated API surface.Also applies to: 22-24, 31-31, 176-176, 180-180, 235-235, 240-240, 244-245, 249-250, 426-427, 430-431
components/core/src/clp_s/ColumnWriter.cpp (2)
9-16: LGTM: Includes added for error handling and ffi types.The new includes properly support the error-aware encoding path and formatted exception messages.
106-140: LGTM: Robust error handling with formatted exceptions.The implementation correctly captures Result types from
encode_and_add_to_dictionaryand throws DecodingException with detailed error information on failure. The error messages include both category name and message text, providing good diagnostics.Note: There is some code duplication between the eight-byte and four-byte branches, but this is acceptable given the clear error handling logic.
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)
8-8: LGTM: Result-based deserialization API added.The new
deserialize_encoded_text_astoverload provides a result-wrapped return type, enabling cleaner error handling patterns. The function is well-documented and properly constrained with their::EncodedVariableTypeReqconcept.Also applies to: 13-13, 117-131
components/core/src/clp/clp/CMakeLists.txt (1)
22-24: LGTM: Build configuration updated for new ffi components.The CMakeLists additions correctly include the new EncodedTextAst and StringBlob files from the ffi directory, supporting the public API surface expansion.
Also applies to: 33-33
components/core/src/clp_s/ParsedMessage.hpp (1)
11-11: LGTM: Namespace migration to ffi.The include path and type aliases are correctly updated to use the ffi namespace for EncodedTextAst types, maintaining consistency with the broader migration effort.
Also applies to: 23-24
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)
147-147: LGTM: Template constraints use modern concept syntax.The function templates correctly use
ir::EncodedVariableTypeReqconcept-based constraints, providing clear type requirements.Also applies to: 453-453
459-471: LGTM: Deserialization logic simplified with Result-based API.The refactored implementation uses a single
deserialize_encoded_text_astcall that returns a Result, eliminating intermediate variables and simplifying the control flow. Error handling is correct with explicithas_error()checking before moving the value.components/core/src/clp_s/CMakeLists.txt (1)
32-34: CLP_S_CLP_SOURCES updates look consistent with the new ffi EncodedTextAst APIListing
EncodedTextAst.hpp,EncodedTextAstErrorandStringBlob.hpphere matches their usage inJsonParser.cppandEncodedVariableInterpreter.hpp, and should keep the clp_s targets building without extra link changes.Also applies to: 63-63
components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)
116-131: MakeResult<bool>returns explicit in filter helpers
evaluate_clp_string_filter_opandevaluate_filter_against_literal_type_value_pairboth returnystdlib::error_handling::Result<bool>but still return bareboolvalues in several branches (for example Lines 258–260 and the non-error cases inevaluate_filter_against_literal_type_value_pair). This assumesResult<bool>has an implicit constructor frombool; if it does not, this will fail to compile, and even if it does compile the intent is a bit opaque.It would be safer and clearer to wrap these in explicit success results (or equivalent helper) so all return sites are uniform with the new error‑propagating API, while keeping the TRYX-based error forwarding from
EncodedTextAst::to_stringunchanged.Could you confirm that
ystdlib::error_handling::Result<bool>is explicitly constructible frombool? If not, the affected return sites should be updated accordingly. If you want, I can sketch a small refactor that makes allResult<bool>returns explicit.Also applies to: 251-269, 327-373
components/core/src/clp/EncodedVariableInterpreter.hpp (1)
10-13: Confirmadd_constantsemantics with string_view‑based decodingThe new ffi‑based
encode_and_add_to_dictionaryoverload closely mirrors the existing ir implementation, but with one behavioural shift in the constant handler:
- Previously, the handler received
(std::string const& value, size_t begin_pos, size_t length)fromgeneric_decode_messageand calledlogtype_dict_entry.add_constant(value, begin_pos, length).- Now,
EncodedTextAst::decodeprovidesstd::string_view constant, and the handler callslogtype_dict_entry.add_constant(constant, 0, constant.length()).If
LogTypeDictionaryEntryType::add_constantsimply appendsvalue.substr(begin_pos, length)(or equivalent), this remains correct and reconstructs the logtype while preservingraw_num_bytessemantics. If, however,add_constantassumesbegin_pos/lengthare offsets into a shared, longer backing string (as in the message‑parsing path), passing0here could subtly change behaviour.It would be good to double‑check
add_constant’s contract to ensure it treats the first argument as the full slice source for each call, not as a single global buffer. If you’d like, I can draft an alternative handler that accumulates the full logtype in one string and uses that consistently when callingadd_constant.Also applies to: 173-203, 412-478
components/core/src/clp_s/JsonParser.cpp (2)
18-24: kv‑ir deserialization error handling and new includes are coherentThe new includes for
fmt::format,clp::ErrorCodeandffi::EncodedTextAstalign with the later use ofDecodingExceptionand the ffi EncodedTextAst types. Iningest_kvir, usingDeserializer<IrUnitHandler>::createwith error logging on failure, and treating per‑recorddeserialize_next_ir_uniterrors as a truncated stream (WARN + break), is a reasonable trade‑off for robustness when ingesting potentially incomplete kv‑ir inputs.Just ensure this “treat any deserialization error as truncation” behaviour matches expectations for upstream producers; if some error codes should instead be considered fatal, they may need to be distinguished here.
If you want, I can suggest a small pattern for branching on specific error codes while still treating common truncation cases as non‑fatal.
Also applies to: 842-852, 933-995
1019-1051: EncodedTextAst handling for ClpString and UnstructuredArray nodes looks correct, with one assumption to validateFor
NodeType::ClpStringandNodeType::UnstructuredArray, the new logic:
- Distinguishes between
clp::ffi::EightByteEncodedTextAstandclp::ffi::FourByteEncodedTextAstusingValue::is<T>().- Stores the EncodedTextAst directly in
ParsedMessagewhen the node is not a timestamp, preserving the encoded representation.- For timestamp‑matching
ClpStringnodes, decodes viato_string()and on failure throwsDecodingExceptionwithclp::ErrorCode_Failureand a detailedfmt::formatmessage, which is then caught and logged iningest_kvir.This is consistent with
get_archive_node_type’s mapping ofSchemaTree::Node::Type::StrtoNodeType::VarString(backed bystd::string) vsNodeType::ClpString(backed by EncodedTextAst), sopair.second.value()should indeed hold one of the two EncodedTextAst instantiations whenever you hit those cases.Please confirm that
ParsedMessagehasadd_valueoverloads for bothclp::ffi::EightByteEncodedTextAstandclp::ffi::FourByteEncodedTextAst, and that no otherValuealternative can occur forNodeType::ClpStringorNodeType::UnstructuredArray. If desired, I can propose a small static_assert or helper to make that invariant explicit.Also applies to: 1171-1283
gibber9809
left a comment
There was a problem hiding this comment.
Looks good! Just a few minor comments.
| */ | ||
| template <ir::EncodedVariableTypeReq encoded_variable_t> | ||
| [[nodiscard]] auto deserialize_encoded_text_ast(ReaderInterface& reader, encoded_tag_t encoded_tag) | ||
| -> boost::outcome_v2::std_checked<EncodedTextAst<encoded_variable_t>, IRErrorCode>; |
There was a problem hiding this comment.
Why not ystdlib::error_handling::Result? I'm probably missing some context.
There was a problem hiding this comment.
This has sth to do with the no_value_policy, which defaults to sth that checks whether the error type can be converted into std::errc. In our case it's not, so we can't use std_result.
Ref: https://www.boost.org/doc/libs/latest/libs/outcome/doc/html/reference/types/basic_result.html
| * - Forwards `ffi::EncodedTextAst::decode`'s return values on failure. | ||
| */ | ||
| template < | ||
| ir::EncodedVariableTypeReq encoded_variable_t, |
There was a problem hiding this comment.
Can we also use this requirement for the other encode_and_add_to_dictionary method?
There was a problem hiding this comment.
Sure. Do u want me to include in this PR?
Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
567-567: Remove unnecessary std::move.The
std::moveis unnecessary due to automatic return value optimization (RVO). The value can be returned directly.Apply this diff:
- return std::move(encoded_text_ast_result.value()); + return encoded_text_ast_result.value();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp(4 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧠 Learnings (19)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-09-27T23:07:22.429Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:131-139
Timestamp: 2024-09-27T23:07:22.429Z
Learning: In the IR encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
reader(46-47)components/core/src/clp/ffi/ir_stream/utils.hpp (3)
deserialize_int(52-52)deserialize_int(156-173)deserialize_int(156-156)components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
deserialize_int(35-35)deserialize_int(137-155)deserialize_int(137-137)is_variable_tag(25-25)is_variable_tag(109-134)is_variable_tag(109-109)components/core/src/clp/ffi/EncodedTextAst.hpp (1)
encoded_vars(81-82)
🔇 Additional comments (8)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)
11-11: LGTM: Includes added for new API.The new includes support the migration to the
parse_and_encode_fromAPI and Catch2 message handling.Also applies to: 24-24
254-262: Excellent: Implemented previous feedback to use explicit types.The change from template-based
get_encoded_text_ast<ir::four_byte_encoded_variable_t>()to explicitFourByteEncodedTextAst::parse_and_encode_from()improves readability as suggested in the previous review.
289-296: LGTM: Consistent API migration.The changes here mirror those in
get_matchable_values, maintaining consistency across the test helper functions.
917-919: LGTM: Test case updated with new API.The array evaluation test correctly uses
FourByteEncodedTextAst::parse_and_encode_from(), completing the migration across all test cases.components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
9-13: LGTM! Necessary includes for the new StringBlob-based implementation.The new includes support the outcome-based error handling and StringBlob-backed EncodedTextAst.
100-141: LGTM! Well-structured helper for dictionary variable deserialization.The implementation correctly handles different tag widths, propagates errors appropriately, and follows the coding guidelines.
527-568: LGTM! Excellent implementation of StringBlob-based deserialization.The new overload successfully eliminates per-string allocations by accumulating data into a StringBlob before constructing the EncodedTextAst. The error handling is thorough, using outcome-based returns to propagate errors clearly. This implementation directly addresses the performance objectives outlined in issue #1541.
752-760: LGTM! Proper template specialization declarations.The explicit template instantiations for both four-byte and eight-byte encoded variables are necessary and correctly declared.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1)
11-26: File organisation: Consider grouping new ffi files more cohesively.The new files are added in a somewhat scattered manner within the ffi section (EncodedTextAst/Error files at 14–16, StringBlob at 26). While the current organisation is functional, grouping all new ffi-related files consecutively could improve maintainability and reduce future merge conflicts. However, this is a minor preference and only worth addressing if you anticipate frequent changes in this area.
components/core/src/clp/EncodedVariableInterpreter.hpp (1)
148-171: Template constraint and internal naming for IR EncodedTextAst overloadSwitching the template parameter to
ir::EncodedVariableTypeReq encoded_variable_tand updating all internal uses (including the handlers andstd::is_same_vchecks) keeps the function well-constrained and consistent with the rest of the encoded-variable APIs. Behaviour is unchanged and looks correct.One minor readability point: using the name
encoded_variable_tfor the template parameter inside the class shadows the global aliasclp::encoded_variable_t. It is valid but slightly increases cognitive load when reading this file; consider a distinct template name (e.g.,EncodedVarT) in future clean-ups if this becomes confusing.Also applies to: 338-410
♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
527-568: Outcome‑baseddeserialize_encoded_text_astis sound; trim redundant move and confirm error mapping.The new overload correctly mirrors the existing variable/logtype deserialization logic while building a
StringBloband wrapping the result inboost::outcome_v2::std_checked<EncodedTextAst<encoded_variable_t>, IRErrorCode>. A couple of small follow‑ups:
On failure of
EncodedTextAst::create, you map all errors toIRErrorCode_Corrupted_IR(Lines 560–565). That’s reasonable ifcreate’s failures are exclusively due to invalid IR (e.g., inconsistent offsets). Ifcreatecan also fail for non‑corruption reasons (like internal limits), consider documenting that assumption or, if feasible, mapping those cases differently.The final
return std::move(encoded_text_ast_result.value());(Line 567) doesn’t needstd::move: returning the local value expression directly lets NRVO/RVO do their job and was already flagged in an earlier review. You can safely simplify to:return std::move(encoded_text_ast_result.value());
- return encoded_text_ast_result.value();
```web Check whether `ystdlib::error_handling::Result` (used by `EncodedTextAst::create`) documents that its error states for creation failures all correspond to invalid/corrupted encoded text AST, and not to other categories of failure (e.g., resource limits).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/core/src/clp/EncodedVariableInterpreter.hpp(7 hunks)components/core/src/clp/ffi/ir_stream/decoding_methods.cpp(4 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp(5 hunks)components/core/src/clp/make_dictionaries_readable/CMakeLists.txt(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
🧠 Learnings (22)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.
Applied to files:
components/core/src/clp/make_dictionaries_readable/CMakeLists.txtcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.
Applied to files:
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Applied to files:
components/core/src/clp/ffi/ir_stream/decoding_methods.cppcomponents/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
Applied to files:
components/core/src/clp/EncodedVariableInterpreter.hpp
🧬 Code graph analysis (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
reader(46-47)components/core/src/clp/ffi/ir_stream/utils.hpp (3)
deserialize_int(52-52)deserialize_int(156-173)deserialize_int(156-156)components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
deserialize_int(35-35)deserialize_int(137-155)deserialize_int(137-137)is_variable_tag(25-25)is_variable_tag(109-134)is_variable_tag(109-109)components/core/src/clp/ffi/EncodedTextAst.hpp (1)
encoded_vars(81-82)
components/core/src/clp/EncodedVariableInterpreter.hpp (4)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
encoded_vars(81-82)components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
nodiscard(528-568)components/core/src/clp/ffi/encoding_methods.hpp (4)
decode_integer_var(158-158)encode_four_byte_integer_as_eight_byte(147-149)decode_float_var(130-130)encode_four_byte_float_as_eight_byte(75-77)components/core/src/clp/EncodedVariableInterpreter.cpp (2)
encode_var_dict_id(199-201)encode_var_dict_id(199-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: lint-check (macos-15)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)
11-11: LGTM! Includes are appropriate for the new API.The new includes support the migration to
FourByteEncodedTextAstandEightByteEncodedTextAst.Also applies to: 24-24
253-254: LGTM! Migration to explicit type names is consistent with past review feedback.The use of
FourByteEncodedTextAst::parse_and_encode_fromandEightByteEncodedTextAst::parse_and_encode_fromaligns with the preference for explicit type names over template syntax.
280-285: LGTM! Consistent API migration.The changes maintain consistency with the migration pattern established in
get_matchable_values.
905-905: LGTM! Completes the migration to the new API.The array evaluation test correctly uses
FourByteEncodedTextAst::parse_and_encode_from, maintaining consistency throughout the file.components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (3)
1-95: Build configuration changes align with PR objectives.The additions to MAKE_DICTIONARIES_READABLE_SOURCES and the new dependency are consistent with the PR's goal to migrate EncodedTextAst to ffi::EncodedTextAst backed by StringBlob. The changes follow existing patterns in the CMakeLists.txt for including header-only files and linking optional dependencies.
14-16: Verification complete — all files and dependencies are correctly configured.Confirmed that all four new files are present at their referenced paths:
../ffi/EncodedTextAst.hpp✓../ffi/EncodedTextAstError.cpp✓../ffi/EncodedTextAstError.hpp✓../ffi/StringBlob.hpp✓The
ystdlib::error_handlingtarget is properly linked at line 86, and theystdlibpackage is correctly declared asREQUIREDin the parent CMakeLists.txt. The CMakeLists.txt changes are complete and accurate.
86-86: No issues found—ystdlib::error_handlingtarget is properly available.The target is correctly imported via
find_package(ystdlib REQUIRED)in components/core/CMakeLists.txt, and verification confirms that theerror_handlingtarget exists in ystdlib-cpp at commit 9ed78cd (the version specified in taskfiles/deps/main.yaml). The target is used consistently across 14 build targets in the codebase alongside other ystdlib targets likeystdlib::containers, confirming it is properly exported and available.components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (2)
9-13: Includes for Outcome/EncodedTextAst/StringBlob look correct.The added headers line up with the new
std_checkedreturn type andEncodedTextAst/StringBlobusage; no issues from an interface or dependency standpoint.
752-760: Explicit template instantiations for new overload look good.The explicit instantiations for both four‑ and eight‑byte encoded variable types match the new
std_checked<EncodedTextAst<...>, IRErrorCode>overload and ensure both encodings are available at link time; this is consistent with the existing pattern for the IRErrorCode‑returning overload.components/core/src/clp/EncodedVariableInterpreter.hpp (1)
10-15: Includes updated appropriately for Result and ffi EncodedTextAstThe added includes for
ystdlib::error_handling::Resultandffi::EncodedTextAstalign with the new overload and error-handling macro usage; no issues from a dependency or layering perspective.
gibber9809
left a comment
There was a problem hiding this comment.
LGTM. For PR title how about:
perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (resolves #1541).
Using resolves instead of fixes, because #1541 isn't really a bug.
|
@coderabbitai Create an issue to track that we need to deprecate |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@LinZhihao-723: I'll create an issue to track the deprecation of 🏁 Script executed: Length of output: 42 @LinZhihao-723: I've created issue #1622 to track the deprecation of 🐰📝 🧠 Learnings used |
… kv-ir deserialization to avoid small string allocation overhead (resolves y-scope#1541). (y-scope#1607) Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Description
This PR fixes #1541 by using string-blob-based encoded text AST (aka.
ffi::EncodedTextAst) in kv-ir deserialization. The PR has the following core changes:ffi::EncodedTextAstfrom an IR stream.ir::EncodedTextAstbyffi::EncodedTextAstin kv-ir and clp-s implementation.ffi::EncodedTextAstinto an archive with decoding and re-parsing.parse_and_encode_frommethod toffi::EncodedTextAstto avoid duplicating general helpers across different uses (clp-s and test cases).Some parts of code introduced in this PR duplicate existing code base which deals with
ir::EncodedTextAst. The reasons to not deleteir::EncodedTextAstand fully replace it withffi::EncodedTextAstare:Prefer to defer the legacy code deprecation to a future PR to:
Performance Benchmark
This PR is tested with a sample IR stream from a private encoded-text-AST-heavy dataset (raw: 8.8GB; zstd-compressed kv-ir: 633MB). All tests are measured in real time.
Search performance
Compression: vs. CLP v0.6.0 (with encoded text AST direct ingestion)
Compression: vs. CLP v0.5.0 (without encoded test AST direct ingestion)
Compression: vs. JSONL
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Refactor
New Features
Tests
Breaking Changes