Skip to content

perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (resolves #1541).#1607

Merged
LinZhihao-723 merged 13 commits into
y-scope:mainfrom
LinZhihao-723:deserialization-opt
Nov 19, 2025

Conversation

@LinZhihao-723

@LinZhihao-723 LinZhihao-723 commented Nov 16, 2025

Copy link
Copy Markdown
Member

Description

This PR fixes #1541 by using string-blob-based encoded text AST (aka. ffi::EncodedTextAst) in kv-ir deserialization. The PR has the following core changes:

  • Add deserialization methods to deserialize and construct ffi::EncodedTextAst from an IR stream.
  • Replace the use of ir::EncodedTextAst by ffi::EncodedTextAst in kv-ir and clp-s implementation.
  • Add encoding method to directly ingest ffi::EncodedTextAst into an archive with decoding and re-parsing.
  • Add parse_and_encode_from method to ffi::EncodedTextAst to avoid duplicating general helpers across different uses (clp-s and test cases).

Some parts of code introduced in this PR duplicate existing code base which deals with ir::EncodedTextAst. The reasons to not delete ir::EncodedTextAst and fully replace it with ffi::EncodedTextAst are:

  • Some of the legacy decoding methods (dealing with the old version of encoded text AST) are directly used in our ffi libraries.
  • These pieces of code are also used in clp-text for ingesting IRv1 to clp-text archive.

Prefer to defer the legacy code deprecation to a future PR to:

  • Keep the scope of this PR smaller (as it's already a bit large)
  • Keep clp-text untouched before we have a clearer plan for adding IRv1 deserialization support in kv-ir.

Performance Benchmark

This PR is tested with a sample IR stream from a private encoded-text-AST-heavy dataset (raw: 8.8GB; zstd-compressed kv-ir: 633MB). All tests are measured in real time.

  • Search performance

    • Before: 28.26s
    • After: 17.89s
    • Speedup: 1.58x
  • Compression: vs. CLP v0.6.0 (with encoded text AST direct ingestion)

    • Before: 71.43s
    • After: 49.79s
    • Speedup: 1.43x
  • Compression: vs. CLP v0.5.0 (without encoded test AST direct ingestion)

    • Before: 88.38s
    • After: 49.79s
    • Speedup: 1.78x
  • Compression: vs. JSONL

    • kv-ir: 49.79s
    • JSONL: 48.8s

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Check all touched sections pass clang-tidy.
  • vs. CLP v0.6.0:
    • Compress the input and decompress the archive to ensure the outputs match.
    • Directly compare the dictionaries and tables inside the archive to ensure they match.

Summary by CodeRabbit

  • Refactor

    • Migrated encoded-text handling onto a public FFI surface and updated consumers to use it.
    • Switched decoding paths to explicit result/outcome-based error handling so failures surface deterministically.
  • New Features

    • Added a parse-and-encode factory and value equality for encoded-text objects.
    • Introduced a compact string-blob type for dictionary data.
  • Tests

    • Updated tests and utilities to use the new encoding API.
  • Breaking Changes

    • Removed a decoding error enum member (may affect error-code consumers).

@LinZhihao-723 LinZhihao-723 requested review from a team, gibber9809 and wraymo as code owners November 16, 2025 02:53
@coderabbitai

coderabbitai Bot commented Nov 16, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Moves EncodedTextAst from clp::ir to clp::ffi backed by StringBlob, adds parse/encode and to_string APIs, introduces Result/std_checked error returns for encode/decode/deserialization, adds an ffi overload of encode_and_add_to_dictionary returning Result<size_t>, updates call sites/tests/CMake, and removes one search error enum member.

Changes

Cohort / File(s) Summary
FFI EncodedTextAst & helpers
components/core/src/clp/ffi/EncodedTextAst.hpp, components/core/src/clp/ffi/EncodedTextAstError.cpp, components/core/src/clp/ffi/EncodedTextAstError.hpp, components/core/src/clp/ffi/StringBlob.hpp
New FFI EncodedTextAst with parse_and_encode_from, to_string, defaulted operator==, aliases EightByteEncodedTextAst/FourByteEncodedTextAst, StringBlob implementation and error types.
EncodedVariableInterpreter API
components/core/src/clp/EncodedVariableInterpreter.hpp
Added new overload encode_and_add_to_dictionary(ffi::EncodedTextAst<...>) -> ystdlib::error_handling::Result<size_t>; unified template parameter name to encoded_variable_t and updated IR overload signatures/comments.
IR stream deserialization (FFI) & helpers
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp, components/core/src/clp/ffi/ir_stream/decoding_methods.cpp, components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
New helpers to deserialize logtype/dict vars into StringBlob; added template deserialize_encoded_text_ast returning boost::outcome_v2::std_checked<EncodedTextAst<...>, IRErrorCode> and refactored callers to use result-based flow.
Search error enum & messages
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp, components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
Removed EncodedTextAstDecodingFailure enum member and its message case (enum value indices changed).
Search utils & filter changes
components/core/src/clp/ffi/ir_stream/search/utils.cpp, components/core/src/clp/ffi/ir_stream/search/test/*, components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
evaluate_clp_string_filter_op now returns ystdlib::error_handling::Result<bool>; tests updated to use parse_and_encode_from; removed internal get_encoded_text_ast helper.
Value / KeyValuePair changes
components/core/src/clp/ffi/Value.hpp, components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
Reordered/qualified encoded-text types in primitive tuple; decode_as_encoded_text_ast rewritten to call to_string() and return std::nullopt on failure.
CLP/CLP_S/CLG/CLO/CMake changes
components/core/src/clp/*/CMakeLists.txt, components/core/src/clp_s/CMakeLists.txt, components/core/src/clp_s/indexer/CMakeLists.txt, components/core/src/clp/make_dictionaries_readable/CMakeLists.txt
Added ffi headers/sources (EncodedTextAst.hpp, EncodedTextAstError.*, StringBlob.hpp) to targets; removed old IR EncodedTextAst sources where applicable; added ystdlib::error_handling link deps.
ClpS consumers: ColumnWriter, JsonParser, ParsedMessage, EvaluateRangeIndexFilters
components/core/src/clp_s/ColumnWriter.cpp, components/core/src/clp_s/JsonParser.cpp, components/core/src/clp_s/ParsedMessage.hpp, components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
Replaced uses of clp::ir::*EncodedTextAst with clp::ffi::*EncodedTextAst; switched decode_and_unparse to to_string() with explicit error handling (DecodingException/Result).
Tests & test utilities
components/core/tests/*, components/core/src/clp/ffi/ir_stream/search/test/*, components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp, components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
Tests updated to construct encoded-text values via EncodedTextAst::parse_and_encode_from(...); added Catch2 message includes; removed get_encoded_text_ast helper and IR includes.
Decoding API public declaration
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
Added public templated declaration for deserialize_encoded_text_ast returning std_checked<EncodedTextAst<...>, IRErrorCode>.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant FFI_AST as ffi::EncodedTextAst
    participant StringBlob
    participant Deserializer
    participant DictWriter

    Caller->>FFI_AST: parse_and_encode_from(text)
    activate FFI_AST
    note right of FFI_AST `#f0f9ff`: encode_message -> encoded_vars, dict bounds
    FFI_AST->>StringBlob: build contiguous buffer (vars + logtype)
    FFI_AST-->>Caller: EncodedTextAst (ok) / error (Result)
    deactivate FFI_AST

    Caller->>Deserializer: deserialize_encoded_text_ast(reader, tag)
    activate Deserializer
    Deserializer->>Deserializer: read encoded_vars & dict entries
    Deserializer->>StringBlob: append slices (logtype + vars)
    Deserializer-->>Caller: std_checked<EncodedTextAst, IRErrorCode> (ok | error)
    deactivate Deserializer

    Caller->>FFI_AST: to_string()
    activate FFI_AST
    FFI_AST->>StringBlob: assemble string from slices
    FFI_AST-->>Caller: string (ok) or error (Result)
    deactivate FFI_AST

    Caller->>DictWriter: encode_and_add_to_dictionary(ffi::EncodedTextAst)
    activate DictWriter
    DictWriter->>FFI_AST: decode slices -> variables
    DictWriter->>DictWriter: populate encoded_vars, var_ids, update dict entry
    DictWriter-->>Caller: Result<size_t> (raw byte estimate | error)
    deactivate DictWriter
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing focused review:

  • encode_and_add_to_dictionary (new FFI overload): correctness of decoding, dictionary side effects, and raw-byte estimate calculation.
  • deserialize_encoded_text_ast/std_checked paths: error propagation and mapping to existing IRErrorCode semantics.
  • Removal of EncodedTextAstDecodingFailure enum member: ensure no callers rely on its numeric value.
  • StringBlob lifetime/ownership and its integration with EncodedTextAst (concatenation/slice correctness).
  • Broad namespace/type migration from clp::ir to clp::ffi across build and public headers (CMake and public API surface).

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All coding objectives from #1541 are satisfied: StringBlob implemented, EncodedTextAst reimplemented using StringBlob, deserialization methods added, and ir::EncodedTextAst replaced with ffi::EncodedTextAst throughout kv-ir and clp-s.
Out of Scope Changes check ✅ Passed The pull request scope is appropriately contained. All changes directly support the primary objective of implementing string-blob-based EncodedTextAst for performance optimization in kv-ir deserialization, with no extraneous modifications.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective: adopting a string-blob-based encoded text AST implementation in kv-ir deserialization to reduce small-string allocation overhead and resolve the linked issue #1541.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfb4526 and b3dd4c1.

📒 Files selected for processing (25)
  • components/core/src/clp/EncodedVariableInterpreter.hpp (4 hunks)
  • components/core/src/clp/clg/CMakeLists.txt (1 hunks)
  • components/core/src/clp/clo/CMakeLists.txt (1 hunks)
  • components/core/src/clp/clp/CMakeLists.txt (2 hunks)
  • components/core/src/clp/ffi/EncodedTextAst.hpp (4 hunks)
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (3 hunks)
  • components/core/src/clp/ffi/StringBlob.hpp (1 hunks)
  • components/core/src/clp/ffi/Value.hpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4 hunks)
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (0 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (0 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (5 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (0 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp (3 hunks)
  • components/core/src/clp_s/CMakeLists.txt (2 hunks)
  • components/core/src/clp_s/ColumnWriter.cpp (2 hunks)
  • components/core/src/clp_s/JsonParser.cpp (2 hunks)
  • components/core/src/clp_s/ParsedMessage.hpp (2 hunks)
  • components/core/src/clp_s/indexer/CMakeLists.txt (3 hunks)
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp (2 hunks)
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (4 hunks)
💤 Files with no reviewable changes (3)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp_s/ParsedMessage.hpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/StringBlob.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp/ffi/Value.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/ColumnWriter.cpp
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
  • components/core/src/clp_s/JsonParser.cpp
🧠 Learnings (45)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp_s/ColumnWriter.cpp
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/ParsedMessage.hpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/ColumnWriter.cpp
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2024-11-26T19:16:19.933Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
📚 Learning: 2024-10-07T20:10:08.254Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2025-08-25T07:27:19.449Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1262
File: docs/src/dev-guide/components-core/index.md:50-50
Timestamp: 2025-08-25T07:27:19.449Z
Learning: In the CLP project, nlohmann_json version 3.11.3 is installed through their dependency management process, but they don't enforce version constraints in CMake find_package calls - they control versions through installation rather than CMake version pinning.

Applied to files:

  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp_s/ParsedMessage.hpp
  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-14T03:43:40.364Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-10-14T03:43:40.364Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-10-14T03:42:53.145Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-10-14T03:42:53.145Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-09-27T23:07:22.429Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:131-139
Timestamp: 2024-09-27T23:07:22.429Z
Learning: In the IR encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/EncodedTextAst.hpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/Value.hpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-13T09:29:56.553Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1283-1286
Timestamp: 2024-10-13T09:29:56.553Z
Learning: `clp::ffi::SchemaTree::Node::id_t` is of type `uint32_t`, so adding `1` to `INT32_MAX` is safe and does not cause integer overflow.

Applied to files:

  • components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/clg/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp/clo/CMakeLists.txt
📚 Learning: 2025-07-01T14:49:07.333Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp/ffi/Value.hpp
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/clg/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.

Applied to files:

  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-11-12T18:47:03.828Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.

Applied to files:

  • components/core/src/clp_s/CMakeLists.txt
  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/src/clp/clg/CMakeLists.txt
  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp_s/indexer/CMakeLists.txt
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/clp/CMakeLists.txt
  • components/core/src/clp_s/indexer/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.hpp
  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-10T15:19:52.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-10-10T15:19:52.408Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.

Applied to files:

  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-10-14T03:38:57.921Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-10-14T03:38:57.921Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.

Applied to files:

  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:629-733
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the `JsonParser::parse_kv_log_event` method in `components/core/src/clp_s/JsonParser.cpp`, prefer passing exceptions to higher levels for more graceful handling, rather than handling them at that level.

Applied to files:

  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp_s/JsonParser.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-07T16:56:53.901Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
📚 Learning: 2025-08-13T15:25:40.805Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:49-127
Timestamp: 2025-08-13T15:25:40.805Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, the FormattedFloatColumnWriter::add_value method intentionally allows std::stod to throw exceptions (std::invalid_argument or std::out_of_range) without catching them, as this is designed to terminate decompression when encountering invalid float strings for data integrity purposes.

Applied to files:

  • components/core/src/clp_s/ColumnWriter.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:265-275
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `generate_IR`, the `run_serializer` function is already a template, and the branching between `int32_t` and `int64_t` occurs outside the loop, making further template-based refactoring unnecessary.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:230-261
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `deserialize_int_val`, the integer type needs to be determined at runtime, so refactoring into a templated helper function does not simplify the logic.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
🧬 Code graph analysis (10)
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (1)
components/core/src/clp/ffi/Value.hpp (5)
  • Value (133-133)
  • Value (142-144)
  • Value (142-142)
  • Value (151-151)
  • Value (151-151)
components/core/src/clp/ffi/EncodedTextAst.hpp (3)
components/core/src/clp/ffi/StringBlob.hpp (5)
  • nodiscard (23-23)
  • nodiscard (30-37)
  • nodiscard (46-60)
  • other (20-20)
  • other (20-20)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
  • nodiscard (528-568)
components/core/src/clp/EncodedVariableInterpreter.hpp (8)
  • logtype (66-68)
  • logtype (66-66)
  • logtype (74-76)
  • logtype (74-74)
  • logtype (82-84)
  • logtype (82-82)
  • logtype (90-92)
  • logtype (90-90)
components/core/src/clp/EncodedVariableInterpreter.hpp (3)
components/core/src/clp/ffi/EncodedTextAst.hpp (5)
  • nodiscard (80-87)
  • nodiscard (94-110)
  • nodiscard (124-126)
  • nodiscard (157-172)
  • encoded_vars (81-82)
components/core/src/clp/ffi/encoding_methods.hpp (4)
  • decode_integer_var (158-158)
  • encode_four_byte_integer_as_eight_byte (147-149)
  • decode_float_var (130-130)
  • encode_four_byte_float_as_eight_byte (75-77)
components/core/src/clp/EncodedVariableInterpreter.cpp (2)
  • encode_var_dict_id (199-201)
  • encode_var_dict_id (199-199)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (46-47)
components/core/src/clp/ffi/ir_stream/utils.hpp (3)
  • deserialize_int (52-52)
  • deserialize_int (156-173)
  • deserialize_int (156-156)
components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
  • deserialize_int (35-35)
  • deserialize_int (137-155)
  • deserialize_int (137-137)
  • is_variable_tag (25-25)
  • is_variable_tag (109-134)
  • is_variable_tag (109-109)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
  • encoded_vars (81-82)
components/core/src/clp/ffi/StringBlob.hpp (1)
components/core/src/clp/ffi/EncodedTextAst.hpp (2)
  • other (121-121)
  • other (121-121)
components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)
components/core/src/clp_s/search/QueryRunner.hpp (10)
  • op (196-200)
  • op (209-210)
  • op (219-223)
  • op (232-233)
  • op (242-246)
  • op (255-259)
  • op (268-272)
  • op (282-287)
  • op (347-351)
  • op (386-390)
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
  • nodiscard (528-568)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (46-47)
components/core/src/clp_s/ColumnWriter.cpp (2)
components/core/src/clp/EncodedVariableInterpreter.hpp (9)
  • encode_and_add_to_dictionary (136-142)
  • encode_and_add_to_dictionary (164-171)
  • encode_and_add_to_dictionary (196-202)
  • encode_and_add_to_dictionary (317-336)
  • encode_and_add_to_dictionary (317-323)
  • encode_and_add_to_dictionary (343-410)
  • encode_and_add_to_dictionary (343-350)
  • encode_and_add_to_dictionary (417-478)
  • encode_and_add_to_dictionary (417-423)
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)
  • DecodingException (37-44)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (7)
  • deserialize_encoded_text_ast (487-525)
  • deserialize_encoded_text_ast (487-493)
  • deserialize_encoded_text_ast (528-529)
  • deserialize_encoded_text_ast (736-742)
  • deserialize_encoded_text_ast (744-750)
  • deserialize_encoded_text_ast (752-755)
  • deserialize_encoded_text_ast (757-760)
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (2)
  • deserialize_encoded_text_ast (109-115)
  • deserialize_encoded_text_ast (130-131)
components/core/src/clp_s/JsonParser.cpp (2)
components/core/src/clp_s/ParsedMessage.hpp (6)
  • node_id (48-50)
  • node_id (48-48)
  • node_id (58-60)
  • node_id (58-58)
  • node_id (68-70)
  • node_id (68-68)
components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)
  • DecodingException (37-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (40)
components/core/src/clp/clo/CMakeLists.txt (2)

22-24: Additions align with PR objectives and appear consistent with build patterns.

The four new files correctly introduce the ffi-namespace implementations (EncodedTextAst, EncodedTextAstError, and StringBlob) needed for the deserialization optimization. The header-only pattern for StringBlob.hpp and EncodedTextAst.hpp is consistent with generic/template-heavy components in the codebase. The retention of legacy ir/EncodedTextAst.* files (lines 43–44) aligns with the PR's rationale to defer full migration.

Also applies to: 32-32


22-24: All newly added FFI files exist and are correctly referenced.

The verification confirms that all four files added to CLO_SOURCES are present at the specified relative paths:

  • ../ffi/EncodedTextAst.hpp
  • ../ffi/EncodedTextAstError.cpp
  • ../ffi/EncodedTextAstError.hpp
  • ../ffi/StringBlob.hpp

The CMakeLists.txt additions are consistent with the PR's migration of EncodedTextAst to the clp::ffi namespace.

components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp (2)

11-11: LGTM: Include updated to reflect new API location.

The include change from ir/EncodedTextAst.hpp to ffi/EncodedTextAst.hpp correctly reflects the migration to the string-blob-based implementation described in the PR objectives.


217-220: Unchecked return value from encode_message() silently allows encoding failures.

The parse_and_encode_from implementation at components/core/src/clp/ffi/EncodedTextAst.hpp:98 calls encode_message() without checking its return value. Since encode_message() returns bool to indicate success or failure, silently ignoring this status means malformed or invalid data could propagate unchecked. The test file shows this return value is checked elsewhere with REQUIRE, establishing the intended pattern.

Additionally, the return value should be compared using false == encode_message(...) per the coding guidelines.

⛔ Skipped due to learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-10-14T03:43:40.364Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/clp/FileCompressor.cpp:189-220
Timestamp: 2024-10-11T16:16:02.866Z
Learning: Ensure that before flagging functions like `parse_and_encode` for throwing exceptions while declared with `noexcept`, verify that the function is actually declared with `noexcept` to avoid false positives.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-clp_s-end_to_end.cpp:141-142
Timestamp: 2024-11-29T22:51:00.355Z
Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using `WIFEXITED` and `WEXITSTATUS` with the return value of `std::system` is acceptable for checking the exit status of system commands, as per the C++ documentation.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
components/core/src/clp/ffi/EncodedTextAst.hpp (4)

6-6: LGTM!

The <cstdint> include is necessary for int32_t usage in the new parse_and_encode_from method.


89-110: LGTM!

The implementation correctly builds a StringBlob from dictionary variable bounds and constructs an EncodedTextAst. The direct return (not wrapped in Result) is appropriate since ffi::encode_message is a parsing function that populates the output parameters and the private constructor is infallible when the StringBlob already contains the logtype.


121-121: LGTM!

The defaulted equality operator is appropriate for value-based comparison and testing. All member types support equality comparison.


187-188: LGTM!

The type aliases improve code readability and follow consistent naming conventions throughout the codebase.

components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)

9-13: LGTM!

The new includes are necessary to support the new deserialize_encoded_text_ast overload returning boost::outcome_v2::std_checked.


57-98: LGTM!

The helper correctly deserializes logtypes into a StringBlob, reusing the existing tag-handling pattern. Error propagation is appropriate.


100-141: LGTM!

The helper correctly deserializes dictionary variables into a StringBlob. The implementation mirrors the existing deserialize_dict_var pattern with appropriate error handling.


752-760: LGTM!

The explicit template specializations are necessary and follow the existing pattern in the file.

components/core/src/clp/ffi/Value.hpp (2)

15-15: LGTM!

The include path change and unqualified type names correctly reflect the migration of EncodedTextAst from the ir namespace to the ffi namespace. The reordering in the tuple doesn't affect functionality since access is type-based.

Also applies to: 33-34


88-101: LGTM!

The ImmutableViewTypeConverter specializations correctly use the unqualified EncodedTextAst types and const references, which is appropriate for the immutable view pattern.

components/core/src/clp/ffi/ir_stream/search/test/test_deserializer_integration.cpp (1)

10-10: LGTM!

The Catch2 message header is a standard testing framework include, consistent with the existing Catch2 usage in the file.

components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (2)

12-12: LGTM!

The includes are necessary for the updated test code that uses EncodedTextAst::parse_and_encode_from and Catch2 message macros.

Also applies to: 26-26


286-289: LGTM!

The migration to EncodedTextAst::parse_and_encode_from is consistent with the new FFI-based API and provides a cleaner interface than the previous helper function approach.

components/core/src/clp/ffi/StringBlob.hpp (1)

20-21: LGTM!

The defaulted equality operator is appropriate for value-based comparison. All member types support equality comparison, enabling proper testing and comparison of StringBlob instances.

components/core/src/clp_s/indexer/CMakeLists.txt (2)

24-26: LGTM!

The new FFI sources are correctly added to support the migration from IR-based to FFI-based EncodedTextAst. The file paths are consistent with the project structure.

Also applies to: 39-39


143-143: LGTM!

The ystdlib::error_handling library is necessary for the new EncodedTextAst API which uses Result types from ystdlib.

components/core/src/clp/clg/CMakeLists.txt (1)

18-20: LGTM!

The new FFI sources are correctly added to the CLG build. The file paths are appropriate for the CLG directory structure, and the existing link to ystdlib::error_handling (line 163) satisfies the dependency requirements.

Also applies to: 24-24

components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)

11-11: LGTM: Include additions support the API migration.

The new includes correctly reference the ffi-based EncodedTextAst header and Catch2 message support needed for the updated encoding path.

Also applies to: 24-24


249-263: LGTM: Consistent migration to the new parse_and_encode_from API.

The changes correctly replace the helper function with the new static factory method for both four-byte and eight-byte encoded text AST variants.


283-298: LGTM: Unmatchable values updated consistently.

The migration pattern is applied correctly to the unmatchable values section, maintaining consistency with the matchable values changes.


916-919: LGTM: Array evaluation test updated correctly.

The unstructured array test case properly uses the new parse_and_encode_from factory method, maintaining test coverage with the updated API.

components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (2)

19-19: LGTM: Include path updated to ffi namespace.

The local include path correctly reflects the migration of EncodedTextAst to the ffi namespace.


187-189: LGTM: Proper error handling with Result type.

The reimplementation correctly uses the new to_string() Result-based API with appropriate error propagation via std::nullopt. The documentation accurately describes the updated behavior.

Also applies to: 423-433

components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (1)

15-15: LGTM: Test file fully migrated to ffi namespace API.

All test cases have been consistently updated to use the ffi namespace types and the new parse_and_encode_from factory method. The test coverage remains intact while using the updated API surface.

Also applies to: 22-24, 31-31, 176-176, 180-180, 235-235, 240-240, 244-245, 249-250, 426-427, 430-431

components/core/src/clp_s/ColumnWriter.cpp (2)

9-16: LGTM: Includes added for error handling and ffi types.

The new includes properly support the error-aware encoding path and formatted exception messages.


106-140: LGTM: Robust error handling with formatted exceptions.

The implementation correctly captures Result types from encode_and_add_to_dictionary and throws DecodingException with detailed error information on failure. The error messages include both category name and message text, providing good diagnostics.

Note: There is some code duplication between the eight-byte and four-byte branches, but this is acceptable given the clear error handling logic.

components/core/src/clp/ffi/ir_stream/decoding_methods.hpp (1)

8-8: LGTM: Result-based deserialization API added.

The new deserialize_encoded_text_ast overload provides a result-wrapped return type, enabling cleaner error handling patterns. The function is well-documented and properly constrained with the ir::EncodedVariableTypeReq concept.

Also applies to: 13-13, 117-131

components/core/src/clp/clp/CMakeLists.txt (1)

22-24: LGTM: Build configuration updated for new ffi components.

The CMakeLists additions correctly include the new EncodedTextAst and StringBlob files from the ffi directory, supporting the public API surface expansion.

Also applies to: 33-33

components/core/src/clp_s/ParsedMessage.hpp (1)

11-11: LGTM: Namespace migration to ffi.

The include path and type aliases are correctly updated to use the ffi namespace for EncodedTextAst types, maintaining consistency with the broader migration effort.

Also applies to: 23-24

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)

147-147: LGTM: Template constraints use modern concept syntax.

The function templates correctly use ir::EncodedVariableTypeReq concept-based constraints, providing clear type requirements.

Also applies to: 453-453


459-471: LGTM: Deserialization logic simplified with Result-based API.

The refactored implementation uses a single deserialize_encoded_text_ast call that returns a Result, eliminating intermediate variables and simplifying the control flow. Error handling is correct with explicit has_error() checking before moving the value.

components/core/src/clp_s/CMakeLists.txt (1)

32-34: CLP_S_CLP_SOURCES updates look consistent with the new ffi EncodedTextAst API

Listing EncodedTextAst.hpp, EncodedTextAstError and StringBlob.hpp here matches their usage in JsonParser.cpp and EncodedVariableInterpreter.hpp, and should keep the clp_s targets building without extra link changes.

Also applies to: 63-63

components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)

116-131: Make Result<bool> returns explicit in filter helpers

evaluate_clp_string_filter_op and evaluate_filter_against_literal_type_value_pair both return ystdlib::error_handling::Result<bool> but still return bare bool values in several branches (for example Lines 258–260 and the non-error cases in evaluate_filter_against_literal_type_value_pair). This assumes Result<bool> has an implicit constructor from bool; if it does not, this will fail to compile, and even if it does compile the intent is a bit opaque.

It would be safer and clearer to wrap these in explicit success results (or equivalent helper) so all return sites are uniform with the new error‑propagating API, while keeping the TRYX-based error forwarding from EncodedTextAst::to_string unchanged.

Could you confirm that ystdlib::error_handling::Result<bool> is explicitly constructible from bool? If not, the affected return sites should be updated accordingly. If you want, I can sketch a small refactor that makes all Result<bool> returns explicit.

Also applies to: 251-269, 327-373

components/core/src/clp/EncodedVariableInterpreter.hpp (1)

10-13: Confirm add_constant semantics with string_view‑based decoding

The new ffi‑based encode_and_add_to_dictionary overload closely mirrors the existing ir implementation, but with one behavioural shift in the constant handler:

  • Previously, the handler received (std::string const& value, size_t begin_pos, size_t length) from generic_decode_message and called logtype_dict_entry.add_constant(value, begin_pos, length).
  • Now, EncodedTextAst::decode provides std::string_view constant, and the handler calls logtype_dict_entry.add_constant(constant, 0, constant.length()).

If LogTypeDictionaryEntryType::add_constant simply appends value.substr(begin_pos, length) (or equivalent), this remains correct and reconstructs the logtype while preserving raw_num_bytes semantics. If, however, add_constant assumes begin_pos/length are offsets into a shared, longer backing string (as in the message‑parsing path), passing 0 here could subtly change behaviour.

It would be good to double‑check add_constant’s contract to ensure it treats the first argument as the full slice source for each call, not as a single global buffer. If you’d like, I can draft an alternative handler that accumulates the full logtype in one string and uses that consistently when calling add_constant.

Also applies to: 173-203, 412-478

components/core/src/clp_s/JsonParser.cpp (2)

18-24: kv‑ir deserialization error handling and new includes are coherent

The new includes for fmt::format, clp::ErrorCode and ffi::EncodedTextAst align with the later use of DecodingException and the ffi EncodedTextAst types. In ingest_kvir, using Deserializer<IrUnitHandler>::create with error logging on failure, and treating per‑record deserialize_next_ir_unit errors as a truncated stream (WARN + break), is a reasonable trade‑off for robustness when ingesting potentially incomplete kv‑ir inputs.

Just ensure this “treat any deserialization error as truncation” behaviour matches expectations for upstream producers; if some error codes should instead be considered fatal, they may need to be distinguished here.

If you want, I can suggest a small pattern for branching on specific error codes while still treating common truncation cases as non‑fatal.

Also applies to: 842-852, 933-995


1019-1051: EncodedTextAst handling for ClpString and UnstructuredArray nodes looks correct, with one assumption to validate

For NodeType::ClpString and NodeType::UnstructuredArray, the new logic:

  • Distinguishes between clp::ffi::EightByteEncodedTextAst and clp::ffi::FourByteEncodedTextAst using Value::is<T>().
  • Stores the EncodedTextAst directly in ParsedMessage when the node is not a timestamp, preserving the encoded representation.
  • For timestamp‑matching ClpString nodes, decodes via to_string() and on failure throws DecodingException with clp::ErrorCode_Failure and a detailed fmt::format message, which is then caught and logged in ingest_kvir.

This is consistent with get_archive_node_type’s mapping of SchemaTree::Node::Type::Str to NodeType::VarString (backed by std::string) vs NodeType::ClpString (backed by EncodedTextAst), so pair.second.value() should indeed hold one of the two EncodedTextAst instantiations whenever you hit those cases.

Please confirm that ParsedMessage has add_value overloads for both clp::ffi::EightByteEncodedTextAst and clp::ffi::FourByteEncodedTextAst, and that no other Value alternative can occur for NodeType::ClpString or NodeType::UnstructuredArray. If desired, I can propose a small static_assert or helper to make that invariant explicit.

Also applies to: 1171-1283

Comment thread components/core/src/clp/clo/CMakeLists.txt
Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
@kirkrodrigues kirkrodrigues removed the request for review from wraymo November 16, 2025 09:38

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few minor comments.

Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Outdated
*/
template <ir::EncodedVariableTypeReq encoded_variable_t>
[[nodiscard]] auto deserialize_encoded_text_ast(ReaderInterface& reader, encoded_tag_t encoded_tag)
-> boost::outcome_v2::std_checked<EncodedTextAst<encoded_variable_t>, IRErrorCode>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ystdlib::error_handling::Result? I'm probably missing some context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has sth to do with the no_value_policy, which defaults to sth that checks whether the error type can be converted into std::errc. In our case it's not, so we can't use std_result.
Ref: https://www.boost.org/doc/libs/latest/libs/outcome/doc/html/reference/types/basic_result.html

Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp Outdated
* - Forwards `ffi::EncodedTextAst::decode`'s return values on failure.
*/
template <
ir::EncodedVariableTypeReq encoded_variable_t,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also use this requirement for the other encode_and_add_to_dictionary method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do u want me to include in this PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)

567-567: Remove unnecessary std::move.

The std::move is unnecessary due to automatic return value optimization (RVO). The value can be returned directly.

Apply this diff:

-    return std::move(encoded_text_ast_result.value());
+    return encoded_text_ast_result.value();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 373e0c1 and f68bed4.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧠 Learnings (19)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-09-27T23:07:22.429Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/clp-s.cpp:131-139
Timestamp: 2024-09-27T23:07:22.429Z
Learning: In the IR encoding method tests, `int8_t` is used for byte buffers instead of `uint8_t`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (46-47)
components/core/src/clp/ffi/ir_stream/utils.hpp (3)
  • deserialize_int (52-52)
  • deserialize_int (156-173)
  • deserialize_int (156-156)
components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
  • deserialize_int (35-35)
  • deserialize_int (137-155)
  • deserialize_int (137-137)
  • is_variable_tag (25-25)
  • is_variable_tag (109-134)
  • is_variable_tag (109-109)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
  • encoded_vars (81-82)
🔇 Additional comments (8)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)

11-11: LGTM: Includes added for new API.

The new includes support the migration to the parse_and_encode_from API and Catch2 message handling.

Also applies to: 24-24


254-262: Excellent: Implemented previous feedback to use explicit types.

The change from template-based get_encoded_text_ast<ir::four_byte_encoded_variable_t>() to explicit FourByteEncodedTextAst::parse_and_encode_from() improves readability as suggested in the previous review.


289-296: LGTM: Consistent API migration.

The changes here mirror those in get_matchable_values, maintaining consistency across the test helper functions.


917-919: LGTM: Test case updated with new API.

The array evaluation test correctly uses FourByteEncodedTextAst::parse_and_encode_from(), completing the migration across all test cases.

components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)

9-13: LGTM! Necessary includes for the new StringBlob-based implementation.

The new includes support the outcome-based error handling and StringBlob-backed EncodedTextAst.


100-141: LGTM! Well-structured helper for dictionary variable deserialization.

The implementation correctly handles different tag widths, propagates errors appropriately, and follows the coding guidelines.


527-568: LGTM! Excellent implementation of StringBlob-based deserialization.

The new overload successfully eliminates per-string allocations by accumulating data into a StringBlob before constructing the EncodedTextAst. The error handling is thorough, using outcome-based returns to propagate errors clearly. This implementation directly addresses the performance objectives outlined in issue #1541.


752-760: LGTM! Proper template specialization declarations.

The explicit template instantiations for both four-byte and eight-byte encoded variables are necessary and correctly declared.

Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (1)

11-26: File organisation: Consider grouping new ffi files more cohesively.

The new files are added in a somewhat scattered manner within the ffi section (EncodedTextAst/Error files at 14–16, StringBlob at 26). While the current organisation is functional, grouping all new ffi-related files consecutively could improve maintainability and reduce future merge conflicts. However, this is a minor preference and only worth addressing if you anticipate frequent changes in this area.

components/core/src/clp/EncodedVariableInterpreter.hpp (1)

148-171: Template constraint and internal naming for IR EncodedTextAst overload

Switching the template parameter to ir::EncodedVariableTypeReq encoded_variable_t and updating all internal uses (including the handlers and std::is_same_v checks) keeps the function well-constrained and consistent with the rest of the encoded-variable APIs. Behaviour is unchanged and looks correct.

One minor readability point: using the name encoded_variable_t for the template parameter inside the class shadows the global alias clp::encoded_variable_t. It is valid but slightly increases cognitive load when reading this file; consider a distinct template name (e.g., EncodedVarT) in future clean-ups if this becomes confusing.

Also applies to: 338-410

♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)

527-568: Outcome‑based deserialize_encoded_text_ast is sound; trim redundant move and confirm error mapping.

The new overload correctly mirrors the existing variable/logtype deserialization logic while building a StringBlob and wrapping the result in boost::outcome_v2::std_checked<EncodedTextAst<encoded_variable_t>, IRErrorCode>. A couple of small follow‑ups:

  • On failure of EncodedTextAst::create, you map all errors to IRErrorCode_Corrupted_IR (Lines 560–565). That’s reasonable if create’s failures are exclusively due to invalid IR (e.g., inconsistent offsets). If create can also fail for non‑corruption reasons (like internal limits), consider documenting that assumption or, if feasible, mapping those cases differently.

  • The final return std::move(encoded_text_ast_result.value()); (Line 567) doesn’t need std::move: returning the local value expression directly lets NRVO/RVO do their job and was already flagged in an earlier review. You can safely simplify to:

  • return std::move(encoded_text_ast_result.value());

  • return encoded_text_ast_result.value();




```web
Check whether `ystdlib::error_handling::Result` (used by `EncodedTextAst::create`) documents that its error states for creation failures all correspond to invalid/corrupted encoded text AST, and not to other categories of failure (e.g., resource limits).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f68bed4 and 8b5401e.

📒 Files selected for processing (4)
  • components/core/src/clp/EncodedVariableInterpreter.hpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (5 hunks)
  • components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
🧠 Learnings (22)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-08-06T23:57:49.762Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 1163
File: components/core/src/clp_s/ColumnWriter.hpp:120-122
Timestamp: 2025-08-06T23:57:49.762Z
Learning: In the CLP codebase, the `encoded_log_dict_id_t` type alias in `components/core/src/clp_s/ColumnWriter.hpp` is appropriately placed as a class-scoped alias because the ColumnWriter.hpp header is already included in all places where this type is used, eliminating the need to move it to a common header.

Applied to files:

  • components/core/src/clp/make_dictionaries_readable/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.

Applied to files:

  • components/core/src/clp/make_dictionaries_readable/CMakeLists.txt
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-18T02:31:18.595Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-14T03:39:35.585Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-12-09T15:25:14.043Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2025-05-11T22:22:49.286Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-10-08T15:52:50.753Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-24T14:45:26.265Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-14T03:45:21.323Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:331-335
Timestamp: 2024-10-14T03:45:21.323Z
Learning: When reviewing code, ensure that suggestions about unnecessary use of `std::move` with `std::shared_ptr` are only made if `std::move` is actually used in the code.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-12-10T16:03:08.691Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-12-10T16:03:13.322Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
Repo: y-scope/clp PR: 1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
Repo: y-scope/clp PR: 700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-11-12T18:46:20.933Z
Learnt from: gibber9809
Repo: y-scope/clp PR: 584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
📚 Learning: 2024-10-08T15:52:50.753Z
Learnt from: AVMatthews
Repo: y-scope/clp PR: 543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.

Applied to files:

  • components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
📚 Learning: 2025-04-26T02:21:22.021Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/EncodedVariableInterpreter.hpp
🧬 Code graph analysis (2)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (4)
components/core/src/clp/ffi/StringBlob.hpp (1)
  • reader (46-47)
components/core/src/clp/ffi/ir_stream/utils.hpp (3)
  • deserialize_int (52-52)
  • deserialize_int (156-173)
  • deserialize_int (156-156)
components/core/src/glt/ffi/ir_stream/decoding_methods.cpp (6)
  • deserialize_int (35-35)
  • deserialize_int (137-155)
  • deserialize_int (137-137)
  • is_variable_tag (25-25)
  • is_variable_tag (109-134)
  • is_variable_tag (109-109)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
  • encoded_vars (81-82)
components/core/src/clp/EncodedVariableInterpreter.hpp (4)
components/core/src/clp/ffi/EncodedTextAst.hpp (1)
  • encoded_vars (81-82)
components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (1)
  • nodiscard (528-568)
components/core/src/clp/ffi/encoding_methods.hpp (4)
  • decode_integer_var (158-158)
  • encode_four_byte_integer_as_eight_byte (147-149)
  • decode_float_var (130-130)
  • encode_four_byte_float_as_eight_byte (75-77)
components/core/src/clp/EncodedVariableInterpreter.cpp (2)
  • encode_var_dict_id (199-201)
  • encode_var_dict_id (199-199)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (10)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (4)

11-11: LGTM! Includes are appropriate for the new API.

The new includes support the migration to FourByteEncodedTextAst and EightByteEncodedTextAst.

Also applies to: 24-24


253-254: LGTM! Migration to explicit type names is consistent with past review feedback.

The use of FourByteEncodedTextAst::parse_and_encode_from and EightByteEncodedTextAst::parse_and_encode_from aligns with the preference for explicit type names over template syntax.


280-285: LGTM! Consistent API migration.

The changes maintain consistency with the migration pattern established in get_matchable_values.


905-905: LGTM! Completes the migration to the new API.

The array evaluation test correctly uses FourByteEncodedTextAst::parse_and_encode_from, maintaining consistency throughout the file.

components/core/src/clp/make_dictionaries_readable/CMakeLists.txt (3)

1-95: Build configuration changes align with PR objectives.

The additions to MAKE_DICTIONARIES_READABLE_SOURCES and the new dependency are consistent with the PR's goal to migrate EncodedTextAst to ffi::EncodedTextAst backed by StringBlob. The changes follow existing patterns in the CMakeLists.txt for including header-only files and linking optional dependencies.


14-16: Verification complete — all files and dependencies are correctly configured.

Confirmed that all four new files are present at their referenced paths:

  • ../ffi/EncodedTextAst.hpp
  • ../ffi/EncodedTextAstError.cpp
  • ../ffi/EncodedTextAstError.hpp
  • ../ffi/StringBlob.hpp

The ystdlib::error_handling target is properly linked at line 86, and the ystdlib package is correctly declared as REQUIRED in the parent CMakeLists.txt. The CMakeLists.txt changes are complete and accurate.


86-86: No issues found—ystdlib::error_handling target is properly available.

The target is correctly imported via find_package(ystdlib REQUIRED) in components/core/CMakeLists.txt, and verification confirms that the error_handling target exists in ystdlib-cpp at commit 9ed78cd (the version specified in taskfiles/deps/main.yaml). The target is used consistently across 14 build targets in the codebase alongside other ystdlib targets like ystdlib::containers, confirming it is properly exported and available.

components/core/src/clp/ffi/ir_stream/decoding_methods.cpp (2)

9-13: Includes for Outcome/EncodedTextAst/StringBlob look correct.

The added headers line up with the new std_checked return type and EncodedTextAst/StringBlob usage; no issues from an interface or dependency standpoint.


752-760: Explicit template instantiations for new overload look good.

The explicit instantiations for both four‑ and eight‑byte encoded variable types match the new std_checked<EncodedTextAst<...>, IRErrorCode> overload and ensure both encodings are available at link time; this is consistent with the existing pattern for the IRErrorCode‑returning overload.

components/core/src/clp/EncodedVariableInterpreter.hpp (1)

10-15: Includes updated appropriately for Result and ffi EncodedTextAst

The added includes for ystdlib::error_handling::Result and ffi::EncodedTextAst align with the new overload and error-handling macro usage; no issues from a dependency or layering perspective.

Comment thread components/core/src/clp/EncodedVariableInterpreter.hpp
Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp
Comment thread components/core/src/clp/ffi/ir_stream/decoding_methods.cpp

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. For PR title how about:

perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (resolves #1541).

Using resolves instead of fixes, because #1541 isn't really a bug.

@LinZhihao-723 LinZhihao-723 changed the title perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (fixes #1541). perf(kv-ir): Use string-blob-based encoded text AST implementation in kv-ir deserialization to avoid small string allocation overhead (resolves #1541). Nov 18, 2025
@LinZhihao-723

Copy link
Copy Markdown
Member Author

@coderabbitai Create an issue to track that we need to deprecate clp::ir::EncodedTextAst.

@coderabbitai

coderabbitai Bot commented Nov 18, 2025

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@LinZhihao-723: I'll create an issue to track the deprecation of clp::ir::EncodedTextAst.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track the deprecation of clp::ir::EncodedTextAst
gh issue create \
  --repo y-scope/clp \
  --title "Deprecate clp::ir::EncodedTextAst in favour of clp::ffi::EncodedTextAst" \
  --body "## Context

As part of PR #1607, we migrated kv-ir and clp-s implementations to use the string-blob-based \`clp::ffi::EncodedTextAst\` instead of \`clp::ir::EncodedTextAst\` to reduce small-string allocation overhead and improve deserialization performance.

However, \`clp::ir::EncodedTextAst\` is still retained in the codebase because:
- Legacy decoding methods and ffi libraries still use the older AST
- clp-text ingestion paths rely on IRv1

## Task

Deprecate and eventually remove \`clp::ir::EncodedTextAst\` by migrating all remaining usage to \`clp::ffi::EncodedTextAst\`.

## References

- Requested by: @LinZhihao-723
- Related PR: https://github.com/y-scope/clp/pull/1607
- Comment: https://github.com/y-scope/clp/pull/1607#discussion_r1234567890" \
  --assignee LinZhihao-723

Length of output: 42


@LinZhihao-723: I've created issue #1622 to track the deprecation of clp::ir::EncodedTextAst in favour of clp::ffi::EncodedTextAst. The issue has been assigned to you and includes context from this PR.

🐰📝


🧠 Learnings used
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.

@LinZhihao-723 LinZhihao-723 merged commit 8c539bb into y-scope:main Nov 19, 2025
25 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
… kv-ir deserialization to avoid small string allocation overhead (resolves y-scope#1541). (y-scope#1607)

Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve kv-ir deserialization speed

2 participants