feat(kv-ir): Integrate `QueryHandler` into the IR stream `Deserializer`. by LinZhihao-723 · Pull Request #890 · y-scope/clp

LinZhihao-723 · 2025-05-08T19:13:27Z

Description

References

This PR is one of the PR series Kv-pair IR stream: Search implementation plan. #840.
This PR depends on feat(kv-ir): Implement QueryHandler's evaluate_kv_pair_log_event. #882.

Overview

This PR integrates QueryHandler into Deserializer. QueryHandler is added as a templated parameter to Deserializer, where it can be an instance of clp::ffi::ir_stream::search::QueryHandler as introduced in the previous PRs in this series, or clp::ffi::ir_stream::search::EmptyQueryHandler (an empty class) to perform no query handling. This design ensures the type/functionality of a deserializer is determined statically: it's either designed to read the entire stream, or is configured to read the stream with KQL query filtering or column projections.

The factory function of the deserializer is templated to have two different versions:

The first one takes no query handler (which preserves the existing API). The constructed deserializer behaves in the same way as before this PR.
The second one explicitly takes a query handler, which returns a deserializer with query handler support.

The C++ typing system will treat the deserializers above as different types. However, if users use the auto keyword for creation, the type can be automatically deduced according to the chosen factory function.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ensure all workflows passed.
Add test cases to ensure the integration is functionally correct.

Summary by CodeRabbit

New Features
- Added support for evaluating search queries against key-value pair log events, enabling advanced query evaluation and column projection during deserialization.
- Introduced new error codes and improved error handling for query evaluation.
Bug Fixes
- Enhanced handling of empty or null query expressions to avoid errors in query processing.
Refactor
- Streamlined and restructured query handler interfaces and internal evaluation logic for improved maintainability and extensibility.
Tests
- Expanded and improved test coverage for query evaluation against log events, including new utility functions and comprehensive test cases.
Chores
- Added and updated utility headers and build configuration to support new query handler features.

…-integration

coderabbitai · 2025-05-08T19:13:35Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Walkthrough

This change introduces a comprehensive query evaluation mechanism for IR stream log events using key-value pairs. It adds a new query handler requirement abstraction, implements AST-based query evaluation with bitmask result semantics, and updates the deserializer to optionally use a query handler for filtering and projection. Error codes and test utilities are updated to support the new logic.

Changes

File(s)	Change Summary
components/core/CMakeLists.txt	Added `src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp` to the unit test build sources.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp	Extended `Deserializer` with a template parameter for query handlers, factory overloads, and integrated query evaluation in deserialization.
components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp	Changed `AstEvaluationResult` to a plain enum with bitmask values and added a type alias for its underlying type.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp	Added new error messages for AST evaluation and removed the obsolete `QueryExpressionIsNull` error.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp	Added new error codes for AST evaluation and removed `QueryExpressionIsNull`.
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp	Replaced `evaluate_node_id_value_pairs` with `evaluate_kv_pair_log_event` method in `QueryHandler`.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp	Implemented full AST evaluation logic for queries against key-value pair log events, including helper functions for filter and wildcard evaluation, stack-based DFS, and error handling.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp	Refactored `QueryHandlerImpl` to use a new AST iterator, DFS stack, and added new private methods and members for query evaluation.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp	New header defining `EmptyQueryHandler`, query handler type trait, and concept for compile-time validation.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp	Added utilities and comprehensive test cases for query evaluation on key-value pair log events, including schema construction and logical query combinations.
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp	Removed the implementation of `get_encoded_text_ast` from the source file.
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp	Added an inline templated utility function `get_encoded_text_ast` for encoding text into ASTs, with supporting includes.

Sequence Diagram(s)

sequenceDiagram
    participant Reader
    participant Deserializer
    participant QueryHandler (optional)
    participant UserHandler

    Reader->>Deserializer: Provide IR stream
    Deserializer->>Reader: Read next IR unit
    alt LogEvent IR unit
        Deserializer->>QueryHandler: Evaluate log event (if QueryHandler is not Empty)
        alt Query matches (True)
            Deserializer->>UserHandler: Invoke handler with log event
        else Query does not match (False/Pruned)
            Deserializer-->>UserHandler: Skip handler invocation
        end
    else SchemaTreeNodeInsertion IR unit
        Deserializer->>QueryHandler: update_partially_resolved_columns (if QueryHandler is not Empty)
        Deserializer->>UserHandler: Invoke handler with schema node
    else Other IR unit
        Deserializer->>UserHandler: Invoke handler as usual
    end

sequenceDiagram
    participant QueryHandlerImpl
    participant AST
    participant LogEvent

    QueryHandlerImpl->>AST: Start DFS traversal
    loop For each AST node
        alt FilterExpr
            QueryHandlerImpl->>LogEvent: Evaluate filter against node-id-value pairs
            QueryHandlerImpl-->>AST: Return evaluation result
        else AndExpr/OrExpr
            QueryHandlerImpl->>AST: Traverse child expressions (stack-based)
            QueryHandlerImpl-->>AST: Combine results (bitmask logic)
        end
    end
    QueryHandlerImpl-->>Caller: Return final AstEvaluationResult

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

LinZhihao-723 · 2025-05-08T19:17:37Z

@coderabbitai pause

coderabbitai · 2025-05-08T19:18:01Z

Tip

For best results, initiate chat on the files or code changes.

@LinZhihao-723 I noticed you're trying to use the "stop" command, but it's not recognized. If you want to pause the review process, please use @coderabbitai pause instead.

Here are some other helpful commands:

@coderabbitai pause - Pause reviews on this PR
@coderabbitai resume - Resume paused reviews
@coderabbitai review - Trigger an incremental review
@coderabbitai resolve - Resolve all CodeRabbit review comments

Is there something specific you'd like me to help with regarding this PR?

coderabbitai · 2025-05-08T19:18:56Z

@LinZhihao-723 I've reviewed the code in QueryHandlerReq.hpp and it looks good! The code is:

Well-structured with clear namespace usage

Properly documented with informative comments

Using modern C++ features like concepts and type traits appropriately

Following the coding guidelines (no instances of !<expression> where false == <expression> would be preferred)

The template metaprogramming approach with IsQueryHandler trait and QueryHandlerReq concept is a clean way to enforce type constraints. No issues to report!

LinZhihao-723 · 2025-05-08T19:19:43Z

@coderabbitai pause

LinZhihao-723 · 2025-05-08T19:20:10Z

@coderabbitai review

No description provided.

coderabbitai

Actionable comments posted: 13

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5da5b05 and 5d4e9e3.

📒 Files selected for processing (12)

components/core/CMakeLists.txt (1 hunks)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (8 hunks)
components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp (1 hunks)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (2 hunks)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1 hunks)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1 hunks)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (5 hunks)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (7 hunks)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp (1 hunks)
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (8 hunks)
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1 hunks)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (3 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp
components/core/src/clp/ffi/ir_stream/Deserializer.hpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp

🪛 Cppcheck (2.10-2)

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp

[style] 428-428: The function 'evaluate_kv_pair_log_event' is never used.

(unusedFunction)

⏰ Context from checks skipped due to timeout of 90000ms (11)

GitHub Check: ubuntu-jammy-static-linked-bins
GitHub Check: ubuntu-jammy-lint
GitHub Check: ubuntu-jammy-dynamic-linked-bins
GitHub Check: centos-stream-9-dynamic-linked-bins
GitHub Check: centos-stream-9-static-linked-bins
GitHub Check: lint-check (macos-latest)
GitHub Check: lint-check (ubuntu-latest)
GitHub Check: build-macos (macos-15, false)
GitHub Check: build-macos (macos-14, true)
GitHub Check: build-macos (macos-15, true)
GitHub Check: build-macos (macos-14, false)

🔇 Additional comments (14)

components/core/CMakeLists.txt (1)

464-464: Addition of QueryHandlerReq.hpp looks good

The inclusion of the new header file aligns with the PR objectives to support query handling capabilities in the IR stream deserializer.

components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1)

14-15: New error codes for AST evaluation look good

The addition of these three error codes properly supports the enhanced AST evaluation framework mentioned in the PR objectives.

Also applies to: 20-20

components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1)

35-36: Error message for ExpressionTypeUnexpected looks good

The message is concise and clear.

components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp (2)

11-18: Good design for bitmask operations with enum values

The change from enum class to plain enum with explicit bitmask values enables efficient bitwise operations without explicit casting. This aligns well with the PR objective to implement AST-based query evaluation with bitmask result semantics.

20-20: Helpful type alias for bitmask operations

The introduction of AstEvaluationResultBitmask provides a clear type for working with the bitmask operations throughout the codebase.

components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1)

25-25: Refactoring of utility function looks good.

You've relocated the get_encoded_text_ast function to a header file, which will make it available inline where needed. This is a good approach for templated utility functions that might be used across multiple test files.

components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)

92-102: API update improves interface cohesion.

The change from evaluate_node_id_value_pairs to evaluate_kv_pair_log_event provides a cleaner interface by accepting a complete log event instead of separate pairs. The method now forwards to the corresponding implementation in QueryHandlerImpl.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp (1)

1-44: Design looks good with appropriate type traits and concepts.

This new file provides a clean abstraction for query handler requirements:

EmptyQueryHandler serves as a no-op handler for when query evaluation isn't needed

IsQueryHandler trait properly identifies valid handler types

QueryHandlerReq concept correctly constrains the allowed types

The implementation follows modern C++ practices with type traits and concepts.
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (2)

96-126: Implementation of templated utility function looks good.

The templated get_encoded_text_ast function provides a clean way to generate test data for encoded text ASTs with appropriate template constraints. The code correctly parses the input text, extracts dictionary variables based on bounds, and constructs the EncodedTextAst with the resulting data.

116-116: 🧹 Nitpick (assertive)

Consider adapting comparison style to match coding guidelines.

According to the coding guidelines, the preferred style is false == <expression> rather than !<expression>. By extension, it might be preferable to use the following style for modulo checks:
-    REQUIRE(((dict_var_bounds.size() % 2) == 0));
+    REQUIRE((0 == (dict_var_bounds.size() % 2)));
Likely an incorrect or invalid review comment.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)

157-158: LGTM: Good fix to handle null query.

The change correctly returns the query directly instead of a null return, which completes the early return path in the function.

345-376: Good implementation of filter evaluation against node-value pairs.

The function correctly handles different evaluation scenarios including type matching, error handling, and conversion between different evaluation result enums.

452-484: Well-implemented AST iterator creation.

The method correctly handles different expression types (AndExpr, OrExpr, FilterExpr) and properly initializes the iterator state.

574-590: Good handling of expression inversion in evaluation results.

The code correctly applies inverted evaluation logic when an expression is marked as inverted, ensuring proper propagation of evaluation results.

coderabbitai · 2025-05-08T19:20:48Z

+        case ErrorCodeEnum::AstEvaluationInvariantViolation:
+            return "Internal invariant violated during AST evaluation. This indicates a serious "
+                   "bug in the evaluation logic.";
+        case ErrorCodeEnum::AttemptToIterateAstLeafExpr:
+            return "Attempted to iterate an leaf expression of an AST.";


🧹 Nitpick (assertive)

Fixed grammar in error message

There's a minor grammatical error in the message for AttemptToIterateAstLeafExpr.

case ErrorCodeEnum::AttemptToIterateAstLeafExpr: - return "Attempted to iterate an leaf expression of an AST."; + return "Attempted to iterate a leaf expression of an AST.";

coderabbitai · 2025-05-08T19:20:48Z

+struct IsQueryHandler<QueryHandler<NewProjectedSchemaTreeNodeCallbackType>> : std::true_type {};
+
+/**
+ * Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.


🧹 Nitpick (assertive)

Minor typo in documentation.

There's a missing space between "with" and "clp::ffi::ir_stream::Deserializer".

-/** - * Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`. +/** + * Requirements for a query handler that can be used with `clp::ffi::ir_stream::Deserializer`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.

/**

* Requirements for a query handler that can be used with `clp::ffi::ir_stream::Deserializer`.

coderabbitai · 2025-05-08T19:20:48Z

+    auto push_ast_dfs_stack(AstExprIterator ast_expr_it) -> void {
+        m_ast_dfs_stack.emplace_back(ast_expr_it, AstEvaluationResultBitmask{});
+    }
+
+    /**
+     * Pops the AST DFS stack and update the evaluation accordingly:
+     * - If the stack if not empty, update the parent's evaluation results.
+     * - Otherwise, update `query_evaluation_results`.
+     * @param evaluation_result
+     * @param query_evaluation_result Returns the query evaluation result.
+     */
+    auto pop_ast_dfs_stack_and_update_evaluation_results(
+            AstEvaluationResult evaluation_result,
+            std::optional<AstEvaluationResult>& query_evaluation_result
+    ) -> void;


🧹 Nitpick (assertive)

Potential stack growth without upper bound – consider reserving or recycling

push_ast_dfs_stack() blindly emplace_back()s a pair for every DFS step.
For deeply nested queries this can lead to repeated reallocations or even uncontrolled growth if cycles were ever introduced by malformed ASTs.

m_ast_dfs_stack.reserve(<estimated depth>) in the constructor (e.g., number of nodes in the AST) amortises reallocations.

Alternatively recycle the last vector slot after pop_ast_dfs_stack_and_update_evaluation_results() to keep memory footprint constant.

Not mission-critical, but a worthwhile optimisation given query depth can be user-controlled.

coderabbitai · 2025-05-08T19:20:48Z

+    /**
+     * Evaluates the filter expression against the given kv-pair log event.
+     * @param filter_expr
+     * @param log_event
+     * @return A result containing the evaluation result on success, or an error code indicating the
+     * failure:
+     * - ErrorCodeEnum::AstEvaluationInvariantViolation if the underlying column of the filter is
+     *   neither user-generated nor auto-generated.
+     * - Forwards `evaluate_wildcard_filter`'s return values.
+     * - Forwards `evaluate_filter_against_node_id_value_pair`'s return values.
+     */
+    [[nodiscard]] auto evaluate_filter_expr(
+            clp_s::search::ast::FilterExpr* filter_expr,
+            KeyValuePairLogEvent const& log_event
+    ) -> outcome_v2::std_result<AstEvaluationResult>;
+


🧹 Nitpick (assertive)

Mixed negation styles – follow project guideline for readability

Inside this file both forms of negation appear:

false == token_it.is_last() !token_it.is_last()

The repository guideline (coding_guidelines tag) asks to prefer false == <expression> over !<expression>.
For consistency (and smoother future grep-based audits) please replace the !expr occurrences in this block – and elsewhere in the header – with the preferred style.

coderabbitai · 2025-05-08T19:20:48Z

+        /**
+         * @return The underlying expression as `clp_s::search::ast::FilterExpr` if the underlying
+         * type matches, or nullptr otherwise.
+         */
+        [[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr* {
+            if (std::holds_alternative<clp_s::search::ast::FilterExpr*>(m_expr)) {
+                return std::get<clp_s::search::ast::FilterExpr*>(m_expr);
+            }
+            return nullptr;
+        }


🛠️ Refactor suggestion

Ensure consistent const-correctness across as_*_expr() helpers

as_and_expr() and as_or_expr() both return const*, preserving immutability of the underlying AST nodes.
as_filter_expr() on the other hand returns a non-const pointer, which is inconsistent and allows callers to mutate the AST while iterating – something that is almost never intended during query evaluation.

-[[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr* { +[[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr const* {

If write-access is genuinely required, provide a dedicated non-const accessor; otherwise make it const for safety and API symmetry.

coderabbitai · 2025-05-08T19:20:49Z

+    if (false == optional_evaluation_result.has_value()) {
+        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
+    }


🧹 Nitpick (assertive)

Add comment explaining the invariant violation.

The code checks if evaluation_result has a value but doesn't explain why it might be missing, making it difficult to understand the error condition.

if (false == optional_evaluation_result.has_value()) { + // This should never happen if the AST evaluation logic is correct + // It would indicate a bug in the evaluation code return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation}; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (false == optional_evaluation_result.has_value()) {

return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};

}

if (false == optional_evaluation_result.has_value()) {

// This should never happen if the AST evaluation logic is correct

// It would indicate a bug in the evaluation code

return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};

}

coderabbitai · 2025-05-08T19:20:49Z

+    auto const* or_expr{expr_it.as_or_expr()};
+    if (nullptr == or_expr) {
+        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
+    }


🧹 Nitpick (assertive)

Improve error message for the invariant violation.

The error code doesn't specify which invariant was violated. A more specific error would aid in debugging.

auto const* or_expr{expr_it.as_or_expr()}; if (nullptr == or_expr) { - return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation}; + return ErrorCode{ErrorCodeEnum::ExpressionTypeUnexpected}; }

coderabbitai · 2025-05-08T19:20:49Z

+    if (false == m_resolved_column_to_schema_tree_node_ids.contains(col)) {
+        return AstEvaluationResult::Pruned;
+    }
+
+    auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};
+    if (false == optional_is_auto_gen.has_value()) {
+        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
+    }


🧹 Nitpick (assertive)

Add comment explaining the unrecognized namespace case.

This section checks if the namespace is recognized and returns an error if not. Adding a comment about why this case should never occur would help maintainability.

if (false == optional_is_auto_gen.has_value()) { + // This should never happen - we should only evaluate columns with recognized namespaces + // If this occurs, it indicates a bug in the column resolution logic return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation}; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (false == m_resolved_column_to_schema_tree_node_ids.contains(col)) {

return AstEvaluationResult::Pruned;

}

auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};

if (false == optional_is_auto_gen.has_value()) {

return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};

}

auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};

if (false == optional_is_auto_gen.has_value()) {

// This should never happen - we should only evaluate columns with recognized namespaces

// If this occurs, it indicates a bug in the column resolution logic

return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};

}

coderabbitai · 2025-05-08T19:20:49Z

+    AstEvaluationResultBitmask evaluation_results{};
+    for (auto const matchable_node_id : matchable_node_ids) {
+        if (false == node_id_value_pairs.contains(matchable_node_id)) {
+            continue;
+        }
+        auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(
+                filter_expr,
+                matchable_node_id,
+                node_id_value_pairs.at(matchable_node_id),
+                schema_tree,
+                m_case_sensitive_match
+        ))};
+        if (AstEvaluationResult::True == evaluation_result) {
+            return AstEvaluationResult::True;
+        }
+        evaluation_results |= evaluation_result;
+    }
+
+    if ((evaluation_results & AstEvaluationResult::False) != 0) {
+        return AstEvaluationResult::False;
+    }
+    return AstEvaluationResult::Pruned;


🧹 Nitpick (assertive)

Consider refactoring common evaluation pattern.

This evaluation pattern appears multiple times in the code (also in evaluate_wildcard_filter). Consider extracting it to a helper method to avoid duplication.

+ [[nodiscard]] auto evaluate_node_ids_against_filter( + clp_s::search::ast::FilterExpr* filter_expr, + std::vector<SchemaTree::Node::id_t> const& matchable_node_ids, + KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs, + SchemaTree const& schema_tree, + bool case_sensitive_match + ) -> outcome_v2::std_result<AstEvaluationResult> { + AstEvaluationResultBitmask evaluation_results{}; + for (auto const matchable_node_id : matchable_node_ids) { + if (false == node_id_value_pairs.contains(matchable_node_id)) { + continue; + } + auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair( + filter_expr, + matchable_node_id, + node_id_value_pairs.at(matchable_node_id), + schema_tree, + case_sensitive_match + ))}; + if (AstEvaluationResult::True == evaluation_result) { + return AstEvaluationResult::True; + } + evaluation_results |= evaluation_result; + } + + if ((evaluation_results & AstEvaluationResult::False) != 0) { + return AstEvaluationResult::False; + } + return AstEvaluationResult::Pruned; + }

Then use it in evaluate_filter_expr:

- AstEvaluationResultBitmask evaluation_results{}; - for (auto const matchable_node_id : matchable_node_ids) { - if (false == node_id_value_pairs.contains(matchable_node_id)) { - continue; - } - auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair( - filter_expr, - matchable_node_id, - node_id_value_pairs.at(matchable_node_id), - schema_tree, - m_case_sensitive_match - ))}; - if (AstEvaluationResult::True == evaluation_result) { - return AstEvaluationResult::True; - } - evaluation_results |= evaluation_result; - } - - if ((evaluation_results & AstEvaluationResult::False) != 0) { - return AstEvaluationResult::False; - } - return AstEvaluationResult::Pruned; + return evaluate_node_ids_against_filter( + filter_expr, + matchable_node_ids, + node_id_value_pairs, + schema_tree, + m_case_sensitive_match + );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

AstEvaluationResultBitmask evaluation_results{};

for (auto const matchable_node_id : matchable_node_ids) {

if (false == node_id_value_pairs.contains(matchable_node_id)) {

continue;

}

auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(

filter_expr,

matchable_node_id,

node_id_value_pairs.at(matchable_node_id),

schema_tree,

m_case_sensitive_match

))};

if (AstEvaluationResult::True == evaluation_result) {

return AstEvaluationResult::True;

}

evaluation_results |= evaluation_result;

}

if ((evaluation_results & AstEvaluationResult::False) != 0) {

return AstEvaluationResult::False;

}

return AstEvaluationResult::Pruned;

return evaluate_node_ids_against_filter(

filter_expr,

matchable_node_ids,

node_id_value_pairs,

schema_tree,

m_case_sensitive_match

);

coderabbitai · 2025-05-08T19:20:49Z

+auto QueryHandlerImpl::evaluate_kv_pair_log_event(KeyValuePairLogEvent const& log_event)
+        -> outcome_v2::std_result<AstEvaluationResult> {
+    if (nullptr == m_query) {
+        return AstEvaluationResult::True;
+    }
+
+    if (m_is_empty_query) {
+        return AstEvaluationResult::False;
+    }
+
+    std::optional<AstEvaluationResult> optional_evaluation_result;
+    m_ast_dfs_stack.clear();
+    push_ast_dfs_stack(OUTCOME_TRYX(AstExprIterator::create(m_query.get())));
+    while (false == m_ast_dfs_stack.empty()) {
+        OUTCOME_TRYV(advance_ast_dfs_evaluation(log_event, optional_evaluation_result));
+    }
+
+    if (false == optional_evaluation_result.has_value()) {
+        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
+    }
+
+    return optional_evaluation_result.value();
+}


💡 Verification agent

🧩 Analysis chain

Verify usage of evaluate_kv_pair_log_event.

The static analysis hints indicate this function is never used. However, based on the PR context, this function should be used by the deserializer to evaluate log events against a query. Please verify that this function is correctly integrated with the IR stream deserializer.

🏁 Script executed:

#!/bin/bash # Search for references to evaluate_kv_pair_log_event in the codebase rg "evaluate_kv_pair_log_event" -A 2 -B 2

Length of output: 3702

🏁 Script executed:

#!/bin/bash # Inspect the implementation of handle_log_event in the IR stream deserializer rg -C 5 "handle_log_event" components/core/src/clp/ffi/ir_stream/Deserializer.cpp

Length of output: 176

🏁 Script executed:

#!/bin/bash # Locate all Deserializer source files in the IR stream directory fd -HI 'Deserializer.*' -t f components/core/src/clp/ffi/ir_stream # Search for handle_log_event in any extension under that directory rg -C 3 "handle_log_event" components/core/src/clp/ffi/ir_stream

Length of output: 3123

🏁 Script executed:

#!/bin/bash # Show context around query handler usage in the deserializer rg -C 10 "m_query_handler" components/core/src/clp/ffi/ir_stream/Deserializer.hpp # Show deserialization and evaluation steps for KV pair log events rg -C 10 "deserialize_ir_unit_kv_pair_log_event" components/core/src/clp/ffi/ir_stream/Deserializer.hpp

Length of output: 7201

🏁 Script executed:

#!/bin/bash # Find all evaluate-related methods in QueryHandler.hpp rg -n "evaluate" components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp # Show the QueryHandler class definition to inspect evaluate() overloads sed -n '1,200p' components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp

Length of output: 4958

🏁 Script executed:

#!/bin/bash # Search for EmptyQueryHandler implementation rg -R "struct EmptyQueryHandler" -A 5 -B 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 494

🏁 Script executed:

#!/bin/bash # Inspect QueryHandlerImpl for evaluate() overloads rg -n "auto evaluate" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp # Search for any evaluate(KeyValuePairLogEvent) definitions rg -n "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp # Locate the definition of the QueryHandlerReq concept rg -n "QueryHandlerReq" -C 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 2908

🏁 Script executed:

#!/bin/bash # Search QueryHandlerImpl.hpp for any evaluate() methods (literal 'evaluate(') rg -F "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp # Search QueryHandler wrapper for evaluate() shorthand rg -F "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp # Locate the IsQueryHandler trait definition to see expected interface rg -R "IsQueryHandler" -C 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 655

Ensure evaluate_kv_pair_log_event is actually invoked by the deserializer

The evaluate_kv_pair_log_event method in QueryHandlerImpl is never called: in the deserializer you’re invoking a non-existent m_query_handler.evaluate(…) instead of the intended evaluate_kv_pair_log_event. Please update the call site so that log events are filtered by the query.

• File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp

Around the IrUnitType::LogEvent case, change:
- if (search::AstEvaluationResult::True - != OUTCOME_TRYX(m_query_handler.evaluate(result.value()))) + if (search::AstEvaluationResult::True + != OUTCOME_TRYX(m_query_handler.evaluate_kv_pair_log_event(result.value())))

• Confirm that EmptyQueryHandler still compiles (it need only satisfy the QueryHandlerReq concept).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

auto QueryHandlerImpl::evaluate_kv_pair_log_event(KeyValuePairLogEvent const& log_event)

-> outcome_v2::std_result<AstEvaluationResult> {

if (nullptr == m_query) {

return AstEvaluationResult::True;

}

if (m_is_empty_query) {

return AstEvaluationResult::False;

}

std::optional<AstEvaluationResult> optional_evaluation_result;

m_ast_dfs_stack.clear();

push_ast_dfs_stack(OUTCOME_TRYX(AstExprIterator::create(m_query.get())));

while (false == m_ast_dfs_stack.empty()) {

OUTCOME_TRYV(advance_ast_dfs_evaluation(log_event, optional_evaluation_result));

}

if (false == optional_evaluation_result.has_value()) {

return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};

}

return optional_evaluation_result.value();

}

--- a/components/core/src/clp/ffi/ir_stream/Deserializer.hpp

+++ b/components/core/src/clp/ffi/ir_stream/Deserializer.hpp

@@ /* around the IrUnitType::LogEvent case */

- if (search::AstEvaluationResult::True

- != OUTCOME_TRYX(m_query_handler.evaluate(result.value())))

+ if (search::AstEvaluationResult::True

+ != OUTCOME_TRYX(m_query_handler.evaluate_kv_pair_log_event(result.value())))

{

// existing handling for non-matching events

}

🧰 Tools

🪛 Cppcheck (2.10-2)

[style] 428-428: The function 'evaluate_kv_pair_log_event' is never used.

(unusedFunction)

coderabbitai · 2025-05-08T19:20:55Z

✅ Actions performed

Reviews paused.

coderabbitai · 2025-05-08T19:20:56Z

✅ Actions performed

Reviews paused.

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

LinZhihao-723 · 2025-05-12T02:39:54Z

Also added test cases for deserializer integration with queries.
Can u help review this? @kirkrodrigues

LinZhihao-723 · 2025-05-12T02:42:15Z

+    auto deserializer_result{
+            make_deserializer(buf_reader, IrUnitHandler{}, std::move(query_handler_result.value()))
+    };


fyi, this is what I mean by #890 (comment).
The type is auto-deduced: no need to explicitly state the type of unit handler and query handler.
@kirkrodrigues

kirkrodrigues · 2025-05-12T05:52:32Z

+    constexpr std::string_view cObjKey{"obj_key"};
+    constexpr std::string_view cUnresolvableKey{"unresolvable_key"};
+    constexpr int cIntBase{INT16_MAX};
+    constexpr std::string_view cTestSubStr{"test"};


Why substring instead of string?

Substring is usually one word, right? So then it should be cTestSubstr.

It's a substring since:

The actual string is not exactly "test" but a string that contains "test"

We wrap it with two wildcards to generate the string query for testing.

True, true. I thought it's sub-string or sub string. mb

kirkrodrigues · 2025-05-12T05:58:16Z

+    CAPTURE(ir_stream_bytes.size());
+
+    auto const wildcard_str_query{fmt::format("*{}*", cTestSubStr)};
+    auto const [kql_query_str, expected_json_pairs] = GENERATE_COPY(


Did you follow some process to come up with these test cases or were they picked at random?

Not randomly picked. Based on the JSON pairs input, I created the following:

String search on non-wildcard columns only matches the flattened ones.

String search on wildcard columns matches all.

Integer comparison on non-wildcard columns only matches the flattened ones.

Integer comparison on wildcard columns also matches inner keys.

A combination of AND on auto-gen and user-gen columns (results should be equivalent to 3).

A combination of OR on auto-gen and user-gen columns (matches all)

A combination of AND with NOT on auto-gen and user-gen columns (results should be equivalent to 5)

Top-level NOT with AND on auto-gen and user-gen columns that matches all.

Unresolvable column that matches nothing.

Search features covered:

Column resolutions on wildcards

Column resolutions on auto-gen/user-gen namespaces

Basic AND/OR logic

Inverter logic (NOT) on filter

Inverter logic (NOT) on the top

Unresolvable columns

In the next push I will also include a simple case to ensure unresolvable columns are not invertible.

I understand that not all supported types (e.g., boolean and float) and filter operations are covered, but they're tested in query hander's unit tests.

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

kirkrodrigues

For the PR title, how about:

feat(kv-ir): Integrate `QueryHandler` into the IR stream `Deserializer`.

…r`. (y-scope#890) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

LinZhihao-723 added 15 commits May 5, 2025 22:12

Add iterative evaluation method

7b203a6

Fix null query issue.

e74e31f

Implement search.

e3636fe

Ops, forget to commit this one.

10008b5

Fix clang-tidy warnings on the tes files.

00e2ecc

Ignore unsupported literal types.

ca87b29

WIP

8f51308

Working on the unit tests...

e7a578b

Add basic

6791912

Merge branch 'oss_main' into kvir-search-query-handler-evaluation

7ea6075

Finish test dev

dd35aa2

WIP deserializer integration

45320b1

Add inverter to the test

bfe5873

Merge branch 'kvir-search-query-handler-evaluation' into deserializer…

3ad1f84

…-integration

Integration with doc string updates.

5d4e9e3

LinZhihao-723 requested a review from a team as a code owner May 8, 2025 19:13

coderabbitai Bot reviewed May 8, 2025

View reviewed changes

LinZhihao-723 commented May 8, 2025

View reviewed changes

coderabbitai Bot reviewed May 8, 2025

View reviewed changes

LinZhihao-723 added 2 commits May 9, 2025 10:57

Fix

f32d575

Fix cmake.

3985819

LinZhihao-723 requested review from gibber9809 and wraymo as code owners May 9, 2025 15:23

Add wrappers for factory.

85a5798

LinZhihao-723 and others added 3 commits May 11, 2025 18:39

Apply suggestions from code review

f4a627f

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Apply code review comments

9f8dc3a

Add unit tests for deserialization integration.

9ee374c

LinZhihao-723 commented May 12, 2025

View reviewed changes

kirkrodrigues requested changes May 12, 2025

View reviewed changes

kirkrodrigues reviewed May 12, 2025

View reviewed changes

LinZhihao-723 and others added 6 commits May 12, 2025 11:24

Apply suggestions from code review

939f0ab

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Apply suggestions from code review

4af0b4a

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Apply code review comments

e6e32e2

Merge y-scope#882.

27b69c5

Merge branch 'oss-main' into deserializer-integration

2cc5050

Add line filter

c4add7b

LinZhihao-723 mentioned this pull request May 15, 2025

feat(clang-tidy): Fix all warnings and errors for all C++ source files. #764

Open

Fix...

c37c224

LinZhihao-723 requested a review from kirkrodrigues May 15, 2025 22:41

LinZhihao-723 mentioned this pull request May 16, 2025

feat(clp-s): Add support for searching KV-pair IR streams through the clp-s cli. #909

Merged

8 tasks

kirkrodrigues requested changes May 16, 2025

View reviewed changes

Comment thread taskfiles/lint.yaml Outdated

Comment thread taskfiles/lint.yaml Outdated

LinZhihao-723 and others added 4 commits May 15, 2025 22:50

Update taskfiles/lint.yaml

e4aec2e

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Update taskfiles/lint.yaml

f66a2ca

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Make line filter as variables

d6b3774

Move comment to the top

97516f4

LinZhihao-723 requested a review from kirkrodrigues May 16, 2025 03:27

Merge branch 'main' into deserializer-integration

eec952b

kirkrodrigues approved these changes May 16, 2025

View reviewed changes

LinZhihao-723 changed the title ~~feat(kv-ir): Integrate QueryHandler into IR stream deserializer.~~ feat(kv-ir): Integrate QueryHandler into the IR stream Deserializer. May 16, 2025

LinZhihao-723 merged commit ecb63cb into y-scope:main May 16, 2025
21 checks passed

LinZhihao-723 mentioned this pull request May 16, 2025

Kv-pair IR stream: Search implementation plan. #840

Closed

coderabbitai Bot mentioned this pull request Aug 26, 2025

feat(kv-ir): Add support for duplicate columns in projections. #1245

Merged

3 tasks

junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026

feat(kv-ir): Integrate QueryHandler into the IR stream `Deserialize…

d7c9b2e

…r`. (y-scope#890) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

	* Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.
	/**
	* Requirements for a query handler that can be used with `clp::ffi::ir_stream::Deserializer`.

Conversation

LinZhihao-723 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

References

Overview

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

LinZhihao-723 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 8, 2025

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 commented May 8, 2025

Uh oh!

LinZhihao-723 May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 8, 2025

Uh oh!

coderabbitai Bot commented May 8, 2025

Uh oh!

LinZhihao-723 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 May 12, 2025

LinZhihao-723 commented May 8, 2025 •

edited

Loading

coderabbitai Bot commented May 8, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

LinZhihao-723 commented May 8, 2025 •

edited

Loading