Skip to content

feat(kv-ir): Integrate QueryHandler into the IR stream Deserializer.#890

Merged
LinZhihao-723 merged 33 commits into
y-scope:mainfrom
LinZhihao-723:deserializer-integration
May 16, 2025
Merged

feat(kv-ir): Integrate QueryHandler into the IR stream Deserializer.#890
LinZhihao-723 merged 33 commits into
y-scope:mainfrom
LinZhihao-723:deserializer-integration

Conversation

@LinZhihao-723

@LinZhihao-723 LinZhihao-723 commented May 8, 2025

Copy link
Copy Markdown
Member

Description

References

Overview

This PR integrates QueryHandler into Deserializer. QueryHandler is added as a templated parameter to Deserializer, where it can be an instance of clp::ffi::ir_stream::search::QueryHandler as introduced in the previous PRs in this series, or clp::ffi::ir_stream::search::EmptyQueryHandler (an empty class) to perform no query handling. This design ensures the type/functionality of a deserializer is determined statically: it's either designed to read the entire stream, or is configured to read the stream with KQL query filtering or column projections.

The factory function of the deserializer is templated to have two different versions:

  • The first one takes no query handler (which preserves the existing API). The constructed deserializer behaves in the same way as before this PR.
  • The second one explicitly takes a query handler, which returns a deserializer with query handler support.

The C++ typing system will treat the deserializers above as different types. However, if users use the auto keyword for creation, the type can be automatically deduced according to the chosen factory function.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows passed.
  • Add test cases to ensure the integration is functionally correct.

Summary by CodeRabbit

  • New Features

    • Added support for evaluating search queries against key-value pair log events, enabling advanced query evaluation and column projection during deserialization.
    • Introduced new error codes and improved error handling for query evaluation.
  • Bug Fixes

    • Enhanced handling of empty or null query expressions to avoid errors in query processing.
  • Refactor

    • Streamlined and restructured query handler interfaces and internal evaluation logic for improved maintainability and extensibility.
  • Tests

    • Expanded and improved test coverage for query evaluation against log events, including new utility functions and comprehensive test cases.
  • Chores

    • Added and updated utility headers and build configuration to support new query handler features.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner May 8, 2025 19:13
@coderabbitai

coderabbitai Bot commented May 8, 2025

Copy link
Copy Markdown
Contributor

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Walkthrough

This change introduces a comprehensive query evaluation mechanism for IR stream log events using key-value pairs. It adds a new query handler requirement abstraction, implements AST-based query evaluation with bitmask result semantics, and updates the deserializer to optionally use a query handler for filtering and projection. Error codes and test utilities are updated to support the new logic.

Changes

File(s) Change Summary
components/core/CMakeLists.txt Added src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp to the unit test build sources.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp Extended Deserializer with a template parameter for query handlers, factory overloads, and integrated query evaluation in deserialization.
components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp Changed AstEvaluationResult to a plain enum with bitmask values and added a type alias for its underlying type.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp Added new error messages for AST evaluation and removed the obsolete QueryExpressionIsNull error.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp Added new error codes for AST evaluation and removed QueryExpressionIsNull.
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp Replaced evaluate_node_id_value_pairs with evaluate_kv_pair_log_event method in QueryHandler.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp Implemented full AST evaluation logic for queries against key-value pair log events, including helper functions for filter and wildcard evaluation, stack-based DFS, and error handling.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp Refactored QueryHandlerImpl to use a new AST iterator, DFS stack, and added new private methods and members for query evaluation.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp New header defining EmptyQueryHandler, query handler type trait, and concept for compile-time validation.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Added utilities and comprehensive test cases for query evaluation on key-value pair log events, including schema construction and logical query combinations.
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp Removed the implementation of get_encoded_text_ast from the source file.
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp Added an inline templated utility function get_encoded_text_ast for encoding text into ASTs, with supporting includes.

Sequence Diagram(s)

sequenceDiagram
    participant Reader
    participant Deserializer
    participant QueryHandler (optional)
    participant UserHandler

    Reader->>Deserializer: Provide IR stream
    Deserializer->>Reader: Read next IR unit
    alt LogEvent IR unit
        Deserializer->>QueryHandler: Evaluate log event (if QueryHandler is not Empty)
        alt Query matches (True)
            Deserializer->>UserHandler: Invoke handler with log event
        else Query does not match (False/Pruned)
            Deserializer-->>UserHandler: Skip handler invocation
        end
    else SchemaTreeNodeInsertion IR unit
        Deserializer->>QueryHandler: update_partially_resolved_columns (if QueryHandler is not Empty)
        Deserializer->>UserHandler: Invoke handler with schema node
    else Other IR unit
        Deserializer->>UserHandler: Invoke handler as usual
    end
Loading
sequenceDiagram
    participant QueryHandlerImpl
    participant AST
    participant LogEvent

    QueryHandlerImpl->>AST: Start DFS traversal
    loop For each AST node
        alt FilterExpr
            QueryHandlerImpl->>LogEvent: Evaluate filter against node-id-value pairs
            QueryHandlerImpl-->>AST: Return evaluation result
        else AndExpr/OrExpr
            QueryHandlerImpl->>AST: Traverse child expressions (stack-based)
            QueryHandlerImpl-->>AST: Combine results (bitmask logic)
        end
    end
    QueryHandlerImpl-->>Caller: Return final AstEvaluationResult
Loading

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LinZhihao-723

LinZhihao-723 commented May 8, 2025

Copy link
Copy Markdown
Member Author

@coderabbitai pause

@coderabbitai

coderabbitai Bot commented May 8, 2025

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@LinZhihao-723 I noticed you're trying to use the "stop" command, but it's not recognized. If you want to pause the review process, please use @coderabbitai pause instead.

Here are some other helpful commands:

  • @coderabbitai pause - Pause reviews on this PR
  • @coderabbitai resume - Resume paused reviews
  • @coderabbitai review - Trigger an incremental review
  • @coderabbitai resolve - Resolve all CodeRabbit review comments

Is there something specific you'd like me to help with regarding this PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723 I've reviewed the code in QueryHandlerReq.hpp and it looks good! The code is:

  • Well-structured with clear namespace usage
  • Properly documented with informative comments
  • Using modern C++ features like concepts and type traits appropriately
  • Following the coding guidelines (no instances of !<expression> where false == <expression> would be preferred)

The template metaprogramming approach with IsQueryHandler trait and QueryHandlerReq concept is a clean way to enforce type constraints. No issues to report!

@LinZhihao-723

Copy link
Copy Markdown
Member Author

@coderabbitai pause

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No description provided.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5da5b05 and 5d4e9e3.

📒 Files selected for processing (12)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp (8 hunks)
  • components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (5 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (8 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp

[style] 428-428: The function 'evaluate_kv_pair_log_event' is never used.

(unusedFunction)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (14)
components/core/CMakeLists.txt (1)

464-464: Addition of QueryHandlerReq.hpp looks good

The inclusion of the new header file aligns with the PR objectives to support query handling capabilities in the IR stream deserializer.

components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1)

14-15: New error codes for AST evaluation look good

The addition of these three error codes properly supports the enhanced AST evaluation framework mentioned in the PR objectives.

Also applies to: 20-20

components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1)

35-36: Error message for ExpressionTypeUnexpected looks good

The message is concise and clear.

components/core/src/clp/ffi/ir_stream/search/AstEvaluationResult.hpp (2)

11-18: Good design for bitmask operations with enum values

The change from enum class to plain enum with explicit bitmask values enables efficient bitwise operations without explicit casting. This aligns well with the PR objective to implement AST-based query evaluation with bitmask result semantics.


20-20: Helpful type alias for bitmask operations

The introduction of AstEvaluationResultBitmask provides a clear type for working with the bitmask operations throughout the codebase.

components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1)

25-25: Refactoring of utility function looks good.

You've relocated the get_encoded_text_ast function to a header file, which will make it available inline where needed. This is a good approach for templated utility functions that might be used across multiple test files.

components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)

92-102: API update improves interface cohesion.

The change from evaluate_node_id_value_pairs to evaluate_kv_pair_log_event provides a cleaner interface by accepting a complete log event instead of separate pairs. The method now forwards to the corresponding implementation in QueryHandlerImpl.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp (1)

1-44: Design looks good with appropriate type traits and concepts.

This new file provides a clean abstraction for query handler requirements:

  • EmptyQueryHandler serves as a no-op handler for when query evaluation isn't needed
  • IsQueryHandler trait properly identifies valid handler types
  • QueryHandlerReq concept correctly constrains the allowed types

The implementation follows modern C++ practices with type traits and concepts.

components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (2)

96-126: Implementation of templated utility function looks good.

The templated get_encoded_text_ast function provides a clean way to generate test data for encoded text ASTs with appropriate template constraints. The code correctly parses the input text, extracts dictionary variables based on bounds, and constructs the EncodedTextAst with the resulting data.


116-116: 🧹 Nitpick (assertive)

Consider adapting comparison style to match coding guidelines.

According to the coding guidelines, the preferred style is false == <expression> rather than !<expression>. By extension, it might be preferable to use the following style for modulo checks:

-    REQUIRE(((dict_var_bounds.size() % 2) == 0));
+    REQUIRE((0 == (dict_var_bounds.size() % 2)));

Likely an incorrect or invalid review comment.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)

157-158: LGTM: Good fix to handle null query.

The change correctly returns the query directly instead of a null return, which completes the early return path in the function.


345-376: Good implementation of filter evaluation against node-value pairs.

The function correctly handles different evaluation scenarios including type matching, error handling, and conversion between different evaluation result enums.


452-484: Well-implemented AST iterator creation.

The method correctly handles different expression types (AndExpr, OrExpr, FilterExpr) and properly initializes the iterator state.


574-590: Good handling of expression inversion in evaluation results.

The code correctly applies inverted evaluation logic when an expression is marked as inverted, ensuring proper propagation of evaluation results.

Comment on lines +22 to +26
case ErrorCodeEnum::AstEvaluationInvariantViolation:
return "Internal invariant violated during AST evaluation. This indicates a serious "
"bug in the evaluation logic.";
case ErrorCodeEnum::AttemptToIterateAstLeafExpr:
return "Attempted to iterate an leaf expression of an AST.";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fixed grammar in error message

There's a minor grammatical error in the message for AttemptToIterateAstLeafExpr.

        case ErrorCodeEnum::AttemptToIterateAstLeafExpr:
-            return "Attempted to iterate an leaf expression of an AST.";
+            return "Attempted to iterate a leaf expression of an AST.";

struct IsQueryHandler<QueryHandler<NewProjectedSchemaTreeNodeCallbackType>> : std::true_type {};

/**
* Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor typo in documentation.

There's a missing space between "with" and "clp::ffi::ir_stream::Deserializer".

-/**
- * Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.
+/**
+ * Requirements for a query handler that can be used with `clp::ffi::ir_stream::Deserializer`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Requirements for a query handler that can be used with`clp::ffi::ir_stream::Deserializer`.
/**
* Requirements for a query handler that can be used with `clp::ffi::ir_stream::Deserializer`.

Comment on lines +347 to +361
auto push_ast_dfs_stack(AstExprIterator ast_expr_it) -> void {
m_ast_dfs_stack.emplace_back(ast_expr_it, AstEvaluationResultBitmask{});
}

/**
* Pops the AST DFS stack and update the evaluation accordingly:
* - If the stack if not empty, update the parent's evaluation results.
* - Otherwise, update `query_evaluation_results`.
* @param evaluation_result
* @param query_evaluation_result Returns the query evaluation result.
*/
auto pop_ast_dfs_stack_and_update_evaluation_results(
AstEvaluationResult evaluation_result,
std::optional<AstEvaluationResult>& query_evaluation_result
) -> void;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Potential stack growth without upper bound – consider reserving or recycling

push_ast_dfs_stack() blindly emplace_back()s a pair for every DFS step.
For deeply nested queries this can lead to repeated reallocations or even uncontrolled growth if cycles were ever introduced by malformed ASTs.

  1. m_ast_dfs_stack.reserve(<estimated depth>) in the constructor (e.g., number of nodes in the AST) amortises reallocations.
  2. Alternatively recycle the last vector slot after pop_ast_dfs_stack_and_update_evaluation_results() to keep memory footprint constant.

Not mission-critical, but a worthwhile optimisation given query depth can be user-controlled.

Comment on lines +331 to +346
/**
* Evaluates the filter expression against the given kv-pair log event.
* @param filter_expr
* @param log_event
* @return A result containing the evaluation result on success, or an error code indicating the
* failure:
* - ErrorCodeEnum::AstEvaluationInvariantViolation if the underlying column of the filter is
* neither user-generated nor auto-generated.
* - Forwards `evaluate_wildcard_filter`'s return values.
* - Forwards `evaluate_filter_against_node_id_value_pair`'s return values.
*/
[[nodiscard]] auto evaluate_filter_expr(
clp_s::search::ast::FilterExpr* filter_expr,
KeyValuePairLogEvent const& log_event
) -> outcome_v2::std_result<AstEvaluationResult>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Mixed negation styles – follow project guideline for readability

Inside this file both forms of negation appear:

false == token_it.is_last()
!token_it.is_last()

The repository guideline (coding_guidelines tag) asks to prefer false == <expression> over !<expression>.
For consistency (and smoother future grep-based audits) please replace the !expr occurrences in this block – and elsewhere in the header – with the preferred style.

Comment on lines +248 to +257
/**
* @return The underlying expression as `clp_s::search::ast::FilterExpr` if the underlying
* type matches, or nullptr otherwise.
*/
[[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr* {
if (std::holds_alternative<clp_s::search::ast::FilterExpr*>(m_expr)) {
return std::get<clp_s::search::ast::FilterExpr*>(m_expr);
}
return nullptr;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure consistent const-correctness across as_*_expr() helpers

as_and_expr() and as_or_expr() both return const*, preserving immutability of the underlying AST nodes.
as_filter_expr() on the other hand returns a non-const pointer, which is inconsistent and allows callers to mutate the AST while iterating – something that is almost never intended during query evaluation.

-[[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr* {
+[[nodiscard]] auto as_filter_expr() const -> clp_s::search::ast::FilterExpr const* {

If write-access is genuinely required, provide a dedicated non-const accessor; otherwise make it const for safety and API symmetry.

Comment on lines +445 to +447
if (false == optional_evaluation_result.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add comment explaining the invariant violation.

The code checks if evaluation_result has a value but doesn't explain why it might be missing, making it difficult to understand the error condition.

    if (false == optional_evaluation_result.has_value()) {
+       // This should never happen if the AST evaluation logic is correct
+       // It would indicate a bug in the evaluation code
        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (false == optional_evaluation_result.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}
if (false == optional_evaluation_result.has_value()) {
// This should never happen if the AST evaluation logic is correct
// It would indicate a bug in the evaluation code
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

Comment on lines +634 to +637
auto const* or_expr{expr_it.as_or_expr()};
if (nullptr == or_expr) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Improve error message for the invariant violation.

The error code doesn't specify which invariant was violated. A more specific error would aid in debugging.

    auto const* or_expr{expr_it.as_or_expr()};
    if (nullptr == or_expr) {
-       return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
+       return ErrorCode{ErrorCodeEnum::ExpressionTypeUnexpected};
    }

Comment on lines +532 to +539
if (false == m_resolved_column_to_schema_tree_node_ids.contains(col)) {
return AstEvaluationResult::Pruned;
}

auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};
if (false == optional_is_auto_gen.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add comment explaining the unrecognized namespace case.

This section checks if the namespace is recognized and returns an error if not. Adding a comment about why this case should never occur would help maintainability.

    if (false == optional_is_auto_gen.has_value()) {
+       // This should never happen - we should only evaluate columns with recognized namespaces
+       // If this occurs, it indicates a bug in the column resolution logic
        return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (false == m_resolved_column_to_schema_tree_node_ids.contains(col)) {
return AstEvaluationResult::Pruned;
}
auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};
if (false == optional_is_auto_gen.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}
auto const optional_is_auto_gen{is_auto_generated(col->get_namespace())};
if (false == optional_is_auto_gen.has_value()) {
// This should never happen - we should only evaluate columns with recognized namespaces
// If this occurs, it indicates a bug in the column resolution logic
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

Comment on lines +550 to +571
AstEvaluationResultBitmask evaluation_results{};
for (auto const matchable_node_id : matchable_node_ids) {
if (false == node_id_value_pairs.contains(matchable_node_id)) {
continue;
}
auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(
filter_expr,
matchable_node_id,
node_id_value_pairs.at(matchable_node_id),
schema_tree,
m_case_sensitive_match
))};
if (AstEvaluationResult::True == evaluation_result) {
return AstEvaluationResult::True;
}
evaluation_results |= evaluation_result;
}

if ((evaluation_results & AstEvaluationResult::False) != 0) {
return AstEvaluationResult::False;
}
return AstEvaluationResult::Pruned;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider refactoring common evaluation pattern.

This evaluation pattern appears multiple times in the code (also in evaluate_wildcard_filter). Consider extracting it to a helper method to avoid duplication.

+   [[nodiscard]] auto evaluate_node_ids_against_filter(
+       clp_s::search::ast::FilterExpr* filter_expr,
+       std::vector<SchemaTree::Node::id_t> const& matchable_node_ids,
+       KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs,
+       SchemaTree const& schema_tree,
+       bool case_sensitive_match
+   ) -> outcome_v2::std_result<AstEvaluationResult> {
+       AstEvaluationResultBitmask evaluation_results{};
+       for (auto const matchable_node_id : matchable_node_ids) {
+           if (false == node_id_value_pairs.contains(matchable_node_id)) {
+               continue;
+           }
+           auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(
+                   filter_expr,
+                   matchable_node_id,
+                   node_id_value_pairs.at(matchable_node_id),
+                   schema_tree,
+                   case_sensitive_match
+           ))};
+           if (AstEvaluationResult::True == evaluation_result) {
+               return AstEvaluationResult::True;
+           }
+           evaluation_results |= evaluation_result;
+       }
+
+       if ((evaluation_results & AstEvaluationResult::False) != 0) {
+           return AstEvaluationResult::False;
+       }
+       return AstEvaluationResult::Pruned;
+   }

Then use it in evaluate_filter_expr:

-   AstEvaluationResultBitmask evaluation_results{};
-   for (auto const matchable_node_id : matchable_node_ids) {
-       if (false == node_id_value_pairs.contains(matchable_node_id)) {
-           continue;
-       }
-       auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(
-               filter_expr,
-               matchable_node_id,
-               node_id_value_pairs.at(matchable_node_id),
-               schema_tree,
-               m_case_sensitive_match
-       ))};
-       if (AstEvaluationResult::True == evaluation_result) {
-           return AstEvaluationResult::True;
-       }
-       evaluation_results |= evaluation_result;
-   }
-
-   if ((evaluation_results & AstEvaluationResult::False) != 0) {
-       return AstEvaluationResult::False;
-   }
-   return AstEvaluationResult::Pruned;
+   return evaluate_node_ids_against_filter(
+       filter_expr,
+       matchable_node_ids,
+       node_id_value_pairs,
+       schema_tree,
+       m_case_sensitive_match
+   );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AstEvaluationResultBitmask evaluation_results{};
for (auto const matchable_node_id : matchable_node_ids) {
if (false == node_id_value_pairs.contains(matchable_node_id)) {
continue;
}
auto const evaluation_result{OUTCOME_TRYX(evaluate_filter_against_node_id_value_pair(
filter_expr,
matchable_node_id,
node_id_value_pairs.at(matchable_node_id),
schema_tree,
m_case_sensitive_match
))};
if (AstEvaluationResult::True == evaluation_result) {
return AstEvaluationResult::True;
}
evaluation_results |= evaluation_result;
}
if ((evaluation_results & AstEvaluationResult::False) != 0) {
return AstEvaluationResult::False;
}
return AstEvaluationResult::Pruned;
return evaluate_node_ids_against_filter(
filter_expr,
matchable_node_ids,
node_id_value_pairs,
schema_tree,
m_case_sensitive_match
);

Comment on lines +428 to +450
auto QueryHandlerImpl::evaluate_kv_pair_log_event(KeyValuePairLogEvent const& log_event)
-> outcome_v2::std_result<AstEvaluationResult> {
if (nullptr == m_query) {
return AstEvaluationResult::True;
}

if (m_is_empty_query) {
return AstEvaluationResult::False;
}

std::optional<AstEvaluationResult> optional_evaluation_result;
m_ast_dfs_stack.clear();
push_ast_dfs_stack(OUTCOME_TRYX(AstExprIterator::create(m_query.get())));
while (false == m_ast_dfs_stack.empty()) {
OUTCOME_TRYV(advance_ast_dfs_evaluation(log_event, optional_evaluation_result));
}

if (false == optional_evaluation_result.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}

return optional_evaluation_result.value();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify usage of evaluate_kv_pair_log_event.

The static analysis hints indicate this function is never used. However, based on the PR context, this function should be used by the deserializer to evaluate log events against a query. Please verify that this function is correctly integrated with the IR stream deserializer.


🏁 Script executed:

#!/bin/bash
# Search for references to evaluate_kv_pair_log_event in the codebase
rg "evaluate_kv_pair_log_event" -A 2 -B 2

Length of output: 3702


🏁 Script executed:

#!/bin/bash
# Inspect the implementation of handle_log_event in the IR stream deserializer
rg -C 5 "handle_log_event" components/core/src/clp/ffi/ir_stream/Deserializer.cpp

Length of output: 176


🏁 Script executed:

#!/bin/bash
# Locate all Deserializer source files in the IR stream directory
fd -HI 'Deserializer.*' -t f components/core/src/clp/ffi/ir_stream

# Search for handle_log_event in any extension under that directory
rg -C 3 "handle_log_event" components/core/src/clp/ffi/ir_stream

Length of output: 3123


🏁 Script executed:

#!/bin/bash
# Show context around query handler usage in the deserializer
rg -C 10 "m_query_handler" components/core/src/clp/ffi/ir_stream/Deserializer.hpp

# Show deserialization and evaluation steps for KV pair log events
rg -C 10 "deserialize_ir_unit_kv_pair_log_event" components/core/src/clp/ffi/ir_stream/Deserializer.hpp

Length of output: 7201


🏁 Script executed:

#!/bin/bash
# Find all evaluate-related methods in QueryHandler.hpp
rg -n "evaluate" components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp

# Show the QueryHandler class definition to inspect evaluate() overloads
sed -n '1,200p' components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp

Length of output: 4958


🏁 Script executed:

#!/bin/bash
# Search for EmptyQueryHandler implementation
rg -R "struct EmptyQueryHandler" -A 5 -B 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 494


🏁 Script executed:

#!/bin/bash
# Inspect QueryHandlerImpl for evaluate() overloads
rg -n "auto evaluate" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp

# Search for any evaluate(KeyValuePairLogEvent) definitions
rg -n "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp

# Locate the definition of the QueryHandlerReq concept
rg -n "QueryHandlerReq" -C 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 2908


🏁 Script executed:

#!/bin/bash
# Search QueryHandlerImpl.hpp for any evaluate() methods (literal 'evaluate(')
rg -F "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp

# Search QueryHandler wrapper for evaluate() shorthand
rg -F "evaluate(" -C 5 components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp

# Locate the IsQueryHandler trait definition to see expected interface
rg -R "IsQueryHandler" -C 5 components/core/src/clp/ffi/ir_stream/search

Length of output: 655


Ensure evaluate_kv_pair_log_event is actually invoked by the deserializer

The evaluate_kv_pair_log_event method in QueryHandlerImpl is never called: in the deserializer you’re invoking a non-existent m_query_handler.evaluate(…) instead of the intended evaluate_kv_pair_log_event. Please update the call site so that log events are filtered by the query.

• File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp

  • Around the IrUnitType::LogEvent case, change:
    - if (search::AstEvaluationResult::True
    -     != OUTCOME_TRYX(m_query_handler.evaluate(result.value())))
    + if (search::AstEvaluationResult::True
    +     != OUTCOME_TRYX(m_query_handler.evaluate_kv_pair_log_event(result.value())))

• Confirm that EmptyQueryHandler still compiles (it need only satisfy the QueryHandlerReq concept).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto QueryHandlerImpl::evaluate_kv_pair_log_event(KeyValuePairLogEvent const& log_event)
-> outcome_v2::std_result<AstEvaluationResult> {
if (nullptr == m_query) {
return AstEvaluationResult::True;
}
if (m_is_empty_query) {
return AstEvaluationResult::False;
}
std::optional<AstEvaluationResult> optional_evaluation_result;
m_ast_dfs_stack.clear();
push_ast_dfs_stack(OUTCOME_TRYX(AstExprIterator::create(m_query.get())));
while (false == m_ast_dfs_stack.empty()) {
OUTCOME_TRYV(advance_ast_dfs_evaluation(log_event, optional_evaluation_result));
}
if (false == optional_evaluation_result.has_value()) {
return ErrorCode{ErrorCodeEnum::AstEvaluationInvariantViolation};
}
return optional_evaluation_result.value();
}
--- a/components/core/src/clp/ffi/ir_stream/Deserializer.hpp
+++ b/components/core/src/clp/ffi/ir_stream/Deserializer.hpp
@@ /* around the IrUnitType::LogEvent case */
- if (search::AstEvaluationResult::True
- != OUTCOME_TRYX(m_query_handler.evaluate(result.value())))
+ if (search::AstEvaluationResult::True
+ != OUTCOME_TRYX(m_query_handler.evaluate_kv_pair_log_event(result.value())))
{
// existing handling for non-matching events
}
🧰 Tools
🪛 Cppcheck (2.10-2)

[style] 428-428: The function 'evaluate_kv_pair_log_event' is never used.

(unusedFunction)

@coderabbitai

coderabbitai Bot commented May 8, 2025

Copy link
Copy Markdown
Contributor
✅ Actions performed

Reviews paused.

1 similar comment
@coderabbitai

coderabbitai Bot commented May 8, 2025

Copy link
Copy Markdown
Contributor
✅ Actions performed

Reviews paused.

LinZhihao-723 and others added 3 commits May 11, 2025 18:39

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added test cases for deserializer integration with queries.
Can u help review this? @kirkrodrigues

Comment on lines +289 to +291
auto deserializer_result{
make_deserializer(buf_reader, IrUnitHandler{}, std::move(query_handler_result.value()))
};

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, this is what I mean by #890 (comment).
The type is auto-deduced: no need to explicitly state the type of unit handler and query handler.
@kirkrodrigues

Comment thread components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/QueryHandlerReq.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp Outdated
constexpr std::string_view cObjKey{"obj_key"};
constexpr std::string_view cUnresolvableKey{"unresolvable_key"};
constexpr int cIntBase{INT16_MAX};
constexpr std::string_view cTestSubStr{"test"};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why substring instead of string?
  • Substring is usually one word, right? So then it should be cTestSubstr.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It's a substring since:
    • The actual string is not exactly "test" but a string that contains "test"
    • We wrap it with two wildcards to generate the string query for testing.
  • True, true. I thought it's sub-string or sub string. mb

CAPTURE(ir_stream_bytes.size());

auto const wildcard_str_query{fmt::format("*{}*", cTestSubStr)};
auto const [kql_query_str, expected_json_pairs] = GENERATE_COPY(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you follow some process to come up with these test cases or were they picked at random?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not randomly picked. Based on the JSON pairs input, I created the following:

  1. String search on non-wildcard columns only matches the flattened ones.
  2. String search on wildcard columns matches all.
  3. Integer comparison on non-wildcard columns only matches the flattened ones.
  4. Integer comparison on wildcard columns also matches inner keys.
  5. A combination of AND on auto-gen and user-gen columns (results should be equivalent to 3).
  6. A combination of OR on auto-gen and user-gen columns (matches all)
  7. A combination of AND with NOT on auto-gen and user-gen columns (results should be equivalent to 5)
  8. Top-level NOT with AND on auto-gen and user-gen columns that matches all.
  9. Unresolvable column that matches nothing.

Search features covered:

  • Column resolutions on wildcards
  • Column resolutions on auto-gen/user-gen namespaces
  • Basic AND/OR logic
  • Inverter logic (NOT) on filter
  • Inverter logic (NOT) on the top
  • Unresolvable columns

In the next push I will also include a simple case to ensure unresolvable columns are not invertible.

I understand that not all supported types (e.g., boolean and float) and filter operations are covered, but they're tested in query hander's unit tests.

LinZhihao-723 and others added 6 commits May 12, 2025 11:24
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Comment thread taskfiles/lint.yaml Outdated
Comment thread taskfiles/lint.yaml Outdated
LinZhihao-723 and others added 4 commits May 15, 2025 22:50
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(kv-ir): Integrate `QueryHandler` into the IR stream `Deserializer`.

@LinZhihao-723 LinZhihao-723 changed the title feat(kv-ir): Integrate QueryHandler into IR stream deserializer. feat(kv-ir): Integrate QueryHandler into the IR stream Deserializer. May 16, 2025
@LinZhihao-723 LinZhihao-723 merged commit ecb63cb into y-scope:main May 16, 2025
21 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…r`. (y-scope#890)

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants