Skip to content

feat(kv-ir): Implement QueryHandler's factory function and update_partially_resolved_columns.#873

Merged
LinZhihao-723 merged 51 commits into
y-scope:mainfrom
LinZhihao-723:kvir-search-query-handler-impl
May 5, 2025
Merged

feat(kv-ir): Implement QueryHandler's factory function and update_partially_resolved_columns.#873
LinZhihao-723 merged 51 commits into
y-scope:mainfrom
LinZhihao-723:kvir-search-query-handler-impl

Conversation

@LinZhihao-723

@LinZhihao-723 LinZhihao-723 commented Apr 30, 2025

Copy link
Copy Markdown
Member

Description

References

Overview

This PR adds partial implementation for QueryHandler. To avoid making the entire handler as a template class, the actual implementation is in QueryHandlerImpl. The current PR adds implementation for:

  • Factory function
  • update_partially_resolved_columns

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows passed.
  • Add unit tests to cover:
    • Column resolution on search AST.
    • Column resolution on projections.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced advanced query handling and schema tree column resolution for IR stream search, including support for new error codes and detailed error messages.
    • Added new query handler implementation enabling preprocessing, projection, and evaluation of search queries with case sensitivity options.
    • Implemented token iterator and partial resolution mechanisms for schema tree nodes with wildcard support.
    • Provided a factory method to create fully initialized query handlers with projection mapping.
    • Added utilities and classes to generate and manage schema tree column queries and projections.
  • Bug Fixes

    • Improved error reporting by expanding error code coverage for various query and projection failure scenarios.
  • Tests

    • Added comprehensive test suites to verify column resolution and projection handling against complex schema trees.
    • Introduced utilities and helpers to support robust testing of schema tree queries and projections.
  • Documentation

    • Enhanced documentation for new classes, methods, and utility functions related to schema tree search and testing.
  • Refactor

    • Refactored query handler interface to provide full method implementations and improved internal delegation.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner April 30, 2025 23:59
@coderabbitai

coderabbitai Bot commented Apr 30, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change introduces a complete implementation for IR stream search query handling in the clp::ffi::ir_stream::search module. It adds the QueryHandlerImpl class, which manages query preprocessing, projected columns, and partial column resolution for schema trees. The QueryHandler template is refactored to delegate to this new implementation. Error handling is expanded with new error codes. Comprehensive tests and supporting utilities for schema tree queries and projections are also added. Build configuration is updated to include the new source and test files.

Changes

File(s) Change Summary
components/core/CMakeLists.txt Added new source and test files for ffi/ir_stream/search: QueryHandlerImpl.cpp, QueryHandlerImpl.hpp, test/test_QueryHandlerImpl.cpp, test/utils.cpp, test/utils.hpp to the unit test sources list.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
Extended ErrorCodeEnum with eight new error codes related to query preprocessing, column projection, and AST handling. Updated the ErrorCategory::message specialization to provide descriptive messages for each new error code. No changes to exported function signatures.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
Introduced the QueryHandlerImpl class, implementing query preprocessing, projected columns creation, partial resolutions, and column resolution logic. Added a nested ColumnDescriptorTokenIterator class for token-wise traversal of column descriptors. Provided a static factory method, evaluation stub, and methods for updating partial resolutions and handling projections. Deleted copy operations and defaulted move operations.
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp Refactored the QueryHandler template to be a thin wrapper over QueryHandlerImpl. Implemented all methods inline, forwarding to the implementation. Updated the static factory method to accept query, projections, and case sensitivity. Added a private member for the implementation.
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp Added comprehensive Catch2-based tests for QueryHandlerImpl, verifying column resolution and projection handling against a constructed schema tree. Included helper functions for generating KQL expressions, projections, and for serializing expected/actual results.
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
Added utility class ColumnQueryPossibleMatches and related functions to support schema tree column query testing. Implemented methods for filtering and serializing possible matches, generating all valid column queries for a schema tree, and a trivial callback for projections. Provided operator overload for output streaming.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant QueryHandler
    participant QueryHandlerImpl

    Client->>QueryHandler: create(callback, query, projections, case_sensitive)
    QueryHandler->>QueryHandlerImpl: create(query, projections, case_sensitive)
    QueryHandlerImpl->>QueryHandlerImpl: preprocess_query()
    QueryHandlerImpl->>QueryHandlerImpl: create_projected_columns_and_projection_map()
    QueryHandlerImpl->>QueryHandlerImpl: create_initial_partial_resolutions()
    QueryHandlerImpl-->>QueryHandler: QueryHandlerImpl instance
    QueryHandler-->>Client: QueryHandler instance

    Client->>QueryHandler: update_partially_resolved_columns(is_auto_generated, node_locator, node_id)
    QueryHandler->>QueryHandlerImpl: update_partially_resolved_columns(...)

    Client->>QueryHandler: evaluate_node_id_value_pairs(auto_gen_pairs, user_gen_pairs)
    QueryHandler->>QueryHandlerImpl: evaluate_node_id_value_pairs(...)
Loading

Possibly related PRs

  • y-scope/clp#863: Introduced the initial QueryHandler template API with stub methods; this PR completes and implements that interface.
  • y-scope/clp#754: Refactored utility functions like tokenize_column_descriptor into clp_s::search::SearchUtils; this PR depends on those utilities for QueryHandlerImpl implementation.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Still waiting for #856 to merge, but the major logic should be already included in this PR.
  • Still need to work on unit tests.

@gibber9809 I think what's already included so far should be ready for review.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1)

13-22: Consider future-proofing the enum size

ErrorCodeEnum is constrained to uint8_t. We now have 9 values; still safe, but adding more than 256 codes in the long run will silently wrap. If you expect the set to keep expanding, consider switching to uint16_t early to avoid a breaking change later.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)

31-33: Iterator class lacks operator== / operator!=

Providing comparison operators would make the iterator composable with STL algorithms (and future-proofs the design).

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (5)

113-136: Good use of error handling, but the exception catch is broad

Your implementation properly checks for null query and follows the code style of using nullptr == query instead of !query as per guidelines. However, the catch block at line 133 catches all exceptions and returns a generic error code. Consider catching specific exceptions or at least logging the exception message to aid in debugging.

    } catch (std::exception const& ex) {
-        return ErrorCode{ErrorCodeEnum::QueryTransformationPassFailed};
+        // Log or capture exception details for debugging
+        return ErrorCode{ErrorCodeEnum::QueryTransformationPassFailed, ex.what()};

148-164: The duplicate check could be combined with try-emplace

The current implementation uses a separate set to track duplicates, but this could be streamlined using the return value of try_emplace.

-    std::unordered_set<std::string_view> unique_projected_columns;

     for (auto const& [key, types] : projections) {
-        if (unique_projected_columns.contains(key)) {
-            return ErrorCode{ErrorCodeEnum::DuplicateProjectedColumn};
-        }
-        unique_projected_columns.emplace(key);

This is only a suggestion since the current approach with a separate set might be clearer and more maintainable in some contexts.


166-182: Consider separating exception handling for different operations

The current implementation has a single try-catch block that encompasses multiple operations: creating a column descriptor, setting matching types, and checking validity. This makes it difficult to provide specific error messages for different types of failures.

Consider separating the try-catch blocks to handle different operations separately, allowing for more specific error codes or messages.


202-207: Map insertion could be optimized

The code checks if a key exists before inserting, which is an extra lookup operation. Consider using try_emplace directly, which will only insert if the key doesn't exist.

-        if (false == partial_resolutions.contains(SchemaTree::cRootId)) {
-            partial_resolutions.emplace(
-                    SchemaTree::cRootId,
-                    std::vector<QueryHandlerImpl::ColumnDescriptorTokenIterator>{}
-            );
-        }
+        auto [it, inserted] = partial_resolutions.try_emplace(
+                SchemaTree::cRootId,
+                std::vector<QueryHandlerImpl::ColumnDescriptorTokenIterator>{}
+        );

This approach is actually used later in the code at line 262, so this would make the code more consistent.


256-260: Code reuse opportunity for namespace resolution logic

This code duplicates the logic from lines 197-201. Consider extracting this pattern into a small helper function since it's used multiple times.

+    auto& get_resolution_map_for_namespace(
+            std::string_view ns,
+            QueryHandlerImpl::PartialResolutionMap& auto_gen,
+            QueryHandlerImpl::PartialResolutionMap& user_gen) {
+        return (OUTCOME_TRYX(is_auto_generated(ns))) ? auto_gen : user_gen;
+    }

// Then use it like:
-        bool const is_auto_gen{OUTCOME_TRYX(is_auto_generated(col->get_namespace()))};
-        auto& partial_resolutions{
-                is_auto_gen ? auto_gen_namespace_partial_resolutions
-                            : user_gen_namespace_partial_resolutions
-        };
+        auto& partial_resolutions = get_resolution_map_for_namespace(
+                col->get_namespace(),
+                auto_gen_namespace_partial_resolutions,
+                user_gen_namespace_partial_resolutions);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f70d17 and 411bb91.

📒 Files selected for processing (5)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
🧬 Code Graph Analysis (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (9)
  • nodiscard (43-50)
  • nodiscard (56-58)
  • nodiscard (66-71)
  • nodiscard (73-73)
  • nodiscard (75-77)
  • nodiscard (79-81)
  • nodiscard (89-92)
  • query (125-130)
  • auto_gen_node_id_value_pairs (151-154)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (3)
  • is_auto_generated (110-111)
  • is_auto_generated (280-288)
  • is_auto_generated (280-280)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)
  • new_projected_schema_tree_node_callback (33-35)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (9)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1)

20-38: Good coverage of the new error codes

Each newly-added enumerator has a matching human-readable message. 👍

components/core/CMakeLists.txt (1)

480-483: QueryHandlerImpl is only compiled into the unit-test target

QueryHandlerImpl.cpp appears under SOURCE_FILES_unitTest, meaning production code that instantiates QueryHandler will not be able to link against the implementation. If the intention is to ship this component (not merely test it), move the .cpp into the main library target.

Also, adding the header (QueryHandlerImpl.hpp) to the source list is unnecessary; CMake will track it automatically when included.

-        src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
-        src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
+        src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (3)

43-50: Typo in parameter name reduces readability

colum_descriptor is missing an ‘n’. The typo propagates to several places in the class. Renaming improves clarity and avoids future copy-paste errors.

-        static auto create(clp_s::search::ast::ColumnDescriptor* colum_descriptor)
+        static auto create(clp_s::search::ast::ColumnDescriptor* column_descriptor)

[ suggest_nitpick ]


246-248: Unused variable masks potential errors

next_token_it_result is computed but never used.
Besides triggering a warning, ignoring the result swallows a possible ColumnDescriptorTokenIteratorOutOfBound error.

-        auto const next_token_it_result{token_it.next()};
+        // Avoid unused-result and propagate potential error when advancing.
+        OUTCOME_TRY(auto const _unused_next, token_it.next());

[ suggest_essential_refactor ]


220-228: Member m_case_sensitive_search is never read

The flag is set in the constructor but not referenced anywhere in this file.
Remove it or wire it into comparison sites (e.g., token vs. key name comparisons) to honour the caller’s intent.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (4)

156-164: Use of Yoda condition is consistent with guidelines

The code correctly follows the coding guideline of preferring false == expression rather than !expression on line 156.


231-244: Ensure efficient AST traversal

The current DFS implementation uses std::vector as a stack, which works well. However, there's a potential optimization in line 236 where you iterate through all operands. If the order doesn't matter, using a reverse iterator could avoid the need to push operands in reverse order for proper DFS traversal.

The implementation is correct as is, but you might want to consider this optimization if the order of traversal isn't important.


271-274: Good handling of prefix wildcard cases

The code correctly handles the case where a prefix wildcard might match nothing by adding an additional iterator.

The comment at line 272 also provides clear documentation of the intent, which is excellent.


315-320: Properly marked stub implementation

This method is properly implemented as a stub that returns a "method not implemented" error, which is appropriate for a work-in-progress PR. The parameters are annotated with [[maybe_unused]] which is good practice.

Comment thread components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (9)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (2)

18-47: Consider compile-time enforcement that all enumerators are mapped.

The switch covers every enumerator that currently exists, but it will silently fall back to "Unknown error code enum." if a future enumerator is added and the author forgets to update this function.
A small compile-time safeguard can avoid that foot-gun, e.g.

static_assert(
    static_cast<int>(ErrorCodeEnum::UnsupportedNamespace) + 1 ==
        ystdlib::enum_size_v<ErrorCodeEnum>,   // or another meta-func you may have
    "ErrorCode::message() is missing cases for some ErrorCodeEnum values"
);

(or generate the table with magic_enum/nameof if available).
This keeps the mapping exhaustive even as the project evolves.


20-44: Return std::string_view to avoid per-call heap allocations.

message() currently allocates a fresh std::string on every call.
Because the returned texts are static and immutable, you can safely return
std::string_view (or char const*) instead:

-auto ErrorCategory::message(ErrorCodeEnum error_enum) const -> std::string {
+auto ErrorCategory::message(ErrorCodeEnum error_enum) const -> std::string_view {

and change each branch to return "literal";.
This removes dynamic allocation and slightly shrinks the call-site code size without sacrificing usability.

components/core/src/clp/ffi/ir_stream/search/utils.hpp (2)

48-53: Pass filter by reference (or gsl::not_null) instead of raw pointer.

evaluate_filter_against_literal_type_value_pair takes a raw
FilterExpr const*. Within the project this pointer is expected to be non-null,
because a null pointer already maps to ErrorCodeEnum::QueryExpressionIsNull
elsewhere.

Prefer an explicit contract by:

-auto evaluate_filter_against_literal_type_value_pair(
-        clp_s::search::ast::FilterExpr const* filter,
+auto evaluate_filter_against_literal_type_value_pair(
+        clp_s::search::ast::FilterExpr const& filter,

(or not_null<FilterExpr const*>) to make misuse impossible at compile time
and avoid repeated null checks inside the implementation.


4-12: Include guards use relative includes – consider project-wide include paths.

The header reaches into sibling modules via lengthy relative paths:
../../../../clp_s/search/ast/…. This can become fragile when files move.
If possible, expose these headers via the build system’s include-path and switch to
angle-bracket form:

#include <clp_s/search/ast/FilterExpr.hpp>
#include <clp_s/search/ast/Literal.hpp>

Doing so improves readability and portability across IDEs.

components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)

251-251: Unused variable next_token_it_result

auto const next_token_it_result{token_it.next()}; is evaluated but never used, incurring an unnecessary next() call (and potential error handling). Remove the statement or use the result.

components/core/tests/test-ffi_ir_stream_search_utils.cpp (4)

16-29: Prefer project-wide include paths over brittle relative paths

The test currently relies on multi-level ../src/... relative includes.
If the test directory ever moves (or is re-organised by CMake’s add_test), these paths will silently break. Consider exposing src as a public include directory in your build system and switching to angle-bracket includes such as:

#include <clp/ffi/encoding_methods.hpp>
#include <clp/ffi/ir_stream/search/utils.hpp>

This makes the tests location-agnostic and improves maintainability.


145-162: Minor efficiency win in get_encoded_text_ast

  1. dict_vars.reserve(dict_var_bounds.size() / 2);
    Reserving avoids repeated reallocations while building dict_vars.

  2. The inner loop currently constructs substrings via iterator math.
    std::string{text.substr(begin_pos, end_pos - begin_pos)} is clearer and guards against accidental iterator misuse.

Diff sketch:

-    std::vector<std::string> dict_vars;
+    std::vector<std::string> dict_vars;
+    dict_vars.reserve(dict_var_bounds.size() / 2);
...
-        dict_vars.emplace_back(text.cbegin() + begin_pos, text.cbegin() + end_pos);
+        dict_vars.emplace_back(text.substr(begin_pos, end_pos - begin_pos));

499-505: Typo: filer_operand_literal_typefilter_operand_literal_type

The variable is misspelled here and in the subsequent call. While harmless, keeping names consistent aids readability and grep-ability.

-auto const [filter_expr, filer_operand_literal_type]{
+auto const [filter_expr, filter_operand_literal_type]{
 ...
-REQUIRE(check_filter_evaluation(filter_to_test.get(), filer_operand_literal_type));
+REQUIRE(check_filter_evaluation(filter_to_test.get(), filter_operand_literal_type));

304-335: Constraint duplication can be simplified

The requires clause std::is_same_v<...> is repeated on both the forward declaration and the definition of check_filter_evaluation_against_encoded_text_asts. Keeping it only on the declaration (or on a using concept) reduces verbosity:

template <EncodedVariableT encoded_variable_t>
auto check_filter_evaluation_against_encoded_text_asts(...);

where template<typename T> concept EncodedVariableT = std::same_as<...> || ...;

This keeps the intent in one place and avoids accidental divergence if the constraint list changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 411bb91 and d651b1b.

📒 Files selected for processing (9)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (3 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp (1 hunks)
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp
🧬 Code Graph Analysis (1)
components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (2)
  • query (123-128)
  • auto_gen_node_id_value_pairs (149-152)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (5)
  • create (291-312)
  • create (291-295)
  • is_auto_generated (110-111)
  • is_auto_generated (280-288)
  • is_auto_generated (280-280)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (4)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1)

12-25: Verify ABI-stability of the new enumerator ordering.

The new error codes were inserted in the middle of ErrorCodeEnum without explicit
integral values. If any component serialises or persists these values (e.g., across
FFI boundaries or on disk), the implicit renumbering will break compatibility.

  1. Confirm that consumers treat the enum purely as symbolic, or
  2. Assign explicit values (and append new codes at the end) to guarantee stability.
enum class ErrorCodeEnum : uint8_t {
    AstDynamicCastFailure            = 0,
    ColumnDescriptorTokenIteratorOutOfBound,
    /**/
    UnsupportedNamespace,
};

Please double-check downstream usage to avoid subtle mismatches.

components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)

343-349: ⚠️ Potential issue

Incorrect semantics for EXISTS / NEXISTS filters

EXISTS and NEXISTS currently return constant values that do not depend on whether a value is actually present.
As written, EXISTS always succeeds, while NEXISTS can never succeed, which breaks typical query semantics.

-    if (FilterOperation::EXISTS == op) {
-        return true;
-    }
-    if (FilterOperation::NEXISTS == op) {
-        return false;
-    }
+    if (FilterOperation::EXISTS == op) {
+        return value.has_value();
+    }
+    if (FilterOperation::NEXISTS == op) {
+        return false == value.has_value();
+    }

Please adjust the logic (and corresponding tests) so that presence/absence is respected.

⛔ Skipped due to learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp (1)

88-90: match() ignores descriptor token’s own literal‐type constraints

matches_any(schema_tree_node_type_to_literal_types(type)) checks only the column-level type mask, not the token-level constraints that may be present (e.g., array/object indices).
If your design permits token-specific restrictions, consider delegating the check to the token itself or extend the API accordingly.

components/core/src/clp/ffi/ir_stream/search/QueryHandler.hpp (1)

40-57: Factory forwards correctly – no issues found

The wrapper cleanly forwards arguments to QueryHandlerImpl::create using OUTCOME_TRYX, preserving error propagation. Nice!

Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp
Comment thread components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (3)

272-282: Factor out duplicated “unsupported literal-type” branch

The early-exit path that evaluates the filter against an empty optional and expects false is replicated in every helper (*_integral_values, *_bool_values, …).
A small utility such as:

auto filter_matches_empty_value(FilterExpr const* f,
                                LiteralType filter_literal_type,
                                LiteralType value_literal_type) -> bool;

would eliminate ~25 duplicated lines, improving readability and reducing maintenance overhead.


494-499: Typo: filer_operand_literal_typefilter_operand_literal_type

A simple spelling error leaks into the public test output (via CAPTURE).
Renaming improves clarity and avoids confusion for future readers.

-auto const [filter_expr, filer_operand_literal_type]{
+auto const [filter_expr, filter_operand_literal_type]{
         GENERATE_COPY(from_range(filter_and_type_pairs))
 };
 …
-REQUIRE(check_filter_evaluation(filter_to_test.get(), filer_operand_literal_type));
+REQUIRE(check_filter_evaluation(filter_to_test.get(), filter_operand_literal_type));

1-13: File name typo (test_utilis.cpp)

Consider renaming the file to test_utils.cpp for consistency with conventional English spelling and to aid discoverability by IDE search tools.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d651b1b and 172d8a4.

📒 Files selected for processing (2)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp

[error] 420-420: syntax error

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)

Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp
LinZhihao-723 and others added 2 commits May 1, 2025 14:25
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
LinZhihao-723 and others added 2 commits May 5, 2025 13:36
…ndlerImpl.cpp

Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)

64-73: Incomplete Doxygen parameter documentation

The function documentation has an empty @param entry for node_id and is missing documentation for projected_key_path.

/**
 * Trivial implementation of `NewProjectedSchemaTreeNodeCallback` that always return success without
 * doing anything.
 * @param is_auto_generated Whether the node belongs to the auto-generated namespace.
-* @param
+* @param node_id The schema tree node ID receiving the projection.
+* @param projected_key_path The full key-path being projected (unescaped).
 */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4589d and 7005a75.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
🧠 Learnings (1)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (1)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)

69-73:

Details

✅ Verification successful

Review coding guidelines for conditional checks

According to your coding guidelines, conditional checks should prefer false == <expression> rather than !<expression>. While this function doesn't contain explicit conditional checks, ensure this guideline is followed in the implementation when checking if is_auto_generated is false.


🏁 Script executed:

#!/bin/bash
# Check if the implementation of this function follows the coding guidelines
# for conditional checks by looking for the pattern "!is_auto_generated"

grep -r "!is_auto_generated" components/core/src/clp/ffi/ir_stream/search/test/utils.cpp

Length of output: 90


To inspect how is_auto_generated is actually checked in the implementation, let’s locate the function definition and any uses of that parameter:


🏁 Script executed:

#!/bin/bash
# Show the implementation of trivial_new_projected_schema_tree_node_callback
grep -R "trivial_new_projected_schema_tree_node_callback" \
  components/core/src/clp/ffi/ir_stream/search/test/utils.cpp -n -A5 -B5

# List all occurrences of the parameter to see any conditional checks
grep -R "is_auto_generated" \
  components/core/src/clp/ffi/ir_stream/search/test/utils.cpp -n -A2 -B2

Length of output: 916


No conditional checks in trivial_new_projected_schema_tree_node_callback – no action required

The implementation of this function doesn’t reference is_auto_generated (it’s marked [[maybe_unused]] and immediately returns success()), so there are currently no !is_auto_generated checks to replace. No changes are needed here.

Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
@LinZhihao-723 LinZhihao-723 requested a review from gibber9809 May 5, 2025 18:50

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work! PR title also looks good to me.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)

67-67: Missing parameter documentation

This line contains an empty @param tag.

-* @param
+* @param node_id The ID of the schema tree node receiving the projection
+* @param projected_key_path The full key path being projected (unescaped)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)

120-137: Consider flattening chained assignment in if statements

The current implementation combines transformation and null check in single if statements, which couples concerns and makes debugging more challenging.

-if (query = clp_s::search::ast::OrOfAndForm{}.run(query);
-    nullptr != std::dynamic_pointer_cast<EmptyExpr>(query))
-{
-    return query;
-}
+auto transformed{clp_s::search::ast::OrOfAndForm{}.run(query)};
+if (nullptr != std::dynamic_pointer_cast<EmptyExpr>(transformed)) {
+    return transformed;
+}
+query = std::move(transformed);

This pattern also applies to the NarrowTypes transformation below.

components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (3)

294-296: Add a null check after KQL parsing

The KQL parser can return nullptr if parsing fails. Adding a check ensures the test fails early with a clear error message rather than crashing on a null dereference later.

auto query_stream{std::istringstream{kql_query_str}};
auto query{clp_s::search::kql::parse_kql_expression(query_stream)};
+REQUIRE(query != nullptr);  // Verify that KQL parsing succeeded

76-163: Consider refactoring repetitive code blocks in generate_matchable_kql_expressions

The function contains nearly identical code blocks for handling each data type (Int, Float, Bool, Str). Consider using a helper function to reduce duplication:

auto process_type = [&](SchemaTree::Node::Type type, auto ref_value, std::string_view format = "{}") {
    for (auto const& [column_query, possible_matches] : column_query_to_possible_matches) {
        auto const column_query_with_namespace{fmt::format("{}{}", column_namespace, column_query)};
        if (auto const matchable_node_ids{
                    possible_matches.get_matchable_node_ids_from_schema_tree_type(type)
                };
                false == matchable_node_ids.empty())
        {
            matchable_kql_expressions.emplace_back(
                    fmt::format("{}: " + std::string{format}, column_query_with_namespace, ref_value)
            );
            auto [it, inserted] = expected_column_resolutions.try_emplace(
                    column_query_with_namespace,
                    std::set<SchemaTree::Node::id_t>{}
            );
            auto& resolved_node_ids{it->second};
            resolved_node_ids.insert(matchable_node_ids.cbegin(), matchable_node_ids.cend());
        }
    }
};

// Process each type
process_type(SchemaTree::Node::Type::Int, cRefTestInt);
process_type(SchemaTree::Node::Type::Float, cRefTestFloat, "{:.2f}");
process_type(SchemaTree::Node::Type::Bool, cRefTestBool);
process_type(SchemaTree::Node::Type::Str, cRefTestStr);

This reduces duplication and improves maintainability.


228-246: Consider extracting duplicated schema tree setup into a helper function

Both test cases duplicate the same schema tree setup code. Consider extracting this into a helper function:

namespace {
// Returns a schema tree with a predefined structure and the corresponding node locators
[[nodiscard]] auto create_test_schema_tree() 
    -> std::pair<std::shared_ptr<SchemaTree>, std::vector<SchemaTree::NodeLocator>> 
{
    auto schema_tree{std::make_shared<SchemaTree>()};
    std::vector<SchemaTree::NodeLocator> const locators{
            {SchemaTree::cRootId, "a", SchemaTree::Node::Type::Obj},
            {SchemaTree::cRootId, "b", SchemaTree::Node::Type::Int},
            {1, "b", SchemaTree::Node::Type::Obj},
            // ... (other locators)
    };
    for (auto const& locator : locators) {
        schema_tree->insert_node(locator);
    }
    return {schema_tree, locators};
}
}

Then in each test case:

auto [schema_tree, locators] = create_test_schema_tree();

Also applies to: 369-387

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7005a75 and a2868f9.

📒 Files selected for processing (3)
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp
  • components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
🧠 Learnings (2)
components/core/src/clp/ffi/ir_stream/search/test/utils.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp

[style] 323-323: The function 'evaluate_node_id_value_pairs' is never used.

(unusedFunction)

components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp

[error] 207-207: syntax error

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
🔇 Additional comments (2)
components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp (2)

321-328: WIP implementation clearly marked

The stub implementation of evaluate_node_id_value_pairs with the TODO comment appropriately marks this as work-in-progress. This aligns well with the PR objectives of providing a partial implementation.

🧰 Tools
🪛 Cppcheck (2.10-2)

[style] 323-323: The function 'evaluate_node_id_value_pairs' is never used.

(unusedFunction)


157-165: Good use of the coding guideline pattern

Correctly using false == expression pattern rather than !expression as recommended in the coding guidelines.

Comment thread components/core/src/clp/ffi/ir_stream/search/test/utils.hpp
@LinZhihao-723 LinZhihao-723 changed the title feat(kv-ir): Add work-in-progress implementation for QueryHandler. feat(kv-ir): Implement QueryHandler's factory function and update_partially_resolved_columns. May 5, 2025
@LinZhihao-723 LinZhihao-723 merged commit aa689e1 into y-scope:main May 5, 2025
22 checks passed
@LinZhihao-723 LinZhihao-723 deleted the kvir-search-query-handler-impl branch May 5, 2025 21:33
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…partially_resolved_columns`. (y-scope#873)

Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants