Skip to content

feat(kv-ir): Add utilities for kv-pair IR stream search.#856

Merged
LinZhihao-723 merged 21 commits into
y-scope:mainfrom
LinZhihao-723:kvir-search-utils
May 2, 2025
Merged

feat(kv-ir): Add utilities for kv-pair IR stream search.#856
LinZhihao-723 merged 21 commits into
y-scope:mainfrom
LinZhihao-723:kvir-search-utils

Conversation

@LinZhihao-723

@LinZhihao-723 LinZhihao-723 commented Apr 25, 2025

Copy link
Copy Markdown
Member

Description

This is one of PR in series #840.

We will reuse the search AST in clp-s to implement kv-pair IR stream search. However, kv-pair IR stream has a different schema tree implementation than clp-s, meaning we will need some special handling for kv-pair IR values and schema tree nodes. This PR introduces the relevant utilities for:

  • Converting a kv-pair IR value to clp-s AST literal types.
  • Converting a kv-pair IR value and its tree node type to a concrete clp-s AST literal type.
  • Evaluating a kv-pair IR value on a clp-s AST filter expression.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows passed.
  • Ensure the newly added code passed clang-tidy checks.
  • Added unit tests to cover filter expression evaluations.

Summary by CodeRabbit

  • New Features
    • Introduced utility functions for evaluating filter expressions on various data types, including support for case-sensitive and wildcard string matching.
    • Added new error codes and descriptive error messages for improved error handling during filter evaluation.
  • Tests
    • Implemented comprehensive unit tests to verify the correctness of filter evaluation logic across multiple data types and operations.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner April 25, 2025 23:12
@coderabbitai

coderabbitai Bot commented Apr 25, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change introduces new utility functions and comprehensive unit tests for evaluating filter expressions against various literal types within the IR stream search module. It adds implementation and declarations for these utilities, expands error code handling with new error cases, and integrates the relevant source and test files into the build system. The new tests validate filter evaluation for integers, floats, booleans, variable-length strings, and encoded text ASTs, ensuring correctness across supported filter operations and literal types.

Changes

File(s) Change Summary
components/core/CMakeLists.txt Updated to include new utility and test source files (utils.cpp, utils.hpp, test_utils.cpp) for IR stream search in the unit test build target.
components/core/src/clp/ffi/ir_stream/search/utils.cpp
components/core/src/clp/ffi/ir_stream/search/utils.hpp
Added new utility functions for evaluating filter expressions against typed values, mapping schema tree node types to literal types, and handling string, numeric, and boolean comparisons. Declared functions for schema type to literal type conversion and filter evaluation with error handling.
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
Added new error codes (EncodedTextAstDecodingFailure, LiteralTypeUnexpected, LiteralTypeUnsupported) and corresponding error messages to support filter evaluation error handling.
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp Introduced comprehensive unit tests for filter evaluation logic, including utility functions and templates to test filter operations against all supported literal types and values. Added a Catch2 test case for systematic validation.

Sequence Diagram(s)

sequenceDiagram
    participant TestCase
    participant FilterExpr
    participant Utils
    participant Value
    participant ErrorCode

    TestCase->>Utils: evaluate_filter_against_literal_type_value_pair(filter, literal_type, value, case_sensitive)
    Utils->>Value: Extract operand(s) based on literal_type
    Utils->>Utils: Dispatch to type-specific evaluation (int, float, bool, string, encoded text)
    Utils->>Utils: Perform filter operation comparison
    alt Error encountered
        Utils->>ErrorCode: Return error code (e.g., DecodingFailure, Unexpected, Unsupported)
    else Success
        Utils-->>TestCase: Return boolean result
    end
Loading

Suggested reviewers

  • gibber9809
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
components/core/CMakeLists.txt (1)

610-610: Nit: keep test-file list alphabetically sorted

For long source lists a consistent ordering eases grepping and reduces merge conflicts. Consider inserting test-ffi_ir_stream_search_utils.cpp in lexicographical position.

components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)

281-297: Obj nodes mapped to NullT may break downstream logic

In schema_tree_node_type_to_literal_types, an object node currently returns LiteralType::NullT.
Down-stream code could interpret this as “the only legal literal is null”, blocking legitimate object literals once they are supported.

If object support is still TODO, returning UnknownT (or at least adding ObjectT behind a feature flag) would avoid silently mis-categorising objects as null.

components/core/src/clp/ffi/ir_stream/search/utils.hpp (1)

37-43: Visibility note: function returns depend on schema-node semantics

evaluate_filter_against_literal_type_value_pair operates on a literal type determined elsewhere.
For callers to use this API safely, please document that:

  1. The caller must supply a literal type consistent with value and the underlying schema node.
  2. The function does not attempt any implicit conversions (e.g., string-to-number).

This clarification will prevent accidental misuse and future bugs.

components/core/tests/test-ffi_ir_stream_search_utils.cpp (1)

146-161: Avoid re-encoding the same string for every test – extract and cache the AST once

get_encoded_text_ast performs a full CLP string encoding every time a value is constructed (lines 314-316).
Given that the reference strings are static test fixtures, the repeated work is unnecessary and slows the test suite.

Consider caching the encoded ASTs once per literal type and re-using them:

-    std::vector<ValueToMatchedFilterOpsPair> const value_to_matched_filter_ops_pairs{
-            {Value{get_encoded_text_ast<encoded_variable_t>(matched)}, {FilterOperation::EQ}},
-            {Value{get_encoded_text_ast<encoded_variable_t>(unmatched)}, {FilterOperation::NEQ}},
-    };
+    static auto const matched_ast   = get_encoded_text_ast<encoded_variable_t>(matched);
+    static auto const unmatched_ast = get_encoded_text_ast<encoded_variable_t>(unmatched);
+
+    std::vector<ValueToMatchedFilterOpsPair> const value_to_matched_filter_ops_pairs{
+            {Value{matched_ast},   {FilterOperation::EQ}},
+            {Value{unmatched_ast}, {FilterOperation::NEQ}},
+    };

This keeps the tests deterministic while shaving noticeable time off larger test suites.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfd81e1 and 85067b7.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp (1 hunks)
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (1)
components/core/CMakeLists.txt (1)

436-438:

Details

❌ Incorrect review comment

Confirm compilation/visibility of the new IR-stream search utilities

utils.cpp/utils.hpp are only appended to SOURCE_FILES_unitTest, which means they will be compiled into the test executable but not into any production (installable) target.
If other components (e.g., future FFI bindings) are expected to link against these helpers, please move the two files into the appropriate library target (or create a new one) and link that target both to unitTest and to the consuming binaries.


🏁 Script executed:

#!/bin/bash
set -e

# Show occurrences of SOURCE_FILES_unitTest in the CMakeLists to verify where utils.cpp/hpp are added
rg -n "SOURCE_FILES_unitTest" -C5 components/core/CMakeLists.txt

Length of output: 816


🏁 Script executed:

#!/bin/bash
set -e

# Locate where the IR-stream files are referenced
rg -n "ffi/ir_stream" -C3 components/core/CMakeLists.txt

# Show context around lines 430-460 to inspect where these files are added
sed -n '430,460p' components/core/CMakeLists.txt

Length of output: 2847


Ignore suggestion: IR-stream utilities are already in the production library

The files src/clp/ffi/ir_stream/search/utils.cpp/.hpp and src/clp/ffi/ir_stream/utils.cpp/.hpp appear in the main FFI source list in components/core/CMakeLists.txt (lines 417–442), not just under SOURCE_FILES_unitTest. They are compiled into the production library and available to consumers. No further changes needed.

Likely an incorrect or invalid review comment.

Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp Outdated
Comment thread components/core/tests/test-ffi_ir_stream_search_utils.cpp
@LinZhihao-723 LinZhihao-723 changed the title feat(ffi): Add utility code for kv-pair IR stream search. feat(kv-ir): Add utility code for kv-pair IR stream search. Apr 27, 2025

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to read through tests before finishing this review, but so far don't have any real concerns. Just left one small comment in utils.cpp.

Comment on lines +373 to +378
case LiteralType::ArrayT:
case LiteralType::NullT:
case LiteralType::EpochDateT:
case LiteralType::UnknownT:
default:
return false;

@gibber9809 gibber9809 Apr 28, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its worth adding a comment to differentiate these cases?

E.g. we'll implement search for LiteralType::ArrayT at some point (maybe in this function, maybe somewhere else), but we won't for the other three cases for various reasons.

NullT because this branch should be unreachable by construction (should always be an EXISTS/NEXISTS filter), EpochDateT because schema_tree_node_type_value_pair_to_literal_type only returns (and only has to return) VarStringT and ClpStringT for string types, and UnknownT because it should again be unreachable because column resolution should have failed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about let's merge #863 first, and we can then change this function to return a result instead. For these currently not supported types, we return ErrorCodeEnum::NotImplemented

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change evaluate_filter_against_literal_type_value_pair to return a result instead. The details are added to the docstring.

  • For internal errors, we return LiteralTypeUnexpected.
  • For currently unsupported types, we return LiteralTypeUnsupported.
    • I understand the concern for returning an error for array type; I think we can add a special handling to ignore this error on the caller end.

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions to make the docstrings in the tests more clear.

I'd consider expanding a bit more on the check_filter_evaluation_against_X docstrings as well to explain a bit about how they work, but it's not completely necessary and I don't have a great way of writing it up so I'm not going to block approval based on that.

Comment thread components/core/tests/test-ffi_ir_stream_search_utils.cpp Outdated
Comment thread components/core/tests/test-ffi_ir_stream_search_utils.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)

342-348: Implementation aligns with project's EXISTS/NEXISTS semantics.

As noted in the retrieved learnings, EXISTS always returns true and NEXISTS always returns false in this codebase. This implementation correctly follows that convention rather than checking value.has_value().

🧹 Nitpick comments (1)
components/core/src/clp/ffi/ir_stream/search/utils.cpp (1)

374-379: Consider adding a comment about literal type support.

Based on the past review comments, it would be helpful to add a comment explaining which literal types are planned for future implementation versus those that are intentionally unsupported.

    case LiteralType::EpochDateT:
    case LiteralType::ArrayT:
+        // ArrayT will be implemented in a future update, while EpochDateT is handled differently
+        // as schema_tree_node_type_value_pair_to_literal_type returns specific string types
        return ErrorCode{ErrorCodeEnum::LiteralTypeUnsupported};
    case LiteralType::NullT:
    case LiteralType::UnknownT:
    default:
+        // NullT should be unreachable (EXISTS/NEXISTS filters only)
+        // UnknownT indicates column resolution failure
        return ErrorCode{ErrorCodeEnum::LiteralTypeUnexpected};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcb0a1b and 93355cf.

📒 Files selected for processing (6)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp (1 hunks)
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.hpp
🚧 Files skipped from review as they are similar to previous changes (3)
  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
🧠 Learnings (1)
components/core/src/clp/ffi/ir_stream/search/utils.cpp (2)
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (5)
components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1)

20-23: Clear error messages for the new error codes.

The messages for the two new error codes are clear and descriptive, appropriately explaining when these conditions would occur during IR stream search filtering.

components/core/src/clp/ffi/ir_stream/search/utils.cpp (4)

37-38: Good use of C++ concepts to constrain template parameters.

Using std::same_as concept ensures the template is only instantiated for the intended numerical types, preventing accidental misuse with incompatible types.


170-175: Adheres to project coding guideline for negation.

Using false == wildcard_match_unsafe(...) rather than negating the result follows the project's coding guideline that prefers false == <expression> over !<expression>.


187-188: Consistent use of false comparison style.

The comparison false == operand->as_int(...) follows the project's coding guideline that prefers false == <expression> over !<expression>.


291-292: Correctly includes multiple string literal types.

Properly handles the mapping from the Str schema tree node type to multiple supported literal types using bitwise OR, which allows for flexible matching of string representations.

Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp
@LinZhihao-723 LinZhihao-723 requested a review from gibber9809 May 1, 2025 02:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
components/core/tests/test-ffi_ir_stream_search_utils.cpp (5)

109-109: Fix the typo in the function comment.

The function comment contains a grammatical error - it should be "Checks filter evaluation against variable strings" instead of "Checks filter evaluation against and variable strings".

- * Checks filter evaluation against and variable strings.
+ * Checks filter evaluation against variable strings.

120-120: Fix the typo in the function comment.

Similar to the previous comment, the function comment contains a grammatical error - it should be "Checks filter evaluation against encoded text ASTs" instead of "Checks filter evaluation against and encoded text ASTs".

- * Checks filter evaluation against and encoded text ASTs (Clp strings).
+ * Checks filter evaluation against encoded text ASTs (Clp strings).

289-291: Consider optimizing string concatenation.

Creating two temporary strings and then concatenating them could be optimized by creating a single string with the combined content.

- {Value{std::string{cRefTestString}}, {FilterOperation::EQ}},
- {Value{std::string{cRefTestString} + std::string{cRefTestString}},
-  {FilterOperation::EQ}},
+ {Value{std::string{cRefTestString}}, {FilterOperation::EQ}},
+ {Value{std::string{cRefTestString.data(), cRefTestString.size() * 2}},
+  {FilterOperation::EQ}},

310-321: Consider reducing code duplication.

This pattern of checking for mismatched literal types and returning a predefined result is duplicated across multiple check functions. Consider extracting it to a helper function.

+ /**
+  * Helper function to check if the literal type matches and return false if not.
+  */
+ [[nodiscard]] auto check_literal_type_mismatch(
+        FilterExpr const* filter_to_test,
+        LiteralType expected_type,
+        LiteralType actual_type
+ ) -> bool {
+     if (expected_type != actual_type) {
+         std::optional<Value> const empty_value_to_evaluate;
+         auto const actual_match_result{evaluate_filter_against_literal_type_value_pair(
+                filter_to_test,
+                actual_type,
+                empty_value_to_evaluate,
+                false
+         )};
+         REQUIRE_FALSE(actual_match_result.has_error());
+         return false == actual_match_result.value();
+     }
+     return true;
+ }

Then use it in each check function:

template <typename encoded_variable_t>
requires std::is_same_v<encoded_variable_t, eight_byte_encoded_variable_t>
         || std::is_same_v<encoded_variable_t, four_byte_encoded_variable_t>
auto check_filter_evaluation_against_encoded_text_asts(
        FilterExpr const* filter_to_test,
        LiteralType filter_operand_literal_type
) -> bool {
-    if (LiteralType::ClpStringT != filter_operand_literal_type) {
-        std::optional<Value> const empty_value_to_evaluate;
-        auto const actual_match_result{evaluate_filter_against_literal_type_value_pair(
-                filter_to_test,
-                filter_operand_literal_type,
-                empty_value_to_evaluate,
-                false
-        )};
-        REQUIRE_FALSE(actual_match_result.has_error());
-        return false
-               == actual_match_result.value();
+    if (!check_literal_type_mismatch(filter_to_test, LiteralType::ClpStringT, filter_operand_literal_type)) {
+        return false;
    }

367-419: Consider simplifying validation logic pattern.

The repeated pattern of checking return values against false followed by a WARN message and return statement could be simplified using a helper function or macro to reduce repetition and improve readability.

+ /**
+  * Helper macro to check function result and issue warning on failure.
+  */
+ #define CHECK_EVAL_OR_WARN(evalFunc, failMsg) \
+     if (false == (evalFunc)) { \
+         WARN(failMsg); \
+         return false; \
+     }

auto
check_filter_evaluation(FilterExpr const* filter_to_test, LiteralType filter_operand_literal_type)
        -> bool {
    // ... existing code ...

-    if (false
-        == check_filter_evaluation_against_integral_values<value_int_t>(
-                filter_to_test,
-                filter_operand_literal_type
-        ))
-    {
-        WARN("Failed to evaluate filter against integers.");
-        return false;
-    }
+    CHECK_EVAL_OR_WARN(
+        check_filter_evaluation_against_integral_values<value_int_t>(
+            filter_to_test, 
+            filter_operand_literal_type
+        ),
+        "Failed to evaluate filter against integers."
+    );

    // ... apply to other similar blocks ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6287f0 and 45f949b.

📒 Files selected for processing (1)
  • components/core/tests/test-ffi_ir_stream_search_utils.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/tests/test-ffi_ir_stream_search_utils.cpp
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (2)
components/core/tests/test-ffi_ir_stream_search_utils.cpp (2)

261-261: The code follows the required coding guideline.

The expression false == filter_operand follows the project's coding guideline to prefer false == <expression> rather than !<expression>. This is consistent across the codebase.


425-505: The test coverage is comprehensive.

The test case thoroughly covers all filter operation types (EXISTS, NEXISTS, EQ, NEQ, LT, LTE, GT, GTE) across all supported literal types (Integer, Float, Boolean, VarString, ClpString). The approach of using GENERATE_COPY to create test cases for each filter expression and type combination ensures good coverage while keeping the test code concise.

gibber9809
gibber9809 previously approved these changes May 1, 2025

@gibber9809 gibber9809 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. PR title is fine as well.

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor suggestions.

I think I'm missing some context about the TODOs for Literal::ObjectT. Could you explain briefly?

Comment thread components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.cpp
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.hpp Outdated
Comment thread components/core/src/clp/ffi/ir_stream/search/utils.hpp Outdated
LinZhihao-723 and others added 2 commits May 1, 2025 14:27
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@LinZhihao-723

Copy link
Copy Markdown
Member Author

Some minor suggestions.

I think I'm missing some context about the TODOs for Literal::ObjectT. Could you explain briefly?

It's in this issue: #588
I will add a response to include this TODO

@gibber9809

Copy link
Copy Markdown
Contributor

Some minor suggestions.

I think I'm missing some context about the TODOs for Literal::ObjectT. Could you explain briefly?

Generally both here and in clp-s search for empty objects {} doesn't work because we don't have an object literal type that column resolution can match against the object type in the schema tree.

E.g. for the object {"a":{}} the query a:* wouldn't work.

This github issue tracks the problem.

kirkrodrigues
kirkrodrigues previously approved these changes May 1, 2025

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed basics. For the PR title, instead of "utility code", how about "utilities"?

@LinZhihao-723 LinZhihao-723 changed the title feat(kv-ir): Add utility code for kv-pair IR stream search. feat(kv-ir): Add utilities for kv-pair IR stream search. May 1, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1)

1-1: Typo in file name reduces discoverability

The file is named test_utilis.cpp; consider renaming to test_utils.cpp (or test_utilities.cpp) to match the common spelling of “utilities”. This improves searchability and consistency with the rest of the codebase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45f949b and 7dbc327.

📒 Files selected for processing (5)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • components/core/src/clp/ffi/ir_stream/search/ErrorCode.cpp
  • components/core/CMakeLists.txt
  • components/core/src/clp/ffi/ir_stream/search/utils.hpp
  • components/core/src/clp/ffi/ir_stream/search/utils.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (1)
  • static_cast (241-241)
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp

[error] 419-419: syntax error

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (ubuntu-latest)

Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (4)

429-432: Prefer direct boolean assertions for pointer checks

The current code uses REQUIRE_FALSE((nullptr == ref_ptr)) pattern for multiple pointer validity checks. While this works, consider using a more direct approach:

-REQUIRE_FALSE((nullptr == ref_str_literal));
-REQUIRE_FALSE((nullptr == ref_bool_literal));
-REQUIRE_FALSE((nullptr == ref_int_literal));
-REQUIRE_FALSE((nullptr == ref_float_literal));
+REQUIRE(nullptr != ref_str_literal);
+REQUIRE(nullptr != ref_bool_literal);
+REQUIRE(nullptr != ref_int_literal);
+REQUIRE(nullptr != ref_float_literal);

This makes the assertion's intent clearer and eliminates unnecessary parentheses.


437-438: Consistent style with other pointer validity checks

Apply the same suggestion as with the other pointer checks:

-REQUIRE_FALSE((nullptr == column_descriptor));
+REQUIRE(nullptr != column_descriptor);

496-497: Consistent style with other pointer validity checks

Apply the same suggestion as with the other pointer checks:

-REQUIRE_FALSE((nullptr == filter_to_test));
+REQUIRE(nullptr != filter_to_test);

152-159: Consider adding additional bounds validation

The code uses dict_var_bounds indices directly for string slicing. While there's a check that ensures there's an even number of bounds, consider adding a range check to ensure the bounds are within the text's limits before accessing them:

REQUIRE(((dict_var_bounds.size() % 2) == 0));

std::vector<std::string> dict_vars;
for (size_t i{0}; i < dict_var_bounds.size(); i += 2) {
    auto const begin_pos{static_cast<size_t>(dict_var_bounds[i])};
    auto const end_pos{static_cast<size_t>(dict_var_bounds[i + 1])};
+   REQUIRE(begin_pos <= end_pos);
+   REQUIRE(end_pos <= text.size());
    dict_vars.emplace_back(text.cbegin() + begin_pos, text.cbegin() + end_pos);
}

This would catch potential issues with invalid boundaries.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7dbc327 and 1ec7d92.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (1)
  • static_cast (241-241)
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp

[error] 419-419: syntax error

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (4)
components/core/src/clp/ffi/ir_stream/search/test/test_utilis.cpp (4)

493-494: Code looks good, variable name fixed

The structured binding declaration correctly uses assignment syntax with = and the previously identified typo in the variable name has been fixed correctly.


419-498: Well-structured test case with good coverage

The test case provides comprehensive coverage of filter evaluation against different literal types and operations. The test data setup is clear and the systematic generation of test cases ensures all combinations are tested. The code is well-structured and follows good testing practices.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 419-419: syntax error

(syntaxError)


1-29: Comprehensive header inclusion

All necessary headers are included with a clean organization. The code follows good practices by including standard library headers first, followed by third-party and project-specific headers.


361-413: Good error reporting with descriptive warnings

The check_filter_evaluation function provides clear warning messages when filter evaluation fails, making test failures easier to understand and debug.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (2)

61-64: Introduce a reusable concept to avoid duplicated requires clauses

The same std::is_same_v<encoded_variable_t, …> constraint appears twice (lines 61-64 and 145-146). Extracting it into a named concept simplifies the declaration and keeps the intent DRY.

+#include <concepts>   // already present
+
+template<typename T>
+concept EncodedVariable =
+       std::same_as<T, eight_byte_encoded_variable_t>
+    || std::same_as<T, four_byte_encoded_variable_t>;
 
-template <typename encoded_variable_t>
-requires std::is_same_v<encoded_variable_t, eight_byte_encoded_variable_t>
-         || std::is_same_v<encoded_variable_t, four_byte_encoded_variable_t>
+template <EncodedVariable encoded_variable_t>

Apply the same EncodedVariable concept where the second duplicate constraint occurs.
This removes boiler-plate and makes future maintenance easier.

Also applies to: 145-146


283-289: Consider widening negative-test coverage for var-strings

At the moment only a prefix substring ("te") is evaluated against the NEQ operation. Adding a couple of additional non-matching scenarios (e.g., an empty string and a completely unrelated token) would strengthen the assertion that wildcard handling behaves as expected for all corner cases.

This is strictly optional, but helps future contributors understand the intended semantics.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ec7d92 and 79f6bca.

📒 Files selected for processing (2)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (1)
  • static_cast (241-241)
🪛 Cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp

[error] 419-419: syntax error

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (1)
components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp (1)

419-498: Great test coverage!

The generated test matrix systematically walks through all filter operations and literal types, providing solid confidence in the new evaluation utilities. Nice use of GENERATE_COPY to keep the code concise.

No issues found here.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 419-419: syntax error

(syntaxError)

Comment thread components/core/src/clp/ffi/ir_stream/search/test/test_utils.cpp
@LinZhihao-723 LinZhihao-723 merged commit 9eb8158 into y-scope:main May 2, 2025
20 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants