Skip to content

fix(clp-s): Handle pure wildcards and unexpected literal types correctly in EvaluateTimestampIndex (fixes #1096).#1277

Merged
gibber9809 merged 11 commits into
y-scope:mainfrom
gibber9809:fix-1096
Sep 3, 2025
Merged

fix(clp-s): Handle pure wildcards and unexpected literal types correctly in EvaluateTimestampIndex (fixes #1096).#1277
gibber9809 merged 11 commits into
y-scope:mainfrom
gibber9809:fix-1096

Conversation

@gibber9809

@gibber9809 gibber9809 commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Description

This PR fixes several crashes in EvaluateTimestampIndex that were caused by the assumptions that:

  1. Filters reaching the evaluation code would always have a literal
  2. That literal would always be of integral type
    which on the surface appear valid, because the pass is designed to run after type narrowing, meaning that the column must actually be able to match integral types.

These assumptions are untrue in the following scenarios:

  1. The literal is a pure wildcard and can actually match any type (the literal is a StringLiteral in this case)
  2. The literal is null and has been used in a clause such as NOT ts: null
  3. The filter has no literal because it is an EXISTS or NEXISTS filter

The code has been updated to:

  1. Not attempt to evaluate timestamp filters when the operation is EXISTS/NEXISTS
  2. Not attempt to evaluate timestamp filters when there is no literal
  3. Handle evaluation when the literal is a pure wildcard
  4. Verify that the literal is actually an integral in the expected case

We also add a test case that exercises the behaviour that caused a crash prior to this PR.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • New Features

    • Improved timestamp-based search evaluation with wildcard-aware matching and refined EXISTS/NEXISTS behaviour.
    • Test utilities accept an optional timestamp key (can be omitted) to customise archive compression inputs.
  • Bug Fixes

    • Safer handling of missing/invalid filter operands; indeterminate cases now return Unknown.
    • More robust numeric evaluation and correct inversion of filter results.
  • Tests

    • Updated tests to pass the new optional parameter and added cases for escaped/wildcard queries.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners August 29, 2025 20:11
@coderabbitai

coderabbitai Bot commented Aug 29, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds wildcard-aware and null-operand guards plus safe numeric extraction and inversion-aware evaluation to timestamp-index filter logic; extends test utility compress_archive to accept an optional timestamp_key and updates tests to pass std::nullopt or a concrete key; adjusts one search test case (escaped varstring + new query).

Changes

Cohort / File(s) Change Summary
Search filter evaluation
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
Add FilterOperation include/alias; skip unresolved/wildcard descriptors; return Unknown for EXISTS/NEXISTS and null operands; detect pure-wildcard literals and handle EQ/NEQ with inversion; replace unsafe static_cast extraction with safe numeric extraction (int64_t or double) and evaluate accordingly; propagate Unknown when inconclusive.
Test utility signature update
components/core/tests/clp_s_test_utils.hpp, components/core/tests/clp_s_test_utils.cpp
Add #include <optional> and new std::optional<std::string> timestamp_key parameter (after archive_directory); forward value to parser_option.timestamp_key when present.
Tests updated for new utility signature
components/core/tests/test-clp_s-delta-encode-log-order.cpp, components/core/tests/test-clp_s-end_to_end.cpp, components/core/tests/test-clp_s-range_index.cpp
Add #include <optional> and pass std::nullopt as the new third argument to compress_archive; shift subsequent boolean argument positions accordingly.
Search tests extended
components/core/tests/test-clp_s-search.cpp
Add #include <optional>; pass a concrete timestamp/index key (e.g., "idx") into compress_archive; replace ambiguous "a*e" test with escaped "a\\*e" and add new query idx: * AND NOT idx: null AND idx: 0 expecting {0}.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Q as Query Engine
  participant E as EvaluateTimestampIndex
  participant F as FilterExpr
  participant R as RangeIndex

  Q->>E: evaluate(filter_expr, range_it)
  E->>F: get_operation(), get_operand()
  alt EXISTS / NEXISTS or operand == null
    E-->>Q: Unknown
  else descriptor is unresolved/wildcard
    E-->>Q: skip / non-match
  else operand is pure wildcard
    alt op == EQ
      E-->>Q: True (apply inversion if needed)
    else op == NEQ
      E-->>Q: False (apply inversion if needed)
    else
      E-->>Q: Unknown
    end
  else try numeric extraction
    E->>F: literal->as_int()
    alt int success
      E->>R: evaluate(op, int64)
      R-->>E: True/False/Unknown
      E-->>Q: result (apply inversion)
    else try float
      E->>F: literal->as_float()
      alt float success
        E->>R: evaluate(op, double)
        R-->>E: True/False/Unknown
        E-->>Q: result (apply inversion)
      else
        E-->>Q: Unknown
      end
    end
  end
  note right of E: Unknown is propagated unchanged
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • wraymo
  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d957968 and f33b147.

📒 Files selected for processing (1)
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧠 Learnings (2)
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
  • FilterExpr (101-110)
components/core/src/clp_s/search/ast/Integral.hpp (3)
  • ret (67-67)
  • ret (69-69)
  • ret (71-71)
components/core/src/clp_s/search/ast/Literal.hpp (9)
  • ret (90-90)
  • ret (90-90)
  • ret (92-92)
  • ret (92-92)
  • ret (94-94)
  • ret (94-94)
  • ret (96-96)
  • ret (96-96)
  • ret (98-98)
🔇 Additional comments (2)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (2)

6-6: Good: explicit FilterOperation dependency and alias.

The include and using-alias improve clarity and avoid ADL surprises.

Also applies to: 14-14


99-104: Float branch is valid: TimestampEntry::evaluate_filter(FilterOperation,double) exists, so both int_literal and float_literal cases invoke the same overload and behave correctly—no change needed.

Likely an incorrect or invalid review comment.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
components/core/tests/clp_s_test_utils.hpp (1)

16-21: Update docs for new parameter; consider a default.

  • The Doxygen block is missing the new timestamp_key and the param order no longer matches the signature.
  • Optional: add a default value (= std::nullopt) to ease call sites in tests.

Apply:

 /**
  * Compresses a file into an archive directory according to a given set of configuration options.
@@
  * @param file_path
  * @param archive_directory
+ * @param timestamp_key Optional name of the timestamp field in the input; when unset, default
+ *                      timestamp handling is used.
  * @param single_file_archive
  * @param structurize_arrays
  * @param file_type
  * @return Statistics for every compressed archive.
  */
 [[nodiscard]] auto compress_archive(
         std::string const& file_path,
         std::string const& archive_directory,
-        std::optional<std::string> timestamp_key,
+        std::optional<std::string> timestamp_key,
         bool single_file_archive,
         bool structurize_arrays,
         clp_s::FileType file_type
 ) -> std::vector<clp_s::ArchiveStats>;

Also applies to: 26-30

components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)

25-37: Typo: any_unkownany_unknown.
Purely cosmetic but worth fixing for readability.

-        bool any_unkown = false;
+        bool any_unknown = false;
@@
-            } else if (ret == EvaluatedValue::Unknown) {
-                any_unkown = true;
+            } else if (ret == EvaluatedValue::Unknown) {
+                any_unknown = true;
@@
-        if (any_unkown) {
+        if (any_unknown) {

Repeat similarly in the AND-branch.

Also applies to: 42-55

components/core/tests/test-clp_s-search.cpp (1)

175-175: Make timestamp-eval assertion easier to debug.

Capture the evaluation result before asserting; helps diagnose False/Unknown outcomes across archives.

-        REQUIRE(clp_s::EvaluatedValue::False != timestamp_index_pass.run(archive_expr));
+        auto const ts_eval = timestamp_index_pass.run(archive_expr);
+        REQUIRE(clp_s::EvaluatedValue::False != ts_eval);

Please confirm that for the newly added query (idx: * AND NOT idx: null AND idx: 0), EvaluateTimestampIndex returns Unknown (not False) after this PR’s changes, consistent with the retrieved learning on EXISTS/NEXISTS semantics.

components/core/tests/clp_s_test_utils.cpp (1)

17-21: Minor API ergonomics: consider passing by const-ref.

std::optional<std::string> by value copies the string when engaged. Consider std::optional<std::string> const& in both header and impl if you want to avoid copies (low impact in tests).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9f87a42 and 8056b96.

📒 Files selected for processing (7)
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (2 hunks)
  • components/core/tests/clp_s_test_utils.cpp (3 hunks)
  • components/core/tests/clp_s_test_utils.hpp (2 hunks)
  • components/core/tests/test-clp_s-delta-encode-log-order.cpp (2 hunks)
  • components/core/tests/test-clp_s-end_to_end.cpp (2 hunks)
  • components/core/tests/test-clp_s-range_index.cpp (2 hunks)
  • components/core/tests/test-clp_s-search.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
  • components/core/tests/clp_s_test_utils.cpp
  • components/core/tests/test-clp_s-range_index.cpp
  • components/core/tests/clp_s_test_utils.hpp
  • components/core/tests/test-clp_s-delta-encode-log-order.cpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/tests/test-clp_s-end_to_end.cpp
🧠 Learnings (7)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
  • components/core/tests/test-clp_s-range_index.cpp
  • components/core/tests/clp_s_test_utils.hpp
  • components/core/tests/test-clp_s-delta-encode-log-order.cpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/tests/test-clp_s-end_to_end.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/tests/test-clp_s-delta-encode-log-order.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/tests/test-clp_s-end_to_end.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
  • FilterExpr (101-110)
components/core/src/clp_s/search/ast/Literal.hpp (9)
  • ret (90-90)
  • ret (90-90)
  • ret (92-92)
  • ret (92-92)
  • ret (94-94)
  • ret (94-94)
  • ret (96-96)
  • ret (96-96)
  • ret (98-98)
components/core/src/clp_s/search/ast/Integral.cpp (2)
  • get (53-55)
  • get (53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (17)
components/core/tests/clp_s_test_utils.hpp (1)

4-4: Include for std::optional — LGTM.

components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (6)

6-6: Explicit FilterOperation include — LGTM.


14-14: Alias import for FilterOperation — LGTM.


87-91: EXISTS/NEXISTS: defer to schema — correct.
Returning Unknown here aligns with project semantics (EXISTS always true/NEXISTS always false handled elsewhere).


93-98: Null-operand guard — good defensive check.
Prevents crashes on operations that don’t take a literal.


112-121: Integral narrowing and guard — LGTM.
The dynamic cast + early Unknown avoids invalid static casts.


99-111: No changes needed – Literal::as_any is declared in Literal.hpp and overridden in StringLiteral to detect "*".

components/core/tests/test-clp_s-range_index.cpp (2)

3-3: Include for std::nullopt — LGTM.


225-233: Updated callsite for new compress_archive signature — LGTM.
Passing std::nullopt maintains prior behaviour.

components/core/tests/test-clp_s-end_to_end.cpp (2)

5-5: Include — LGTM.


108-116: Updated compress_archive args — LGTM.
Keeps semantics while accommodating the new optional parameter.

components/core/tests/test-clp_s-delta-encode-log-order.cpp (2)

5-5: Include — LGTM.


69-76: Updated compress_archive call — LGTM.
No behavioural change; test remains focused on delta-encoding order.

components/core/tests/test-clp_s-search.cpp (3)

5-5: Good include.

<optional> is required by the updated test utilities. No issues.


167-167: Style compliance on boolean comparison.

Using false == ignore_case matches the codebase guideline.


227-228: Nice coverage of escaped wildcard semantics.

The added cases for "a*e" vs "a\*e" exercise the intended KQL escaping behaviour.

components/core/tests/clp_s_test_utils.cpp (1)

4-4: Required include.

<optional> is correctly added for the new parameter.

Comment thread components/core/src/clp_s/search/EvaluateTimestampIndex.cpp Outdated
Comment thread components/core/tests/clp_s_test_utils.cpp
Comment thread components/core/tests/test-clp_s-search.cpp
Comment thread components/core/tests/test-clp_s-search.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)

25-37: Fix typo: any_unkown → any_unknown.

Minor readability nit in both OR/AND branches.

Apply:

-        bool any_unkown = false;
+        bool any_unknown = false;
@@
-            } else if (ret == EvaluatedValue::Unknown) {
-                any_unkown = true;
+            } else if (ret == EvaluatedValue::Unknown) {
+                any_unknown = true;
@@
-        if (any_unkown) {
+        if (any_unknown) {

Also applies to: 42-54

components/core/tests/test-clp_s-search.cpp (2)

200-230: Optional: build idx-based queries using cTestIdxKey to avoid drift.

Parametrising the query strings keeps tests consistent if the key changes.

Example:

-            {R"aa(idx: * AND NOT idx: null AND idx: 0)aa", {0}}
+            {fmt::format(R"aa({0}: * AND NOT {0}: null AND {0}: 0)aa", cTestIdxKey), {0}}

110-114: Remove unused variable.

results_are_valid_json is never read.

Apply:

-    bool results_are_valid_json = true;
♻️ Duplicate comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)

123-125: Good: removed duplicate virtual get_operation calls.

Using filter_op eliminates repeated virtual dispatch.

components/core/tests/test-clp_s-search.cpp (1)

239-239: Reused cTestIdxKey when invoking compress_archive.

Removes the magic string; nice.

components/core/tests/clp_s_test_utils.cpp (1)

44-46: Simplify optional handling and move value.

Cleaner and avoids an extra copy.

Apply:

-    if (timestamp_key.has_value()) {
-        parser_option.timestamp_key = std::move(timestamp_key.value());
-    }
+    if (timestamp_key) {
+        parser_option.timestamp_key = std::move(*timestamp_key);
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8056b96 and 207b10d.

📒 Files selected for processing (3)
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (2 hunks)
  • components/core/tests/clp_s_test_utils.cpp (3 hunks)
  • components/core/tests/test-clp_s-search.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/clp_s_test_utils.cpp
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
  • components/core/tests/test-clp_s-search.cpp
🧠 Learnings (6)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
  • components/core/tests/test-clp_s-search.cpp
📚 Learning: 2025-08-15T15:19:14.562Z
Learnt from: haiqi96
PR: y-scope/clp#1186
File: components/clp-package-utils/clp_package_utils/scripts/archive_manager.py:40-40
Timestamp: 2025-08-15T15:19:14.562Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py, the _validate_timestamps function correctly uses Optional[int] for end_ts parameter without a default value because it receives parsed argument values that can be None when --end-ts is not provided for the find command.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (5)
components/core/src/clp_s/search/ast/FilterExpr.hpp (4)
  • FilterExpr (65-65)
  • FilterExpr (69-69)
  • FilterExpr (99-104)
  • FilterExpr (107-107)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
  • FilterExpr (101-110)
components/core/src/clp_s/search/ast/Integral.hpp (3)
  • ret (67-67)
  • ret (69-69)
  • ret (71-71)
components/core/src/clp_s/search/ast/Literal.hpp (9)
  • ret (90-90)
  • ret (90-90)
  • ret (92-92)
  • ret (92-92)
  • ret (94-94)
  • ret (94-94)
  • ret (96-96)
  • ret (96-96)
  • ret (98-98)
components/core/src/clp_s/search/ast/Integral.cpp (2)
  • get (53-55)
  • get (53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)

6-6: Headers/aliases are correct.

Including FilterOperation.hpp and the using-alias improves clarity.

Also applies to: 14-14

components/core/tests/test-clp_s-search.cpp (2)

5-5: Include is appropriate.

Needed for updated test utils signature.


227-229: New queries look correct and exercise the edge cases.

The escaped wildcard and idx: * with NOT null validate the fix paths.

components/core/tests/clp_s_test_utils.cpp (2)

4-4: Include is correct.

Required for the new parameter.


17-21: Signature change LGTM.

The optional timestamp key makes the test helper flexible without breaking callers that pass std::nullopt.

Comment on lines +88 to +91
auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
return EvaluatedValue::Unknown;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hoist EXISTS/NEXISTS guard before the range loop.

This result is invariant of the timestamp dictionary; return Unknown earlier to skip unnecessary work.

Apply:

@@
-        for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
+        // Leave handling EXISTS/NEXISTS to schema matching.
+        auto const filter_op{filter->get_operation()};
+        if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
+            return EvaluatedValue::Unknown;
+        }
+
+        for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
              range_it != m_timestamp_dict->tokenized_column_to_range_end();
              range_it++)
         {
@@
-            // Leave handling EXISTS/NEXISTS to schema matching.
-            auto const filter_op{filter->get_operation()};
-            if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
-                return EvaluatedValue::Unknown;
-            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
return EvaluatedValue::Unknown;
}
// Leave handling EXISTS/NEXISTS to schema matching.
auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
return EvaluatedValue::Unknown;
}
for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
range_it != m_timestamp_dict->tokenized_column_to_range_end();
range_it++)
{
// ... rest of loop body (no longer needs the EXISTS/NEXISTS check here)
}
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/EvaluateTimestampIndex.cpp around lines 88
to 91, the EXISTS/NEXISTS check is performed inside or after a loop over the
timestamp dictionary; hoist this guard so it executes before entering the range
loop. Change flow so that if filter->get_operation() is FilterOperation::EXISTS
or FilterOperation::NEXISTS the function immediately returns
EvaluatedValue::Unknown, avoiding any iteration or dictionary access; ensure no
other logic relies on side-effects from the loop before this early return.

Comment on lines +93 to +97
auto const literal{filter->get_operand()};
// Defend against future FilterOperation types that don't take a literal.
if (nullptr == literal) {
return EvaluatedValue::Unknown;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hoist null-literal guard before the range loop as well.

Like the EXISTS/NEXISTS case, this is dictionary-independent.

Apply:

@@
-        for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
+        auto const literal{filter->get_operand()};
+        // Defend against FilterOperation types that don't take a literal.
+        if (nullptr == literal) {
+            return EvaluatedValue::Unknown;
+        }
+
+        for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
              range_it != m_timestamp_dict->tokenized_column_to_range_end();
              range_it++)
         {
@@
-            auto const literal{filter->get_operand()};
-            // Defend against future FilterOperation types that don't take a literal.
-            if (nullptr == literal) {
-                return EvaluatedValue::Unknown;
-            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto const literal{filter->get_operand()};
// Defend against future FilterOperation types that don't take a literal.
if (nullptr == literal) {
return EvaluatedValue::Unknown;
}
// Hoist the null-literal guard out of the loop, since it doesn’t depend on the dictionary.
auto const literal{filter->get_operand()};
// Defend against FilterOperation types that don't take a literal.
if (nullptr == literal) {
return EvaluatedValue::Unknown;
}
for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
range_it != m_timestamp_dict->tokenized_column_to_range_end();
range_it++)
{
- auto const literal{filter->get_operand()};
- // Defend against future FilterOperation types that don't take a literal.
- if (nullptr == literal) {
- return EvaluatedValue::Unknown;
- }
// ... rest of loop body using `literal` ...
}
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/EvaluateTimestampIndex.cpp around lines 93
to 97, the null-literal guard is currently inside the range-handling path; hoist
that guard before entering the range loop so we defend against future
FilterOperation types irrespective of dictionary logic. Move the nullptr ==
literal check to immediately after obtaining literal (before any loop or range
processing) and early-return EvaluatedValue::Unknown if literal is null,
ensuring both the EXISTS/NEXISTS and range cases use the same
dictionary-independent guard.

Comment thread components/core/src/clp_s/search/EvaluateTimestampIndex.cpp Outdated
Comment thread components/core/src/clp_s/search/EvaluateTimestampIndex.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (2)

85-96: Hoist dictionary-independent guards (EXISTS/NEXISTS, null literal, pure-wildcard literal) before iterating ranges.

These checks don’t depend on the timestamp dictionary. Moving them out avoids repeated work and earlier exits. This also matches prior guidance.

Apply removals inside the loop:

-            // Leave handling EXISTS/NEXISTS to schema matching.
-            auto const filter_op{filter->get_operation()};
-            if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
-                return EvaluatedValue::Unknown;
-            }
-
-            auto const literal{filter->get_operand()};
-            // Defend against future FilterOperation types that don't take a literal.
-            if (nullptr == literal) {
-                return EvaluatedValue::Unknown;
-            }
-
-            // Handle the case where the literal is a pure wildcard.
-            if (literal->as_any(filter_op)) {
-                switch (filter_op) {
-                    case FilterOperation::EQ:
-                        return filter->is_inverted() ? EvaluatedValue::False : EvaluatedValue::True;
-                    case FilterOperation::NEQ:
-                        return filter->is_inverted() ? EvaluatedValue::True : EvaluatedValue::False;
-                    default:
-                        // Not reachable.
-                        return EvaluatedValue::Unknown;
-                }
-            }

Then add once, above the range loop (contextual snippet outside this hunk):

auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
    return EvaluatedValue::Unknown; // schema-level responsibility
}

auto const literal{filter->get_operand()};
if (nullptr == literal) {
    return EvaluatedValue::Unknown;
}

// Pure wildcard literal is also dictionary-independent.
if (literal->as_any(filter_op)) {
    switch (filter_op) {
        case FilterOperation::EQ:
            return filter->is_inverted() ? EvaluatedValue::False : EvaluatedValue::True;
        case FilterOperation::NEQ:
            return filter->is_inverted() ? EvaluatedValue::True : EvaluatedValue::False;
        default:
            // Unknown for ops without explicit wildcard semantics here.
            return EvaluatedValue::Unknown;
    }
}

Also applies to: 97-108


104-106: Soften the “Not reachable” assertion in default case.

Future ops could route here; prefer neutral wording.

-                        // Not reachable.
+                        // Unknown for ops that don't define wildcard semantics here.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 207b10d and bac07e5.

📒 Files selected for processing (1)
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧠 Learnings (2)
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
  • FilterExpr (101-110)
components/core/src/clp_s/search/ast/Integral.hpp (3)
  • ret (67-67)
  • ret (69-69)
  • ret (71-71)
components/core/src/clp_s/search/ast/Literal.hpp (8)
  • ret (90-90)
  • ret (90-90)
  • ret (92-92)
  • ret (92-92)
  • ret (94-94)
  • ret (94-94)
  • ret (96-96)
  • ret (96-96)
🔇 Additional comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)

6-6: Good include.

Adding FilterOperation.hpp is necessary for the new logic.


14-14: Alias improves readability.

Bringing FilterOperation into scope keeps call sites concise.


110-118: Safer numeric extraction path looks good.

Switching to as_int/as_float avoids Integral downcasts and is future-proof.

Comment thread components/core/src/clp_s/search/EvaluateTimestampIndex.cpp Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (4)

68-71: Hoist unresolved-descriptor guard before the range loop.

This check is dictionary-independent; early-return to avoid O(N) scans when descriptors include wildcards.

Apply within this hunk:

-            // Don't attempt to evaluate the timestamp index against columns with wildcard tokens.
-            if (column->is_unresolved_descriptor()) {
-                continue;
-            }

Add before the loop at Line 64:

// If the descriptor contains wildcard tokens, probing the index is inconclusive.
if (column->is_unresolved_descriptor()) {
    return EvaluatedValue::Unknown;
}

90-94: EXISTS/NEXISTS guard should be outside the index loop.

It’s invariant of the dictionary; return Unknown earlier and reuse filter_op inside the loop.

Apply within this hunk:

-            // Leave handling EXISTS/NEXISTS to schema matching.
-            auto const filter_op{filter->get_operation()};
-            if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
-                return EvaluatedValue::Unknown;
-            }

Add before the loop at Line 64:

// Leave EXISTS/NEXISTS to schema matching.
auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
    return EvaluatedValue::Unknown;
}

96-101: Null-literal guard should be hoisted before the loop.

Dictionary-independent; short-circuits earlier and keeps the loop focused on matching.

Apply within this hunk:

-            auto const literal{filter->get_operand()};
-            // Defend against future FilterOperation types that don't take a literal.
-            if (nullptr == literal) {
-                return EvaluatedValue::Unknown;
-            }

Add before the loop (after declaring filter_op) so it’s shared:

auto const literal{filter->get_operand()};
// Defend against FilterOperation types that don't take a literal.
if (nullptr == literal) {
    return EvaluatedValue::Unknown;
}

102-113: Neutralise the default-branch comment in wildcard switch.

Avoid claiming impossibility; keep a neutral fallback.

-                    default:
-                        // Not reachable.
-                        return EvaluatedValue::Unknown;
+                    default:
+                        // Unknown for operations without explicit wildcard semantics here.
+                        return EvaluatedValue::Unknown;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bac07e5 and d957968.

📒 Files selected for processing (1)
  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧠 Learnings (2)
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
  • FilterExpr (101-110)
components/core/src/clp_s/search/ast/Integral.hpp (3)
  • ret (67-67)
  • ret (69-69)
  • ret (71-71)
components/core/src/clp_s/search/ast/Literal.hpp (9)
  • ret (90-90)
  • ret (90-90)
  • ret (92-92)
  • ret (92-92)
  • ret (94-94)
  • ret (94-94)
  • ret (96-96)
  • ret (96-96)
  • ret (98-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
🔇 Additional comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)

6-6: Include of FilterOperation is correct and necessary — LGTM.


14-14: Using alias for FilterOperation improves readability — LGTM.


115-121: Safer numeric extraction via as_int/as_float — LGTM.

Eliminates downcasts to Integral and handles both i64/f64 cleanly.

Also applies to: 123-123

Comment thread components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
// Leave handling EXISTS/NEXISTS to schema matching.
auto const filter_op{filter->get_operation()};
if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
return EvaluatedValue::Unknown;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong. ts: * is an existence check and we convert it from EQ to EXISTS before evaluating the timestamp filter. Why do we return Unknown here even when we find a match?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just evaluating against the timestamp index -- it won't tell us whether the timestamp exists on a given record/schema, so we leave that up to the schema matching pass.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: what happens if an archive contains some messages without timestamp fields? I think we simply skip recording timestamps for those messages? But if a query includes timestamp filters and none of the timestamped messages in that archive match (out of range), should we still scan the remaining messages that don’t have timestamp fields?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we currently don't record any timestamps and will skip them in a scan that contains a timestamp filter. It probably makes sense to add some options for users to get around this (e.g. assign timestamp based on surrounding log events at ingestion time, or potentially still search log events missing timestamps if the timestamp range index matches), but that's future work.

}

// Handle the case where the literal is a pure wildcard.
if (literal->as_any(filter_op)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we have still have an EQ case here if we it's already converted to EXISTS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't if we run this after the convert to exists pass. Thinking about this more though we should also return Unknown here and just leave it up to schema matching for the same reason I mention in my other comment.

Since that's the case I think I can just delete this code block and rely on following through to the Unknown case below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I can probably also delete the special case dedicated to exists/nexists and rely on the check for whether the filter has a literal -- going to do a pass and get rid of redundant special cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's cool

@gibber9809 gibber9809 requested a review from wraymo September 2, 2025 20:58
@kirkrodrigues kirkrodrigues changed the title fix(clp-s): Handle pure wildcards and unexpected literal types correctly in EvaluateTimestampIndex. (fixes #1096) fix(clp-s): Handle pure wildcards and unexpected literal types correctly in EvaluateTimestampIndex (fixes #1096). Sep 3, 2025

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to @wraymo's review. Fixed the PR title to put the fixes phrase before the period.

@gibber9809 gibber9809 merged commit de423fc into y-scope:main Sep 3, 2025
25 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…tly in `EvaluateTimestampIndex` (fixes y-scope#1096). (y-scope#1277)

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clp-s: EvaluateTimestampIndex pass doesn't handle matching against wildcards.

3 participants