fix(clp-s): Handle pure wildcards and unexpected literal types correctly in EvaluateTimestampIndex (fixes #1096).#1277
Conversation
WalkthroughAdds wildcard-aware and null-operand guards plus safe numeric extraction and inversion-aware evaluation to timestamp-index filter logic; extends test utility Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Q as Query Engine
participant E as EvaluateTimestampIndex
participant F as FilterExpr
participant R as RangeIndex
Q->>E: evaluate(filter_expr, range_it)
E->>F: get_operation(), get_operand()
alt EXISTS / NEXISTS or operand == null
E-->>Q: Unknown
else descriptor is unresolved/wildcard
E-->>Q: skip / non-match
else operand is pure wildcard
alt op == EQ
E-->>Q: True (apply inversion if needed)
else op == NEQ
E-->>Q: False (apply inversion if needed)
else
E-->>Q: Unknown
end
else try numeric extraction
E->>F: literal->as_int()
alt int success
E->>R: evaluate(op, int64)
R-->>E: True/False/Unknown
E-->>Q: result (apply inversion)
else try float
E->>F: literal->as_float()
alt float success
E->>R: evaluate(op, double)
R-->>E: True/False/Unknown
E-->>Q: result (apply inversion)
else
E-->>Q: Unknown
end
end
end
note right of E: Unknown is propagated unchanged
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (2)📚 Learning: 2025-04-26T02:19:33.275ZApplied to files:
📚 Learning: 2025-05-07T16:56:35.687ZApplied to files:
🧬 Code graph analysis (1)components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
🔇 Additional comments (2)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
components/core/tests/clp_s_test_utils.hpp (1)
16-21: Update docs for new parameter; consider a default.
- The Doxygen block is missing the new
timestamp_keyand the param order no longer matches the signature.- Optional: add a default value (
= std::nullopt) to ease call sites in tests.Apply:
/** * Compresses a file into an archive directory according to a given set of configuration options. @@ * @param file_path * @param archive_directory + * @param timestamp_key Optional name of the timestamp field in the input; when unset, default + * timestamp handling is used. * @param single_file_archive * @param structurize_arrays * @param file_type * @return Statistics for every compressed archive. */ [[nodiscard]] auto compress_archive( std::string const& file_path, std::string const& archive_directory, - std::optional<std::string> timestamp_key, + std::optional<std::string> timestamp_key, bool single_file_archive, bool structurize_arrays, clp_s::FileType file_type ) -> std::vector<clp_s::ArchiveStats>;Also applies to: 26-30
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)
25-37: Typo:any_unkown→any_unknown.
Purely cosmetic but worth fixing for readability.- bool any_unkown = false; + bool any_unknown = false; @@ - } else if (ret == EvaluatedValue::Unknown) { - any_unkown = true; + } else if (ret == EvaluatedValue::Unknown) { + any_unknown = true; @@ - if (any_unkown) { + if (any_unknown) {Repeat similarly in the AND-branch.
Also applies to: 42-55
components/core/tests/test-clp_s-search.cpp (1)
175-175: Make timestamp-eval assertion easier to debug.Capture the evaluation result before asserting; helps diagnose False/Unknown outcomes across archives.
- REQUIRE(clp_s::EvaluatedValue::False != timestamp_index_pass.run(archive_expr)); + auto const ts_eval = timestamp_index_pass.run(archive_expr); + REQUIRE(clp_s::EvaluatedValue::False != ts_eval);Please confirm that for the newly added query (
idx: * AND NOT idx: null AND idx: 0),EvaluateTimestampIndexreturnsUnknown(notFalse) after this PR’s changes, consistent with the retrieved learning on EXISTS/NEXISTS semantics.components/core/tests/clp_s_test_utils.cpp (1)
17-21: Minor API ergonomics: consider passing by const-ref.
std::optional<std::string>by value copies the string when engaged. Considerstd::optional<std::string> const&in both header and impl if you want to avoid copies (low impact in tests).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp(2 hunks)components/core/tests/clp_s_test_utils.cpp(3 hunks)components/core/tests/clp_s_test_utils.hpp(2 hunks)components/core/tests/test-clp_s-delta-encode-log-order.cpp(2 hunks)components/core/tests/test-clp_s-end_to_end.cpp(2 hunks)components/core/tests/test-clp_s-range_index.cpp(2 hunks)components/core/tests/test-clp_s-search.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cppcomponents/core/tests/clp_s_test_utils.cppcomponents/core/tests/test-clp_s-range_index.cppcomponents/core/tests/clp_s_test_utils.hppcomponents/core/tests/test-clp_s-delta-encode-log-order.cppcomponents/core/tests/test-clp_s-search.cppcomponents/core/tests/test-clp_s-end_to_end.cpp
🧠 Learnings (7)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/tests/clp_s_test_utils.cppcomponents/core/tests/test-clp_s-range_index.cppcomponents/core/tests/clp_s_test_utils.hppcomponents/core/tests/test-clp_s-delta-encode-log-order.cppcomponents/core/tests/test-clp_s-search.cppcomponents/core/tests/test-clp_s-end_to_end.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Applied to files:
components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2025-01-30T19:26:33.869Z
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.
Applied to files:
components/core/tests/test-clp_s-delta-encode-log-order.cpp
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
components/core/tests/test-clp_s-end_to_end.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
FilterExpr(101-110)components/core/src/clp_s/search/ast/Literal.hpp (9)
ret(90-90)ret(90-90)ret(92-92)ret(92-92)ret(94-94)ret(94-94)ret(96-96)ret(96-96)ret(98-98)components/core/src/clp_s/search/ast/Integral.cpp (2)
get(53-55)get(53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (17)
components/core/tests/clp_s_test_utils.hpp (1)
4-4: Include for std::optional — LGTM.components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (6)
6-6: Explicit FilterOperation include — LGTM.
14-14: Alias import for FilterOperation — LGTM.
87-91: EXISTS/NEXISTS: defer to schema — correct.
Returning Unknown here aligns with project semantics (EXISTS always true/NEXISTS always false handled elsewhere).
93-98: Null-operand guard — good defensive check.
Prevents crashes on operations that don’t take a literal.
112-121: Integral narrowing and guard — LGTM.
The dynamic cast + early Unknown avoids invalid static casts.
99-111: No changes needed –Literal::as_anyis declared inLiteral.hppand overridden inStringLiteralto detect"*".components/core/tests/test-clp_s-range_index.cpp (2)
3-3: Include for std::nullopt — LGTM.
225-233: Updated callsite for new compress_archive signature — LGTM.
Passingstd::nulloptmaintains prior behaviour.components/core/tests/test-clp_s-end_to_end.cpp (2)
5-5: Include — LGTM.
108-116: Updated compress_archive args — LGTM.
Keeps semantics while accommodating the new optional parameter.components/core/tests/test-clp_s-delta-encode-log-order.cpp (2)
5-5: Include — LGTM.
69-76: Updated compress_archive call — LGTM.
No behavioural change; test remains focused on delta-encoding order.components/core/tests/test-clp_s-search.cpp (3)
5-5: Good include.
<optional>is required by the updated test utilities. No issues.
167-167: Style compliance on boolean comparison.Using
false == ignore_casematches the codebase guideline.
227-228: Nice coverage of escaped wildcard semantics.The added cases for
"a*e"vs"a\*e"exercise the intended KQL escaping behaviour.components/core/tests/clp_s_test_utils.cpp (1)
4-4: Required include.
<optional>is correctly added for the new parameter.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)
25-37: Fix typo: any_unkown → any_unknown.Minor readability nit in both OR/AND branches.
Apply:
- bool any_unkown = false; + bool any_unknown = false; @@ - } else if (ret == EvaluatedValue::Unknown) { - any_unkown = true; + } else if (ret == EvaluatedValue::Unknown) { + any_unknown = true; @@ - if (any_unkown) { + if (any_unknown) {Also applies to: 42-54
components/core/tests/test-clp_s-search.cpp (2)
200-230: Optional: build idx-based queries using cTestIdxKey to avoid drift.Parametrising the query strings keeps tests consistent if the key changes.
Example:
- {R"aa(idx: * AND NOT idx: null AND idx: 0)aa", {0}} + {fmt::format(R"aa({0}: * AND NOT {0}: null AND {0}: 0)aa", cTestIdxKey), {0}}
110-114: Remove unused variable.results_are_valid_json is never read.
Apply:
- bool results_are_valid_json = true;
♻️ Duplicate comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)
123-125: Good: removed duplicate virtual get_operation calls.Using filter_op eliminates repeated virtual dispatch.
components/core/tests/test-clp_s-search.cpp (1)
239-239: Reused cTestIdxKey when invoking compress_archive.Removes the magic string; nice.
components/core/tests/clp_s_test_utils.cpp (1)
44-46: Simplify optional handling and move value.Cleaner and avoids an extra copy.
Apply:
- if (timestamp_key.has_value()) { - parser_option.timestamp_key = std::move(timestamp_key.value()); - } + if (timestamp_key) { + parser_option.timestamp_key = std::move(*timestamp_key); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp(2 hunks)components/core/tests/clp_s_test_utils.cpp(3 hunks)components/core/tests/test-clp_s-search.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/tests/clp_s_test_utils.cppcomponents/core/src/clp_s/search/EvaluateTimestampIndex.cppcomponents/core/tests/test-clp_s-search.cpp
🧠 Learnings (6)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
components/core/tests/clp_s_test_utils.cppcomponents/core/tests/test-clp_s-search.cpp
📚 Learning: 2025-08-15T15:19:14.562Z
Learnt from: haiqi96
PR: y-scope/clp#1186
File: components/clp-package-utils/clp_package_utils/scripts/archive_manager.py:40-40
Timestamp: 2025-08-15T15:19:14.562Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/archive_manager.py, the _validate_timestamps function correctly uses Optional[int] for end_ts parameter without a default value because it receives parsed argument values that can be None when --end-ts is not provided for the find command.
Applied to files:
components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-26T19:12:43.982Z
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.
Applied to files:
components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2024-11-29T22:50:17.206Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Applied to files:
components/core/tests/clp_s_test_utils.cpp
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (5)
components/core/src/clp_s/search/ast/FilterExpr.hpp (4)
FilterExpr(65-65)FilterExpr(69-69)FilterExpr(99-104)FilterExpr(107-107)components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
FilterExpr(101-110)components/core/src/clp_s/search/ast/Integral.hpp (3)
ret(67-67)ret(69-69)ret(71-71)components/core/src/clp_s/search/ast/Literal.hpp (9)
ret(90-90)ret(90-90)ret(92-92)ret(92-92)ret(94-94)ret(94-94)ret(96-96)ret(96-96)ret(98-98)components/core/src/clp_s/search/ast/Integral.cpp (2)
get(53-55)get(53-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (1)
6-6: Headers/aliases are correct.Including FilterOperation.hpp and the using-alias improves clarity.
Also applies to: 14-14
components/core/tests/test-clp_s-search.cpp (2)
5-5: Include is appropriate.Needed for updated test utils signature.
227-229: New queries look correct and exercise the edge cases.The escaped wildcard and idx: * with NOT null validate the fix paths.
components/core/tests/clp_s_test_utils.cpp (2)
4-4: Include is correct.Required for the new parameter.
17-21: Signature change LGTM.The optional timestamp key makes the test helper flexible without breaking callers that pass std::nullopt.
| auto const filter_op{filter->get_operation()}; | ||
| if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { | ||
| return EvaluatedValue::Unknown; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Hoist EXISTS/NEXISTS guard before the range loop.
This result is invariant of the timestamp dictionary; return Unknown earlier to skip unnecessary work.
Apply:
@@
- for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
+ // Leave handling EXISTS/NEXISTS to schema matching.
+ auto const filter_op{filter->get_operation()};
+ if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
+ return EvaluatedValue::Unknown;
+ }
+
+ for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
range_it != m_timestamp_dict->tokenized_column_to_range_end();
range_it++)
{
@@
- // Leave handling EXISTS/NEXISTS to schema matching.
- auto const filter_op{filter->get_operation()};
- if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) {
- return EvaluatedValue::Unknown;
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto const filter_op{filter->get_operation()}; | |
| if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { | |
| return EvaluatedValue::Unknown; | |
| } | |
| // Leave handling EXISTS/NEXISTS to schema matching. | |
| auto const filter_op{filter->get_operation()}; | |
| if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { | |
| return EvaluatedValue::Unknown; | |
| } | |
| for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin(); | |
| range_it != m_timestamp_dict->tokenized_column_to_range_end(); | |
| range_it++) | |
| { | |
| // ... rest of loop body (no longer needs the EXISTS/NEXISTS check here) | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/EvaluateTimestampIndex.cpp around lines 88
to 91, the EXISTS/NEXISTS check is performed inside or after a loop over the
timestamp dictionary; hoist this guard so it executes before entering the range
loop. Change flow so that if filter->get_operation() is FilterOperation::EXISTS
or FilterOperation::NEXISTS the function immediately returns
EvaluatedValue::Unknown, avoiding any iteration or dictionary access; ensure no
other logic relies on side-effects from the loop before this early return.
| auto const literal{filter->get_operand()}; | ||
| // Defend against future FilterOperation types that don't take a literal. | ||
| if (nullptr == literal) { | ||
| return EvaluatedValue::Unknown; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Hoist null-literal guard before the range loop as well.
Like the EXISTS/NEXISTS case, this is dictionary-independent.
Apply:
@@
- for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
+ auto const literal{filter->get_operand()};
+ // Defend against FilterOperation types that don't take a literal.
+ if (nullptr == literal) {
+ return EvaluatedValue::Unknown;
+ }
+
+ for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin();
range_it != m_timestamp_dict->tokenized_column_to_range_end();
range_it++)
{
@@
- auto const literal{filter->get_operand()};
- // Defend against future FilterOperation types that don't take a literal.
- if (nullptr == literal) {
- return EvaluatedValue::Unknown;
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto const literal{filter->get_operand()}; | |
| // Defend against future FilterOperation types that don't take a literal. | |
| if (nullptr == literal) { | |
| return EvaluatedValue::Unknown; | |
| } | |
| // Hoist the null-literal guard out of the loop, since it doesn’t depend on the dictionary. | |
| auto const literal{filter->get_operand()}; | |
| // Defend against FilterOperation types that don't take a literal. | |
| if (nullptr == literal) { | |
| return EvaluatedValue::Unknown; | |
| } | |
| for (auto range_it = m_timestamp_dict->tokenized_column_to_range_begin(); | |
| range_it != m_timestamp_dict->tokenized_column_to_range_end(); | |
| range_it++) | |
| { | |
| - auto const literal{filter->get_operand()}; | |
| - // Defend against future FilterOperation types that don't take a literal. | |
| - if (nullptr == literal) { | |
| - return EvaluatedValue::Unknown; | |
| - } | |
| // ... rest of loop body using `literal` ... | |
| } |
🤖 Prompt for AI Agents
In components/core/src/clp_s/search/EvaluateTimestampIndex.cpp around lines 93
to 97, the null-literal guard is currently inside the range-handling path; hoist
that guard before entering the range loop so we defend against future
FilterOperation types irrespective of dictionary logic. Move the nullptr ==
literal check to immediately after obtaining literal (before any loop or range
processing) and early-return EvaluatedValue::Unknown if literal is null,
ensuring both the EXISTS/NEXISTS and range cases use the same
dictionary-independent guard.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (2)
85-96: Hoist dictionary-independent guards (EXISTS/NEXISTS, null literal, pure-wildcard literal) before iterating ranges.These checks don’t depend on the timestamp dictionary. Moving them out avoids repeated work and earlier exits. This also matches prior guidance.
Apply removals inside the loop:
- // Leave handling EXISTS/NEXISTS to schema matching. - auto const filter_op{filter->get_operation()}; - if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { - return EvaluatedValue::Unknown; - } - - auto const literal{filter->get_operand()}; - // Defend against future FilterOperation types that don't take a literal. - if (nullptr == literal) { - return EvaluatedValue::Unknown; - } - - // Handle the case where the literal is a pure wildcard. - if (literal->as_any(filter_op)) { - switch (filter_op) { - case FilterOperation::EQ: - return filter->is_inverted() ? EvaluatedValue::False : EvaluatedValue::True; - case FilterOperation::NEQ: - return filter->is_inverted() ? EvaluatedValue::True : EvaluatedValue::False; - default: - // Not reachable. - return EvaluatedValue::Unknown; - } - }Then add once, above the range loop (contextual snippet outside this hunk):
auto const filter_op{filter->get_operation()}; if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { return EvaluatedValue::Unknown; // schema-level responsibility } auto const literal{filter->get_operand()}; if (nullptr == literal) { return EvaluatedValue::Unknown; } // Pure wildcard literal is also dictionary-independent. if (literal->as_any(filter_op)) { switch (filter_op) { case FilterOperation::EQ: return filter->is_inverted() ? EvaluatedValue::False : EvaluatedValue::True; case FilterOperation::NEQ: return filter->is_inverted() ? EvaluatedValue::True : EvaluatedValue::False; default: // Unknown for ops without explicit wildcard semantics here. return EvaluatedValue::Unknown; } }Also applies to: 97-108
104-106: Soften the “Not reachable” assertion in default case.Future ops could route here; prefer neutral wording.
- // Not reachable. + // Unknown for ops that don't define wildcard semantics here.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧠 Learnings (2)
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
FilterExpr(101-110)components/core/src/clp_s/search/ast/Integral.hpp (3)
ret(67-67)ret(69-69)ret(71-71)components/core/src/clp_s/search/ast/Literal.hpp (8)
ret(90-90)ret(90-90)ret(92-92)ret(92-92)ret(94-94)ret(94-94)ret(96-96)ret(96-96)
🔇 Additional comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
6-6: Good include.Adding FilterOperation.hpp is necessary for the new logic.
14-14: Alias improves readability.Bringing FilterOperation into scope keeps call sites concise.
110-118: Safer numeric extraction path looks good.Switching to
as_int/as_floatavoids Integral downcasts and is future-proof.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (4)
68-71: Hoist unresolved-descriptor guard before the range loop.This check is dictionary-independent; early-return to avoid O(N) scans when descriptors include wildcards.
Apply within this hunk:
- // Don't attempt to evaluate the timestamp index against columns with wildcard tokens. - if (column->is_unresolved_descriptor()) { - continue; - }Add before the loop at Line 64:
// If the descriptor contains wildcard tokens, probing the index is inconclusive. if (column->is_unresolved_descriptor()) { return EvaluatedValue::Unknown; }
90-94: EXISTS/NEXISTS guard should be outside the index loop.It’s invariant of the dictionary; return Unknown earlier and reuse filter_op inside the loop.
Apply within this hunk:
- // Leave handling EXISTS/NEXISTS to schema matching. - auto const filter_op{filter->get_operation()}; - if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { - return EvaluatedValue::Unknown; - }Add before the loop at Line 64:
// Leave EXISTS/NEXISTS to schema matching. auto const filter_op{filter->get_operation()}; if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { return EvaluatedValue::Unknown; }
96-101: Null-literal guard should be hoisted before the loop.Dictionary-independent; short-circuits earlier and keeps the loop focused on matching.
Apply within this hunk:
- auto const literal{filter->get_operand()}; - // Defend against future FilterOperation types that don't take a literal. - if (nullptr == literal) { - return EvaluatedValue::Unknown; - }Add before the loop (after declaring filter_op) so it’s shared:
auto const literal{filter->get_operand()}; // Defend against FilterOperation types that don't take a literal. if (nullptr == literal) { return EvaluatedValue::Unknown; }
102-113: Neutralise the default-branch comment in wildcard switch.Avoid claiming impossibility; keep a neutral fallback.
- default: - // Not reachable. - return EvaluatedValue::Unknown; + default: + // Unknown for operations without explicit wildcard semantics here. + return EvaluatedValue::Unknown;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧠 Learnings (2)
📚 Learning: 2025-04-26T02:19:33.275Z
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:340-347
Timestamp: 2025-04-26T02:19:33.275Z
Learning: In the CLP codebase, the FilterOperation::EXISTS and FilterOperation::NEXISTS operations have specific semantics that differ from simply checking if a value exists. EXISTS always returns true and NEXISTS always returns false in the evaluate_filter_against_literal_type_value_pair function. The optional::has_value() is used specifically to differentiate empty objects from concrete values, not for existence checks in filter operations.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
📚 Learning: 2025-05-07T16:56:35.687Z
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Applied to files:
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
components/core/src/clp_s/search/ast/FilterExpr.cpp (1)
FilterExpr(101-110)components/core/src/clp_s/search/ast/Integral.hpp (3)
ret(67-67)ret(69-69)ret(71-71)components/core/src/clp_s/search/ast/Literal.hpp (9)
ret(90-90)ret(90-90)ret(92-92)ret(92-92)ret(94-94)ret(94-94)ret(96-96)ret(96-96)ret(98-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
🔇 Additional comments (3)
components/core/src/clp_s/search/EvaluateTimestampIndex.cpp (3)
6-6: Include of FilterOperation is correct and necessary — LGTM.
14-14: Using alias for FilterOperation improves readability — LGTM.
115-121: Safer numeric extraction via as_int/as_float — LGTM.Eliminates downcasts to Integral and handles both i64/f64 cleanly.
Also applies to: 123-123
| // Leave handling EXISTS/NEXISTS to schema matching. | ||
| auto const filter_op{filter->get_operation()}; | ||
| if (FilterOperation::EXISTS == filter_op || FilterOperation::NEXISTS == filter_op) { | ||
| return EvaluatedValue::Unknown; |
There was a problem hiding this comment.
Correct me if I'm wrong. ts: * is an existence check and we convert it from EQ to EXISTS before evaluating the timestamp filter. Why do we return Unknown here even when we find a match?
There was a problem hiding this comment.
This is just evaluating against the timestamp index -- it won't tell us whether the timestamp exists on a given record/schema, so we leave that up to the schema matching pass.
There was a problem hiding this comment.
Quick question: what happens if an archive contains some messages without timestamp fields? I think we simply skip recording timestamps for those messages? But if a query includes timestamp filters and none of the timestamped messages in that archive match (out of range), should we still scan the remaining messages that don’t have timestamp fields?
There was a problem hiding this comment.
Yeah we currently don't record any timestamps and will skip them in a scan that contains a timestamp filter. It probably makes sense to add some options for users to get around this (e.g. assign timestamp based on surrounding log events at ingestion time, or potentially still search log events missing timestamps if the timestamp range index matches), but that's future work.
| } | ||
|
|
||
| // Handle the case where the literal is a pure wildcard. | ||
| if (literal->as_any(filter_op)) { |
There was a problem hiding this comment.
And do we have still have an EQ case here if we it's already converted to EXISTS?
There was a problem hiding this comment.
We don't if we run this after the convert to exists pass. Thinking about this more though we should also return Unknown here and just leave it up to schema matching for the same reason I mention in my other comment.
Since that's the case I think I can just delete this code block and rely on following through to the Unknown case below.
There was a problem hiding this comment.
Actually I can probably also delete the special case dedicated to exists/nexists and rely on the check for whether the filter has a literal -- going to do a pass and get rid of redundant special cases.
EvaluateTimestampIndex. (fixes #1096)EvaluateTimestampIndex (fixes #1096).
kirkrodrigues
left a comment
There was a problem hiding this comment.
Deferring to @wraymo's review. Fixed the PR title to put the fixes phrase before the period.
…tly in `EvaluateTimestampIndex` (fixes y-scope#1096). (y-scope#1277) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
This PR fixes several crashes in
EvaluateTimestampIndexthat were caused by the assumptions that:which on the surface appear valid, because the pass is designed to run after type narrowing, meaning that the column must actually be able to match integral types.
These assumptions are untrue in the following scenarios:
StringLiteralin this case)NOT ts: nullEXISTSorNEXISTSfilterThe code has been updated to:
EXISTS/NEXISTSWe also add a test case that exercises the behaviour that caused a crash prior to this PR.
Checklist
breaking change.
Validation performed
EvaluateTimestampIndexpass doesn't handle matching against wildcards. #1096 no longer crashEvaluateTimestampIndexpass doesn't handle matching against wildcards. #1096Summary by CodeRabbit
New Features
Bug Fixes
Tests