Skip to content

feat(clp-s): Add support for querying kv-pairs in the archive range index.#929

Merged
gibber9809 merged 20 commits into
y-scope:mainfrom
gibber9809:metadata-section-search
Jun 11, 2025
Merged

feat(clp-s): Add support for querying kv-pairs in the archive range index.#929
gibber9809 merged 20 commits into
y-scope:mainfrom
gibber9809:metadata-section-search

Conversation

@gibber9809

@gibber9809 gibber9809 commented May 26, 2025

Copy link
Copy Markdown
Contributor

Description

This PR adds support for querying fields in the archive range index. Any column in the $ namespace is treated as a query against the metadata range index. Filters from the $ namespace are evaluated against the range index and either converted into filters on a range of log_event_idx or constant propagated away.

This logic has been implemented in the EvaluateMetadataFilters transformation pass. The pass has an iterative implementation that walks the AST to find and rewrite filters on columns in the $ namespace and has another iterative implementation that resolves columns against metadata fields in each section of the range index. Expression evaluation is performed by clp::ffi::ir_stream::search::evaluate_filter_against_literal_type_value_pair -- since this evaluation function doesn't yet support array search filters against the archive range index also do not support array search.

Note that this PR enables search on fields in the archive range index but not projection.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added search test cases that test various aspects of column resolution and expression evaluation on the archive range index

Summary by CodeRabbit

  • New Features
    • Added support for filtering search results using metadata fields from the archive range index. Users can now query metadata fields (e.g., filename, file split number, archive creator ID) by prefixing keys with a $ symbol.
  • Bug Fixes
    • Enhanced search logic to skip unnecessary processing when no results match the metadata filters.
  • Documentation
    • Updated user guide with instructions and examples for querying kv-pairs in the archive range index.
  • Tests
    • Expanded test coverage for metadata range index filtering, including new queries and improved diagnostics.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners May 26, 2025 20:23
@coderabbitai

coderabbitai Bot commented May 26, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

This change introduces a new AST transformation pass that evaluates and rewrites filters referencing metadata fields in the archive range index during search. It adds the EvaluateRangeIndexFilters class, integrates it into the search flow, updates unit tests to cover this logic, and documents the new query capability for range index metadata.

Changes

File(s) Change Summary
components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp, .hpp Added new class for evaluating and rewriting filters on range index metadata in AST expressions.
components/core/src/clp_s/search/CMakeLists.txt, components/core/CMakeLists.txt Included new source files in build targets and updated linking dependencies.
components/core/src/clp_s/clp-s.cpp Integrated the new filter evaluation pass into the search_archive function.
components/core/tests/test-clp_s-search.cpp Updated tests to apply the new pass and added test cases for range index metadata filtering.
docs/src/user-guide/reference-json-search-syntax.md Documented querying of archive range index metadata using the new filter syntax.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Search
    participant EvaluateRangeIndexFilters
    participant RangeIndex
    participant AST

    User->>Search: Submit search query
    Search->>AST: Parse query to AST
    Search->>EvaluateRangeIndexFilters: Run transformation on AST
    EvaluateRangeIndexFilters->>RangeIndex: Evaluate filters referencing metadata
    RangeIndex-->>EvaluateRangeIndexFilters: Return matching ranges
    EvaluateRangeIndexFilters->>AST: Rewrite AST with range-based filters
    Search->>AST: Continue search processing (timestamp, decompression, etc.)
Loading

Possibly related PRs

Suggested reviewers

  • wraymo
  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b5e610 and 5d6ff7c.

📒 Files selected for processing (1)
  • docs/src/user-guide/reference-json-search-syntax.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/reference-json-search-syntax.md

89-89: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


95-95: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)


101-101: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (macos-latest)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c12c30 and 79f9bd8.

📒 Files selected for processing (7)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/clp-s.cpp (2 hunks)
  • components/core/src/clp_s/search/EvaluateMetadataFilters.cpp (1 hunks)
  • components/core/src/clp_s/search/EvaluateMetadataFilters.hpp (1 hunks)
  • components/core/tests/test-clp_s-search.cpp (4 hunks)
  • docs/src/user-guide/reference-json-search-syntax.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/search/EvaluateMetadataFilters.hpp
  • components/core/src/clp_s/search/EvaluateMetadataFilters.cpp
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/reference-json-search-syntax.md

93-93: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (12)
components/core/src/clp_s/CMakeLists.txt (1)

171-172: LGTM! Clean addition of new source files.

The new EvaluateMetadataFilters source files are properly added to the CLP_S_SEARCH_SOURCES list, following the established pattern and naming conventions.

components/core/CMakeLists.txt (1)

342-343: LGTM! Consistent unit test integration.

The new EvaluateMetadataFilters source files are correctly added to the unit test build target, ensuring proper test coverage for the new functionality.

components/core/src/clp_s/clp-s.cpp (2)

32-32: LGTM! Clean include addition.

The include for EvaluateMetadataFilters.hpp is properly placed in alphabetical order.


175-182: Well-integrated metadata filter evaluation.

The new transformation pass is properly integrated into the search pipeline:

  • Correctly positioned after AST transformations and before timestamp index evaluation
  • Follows the established pattern for transformation passes
  • Properly handles empty expressions with appropriate logging
  • Respects case sensitivity settings from command line arguments
  • Uses the coding guideline preference false == <expression> instead of negation

The early return optimization when no metadata ranges match is a nice performance improvement.

components/core/tests/test-clp_s-search.cpp (4)

29-29: LGTM!

The include statement is correctly placed and necessary for the new metadata filter evaluation functionality.


164-171: Good implementation and adherence to coding guidelines!

The metadata filter evaluation is correctly integrated into the search pipeline. I appreciate the proper use of false == ignore_case which follows the coding guideline to prefer false == <expression> over !<expression>.


213-224: Comprehensive test coverage for metadata filters!

The new test queries effectively validate the metadata filtering functionality with various scenarios including:

  • Individual metadata field queries
  • Compound queries with AND/OR operators
  • Negation with NOT operators
  • Mixed metadata and regular field queries

240-240: Good addition for test debugging!

The CAPTURE macro will display the current query value when a test fails, making it easier to identify the problematic query.

components/core/src/clp_s/search/EvaluateMetadataFilters.hpp (1)

1-58: Well-structured header with clear documentation!

The header file follows best practices:

  • Proper include guards
  • Clear and informative documentation comments
  • Appropriate use of const reference for the range index parameter
  • Good separation of public interface and private implementation details
components/core/src/clp_s/search/EvaluateMetadataFilters.cpp (3)

58-87: Excellent implementation with proper coding standards!

The iterative AST traversal is well-implemented and consistently follows the coding guideline by using false == work_list.empty(). The renormalization step after rewriting ensures the AST remains in a consistent form.


89-154: Well-optimized filter rewriting with range merging!

The implementation efficiently merges contiguous matching ranges to minimize the number of filter conditions. The code consistently follows the coding guideline with proper use of false == comparisons throughout.


206-222: Potential precision loss for large unsigned integers

The static cast at line 210 could lose precision for unsigned values larger than INT64_MAX. While the TODO comment acknowledges this, it's worth tracking this limitation.

Consider opening an issue to track the implementation of full support for large unsigned values to avoid potential data loss.

Comment on lines +83 to +96
### Querying kv-pairs in the archive range index

The archive range index maps metadata about each file ingested into an archive to a logical range
of the archive. Filters that reference fields in this metadata transparently get converted to
filters that match corresponding logical ranges of an archive.

To search for records corresponding to a kv-pair within the archive range index you can prefix the
key with `$`. For example, the following query searches for records from a file that was passed to
compression as `test.jsonl`:

```
$_filename: "test.jsonl"
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Well-documented new feature with clear examples.

The new section effectively explains how to query metadata fields using the $ prefix, with a practical example that users can easily understand.

Fix missing language specification for code block.

The static analysis tool correctly identified that the fenced code block should specify a language.

Apply this diff to specify the language:

-```
+```kql
 $_filename: "test.jsonl"

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

93-93: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In docs/src/user-guide/reference-json-search-syntax.md around lines 83 to 96,
the fenced code block demonstrating querying kv-pairs in the archive range index
is missing a language specification. To fix this, add the language identifier
"kql" immediately after the opening triple backticks of the code block to enable
proper syntax highlighting.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread components/core/src/clp_s/search/EvaluateMetadataFilters.cpp
Comment on lines +178 to +182
auto ret{evaluate_filter_against_literal_type_value_pair(
filter_expr,
ast::LiteralType::BooleanT,
bool_value,
m_case_sensitive_match

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to deduplicate the code a bit using something like

    auto eval = [&](ast::LiteralType type, std::optional<clp::ffi::Value> value) -> bool {
        auto result = evaluate_filter_against_literal_type_value_pair(
            filter_expr, type, value, m_case_sensitive_match);
        return !result.has_error() && (filter_expr->is_inverted() != result.value());
    };

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do something like this.

@gibber9809 gibber9809 requested a review from wraymo May 28, 2025 16:04

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
components/core/src/clp_s/search/EvaluateMetadataFilters.cpp (1)

261-264: Array search not yet supported.

As documented in the PR objectives and TODO comment, array search filters are not yet implemented. This is a known limitation of the current implementation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79f9bd8 and 8f697bb.

📒 Files selected for processing (1)
  • components/core/src/clp_s/search/EvaluateMetadataFilters.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/search/EvaluateMetadataFilters.cpp
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (4)
components/core/src/clp_s/search/EvaluateMetadataFilters.cpp (4)

38-55: The text encoding helper function is well-implemented.

The function correctly encodes text into the four-byte encoded AST format and properly handles dictionary variable extraction.


58-87: The AST traversal and filter rewriting logic is well-structured.

The method correctly identifies metadata filters, rewrites them, and applies necessary normalization passes. Good use of the iterative work list pattern.


89-154: Excellent implementation of range filter rewriting with efficient range merging.

The method correctly evaluates metadata filters and efficiently merges adjacent matching ranges to minimize the number of generated range filters.


169-178: Good implementation of the deduplication suggestion.

The lambda effectively reduces code duplication as suggested in the previous review.

Comment thread components/core/src/clp_s/search/EvaluateMetadataFilters.cpp
@kirkrodrigues kirkrodrigues self-requested a review May 30, 2025 03:08

@wraymo wraymo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! For EvaluateMetadataFilters, it seems that it's specifically used for the range index rather than searching against the metadata tree. Should we rename it to EvaluateRangeIndexFilters?

std::vector<clp::ir::four_byte_encoded_variable_t> encoded_vars;
std::vector<int32_t> dict_var_bounds;
clp::ffi::encode_message(text, logtype, encoded_vars, dict_var_bounds);
std::vector<std::string> dict_vars;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reserve the size?


auto evaluate_expr
= [&](ast::LiteralType type, std::optional<clp::ffi::Value> const& value) -> bool {
auto ret{evaluate_filter_against_literal_type_value_pair(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why are we using clp::ffi methods to evaluate filter? Is it because we don't expose similar methods in clp_s?

@gibber9809 gibber9809 Jun 5, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is part of it, but also

  1. The current evaluate_filter functions in clp-s rely on a bunch of internal state, so even if we did expose them they would be hard to use
  2. There are really good tests for the clp::ffi evaluate methods, so it should be pretty reliable

Comment on lines +278 to +279
} else if (cur_field.is_object()) {
if (cur_field.contains(cur_it->get_token())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge these two into one condition

return false == ret.has_error() && filter_expr->is_inverted() != ret.value();
};

while (false == work_list.empty()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the purpose of this while loop. Is it because we want to handle nested column in the future? and are we planning to store nested structures in the RangeIndex fields?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is for that and for handling wildcards tokens in the column descriptor.

));

for (auto const& [query, expected_results] : queries_and_results) {
CAPTURE(query);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps with debugging -- now when one of the queries fail this test will show you the specific query as well as the test assertion that failed.

filters that match corresponding logical ranges of an archive.

To search for records corresponding to a kv-pair within the archive range index you can prefix the
key with `$`. For example, the following query searches for records from a file that was passed to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention the range index fields we currently support?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll mention them in the SFA format design doc that I'm writing, but I think the current example with $_filename should be enough for most users.

The $_archive_creator_id and $_file_split_number fields will probably only be needed by ourselves and advanced users.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense

@gibber9809 gibber9809 requested a review from wraymo June 5, 2025 16:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
components/core/tests/test-clp_s-search.cpp (1)

239-239: Good debugging improvement with query capture.

The CAPTURE(query) statement provides valuable diagnostic information when tests fail, as discussed in previous reviews. This helps identify which specific query caused a failure.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1b6dbc and 2cd7b0b.

📒 Files selected for processing (6)
  • components/core/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/clp-s.cpp (2 hunks)
  • components/core/src/clp_s/search/CMakeLists.txt (2 hunks)
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp (1 hunks)
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.hpp (1 hunks)
  • components/core/tests/test-clp_s-search.cpp (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/clp-s.cpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.hpp
  • components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (11)
components/core/CMakeLists.txt (1)

344-345: LGTM! Build configuration correctly updated.

The new source files are properly added to the unit test build configuration and follow the existing alphabetical ordering pattern.

components/core/src/clp_s/search/CMakeLists.txt (2)

18-19: LGTM! Source files correctly added to library build.

The new EvaluateRangeIndexFilters source files are properly added to the search library configuration following the existing pattern.


48-48: New dependency added for library functionality.

The addition of clp_s::clp_dependencies as a private dependency suggests the new EvaluateRangeIndexFilters class requires additional functionality. This aligns with the integration of range index filtering capabilities.

components/core/src/clp_s/clp-s.cpp (2)

33-33: LGTM! Include statement properly added.

The include for the new EvaluateRangeIndexFilters header is correctly placed with other search-related includes.


176-183: Excellent integration of metadata filtering with early optimization.

The EvaluateRangeIndexFilters pass is correctly positioned after AST transformations but before timestamp index evaluation, allowing for early termination when no metadata ranges match. The case sensitivity flag derivation from command line arguments is appropriate.

The early return with SPDLOG_INFO provides good user feedback while avoiding unnecessary archive decompression when metadata filters eliminate all potential matches.

components/core/tests/test-clp_s-search.cpp (3)

29-29: LGTM! Test include properly added.

The include for EvaluateRangeIndexFilters.hpp is correctly placed with other search-related includes in the test file.


163-169: Excellent test integration mirroring production code.

The metadata filtering pass is correctly integrated into the test search function, matching the implementation in the main application. The assertions ensure the transformation doesn't result in null or empty expressions, providing good test coverage.


212-223: Comprehensive test coverage for metadata field querying.

The new test queries effectively cover various scenarios:

  • Metadata field combinations with logical operators
  • Negation of metadata filters
  • Disjunctions with filename patterns
  • Integration with regular index fields

These tests validate the new archive range index querying functionality thoroughly.

components/core/src/clp_s/search/EvaluateRangeIndexFilters.hpp (1)

15-56: Well-structured transformation class design.

The class follows good design principles with clear documentation, appropriate use of const references, and a focused interface that aligns with the single responsibility principle.

components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp (2)

59-88: Correct AST traversal and transformation logic.

The implementation properly traverses the expression tree, identifies filters in the range index namespace, and handles renormalization when changes occur. Good adherence to the coding guideline using false == work_list.empty().


129-147: Efficient range merging optimization.

Good implementation of range merging that combines adjacent ranges to reduce the number of filter conditions. This optimization will improve query performance.

Comment thread components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
Comment thread components/core/src/clp_s/search/EvaluateRangeIndexFilters.cpp
Comment thread docs/src/user-guide/reference-json-search-syntax.md Outdated
gibber9809 and others added 2 commits June 11, 2025 14:43
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
wraymo
wraymo previously approved these changes Jun 11, 2025
@gibber9809 gibber9809 requested a review from kirkrodrigues June 11, 2025 20:19

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, some minor rephrasing.

Comment thread docs/src/user-guide/reference-json-search-syntax.md Outdated
Comment thread docs/src/user-guide/reference-json-search-syntax.md Outdated
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
gibber9809 and others added 2 commits June 11, 2025 18:20
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@gibber9809 gibber9809 requested a review from kirkrodrigues June 11, 2025 22:21

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving docs.

@gibber9809 gibber9809 merged commit c6de929 into y-scope:main Jun 11, 2025
22 checks passed
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…ndex. (y-scope#929)

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(clp-s): Add support for generic ingestion-unit-level range index inside of archives

3 participants