feat(clp-s): Add support for searching against MPT subtrees with specific types.#908
Conversation
WalkthroughThis change unifies the management of subtree IDs in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant QueryRunner
participant SchemaMatch
participant SchemaTree
participant ColumnDescriptor
User->>QueryRunner: Submit search query
QueryRunner->>SchemaMatch: Request column mapping
SchemaMatch->>ColumnDescriptor: Check for subtree_type
alt subtree_type is set
SchemaMatch->>SchemaTree: get_subtree_for_namespace_and_type(namespace, subtree_type)
SchemaTree-->>SchemaMatch: Return subtree node ID
SchemaMatch->>SchemaTree: Access children for mapping
else subtree_type not set
SchemaMatch->>SchemaTree: get_subtrees()
SchemaTree-->>SchemaMatch: Return all subtrees
SchemaMatch->>SchemaTree: Iterate non-metadata subtrees
end
SchemaMatch-->>QueryRunner: Return column mapping
QueryRunner->>QueryRunner: Evaluate filters (skip metadata columns unless targeted)
QueryRunner-->>User: Return results
Suggested reviewers
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🧰 Additional context used📓 Path-based instructions (1)`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
🧬 Code Graph Analysis (1)components/core/src/clp_s/search/SchemaMatch.cpp (4)
⏰ Context from checks skipped due to timeout of 90000ms (11)
🔇 Additional comments (12)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
components/core/src/clp_s/SchemaTree.cpp(2 hunks)components/core/src/clp_s/SchemaTree.hpp(5 hunks)components/core/src/clp_s/archive_constants.hpp(1 hunks)components/core/src/clp_s/search/QueryRunner.cpp(4 hunks)components/core/src/clp_s/search/SchemaMatch.cpp(4 hunks)components/core/src/clp_s/search/ast/ColumnDescriptor.cpp(2 hunks)components/core/src/clp_s/search/ast/ColumnDescriptor.hpp(3 hunks)components/core/tests/test-clp_s-search.cpp(5 hunks)components/core/tests/test-kql.cpp(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
components/core/src/clp_s/archive_constants.hppcomponents/core/tests/test-kql.cppcomponents/core/src/clp_s/search/ast/ColumnDescriptor.hppcomponents/core/src/clp_s/search/QueryRunner.cppcomponents/core/tests/test-clp_s-search.cppcomponents/core/src/clp_s/search/ast/ColumnDescriptor.cppcomponents/core/src/clp_s/SchemaTree.cppcomponents/core/src/clp_s/search/SchemaMatch.cppcomponents/core/src/clp_s/SchemaTree.hpp
🧬 Code Graph Analysis (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (1)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
m_subtree_type(285-285)m_namespace(279-279)
components/core/src/clp_s/SchemaTree.cpp (1)
components/core/src/clp_s/SchemaTree.hpp (8)
m_namespace_and_type_to_subtree_id(169-172)get_subtree_for_namespace_and_type(178-180)subtree_namespace(147-149)subtree_namespace(147-147)subtree_namespace(166-167)field_name(157-157)m_nodes(182-182)m_nodes(195-199)
components/core/src/clp_s/search/SchemaMatch.cpp (3)
components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (2)
subtree_type(287-289)subtree_type(287-287)components/core/src/clp_s/search/SchemaMatch.hpp (4)
column(94-94)column(101-101)column(154-154)column(161-161)components/core/src/clp_s/Schema.hpp (2)
node_type(174-176)node_type(174-174)
🪛 Cppcheck (2.10-2)
components/core/src/clp_s/SchemaTree.cpp
[style] 75-75: The function 'get_metadata_field_id' is never used.
(unusedFunction)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (25)
components/core/src/clp_s/archive_constants.hpp (1)
37-38: Constants added to support the new subtree type feature.The new constants
cObjectSubtreeTypeandcMetadataSubtreeTypealign well with the PR objective to enable searching against specific MPT subtrees. These string constants define the supported values for the newsubtree_typeproperty inColumnDescriptor.components/core/tests/test-kql.cpp (1)
54-54: Test assertions verify default behavior for subtree types.The new assertions ensure that KQL parsing doesn't set
subtree_typeby default. This is important to maintain backward compatibility and confirm that columns will resolve against non-metadata subtrees by default, as specified in the PR objectives.Also applies to: 73-73, 93-93, 143-143, 195-195, 225-225, 281-281, 301-301, 325-325
components/core/src/clp_s/search/ast/ColumnDescriptor.cpp (2)
86-89: Updated print method to include subtree type information.The changes to the
print()method properly include the newsubtree_typeproperty in the output when it's set, which is helpful for debugging and logging.
98-98: Updated closing delimiter to match new token list format.The closing delimiter has been updated to
])to properly match the new tokens list format using square brackets.components/core/src/clp_s/search/ast/ColumnDescriptor.hpp (3)
5-5: Added necessary includes for new functionality.Added
<optional>and<utility>headers to support the newstd::optionalmember andstd::moveusage in the setter.Also applies to: 9-9
285-289: Added getter and setter for the new subtree_type property.These methods properly implement access to the new optional subtree type property. The setter uses
std::movefor efficiency when transferring ownership of the string.
299-299: Added member variable for subtree type.The new
m_subtree_typemember variable is appropriately typed asstd::optional<std::string>to represent an optional subtree type for column resolution, as required by the PR objective.components/core/src/clp_s/search/QueryRunner.cpp (2)
198-202:matches_metadatacan be calculated once, but be wary of transitive‐include dependenceThe new logic is clear and correct.
A minor point: this file now depends onconstants::cMetadataSubtreeType, which is only visible through the transitive inclusion chain (QueryRunner.cpp→SchemaTree.hpp→archive_constants.hpp).
An explicit include improves readability and shields us from future header-ordering regressions.+#include "../archive_constants.hpp"[ suggest_nitpick ]
205-208: Repeated metadata-skip guards – consider centralisingThe four almost-identical
if (false == matches_metadata && m_metadata_columns.contains(...))checks introduce duplication and make future behavioural tweaks error-prone.A small helper keeps intent crystal-clear:
+auto should_skip_metadata = [&](int32_t column_id) -> bool { + return false == matches_metadata && m_metadata_columns.contains(column_id); +}; … - if (false == matches_metadata && m_metadata_columns.contains(entry.first)) { - continue; - } + if (should_skip_metadata(entry.first)) { + continue; + }This removes ~12 redundant lines without affecting performance.
[ suggest_optional_refactor ]Also applies to: 217-220, 228-231, 239-242
components/core/src/clp_s/SchemaTree.cpp (2)
47-52: String-view keys may dangle afterm_nodesreallocation
m_namespace_and_type_to_subtree_idstoresstd::string_views that aliasnode.get_key_name()insidem_nodes(astd::vector).
Wheneverm_nodesgrows past its current capacity, reallocation invalidates all earlier pointers, leaving the map with dangling views and undefined behaviour.Two safe alternatives:
- m_namespace_and_type_to_subtree_id.emplace( - std::make_pair(node.get_key_name(), type), - node_id - ); + // Make a owning copy of the key to keep the map stable. + m_namespace_and_type_to_subtree_id.emplace( + std::make_pair(std::string{node.get_key_name()}, type), + node_id);or pre-reserve
m_nodescapacity and document the guarantee.Given the subtlety, I recommend the first approach.
[ raise_critical_issue ]
64-73: Compact, idiomatic accessor – looks good
get_subtree_for_namespace_and_typecleanly wraps the lookup and sentinel value, improving readability. No concerns here.
[ approve_code_changes ]components/core/src/clp_s/search/SchemaMatch.cpp (2)
43-52: Gracefully handle unknownsubtree_typestrings
get_node_type_for_subtree_typereturnsNodeType::Unknownfor unrecognised strings.
Down-stream, anUnknownlookup inevitably fails, silently causing the column to be unmapped and the expression pruned. Consider surfacing an explicit error (e.g., via status return or logged warning) so users know they passed an unsupported value rather than getting an empty result set.[ request_verification ]
185-206: Short-circuit once a matching column is foundInside the two resolution paths we continue iterating even after
matchedbecomestrue. With wide schemas this wastes work.-resolve_against_subtree(root_node); +if (false == matched) { // stop once we have at least one hit + resolve_against_subtree(root_node); +}Same optimisation applies inside the
for (get_subtrees)loop. This is an easy perf win without altering semantics.
[ suggest_optional_refactor ]components/core/tests/test-clp_s-search.cpp (6)
16-27: Well-organized imports to support new functionality.The added imports properly support the new AST creation capabilities needed for testing subtree type-specific queries.
46-46: Good addition of a utility function for test setup.This function declaration appropriately sets up the creation of a complex test expression that will target the metadata subtree.
50-54: Good refactoring to support direct AST expression testing.This overloaded
search()function allows testing with both string queries and programmatically constructed AST expressions, which is valuable for testing the new subtree type feature.
154-156: Good refactoring to simplify code maintenance.The original
search()function now properly delegates to the new implementation, avoiding code duplication.
158-162: Well-structured function signature that maintains consistency.The overloaded
search()function maintains the same parameter order and naming conventions as the original, making it easy to understand and maintain.
243-245: Effective test case that validates the core functionality.This test properly verifies that:
- The expression creation succeeds without exceptions
- The search with the metadata-specific expression returns only the expected result (index 0)
This validates that subtree type filtering works correctly in the search implementation.
components/core/src/clp_s/SchemaTree.hpp (6)
15-15: Proper include for constants needed by the implementation.Added the include for archive_constants.hpp which contains necessary constants for subtree types.
147-149: Good refactoring to use the new generalized method.The function now delegates to the new
get_subtree_for_namespace_and_typemethod, maintaining the same behavior while supporting the unified subtree management approach.
157-172: Well-designed API for unified subtree management.The new methods provide a clean, consistent interface for accessing subtrees by namespace and type:
get_subtree_for_namespace_and_typeprovides low-level accessget_subtreesoffers access to the full collectionThis design properly encapsulates the implementation details while providing the necessary functionality.
178-180: Consistent refactoring to use the new API.The
get_metadata_subtree_node_idmethod now correctly delegates to the new generalized method, maintaining the same behavior while leveraging the unified implementation.
195-199: Properly updated clear method to handle new data structure.The
clearmethod now correctly clears the newm_namespace_and_type_to_subtree_idmap, preventing potential memory leaks or stale data issues.
218-219: Well-chosen data structure for unified subtree management.Using
absl::btree_mapwith astd::pair<std::string_view, NodeType>key provides:
- Efficient lookup by both namespace and type
- Ordered iteration capabilities (unlike flat_hash_map)
- Memory efficiency by using string_view rather than std::string
This is an excellent choice for the unified storage of subtree IDs.
wraymo
left a comment
There was a problem hiding this comment.
A good PR overall! Just have a few questions about set_subtree_type and the unit tests.
|
|
||
| auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; } | ||
|
|
||
| auto set_subtree_type(std::optional<std::string> subtree_type) { |
There was a problem hiding this comment.
The subtree type is set only in the unit test? I wonder what's the future use of this method.
There was a problem hiding this comment.
We discussed it in the thread last week, but you can also just take a look at this part of the follow-up PR.
It allows us to transform queries against the metadata range index into queries against the log_event_idx column from the metadata subtree in that transformation pass.
| m_namespace = descriptor_namespace; | ||
| } | ||
|
|
||
| auto get_subtree_type() const -> std::optional<std::string> const& { return m_subtree_type; } |
There was a problem hiding this comment.
Can we get rid of get_node_type_for_subtree_type and have a method get_subtree_node_type instead?
| {std::string{clp_s::constants::cLogEventIdxName}}, | ||
| std::string{clp_s::constants::cDefaultNamespace} | ||
| ); | ||
| auto object_subtree_non_matching_idx = non_matching_idx->copy(); |
There was a problem hiding this comment.
Maybe change idx to column? I personally prefer something like column, column_with_object_subtree_type, column_with_metadata_subtree_type
| return (tests_dir / get_test_input_path_relative_to_tests_dir()).string(); | ||
| } | ||
|
|
||
| auto create_first_record_match_metadata_query() -> std::shared_ptr<clp_s::search::ast::Expression> { |
There was a problem hiding this comment.
I didn't get the point of this test. Seems that we create an OR expression with three operands. And one operand matches a record. But if so, what's the use of other two operands? Or can we create another two test cases with the other two operands
There was a problem hiding this comment.
It tests that column resolution is treating subtree type as expected in all cases. Each column has name "log_event_idx" and different subtree type. The column that matches against 0 has subtree type "metadata" and both columns matching against 1 have "object" and no subtree type respectively. If column resolution respects subtree type correctly then only the filter against 0 should get resolved and the only matching record should be record 0, but if either of the two other columns are erroneously resolved then the results will incorrectly also contain record 1.
There was a problem hiding this comment.
Oh, I see your point. It's a compact way to test three filters in one expression
| auto zero_literal = clp_s::search::ast::Integral::create_from_int(0); | ||
| auto one_literal = clp_s::search::ast::Integral::create_from_int(1); |
There was a problem hiding this comment.
Why do we need to create two literals?
There was a problem hiding this comment.
See other comment.
Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
kirkrodrigues
left a comment
There was a problem hiding this comment.
Approving based on @wraymo's review.
…ific types. (y-scope#908) Co-authored-by: wraymo <37269683+wraymo@users.noreply.github.com>
Description
This PR adds support to the clp-s AST for specifying search against specific subtrees of the MPT. The immediate use-case is to allow search on the
log_event_idxcolumn in the metadata subtree.Within
ColumnDescriptorwe now have asubtree_typestring property wrapped in anoptional. The associated search semantics are:subtree_typeis not set then resolve the column against every subtree with matching namespace except theNodeType::Metadatasubtreesubtree_typeis set then resolve only against the subtree with matching namespace and typeThe current supported mappings for
subtree_typeare"object" ->NodeType::Objectand"metadata"->NodeType::Metadata`.We modify
SchemaMatchto implement these semantics during column resolution andQueryRunnerto make dynamically expanded wildcards respectsubtree_type. Note that this dynamic expansion runs into some existing limitations with how we expand wildcards: see #907.Finally we add some search tests which check that we search the metadata subtree iff
subtree_typeismetadata.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests