fix check for pure primary condition filter#6840
Conversation
📝 WalkthroughWalkthroughThe changes remove a Clippy lint attribute from the Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🔇 Additional comments (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/struct_payload_index.rs (1)
45-45: Consider keeping#[expect]instead of#[allow]for better lint management.The
expectattribute is preferred overallowwhen you know the lint will fire, as it documents that the suppression is intentional and will catch if the lint stops firing due to code changes.-#[allow(clippy::enum_variant_names)] +#[expect(clippy::enum_variant_names)]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(0 hunks)lib/segment/src/index/field_index/mod.rs(3 hunks)lib/segment/src/index/query_estimator.rs(2 hunks)lib/segment/src/index/struct_payload_index.rs(5 hunks)lib/segment/src/types.rs(2 hunks)
💤 Files with no reviewable changes (1)
- lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/segment/src/index/field_index/full_text_index/inverted_index.rs:330-344
Timestamp: 2025-05-26T14:45:01.364Z
Learning: For future phrase capabilities in the full-text index, equality checks need to compare both the ID and value of PostingElement items, not just the IDs. This supports use cases like position-aware or frequency-weighted searches where the value component contains important metadata.
lib/segment/src/index/query_estimator.rs (4)
Learnt from: generall
PR: qdrant/qdrant#6635
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:784-832
Timestamp: 2025-06-14T20:35:10.603Z
Learning: Functions gated with `#[cfg(feature = "testing")]` within `#[cfg(test)]` modules are safe from compilation issues since both the function and its call sites are restricted to test builds. The additional feature gate is often used for optional heavy test utilities.
Learnt from: timvisee
PR: qdrant/qdrant#6546
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:497-518
Timestamp: 2025-05-19T09:25:54.285Z
Learning: In the Qdrant project, using `.expect()` and `.unwrap()` is considered acceptable in test code since test failures should be loud and obvious, and proper error propagation adds unnecessary complexity to test code.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6663
File: lib/quantization/src/encoded_vectors_binary.rs:270-276
Timestamp: 2025-06-18T09:23:14.977Z
Learning: Pattern matching on `&Option<T>` with `Some(x)` is valid and idiomatic in Rust. The bound variable `x` will have type `&T` and can be used to access fields through automatic dereferencing. No need to use `.as_ref()` in such cases.
lib/segment/src/index/struct_payload_index.rs (7)
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/iterator.rs:0-0
Timestamp: 2025-05-26T14:50:13.183Z
Learning: The `advance_until_greater_or_equal` method in `PostingIterator` in `lib/posting_list/src/iterator.rs` is intentionally designed with different semantics than `next()` and can output the same value multiple times. This is intended behavior for a search method that finds the first element with ID >= target_id.
Learnt from: generall
PR: qdrant/qdrant#6766
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:274-282
Timestamp: 2025-06-28T19:25:53.448Z
Learning: In the full text inverted index implementations (MutableInvertedIndex, ImmutableInvertedIndex, MmapInvertedIndex), there's an important distinction between TokenId and PointOffsetType: TokenId represents a token/word identifier used to index into postings lists, while PointOffsetType represents a point/document identifier. The iter_point_ids method should take a TokenId and return an iterator over PointOffsetType values (points that contain that token), not the other way around.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/api/src/grpc/qdrant.rs:6493-6496
Timestamp: 2025-03-12T09:25:16.340Z
Learning: The `payload_index_io_read` field in `HardwareUsage` was introduced in PR #5951 and is safe to modify as it was not used in production before this PR.
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
lib/segment/src/types.rs (1)
undefined
<retrieved_learning>
Learnt from: timvisee
PR: #6569
File: lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs:56-76
Timestamp: 2025-05-21T12:52:15.229Z
Learning: In volatile sparse vector storage implementations, the total_sparse_size field should be updated when vectors are inserted, updated, or deleted to maintain an accurate count of the total number of non-zero elements across all vectors.
</retrieved_learning>
lib/segment/src/index/field_index/mod.rs (4)
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: timvisee
PR: qdrant/qdrant#6503
File: lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs:60-74
Timestamp: 2025-05-12T12:54:27.872Z
Learning: In the Qdrant codebase, using `pub(super)` visibility is preferred when fields need to be accessed by sibling modules (particularly for index type conversions), as it provides the necessary access without bloating the interface with numerous getters.
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: test-consistency
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: lint
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (10)
lib/segment/src/index/query_estimator.rs (1)
272-272: LGTM! Import and ID handling changes are correct.The addition of
std::ops::Derefimport and the simplified construction ofPrimaryCondition::Idsusingderef().clone()correctly aligns with the PR's objective of storing external IDs directly.Also applies to: 321-321
lib/segment/src/index/struct_payload_index.rs (3)
108-115: Correct implementation of external to internal ID mapping.The changes properly handle the transition to storing external IDs in
PrimaryCondition::Idsby mapping them to internal offsets through the ID tracker. Filtering out None values ensures only valid internal IDs are returned.
362-369: Cardinality estimation correctly updated for external ID usage.The changes properly adapt the cardinality estimation to work with external IDs stored in
PrimaryCondition::Ids, maintaining the correct count and ID set.
461-464: Clean implementation using the newis_primarymethod.The refactored logic using
query_cardinality.is_primary(condition)provides a more maintainable approach to checking if all filter conditions are primary clauses.lib/segment/src/types.rs (2)
2919-2919: Approve the change from 0 to 1 forHasVectorcondition count.This change correctly updates the
sub_conditions_countmethod forHasVectorconditions to return 1 instead of 0, which aligns with the PR's objective to fix condition counting logic. TheHasVectorcondition represents a single condition that should be counted as such.
2297-2301: Approve the simplified condition counting logic.The updated
count_all_conditionsfunction now correctly counts only the recursive sub-condition counts by summingj.sub_conditions_count()for each condition, rather than additionally counting the direct length of the condition vector. This change addresses the PR objective to prevent counting nested "must" conditions multiple times and provides more accurate condition counting.The logic now relies solely on the recursive sub-condition counts, which is the correct approach for hierarchical condition structures.
lib/segment/src/index/field_index/mod.rs (4)
1-1: Import addition looks correct.The
Dereftrait import is needed for thederef()call in the newis_primarymethod.
5-5: Import consolidation improves organization.Good consolidation of imports from the types module, making the dependencies clearer.
30-30: Type change aligns with PR objectives.The change from
PointOffsetTypetoPointIdTypefor theIdsvariant correctly implements the PR objective of storing external IDs to facilitate easier comparison with original conditions.
89-111: Well-implemented method for primary condition checking.The
is_primarymethod correctly implements the logic to check if a given condition matches any primary clause. The pattern matching is exhaustive and handles all three variants ofPrimaryConditionappropriately:
Conditionvariant: Compares boxed field conditionsIdsvariant: Usesderef()to compare ID collectionsHasVectorvariant: Compares vector namesThe use of
deref()on line 101 is correct for comparing the underlying collection types.
|
|
||
| #[derive(Debug)] | ||
| #[expect(clippy::enum_variant_names)] | ||
| #[allow(clippy::enum_variant_names)] |
There was a problem hiding this comment.
Clippy with --no-default-features was complaining, I couldn't find a better way to fix it. Not directly related to PR
* fix check for pure primary condition filter * avoid cloning has_id for primary condition * fix test * add test --------- Co-authored-by: Arnaud Gourlay <arnaud.gourlay@gmail.com>
total_conditions_count, so it doesn't count nested must conditions as multiple conditionsPrimaryConditiontype to store external ids for: a) easier comparison with original condition b) avoiding mapping of the conditions, which might not be even usedmin_shouldcan generate more primary conditions then there are conditions defined (by duplicating conditions)