Use batching searcher for sparse vector search#7572
Conversation
It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.
📝 WalkthroughWalkthroughRemoved two public methods ( Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (11)📓 Common learnings📚 Learning: 2025-09-01T11:19:26.371ZApplied to files:
📚 Learning: 2025-08-11T00:37:34.100ZApplied to files:
📚 Learning: 2025-07-11T11:35:21.549ZApplied to files:
📚 Learning: 2025-08-15T11:42:00.297ZApplied to files:
📚 Learning: 2025-08-11T07:57:01.399ZApplied to files:
📚 Learning: 2025-08-15T11:41:10.629ZApplied to files:
📚 Learning: 2025-10-13T22:58:03.121ZApplied to files:
📚 Learning: 2025-08-14T11:31:21.777ZApplied to files:
📚 Learning: 2025-06-02T18:10:47.203ZApplied to files:
📚 Learning: 2025-10-16T13:00:46.809ZApplied to files:
🧬 Code graph analysis (1)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
Outdated
Show resolved
Hide resolved
Make the tests stricter.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
92-96: Consider usingexactly_one()for consistency.The current pattern using
let mut results,.len()assertion, and.pop()is functionally correct but could be simplified to match the pattern used elsewhere in the codebase (lines 186-190) and in the dense vector tests.Apply this diff:
- let mut results = searcher + let closest = searcher .peek_top_iter(&mut [0, 1, 2, 3, 4].iter().cloned(), &DEFAULT_STOPPED) - .unwrap(); - assert_eq!(results.len(), 1); - let closest = results.pop().unwrap(); + .unwrap() + .into_iter() + .exactly_one() + .unwrap();This makes the code more concise and consistent with the pattern used at lines 186-190 of this file and throughout
test_appendable_dense_vector_storage.rs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (15)
📓 Common learnings
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.
Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-10-13T22:58:03.121Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-06-02T18:10:47.203Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-10-16T13:00:46.809Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 7368
File: lib/shard/src/query/mod.rs:313-341
Timestamp: 2025-10-16T13:00:46.809Z
Learning: In the Qdrant codebase, for the `Feedback` query variant in `lib/shard/src/query/mod.rs`, validation for sparse vectors should happen down the line rather than early in the `query_enum_from_grpc_raw_query` function. The `Nearest` variant is treated as a special case where early validation with better error messages is acceptable.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.
Applied to files:
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-11-14T10:01:31.926Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 7526
File: src/wal_pop.rs:7-8
Timestamp: 2025-11-14T10:01:31.926Z
Learning: In the Qdrant codebase, development binaries (like src/wal_pop.rs) don't require explicit bounds checking for command-line arguments. Rust's built-in out-of-bounds panic is acceptable for these development tools.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
🧬 Code graph analysis (2)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/index/hnsw_index/point_scorer.rs (2)
new_for_test(185-199)new_for_test(322-352)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (1)
lib/segment/src/index/hnsw_index/point_scorer.rs (2)
new(119-140)new(280-314)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: Build Qdrant Edge Python bindings
🔇 Additional comments (7)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
319-319: LGTM! Import added for test utilities.The
itertools::Itertoolsimport is correctly placed in the test module to support the.exactly_one()usage throughout the tests.
426-427: Excellent improvement to test assertions!The refactor from
.pop()to.into_iter().exactly_one()strengthens the test assertions. The old pattern would silently pass if multiple batch results were returned (taking only the last), while.exactly_one()enforces the expected single-result constraint and provides clearer error messages on violation. This better matches the test intent where eachBatchFilteredSearcheris created with a single query vector.Also applies to: 443-444, 516-517, 545-546, 572-573, 635-636
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (3)
7-7: LGTM!The
Itertoolsimport is correctly added to support theexactly_one()method used for extracting single batch results.
66-77: LGTM!The migration to
BatchFilteredSearcherwithexactly_one()correctly adapts the test to the batch-oriented API while maintaining single-query semantics.
256-281: LGTM!The code correctly demonstrates both
FilteredScorerfor raw scoring andBatchFilteredSearcherfor batch search operations. Thequery.clone()at line 257 is necessary since the query is used in both scorer instances.lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
9-9: LGTM!The import updates correctly reflect the migration from
FilteredScorertoBatchFilteredSearcherand add necessaryItertoolssupport.Also applies to: 18-18
180-190: LGTM!The
BatchFilteredSearcherusage withexactly_one()correctly adapts the test to the batch-oriented API. This is the preferred pattern for extracting single batch results.
It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.
It allows to remove
peek_*methods fromFilteredScorer, simplifyingcode.
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?Changes to Core Features: