Use batching searcher for sparse vector search by monoid · Pull Request #7572 · qdrant/qdrant

monoid · 2025-11-20T13:23:55Z

It allows to remove peek_* methods from FilteredScorer, simplifying
code.

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

Does your submission pass tests?
Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.

coderabbitai · 2025-11-20T13:31:51Z

📝 Walkthrough

Walkthrough

Removed two public methods (peek_top_all and peek_top_iter) from FilteredScorer in lib/segment/src/index/hnsw_index/point_scorer.rs. Replaced uses of FilteredScorer with BatchFilteredSearcher across code and tests (sparse vector index and multiple vector-storage tests). Tests updated to use batch-oriented constructors and to assert single-result expectations with itertools::Itertools::exactly_one(). No other public APIs beyond the scorer→searcher changes were modified.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Inspect lib/segment/src/index/hnsw_index/point_scorer.rs to confirm removed methods have no remaining internal callers or unintended visibility/api gaps.
Review lib/segment/src/index/sparse_index/sparse_vector_index.rs to ensure result extraction from BatchFilteredSearcher preserves prior semantics (pop vs. single-element vector handling).
Check tests in vector_storage (dense/sparse/memmap) for correct usage of BatchFilteredSearcher::new/new_for_test signatures and the switch from .pop() to .into_iter().exactly_one().
Verify imports and Cargo manifest (itertools addition) are correct and tests compile.

Possibly related PRs

Batched scans for dense vectors #7514 — similar replacement of FilteredScorer with BatchFilteredSearcher across scorer implementation and call sites.
TestRawScorerProducer to use VectorStorageEnum #6965 — modifies creation/usage of FilteredScorer in tests and touches related APIs.
Move filtering logic from RawScorer to FilteredScorer #6245 — touches the same peek_top_iter/peek_top_all scoring APIs that were removed here.

Suggested reviewers

generall
timvisee
xzfc

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Use batching searcher for sparse vector search' clearly summarizes the main change: replacing FilteredScorer with BatchFilteredSearcher in sparse vector indexing.
Description check	✅ Passed	The PR description is related to the changeset, explaining that the changes allow removal of peek_* methods from FilteredScorer to simplify code, which aligns with the actual modifications.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/sparse-batched-scans

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc72975 and 603a648.

📒 Files selected for processing (1)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (11)

📓 Common learnings

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-06-02T18:10:47.203Z

Learnt from: coszio
Repo: qdrant/qdrant PR: 6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-10-16T13:00:46.809Z

Learnt from: coszio
Repo: qdrant/qdrant PR: 7368
File: lib/shard/src/query/mod.rs:313-341
Timestamp: 2025-10-16T13:00:46.809Z
Learning: In the Qdrant codebase, for the `Feedback` query variant in `lib/shard/src/query/mod.rs`, validation for sparse vectors should happen down the line rather than early in the `query_enum_from_grpc_raw_query` function. The `Nearest` variant is treated as a special case where early validation with better error messages is acceptable.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

🧬 Code graph analysis (1)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)

lib/segment/src/index/hnsw_index/point_scorer.rs (2)

new_for_test (185-199)

new_for_test (322-352)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: e2e-tests
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: lint
GitHub Check: test-consensus-compose
GitHub Check: test-consistency
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus

🔇 Additional comments (3)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)

9-9: LGTM! Import additions support the batch interface migration.

The addition of itertools::Itertools for exactly_one() and the switch from FilteredScorer to BatchFilteredSearcher correctly support the batch-oriented interface introduced in this PR.

Also applies to: 18-18

86-97: LGTM! Correct migration to batch-oriented searcher.

The test correctly migrates to BatchFilteredSearcher by passing the query vector as a slice and using exactly_one() to assert the single-result expectation. The batch size of 5 is appropriate for the expected 3 results.

181-191: LGTM! Consistent migration to batch interface.

The test correctly migrates to BatchFilteredSearcher with appropriate batch size. The split-statement style (separating results and closest) is clear and maintains the same correctness as the inline style used elsewhere.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

Make the tests stricter.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
92-96: Consider using exactly_one() for consistency.

The current pattern using let mut results, .len() assertion, and .pop() is functionally correct but could be simplified to match the pattern used elsewhere in the codebase (lines 186-190) and in the dense vector tests.

Apply this diff:
-    let mut results = searcher
+    let closest = searcher
         .peek_top_iter(&mut [0, 1, 2, 3, 4].iter().cloned(), &DEFAULT_STOPPED)
-        .unwrap();
-    assert_eq!(results.len(), 1);
-    let closest = results.pop().unwrap();
+        .unwrap()
+        .into_iter()
+        .exactly_one()
+        .unwrap();
This makes the code more concise and consistent with the pattern used at lines 186-190 of this file and throughout test_appendable_dense_vector_storage.rs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62c37e4 and dc72975.

📒 Files selected for processing (3)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)

🧰 Additional context used

🧠 Learnings (15)

📓 Common learnings

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-10-13T22:58:03.121Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-06-02T18:10:47.203Z

Learnt from: coszio
Repo: qdrant/qdrant PR: 6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-10-16T13:00:46.809Z

Learnt from: coszio
Repo: qdrant/qdrant PR: 7368
File: lib/shard/src/query/mod.rs:313-341
Timestamp: 2025-10-16T13:00:46.809Z
Learning: In the Qdrant codebase, for the `Feedback` query variant in `lib/shard/src/query/mod.rs`, validation for sparse vectors should happen down the line rather than early in the `query_enum_from_grpc_raw_query` function. The `Nearest` variant is treated as a special case where early validation with better error messages is acceptable.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-04-07T23:31:22.515Z

Learnt from: generall
Repo: qdrant/qdrant PR: 6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

📚 Learning: 2025-11-14T10:01:31.926Z

Learnt from: timvisee
Repo: qdrant/qdrant PR: 7526
File: src/wal_pop.rs:7-8
Timestamp: 2025-11-14T10:01:31.926Z
Learning: In the Qdrant codebase, development binaries (like src/wal_pop.rs) don't require explicit bounds checking for command-line arguments. Rust's built-in out-of-bounds panic is acceptable for these development tools.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs

🧬 Code graph analysis (2)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)

lib/segment/src/index/hnsw_index/point_scorer.rs (2)

new_for_test (185-199)

new_for_test (322-352)

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (1)

lib/segment/src/index/hnsw_index/point_scorer.rs (2)

new (119-140)

new (280-314)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: lint
GitHub Check: test-consistency
GitHub Check: e2e-tests
GitHub Check: test-consensus-compose
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: Build Qdrant Edge Python bindings

🔇 Additional comments (7)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

319-319: LGTM! Import added for test utilities.

The itertools::Itertools import is correctly placed in the test module to support the .exactly_one() usage throughout the tests.

426-427: Excellent improvement to test assertions!

The refactor from .pop() to .into_iter().exactly_one() strengthens the test assertions. The old pattern would silently pass if multiple batch results were returned (taking only the last), while .exactly_one() enforces the expected single-result constraint and provides clearer error messages on violation. This better matches the test intent where each BatchFilteredSearcher is created with a single query vector.

Also applies to: 443-444, 516-517, 545-546, 572-573, 635-636

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (3)

7-7: LGTM!

The Itertools import is correctly added to support the exactly_one() method used for extracting single batch results.

66-77: LGTM!

The migration to BatchFilteredSearcher with exactly_one() correctly adapts the test to the batch-oriented API while maintaining single-query semantics.

256-281: LGTM!

The code correctly demonstrates both FilteredScorer for raw scoring and BatchFilteredSearcher for batch search operations. The query.clone() at line 257 is necessary since the query is used in both scorer instances.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)

9-9: LGTM!

The import updates correctly reflect the migration from FilteredScorer to BatchFilteredSearcher and add necessary Itertools support.

Also applies to: 18-18

180-190: LGTM!

The BatchFilteredSearcher usage with exactly_one() correctly adapts the test to the batch-oriented API. This is the preferred pattern for extracting single batch results.

It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.

Use batching searcher for sparse vector search

62c37e4

It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.

monoid requested review from generall, timvisee and xzfc November 20, 2025 13:23

monoid self-assigned this Nov 20, 2025

monoid changed the base branch from master to dev November 20, 2025 13:26

monoid marked this pull request as ready for review November 20, 2025 13:26

xzfc reviewed Nov 21, 2025

View reviewed changes

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs Outdated Show resolved Hide resolved

xzfc approved these changes Nov 21, 2025

View reviewed changes

Use Itertools::exactly_one in tests

dc72975

Make the tests stricter.

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

More

603a648

monoid merged commit b5825c1 into dev Nov 21, 2025
16 checks passed

monoid deleted the feat/sparse-batched-scans branch November 21, 2025 13:29

timvisee pushed a commit that referenced this pull request Nov 25, 2025

Use batch searcher for sparse vector search (#7572)

c74abf5

It allows to remove `peek_*` methods from `FilteredScorer`, simplifying code.

timvisee mentioned this pull request Nov 25, 2025

Bump version to 1.16.1 #7603

Merged

coderabbitai bot mentioned this pull request Nov 27, 2025

Fix search in empty HNSW segments #7620

Merged

3 tasks

coderabbitai bot mentioned this pull request Mar 11, 2026

Filter deferred points sparse search #8299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use batching searcher for sparse vector search#7572

Use batching searcher for sparse vector search#7572
monoid merged 3 commits intodevfrom
feat/sparse-batched-scans

monoid commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

monoid commented Nov 20, 2025

All Submissions:

New Feature Submissions:

Changes to Core Features:

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 20, 2025 •

edited

Loading