Skip to content

Use batching searcher for sparse vector search#7572

Merged
monoid merged 3 commits intodevfrom
feat/sparse-batched-scans
Nov 21, 2025
Merged

Use batching searcher for sparse vector search#7572
monoid merged 3 commits intodevfrom
feat/sparse-batched-scans

Conversation

@monoid
Copy link
Contributor

@monoid monoid commented Nov 20, 2025

It allows to remove peek_* methods from FilteredScorer, simplifying
code.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

It allows to remove `peek_*` methods from `FilteredScorer`, simplifying
code.
@monoid monoid requested review from generall, timvisee and xzfc November 20, 2025 13:23
@monoid monoid self-assigned this Nov 20, 2025
@monoid monoid changed the base branch from master to dev November 20, 2025 13:26
@monoid monoid marked this pull request as ready for review November 20, 2025 13:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

📝 Walkthrough

Walkthrough

Removed two public methods (peek_top_all and peek_top_iter) from FilteredScorer in lib/segment/src/index/hnsw_index/point_scorer.rs. Replaced uses of FilteredScorer with BatchFilteredSearcher across code and tests (sparse vector index and multiple vector-storage tests). Tests updated to use batch-oriented constructors and to assert single-result expectations with itertools::Itertools::exactly_one(). No other public APIs beyond the scorer→searcher changes were modified.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Inspect lib/segment/src/index/hnsw_index/point_scorer.rs to confirm removed methods have no remaining internal callers or unintended visibility/api gaps.
  • Review lib/segment/src/index/sparse_index/sparse_vector_index.rs to ensure result extraction from BatchFilteredSearcher preserves prior semantics (pop vs. single-element vector handling).
  • Check tests in vector_storage (dense/sparse/memmap) for correct usage of BatchFilteredSearcher::new/new_for_test signatures and the switch from .pop() to .into_iter().exactly_one().
  • Verify imports and Cargo manifest (itertools addition) are correct and tests compile.

Possibly related PRs

Suggested reviewers

  • generall
  • timvisee
  • xzfc

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Use batching searcher for sparse vector search' clearly summarizes the main change: replacing FilteredScorer with BatchFilteredSearcher in sparse vector indexing.
Description check ✅ Passed The PR description is related to the changeset, explaining that the changes allow removal of peek_* methods from FilteredScorer to simplify code, which aligns with the actual modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/sparse-batched-scans

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc72975 and 603a648.

📒 Files selected for processing (1)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-10-13T22:58:03.121Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-06-02T18:10:47.203Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-10-16T13:00:46.809Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 7368
File: lib/shard/src/query/mod.rs:313-341
Timestamp: 2025-10-16T13:00:46.809Z
Learning: In the Qdrant codebase, for the `Feedback` query variant in `lib/shard/src/query/mod.rs`, validation for sparse vectors should happen down the line rather than early in the `query_enum_from_grpc_raw_query` function. The `Nearest` variant is treated as a special case where early validation with better error messages is acceptable.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
🧬 Code graph analysis (1)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/index/hnsw_index/point_scorer.rs (2)
  • new_for_test (185-199)
  • new_for_test (322-352)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: e2e-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: lint
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
🔇 Additional comments (3)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)

9-9: LGTM! Import additions support the batch interface migration.

The addition of itertools::Itertools for exactly_one() and the switch from FilteredScorer to BatchFilteredSearcher correctly support the batch-oriented interface introduced in this PR.

Also applies to: 18-18


86-97: LGTM! Correct migration to batch-oriented searcher.

The test correctly migrates to BatchFilteredSearcher by passing the query vector as a slice and using exactly_one() to assert the single-result expectation. The batch size of 5 is appropriate for the expected 3 results.


181-191: LGTM! Consistent migration to batch interface.

The test correctly migrates to BatchFilteredSearcher with appropriate batch size. The split-statement style (separating results and closest) is clear and maintains the same correctness as the inline style used elsewhere.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Make the tests stricter.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)

92-96: Consider using exactly_one() for consistency.

The current pattern using let mut results, .len() assertion, and .pop() is functionally correct but could be simplified to match the pattern used elsewhere in the codebase (lines 186-190) and in the dense vector tests.

Apply this diff:

-    let mut results = searcher
+    let closest = searcher
         .peek_top_iter(&mut [0, 1, 2, 3, 4].iter().cloned(), &DEFAULT_STOPPED)
-        .unwrap();
-    assert_eq!(results.len(), 1);
-    let closest = results.pop().unwrap();
+        .unwrap()
+        .into_iter()
+        .exactly_one()
+        .unwrap();

This makes the code more concise and consistent with the pattern used at lines 186-190 of this file and throughout test_appendable_dense_vector_storage.rs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62c37e4 and dc72975.

📒 Files selected for processing (3)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (15)
📓 Common learnings
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Learnt from: generall
Repo: qdrant/qdrant PR: 7100
File: lib/segment/src/index/field_index/full_text_index/inverted_index/mutable_inverted_index.rs:82-100
Timestamp: 2025-08-20T15:03:18.522Z
Learning: The merge_postings_iterator function in lib/segment/src/index/field_index/full_text_index/inverted_index/postings_iterator.rs handles deduplication internally, so there's no need to add .dedup() when using it for OR-merge operations in inverted indexes.
Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
📚 Learning: 2025-09-01T11:19:26.371Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-10-13T22:58:03.121Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.121Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-06-02T18:10:47.203Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-10-16T13:00:46.809Z
Learnt from: coszio
Repo: qdrant/qdrant PR: 7368
File: lib/shard/src/query/mod.rs:313-341
Timestamp: 2025-10-16T13:00:46.809Z
Learning: In the Qdrant codebase, for the `Feedback` query variant in `lib/shard/src/query/mod.rs`, validation for sparse vectors should happen down the line rather than early in the `query_enum_from_grpc_raw_query` function. The `Nearest` variant is treated as a special case where early validation with better error messages is acceptable.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
Repo: qdrant/qdrant PR: 6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.

Applied to files:

  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
Repo: qdrant/qdrant PR: 7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
📚 Learning: 2025-11-14T10:01:31.926Z
Learnt from: timvisee
Repo: qdrant/qdrant PR: 7526
File: src/wal_pop.rs:7-8
Timestamp: 2025-11-14T10:01:31.926Z
Learning: In the Qdrant codebase, development binaries (like src/wal_pop.rs) don't require explicit bounds checking for command-line arguments. Rust's built-in out-of-bounds panic is acceptable for these development tools.

Applied to files:

  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
🧬 Code graph analysis (2)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/index/hnsw_index/point_scorer.rs (2)
  • new_for_test (185-199)
  • new_for_test (322-352)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (1)
lib/segment/src/index/hnsw_index/point_scorer.rs (2)
  • new (119-140)
  • new (280-314)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: lint
  • GitHub Check: test-consistency
  • GitHub Check: e2e-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: Build Qdrant Edge Python bindings
🔇 Additional comments (7)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

319-319: LGTM! Import added for test utilities.

The itertools::Itertools import is correctly placed in the test module to support the .exactly_one() usage throughout the tests.


426-427: Excellent improvement to test assertions!

The refactor from .pop() to .into_iter().exactly_one() strengthens the test assertions. The old pattern would silently pass if multiple batch results were returned (taking only the last), while .exactly_one() enforces the expected single-result constraint and provides clearer error messages on violation. This better matches the test intent where each BatchFilteredSearcher is created with a single query vector.

Also applies to: 443-444, 516-517, 545-546, 572-573, 635-636

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (3)

7-7: LGTM!

The Itertools import is correctly added to support the exactly_one() method used for extracting single batch results.


66-77: LGTM!

The migration to BatchFilteredSearcher with exactly_one() correctly adapts the test to the batch-oriented API while maintaining single-query semantics.


256-281: LGTM!

The code correctly demonstrates both FilteredScorer for raw scoring and BatchFilteredSearcher for batch search operations. The query.clone() at line 257 is necessary since the query is used in both scorer instances.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)

9-9: LGTM!

The import updates correctly reflect the migration from FilteredScorer to BatchFilteredSearcher and add necessary Itertools support.

Also applies to: 18-18


180-190: LGTM!

The BatchFilteredSearcher usage with exactly_one() correctly adapts the test to the batch-oriented API. This is the preferred pattern for extracting single batch results.

@monoid monoid merged commit b5825c1 into dev Nov 21, 2025
16 checks passed
@monoid monoid deleted the feat/sparse-batched-scans branch November 21, 2025 13:29
timvisee pushed a commit that referenced this pull request Nov 25, 2025
It allows to remove `peek_*` methods from `FilteredScorer`, simplifying
code.
@timvisee timvisee mentioned this pull request Nov 25, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 27, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants