TestRawScorerProducer to use VectorStorageEnum by xzfc · Pull Request #6965 · qdrant/qdrant

xzfc · 2025-07-31T13:35:55Z

This PR makes TestRawScorerProducer to use VectorStorageEnum::DenseVolatile instead of rolling its own impl DenseVectorStorage for TestRawScorerProducer.

It reduces some boilerplate, and, more important, this change would simplify unit testing the upcoming hnsw-graph-with-vectors feature. (particularly, having an instance of VectorStorageEnum would simplify extending some existing tests to use quantization).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/segment/src/fixtures/index_fixtures.rs (1)
51-56: Consider using a more direct approach for dense vector access

Since this test fixture is specifically for dense vectors, consider adding an assertion or using a more direct API if available.

If the vector storage enum provides a method to assert or access dense storage directly:
 pub fn get_vector(&self, key: PointOffsetType) -> Cow<[VectorElementType]> {
     match self.vector_storage.get_vector(key) {
         CowVector::Dense(cow) => cow,
-        _ => unreachable!("Expected vector storage to be dense"),
+        _ => panic!("TestRawScorerProducer only supports dense vector storage"),
     }
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e703813 and 345e315.

📒 Files selected for processing (12)

lib/segment/benches/fixture.rs (3 hunks)
lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
lib/segment/benches/hnsw_build_graph.rs (3 hunks)
lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (8 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (4 hunks)
lib/segment/src/index/hnsw_index/graph_layers.rs (7 hunks)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (10 hunks)
lib/segment/src/index/hnsw_index/links_container.rs (3 hunks)
lib/segment/src/index/hnsw_index/tests/mod.rs (3 hunks)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (13)

📓 Common learnings

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

📚 Learning: in test code for qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified id ...

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/index/hnsw_index/links_container.rs
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
lib/segment/benches/hnsw_build_graph.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/benches/hnsw_build_asymptotic.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/benches/fixture.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/mod.rs

📚 Learning: in the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the...

Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:156-159
Timestamp: 2025-05-30T15:26:54.048Z
Learning: In the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the parameter `q` represents the edge removal/failure probability, not the edge retention probability. The logic `if coin_flip < q { continue; }` correctly simulates edge failures for percolation-based connectivity estimation.

Applied to files:

lib/segment/src/index/hnsw_index/links_container.rs
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
lib/segment/benches/hnsw_build_graph.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/tests/mod.rs

📚 Learning: in the asyncrawscorerimpl implementation, the unwrap() call on read_vectors_async results is intenti...

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.

Applied to files:

lib/segment/src/index/hnsw_index/links_container.rs
lib/segment/benches/hnsw_build_graph.rs
lib/segment/benches/hnsw_build_asymptotic.rs
lib/segment/src/fixtures/index_fixtures.rs

📚 Learning: in asyncrawscorerimpl, the unwrap() call after read_vectors_async is intentional. the io_uring featu...

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.

Applied to files:

lib/segment/src/index/hnsw_index/links_container.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/fixtures/index_fixtures.rs

📚 Learning: in the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector a...

Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
lib/segment/benches/hnsw_build_graph.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/benches/fixture.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/tests/mod.rs

📚 Learning: in the qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across gpu...

Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/fixtures/index_fixtures.rs

📚 Learning: in the posting_list crate's postingchunk struct, avoid adding extra fields like storing computed chu...

Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs

📚 Learning: in the blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers ...

Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs

📚 Learning: in the bitpacking crate, the bitpacker::decompress_strictly_sorted function takes an option

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs

📚 Learning: in the qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measur...

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs

📚 Learning: the rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_...

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.

Applied to files:

lib/segment/benches/hnsw_build_asymptotic.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs

📚 Learning: the rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`...

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.

Applied to files:

lib/segment/benches/hnsw_build_asymptotic.rs

🧬 Code Graph Analysis (1)

lib/segment/src/index/hnsw_index/links_container.rs (2)

lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1)

async_raw_scorer_euclid (25-27)

lib/segment/src/spaces/simple.rs (1)

EuclidMetric (31-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: test-consensus-compose
GitHub Check: rust-tests (macos-latest)
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (windows-latest)

🔇 Additional comments (33)

lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs (1)

157-158: LGTM! Vector access properly refactored.

The change from direct vector indexing to the get_vector() method correctly encapsulates vector storage access, aligning with the PR objective to use VectorStorageEnum. The .to_vec() conversion handles the interface change appropriately.

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)

15-15: Import added for Distance enum.

The Distance import is correctly added to support the new non-generic API.

69-69: Clean migration to Distance enum API.

The change from generic CosineMetric parameter to explicit Distance::Cosine argument maintains the same functionality while using the new unified distance metric interface.

lib/segment/benches/hnsw_build_graph.rs (3)

11-11: Import updated for new API.

The import change correctly removes unused types (CosineMetric, VectorOffsetType) and adds the required Distance enum for the new interface.

21-21: TestRawScorerProducer construction updated correctly.

The change from generic CosineMetric parameter to explicit Distance::Cosine argument maintains the same distance metric while using the new unified API.

35-35: Vector access properly encapsulated.

The change from direct vector indexing with type casting to the get_vector() method call correctly uses the new vector storage abstraction. The .to_vec() conversion handles the interface change appropriately.

lib/segment/src/index/hnsw_index/links_container.rs (3)

264-264: Import correctly updated for Distance enum.

The import change from EuclidMetric to Distance is consistent with the refactoring to use the unified distance metric interface.

275-276: TestRawScorerProducer construction migrated correctly.

The change from generic EuclidMetric parameter to explicit Distance::Euclid argument maintains the same Euclidean distance metric while using the new unified API.

403-403: Consistent API migration in second test.

The TestRawScorerProducer construction correctly uses Distance::Euclid instead of the generic EuclidMetric parameter, maintaining consistency with the refactoring pattern.

lib/segment/benches/hnsw_build_asymptotic.rs (4)

13-13: Distance import added for new API.

The Distance import is correctly added to support the new unified distance metric interface.

88-88: TestRawScorerProducer construction updated to use Distance enum.

The change from generic DotProductMetric parameter to explicit Distance::Dot argument maintains the same distance metric while using the new unified API.

105-105: Consistent API migration for 10x benchmark.

The TestRawScorerProducer construction correctly uses Distance::Dot instead of the generic parameter, maintaining consistency with the refactoring pattern.

122-122: Consistent API migration for 50x benchmark.

The TestRawScorerProducer construction correctly uses Distance::Dot instead of the generic parameter, completing the consistent refactoring across all benchmark variants.

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (5)

486-486: LGTM!

The VectorRef import is correctly added to support the new vector insertion pattern using VectorRef::from() wrapper.

507-507: LGTM!

The removal of the generic metric parameter from TestRawScorerProducer field type is consistent with the broader refactor pattern.

520-521: LGTM!

The TestRawScorerProducer instantiation correctly uses the explicit Distance::Dot parameter instead of the generic metric parameter, aligning with the refactor pattern.

530-530: LGTM!

The vector insertion correctly uses VectorRef::from(v.as_ref()) wrapper, providing a unified interface for vector handling as part of the refactor.

544-544: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation and consistency with the refactored TestRawScorerProducer interface.

Also applies to: 750-753, 808-811, 974-977

lib/segment/src/index/hnsw_index/gpu/mod.rs (3)

102-102: LGTM!

The VectorRef import addition is correctly included to support the new vector insertion pattern.

115-115: LGTM!

The removal of the generic metric parameter from the vector_holder field type is consistent with the refactor pattern.

129-130: LGTM!

The changes correctly implement the refactor pattern:

TestRawScorerProducer instantiation uses explicit Distance::Cosine parameter

Vector insertion uses VectorRef::from() wrapper for unified interface

Vector access uses get_vector() method for better encapsulation

Also applies to: 139-139, 159-159

lib/segment/benches/fixture.rs (3)

28-28: LGTM!

The removal of the generic metric parameter from the return type is consistent with the TestRawScorerProducer refactor.

43-48: LGTM!

The TestRawScorerProducer instantiation correctly passes METRIC::distance() as an explicit parameter instead of using generic metric parameters, aligning with the refactor pattern.

67-67: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation consistent with the refactor.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (7)

607-607: LGTM!

The Distance import addition is correctly included to support the refactored code that uses explicit Distance enum values.

613-619: LGTM!

The test helper function signatures correctly remove generic metric parameters and use explicit Distance parameters, with return types updated to match the refactored TestRawScorerProducer. This is consistent with the broader refactor pattern.

Also applies to: 660-666

633-633: LGTM!

The TestRawScorerProducer instantiations correctly use explicit Distance parameters (Distance::Cosine) instead of generic metric parameters, aligning with the refactor pattern.

Also applies to: 674-674, 875-875

651-651: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation consistent with the refactor pattern.

Also applies to: 690-690, 879-879

703-705: LGTM!

The test functions correctly:

Add necessary VectorStorage import for trait methods

Use explicit Distance::Cosine parameters instead of generic metric parameters

Pass distance parameters to test helper functions

These changes align with the refactor pattern.

Also applies to: 715-715, 773-775, 783-783, 786-786

748-748: LGTM!

The distance processing correctly:

Uses distance.preprocess_vector() for explicit distance preprocessing

Uses CosineMetric::similarity() for explicit similarity calculation

This properly handles the transition from generic metric operations to explicit distance-specific operations.

Also applies to: 754-754, 842-842, 848-848

750-751: LGTM!

The vector storage access correctly adapts to the new interface:

Uses vector_storage.total_vector_count() instead of direct vector count access

Uses get_vector() method for encapsulated vector access

This properly handles the transition to VectorStorageEnum in the refactored TestRawScorerProducer.

Also applies to: 844-845

lib/segment/src/index/hnsw_index/tests/mod.rs (1)

14-68: LGTM!

The refactoring of test fixture functions to use Distance enum instead of generic metric parameters is clean and consistent. The changes properly propagate the distance parameter through the call chain.

lib/segment/src/fixtures/index_fixtures.rs (1)

19-64: Well-executed refactoring to use VectorStorageEnum

The refactoring successfully achieves the PR objectives by:

Removing the generic metric parameter in favor of runtime Distance enum

Using VectorStorageEnum::DenseVolatile variant through the helper function

Properly encapsulating vector access through the new get_vector method

Maintaining test compatibility while reducing boilerplate

This should indeed make it easier to extend tests with quantization functionality as mentioned in the PR objectives.

lib/segment/src/index/hnsw_index/graph_layers.rs

lib/segment/src/index/hnsw_index/graph_layers_builder.rs

* TestRawScorerProducer to use VectorStorageEnum * TestRawScorerProducer::new: accept distance as an argument * Drop TMetric from TestRawScorerProducer * Remove `impl VectorStorage for TestRawScorerProducer` * Add TestRawScorerProducer::internal_scorer * Remove TestRawScorerProducer::get_vector * Rename TestRawScorerProducer::{get_scorer => scorer}

xzfc requested review from generall and timvisee July 31, 2025 13:35

This comment was marked as resolved.

Sign in to view

coderabbitai bot reviewed Jul 31, 2025

View reviewed changes

lib/segment/src/index/hnsw_index/graph_layers.rs Outdated Show resolved Hide resolved

xzfc force-pushed the TestRawScorerProducer-VectorStorageEnum branch 3 times, most recently from 7ae8de7 to 888be0f Compare August 1, 2025 15:20

timvisee reviewed Aug 5, 2025

View reviewed changes

lib/segment/src/index/hnsw_index/graph_layers_builder.rs Show resolved Hide resolved

xzfc added 7 commits August 5, 2025 17:11

TestRawScorerProducer to use VectorStorageEnum

96f8b04

TestRawScorerProducer::new: accept distance as an argument

86e548c

Drop TMetric from TestRawScorerProducer

1836827

Remove impl VectorStorage for TestRawScorerProducer

bfc3493

Add TestRawScorerProducer::internal_scorer

7e0fb67

Remove TestRawScorerProducer::get_vector

7f775d1

Rename TestRawScorerProducer::{get_scorer => scorer}

cc6834b

xzfc force-pushed the TestRawScorerProducer-VectorStorageEnum branch from 888be0f to cc6834b Compare August 5, 2025 17:13

xzfc mentioned this pull request Aug 6, 2025

Implement GraphLinksFormat::CompressedWithVectors #6982

Merged

timvisee approved these changes Aug 6, 2025

View reviewed changes

xzfc merged commit 542f03b into dev Aug 6, 2025
16 checks passed

xzfc deleted the TestRawScorerProducer-VectorStorageEnum branch August 6, 2025 09:41

coderabbitai bot mentioned this pull request Nov 14, 2025

Batched scans for dense vectors #7514

Merged

9 tasks

coderabbitai bot mentioned this pull request Nov 21, 2025

Use batching searcher for sparse vector search #7572

Merged

9 tasks

coderabbitai bot mentioned this pull request Dec 19, 2025

Flaky test payload_index_test::test_read_operations #7810

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestRawScorerProducer to use VectorStorageEnum#6965

TestRawScorerProducer to use VectorStorageEnum#6965
xzfc merged 7 commits intodevfrom
TestRawScorerProducer-VectorStorageEnum

xzfc commented Jul 31, 2025

Uh oh!

This comment was marked as resolved.

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xzfc commented Jul 31, 2025

Uh oh!

This comment was marked as resolved.

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants