Skip to content

TestRawScorerProducer to use VectorStorageEnum#6965

Merged
xzfc merged 7 commits intodevfrom
TestRawScorerProducer-VectorStorageEnum
Aug 6, 2025
Merged

TestRawScorerProducer to use VectorStorageEnum#6965
xzfc merged 7 commits intodevfrom
TestRawScorerProducer-VectorStorageEnum

Conversation

@xzfc
Copy link
Member

@xzfc xzfc commented Jul 31, 2025

This PR makes TestRawScorerProducer to use VectorStorageEnum::DenseVolatile instead of rolling its own impl DenseVectorStorage for TestRawScorerProducer.

It reduces some boilerplate, and, more important, this change would simplify unit testing the upcoming hnsw-graph-with-vectors feature. (particularly, having an instance of VectorStorageEnum would simplify extending some existing tests to use quantization).

@xzfc xzfc requested review from generall and timvisee July 31, 2025 13:35
@coderabbitai

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/segment/src/fixtures/index_fixtures.rs (1)

51-56: Consider using a more direct approach for dense vector access

Since this test fixture is specifically for dense vectors, consider adding an assertion or using a more direct API if available.

If the vector storage enum provides a method to assert or access dense storage directly:

 pub fn get_vector(&self, key: PointOffsetType) -> Cow<[VectorElementType]> {
     match self.vector_storage.get_vector(key) {
         CowVector::Dense(cow) => cow,
-        _ => unreachable!("Expected vector storage to be dense"),
+        _ => panic!("TestRawScorerProducer only supports dense vector storage"),
     }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e703813 and 345e315.

📒 Files selected for processing (12)
  • lib/segment/benches/fixture.rs (3 hunks)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_build_graph.rs (3 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (8 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (4 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (7 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (10 hunks)
  • lib/segment/src/index/hnsw_index/links_container.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/tests/mod.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
📚 Learning: in test code for qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified id ...
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

  • lib/segment/src/index/hnsw_index/links_container.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
  • lib/segment/benches/hnsw_build_graph.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/benches/hnsw_build_asymptotic.rs
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/segment/benches/fixture.rs
  • lib/segment/src/index/hnsw_index/graph_layers.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
  • lib/segment/src/fixtures/index_fixtures.rs
  • lib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the...
Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:156-159
Timestamp: 2025-05-30T15:26:54.048Z
Learning: In the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the parameter `q` represents the edge removal/failure probability, not the edge retention probability. The logic `if coin_flip < q { continue; }` correctly simulates edge failures for percolation-based connectivity estimation.

Applied to files:

  • lib/segment/src/index/hnsw_index/links_container.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
  • lib/segment/benches/hnsw_build_graph.rs
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
  • lib/segment/src/index/hnsw_index/graph_layers.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
  • lib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the asyncrawscorerimpl implementation, the unwrap() call on read_vectors_async results is intenti...
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.

Applied to files:

  • lib/segment/src/index/hnsw_index/links_container.rs
  • lib/segment/benches/hnsw_build_graph.rs
  • lib/segment/benches/hnsw_build_asymptotic.rs
  • lib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in asyncrawscorerimpl, the unwrap() call after read_vectors_async is intentional. the io_uring featu...
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.

Applied to files:

  • lib/segment/src/index/hnsw_index/links_container.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector a...
Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
  • lib/segment/benches/hnsw_build_graph.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/segment/benches/fixture.rs
  • lib/segment/src/index/hnsw_index/graph_layers.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
  • lib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across gpu...
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in the posting_list crate's postingchunk struct, avoid adding extra fields like storing computed chu...
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
📚 Learning: in the blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers ...
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
📚 Learning: in the bitpacking crate, the bitpacker::decompress_strictly_sorted function takes an option
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
📚 Learning: in the qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measur...
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.

Applied to files:

  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
📚 Learning: the rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_...
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.

Applied to files:

  • lib/segment/benches/hnsw_build_asymptotic.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
📚 Learning: the rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`...
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.

Applied to files:

  • lib/segment/benches/hnsw_build_asymptotic.rs
🧬 Code Graph Analysis (1)
lib/segment/src/index/hnsw_index/links_container.rs (2)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1)
  • async_raw_scorer_euclid (25-27)
lib/segment/src/spaces/simple.rs (1)
  • EuclidMetric (31-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: e2e-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (33)
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs (1)

157-158: LGTM! Vector access properly refactored.

The change from direct vector indexing to the get_vector() method correctly encapsulates vector storage access, aligning with the PR objective to use VectorStorageEnum. The .to_vec() conversion handles the interface change appropriately.

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)

15-15: Import added for Distance enum.

The Distance import is correctly added to support the new non-generic API.


69-69: Clean migration to Distance enum API.

The change from generic CosineMetric parameter to explicit Distance::Cosine argument maintains the same functionality while using the new unified distance metric interface.

lib/segment/benches/hnsw_build_graph.rs (3)

11-11: Import updated for new API.

The import change correctly removes unused types (CosineMetric, VectorOffsetType) and adds the required Distance enum for the new interface.


21-21: TestRawScorerProducer construction updated correctly.

The change from generic CosineMetric parameter to explicit Distance::Cosine argument maintains the same distance metric while using the new unified API.


35-35: Vector access properly encapsulated.

The change from direct vector indexing with type casting to the get_vector() method call correctly uses the new vector storage abstraction. The .to_vec() conversion handles the interface change appropriately.

lib/segment/src/index/hnsw_index/links_container.rs (3)

264-264: Import correctly updated for Distance enum.

The import change from EuclidMetric to Distance is consistent with the refactoring to use the unified distance metric interface.


275-276: TestRawScorerProducer construction migrated correctly.

The change from generic EuclidMetric parameter to explicit Distance::Euclid argument maintains the same Euclidean distance metric while using the new unified API.


403-403: Consistent API migration in second test.

The TestRawScorerProducer construction correctly uses Distance::Euclid instead of the generic EuclidMetric parameter, maintaining consistency with the refactoring pattern.

lib/segment/benches/hnsw_build_asymptotic.rs (4)

13-13: Distance import added for new API.

The Distance import is correctly added to support the new unified distance metric interface.


88-88: TestRawScorerProducer construction updated to use Distance enum.

The change from generic DotProductMetric parameter to explicit Distance::Dot argument maintains the same distance metric while using the new unified API.


105-105: Consistent API migration for 10x benchmark.

The TestRawScorerProducer construction correctly uses Distance::Dot instead of the generic parameter, maintaining consistency with the refactoring pattern.


122-122: Consistent API migration for 50x benchmark.

The TestRawScorerProducer construction correctly uses Distance::Dot instead of the generic parameter, completing the consistent refactoring across all benchmark variants.

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (5)

486-486: LGTM!

The VectorRef import is correctly added to support the new vector insertion pattern using VectorRef::from() wrapper.


507-507: LGTM!

The removal of the generic metric parameter from TestRawScorerProducer field type is consistent with the broader refactor pattern.


520-521: LGTM!

The TestRawScorerProducer instantiation correctly uses the explicit Distance::Dot parameter instead of the generic metric parameter, aligning with the refactor pattern.


530-530: LGTM!

The vector insertion correctly uses VectorRef::from(v.as_ref()) wrapper, providing a unified interface for vector handling as part of the refactor.


544-544: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation and consistency with the refactored TestRawScorerProducer interface.

Also applies to: 750-753, 808-811, 974-977

lib/segment/src/index/hnsw_index/gpu/mod.rs (3)

102-102: LGTM!

The VectorRef import addition is correctly included to support the new vector insertion pattern.


115-115: LGTM!

The removal of the generic metric parameter from the vector_holder field type is consistent with the refactor pattern.


129-130: LGTM!

The changes correctly implement the refactor pattern:

  • TestRawScorerProducer instantiation uses explicit Distance::Cosine parameter
  • Vector insertion uses VectorRef::from() wrapper for unified interface
  • Vector access uses get_vector() method for better encapsulation

Also applies to: 139-139, 159-159

lib/segment/benches/fixture.rs (3)

28-28: LGTM!

The removal of the generic metric parameter from the return type is consistent with the TestRawScorerProducer refactor.


43-48: LGTM!

The TestRawScorerProducer instantiation correctly passes METRIC::distance() as an explicit parameter instead of using generic metric parameters, aligning with the refactor pattern.


67-67: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation consistent with the refactor.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (7)

607-607: LGTM!

The Distance import addition is correctly included to support the refactored code that uses explicit Distance enum values.


613-619: LGTM!

The test helper function signatures correctly remove generic metric parameters and use explicit Distance parameters, with return types updated to match the refactored TestRawScorerProducer. This is consistent with the broader refactor pattern.

Also applies to: 660-666


633-633: LGTM!

The TestRawScorerProducer instantiations correctly use explicit Distance parameters (Distance::Cosine) instead of generic metric parameters, aligning with the refactor pattern.

Also applies to: 674-674, 875-875


651-651: LGTM!

The vector access correctly uses the get_vector() method instead of direct field access, providing better encapsulation consistent with the refactor pattern.

Also applies to: 690-690, 879-879


703-705: LGTM!

The test functions correctly:

  • Add necessary VectorStorage import for trait methods
  • Use explicit Distance::Cosine parameters instead of generic metric parameters
  • Pass distance parameters to test helper functions

These changes align with the refactor pattern.

Also applies to: 715-715, 773-775, 783-783, 786-786


748-748: LGTM!

The distance processing correctly:

  • Uses distance.preprocess_vector() for explicit distance preprocessing
  • Uses CosineMetric::similarity() for explicit similarity calculation

This properly handles the transition from generic metric operations to explicit distance-specific operations.

Also applies to: 754-754, 842-842, 848-848


750-751: LGTM!

The vector storage access correctly adapts to the new interface:

  • Uses vector_storage.total_vector_count() instead of direct vector count access
  • Uses get_vector() method for encapsulated vector access

This properly handles the transition to VectorStorageEnum in the refactored TestRawScorerProducer.

Also applies to: 844-845

lib/segment/src/index/hnsw_index/tests/mod.rs (1)

14-68: LGTM!

The refactoring of test fixture functions to use Distance enum instead of generic metric parameters is clean and consistent. The changes properly propagate the distance parameter through the call chain.

lib/segment/src/fixtures/index_fixtures.rs (1)

19-64: Well-executed refactoring to use VectorStorageEnum

The refactoring successfully achieves the PR objectives by:

  • Removing the generic metric parameter in favor of runtime Distance enum
  • Using VectorStorageEnum::DenseVolatile variant through the helper function
  • Properly encapsulating vector access through the new get_vector method
  • Maintaining test compatibility while reducing boilerplate

This should indeed make it easier to extend tests with quantization functionality as mentioned in the PR objectives.

@xzfc xzfc force-pushed the TestRawScorerProducer-VectorStorageEnum branch 3 times, most recently from 7ae8de7 to 888be0f Compare August 1, 2025 15:20
@xzfc xzfc force-pushed the TestRawScorerProducer-VectorStorageEnum branch from 888be0f to cc6834b Compare August 5, 2025 17:13
@xzfc xzfc merged commit 542f03b into dev Aug 6, 2025
16 checks passed
@xzfc xzfc deleted the TestRawScorerProducer-VectorStorageEnum branch August 6, 2025 09:41
timvisee pushed a commit that referenced this pull request Aug 11, 2025
* TestRawScorerProducer to use VectorStorageEnum

* TestRawScorerProducer::new: accept distance as an argument

* Drop TMetric from TestRawScorerProducer

* Remove `impl VectorStorage for TestRawScorerProducer`

* Add TestRawScorerProducer::internal_scorer

* Remove TestRawScorerProducer::get_vector

* Rename TestRawScorerProducer::{get_scorer => scorer}
@coderabbitai coderabbitai bot mentioned this pull request Nov 14, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants