TestRawScorerProducer to use VectorStorageEnum#6965
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/segment/src/fixtures/index_fixtures.rs (1)
51-56: Consider using a more direct approach for dense vector accessSince this test fixture is specifically for dense vectors, consider adding an assertion or using a more direct API if available.
If the vector storage enum provides a method to assert or access dense storage directly:
pub fn get_vector(&self, key: PointOffsetType) -> Cow<[VectorElementType]> { match self.vector_storage.get_vector(key) { CowVector::Dense(cow) => cow, - _ => unreachable!("Expected vector storage to be dense"), + _ => panic!("TestRawScorerProducer only supports dense vector storage"), } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
lib/segment/benches/fixture.rs(3 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_build_graph.rs(3 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(8 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(4 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(7 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(10 hunks)lib/segment/src/index/hnsw_index/links_container.rs(3 hunks)lib/segment/src/index/hnsw_index/tests/mod.rs(3 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
📚 Learning: in test code for qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified id ...
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/src/index/hnsw_index/links_container.rslib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rslib/segment/benches/hnsw_build_graph.rslib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/benches/hnsw_build_asymptotic.rslib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/benches/fixture.rslib/segment/src/index/hnsw_index/graph_layers.rslib/segment/src/index/hnsw_index/graph_layers_builder.rslib/segment/src/fixtures/index_fixtures.rslib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the...
Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:156-159
Timestamp: 2025-05-30T15:26:54.048Z
Learning: In the subgraph_connectivity method in lib/segment/src/index/hnsw_index/graph_layers_builder.rs, the parameter `q` represents the edge removal/failure probability, not the edge retention probability. The logic `if coin_flip < q { continue; }` correctly simulates edge failures for percolation-based connectivity estimation.
Applied to files:
lib/segment/src/index/hnsw_index/links_container.rslib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rslib/segment/benches/hnsw_build_graph.rslib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rslib/segment/src/index/hnsw_index/graph_layers.rslib/segment/src/index/hnsw_index/graph_layers_builder.rslib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the asyncrawscorerimpl implementation, the unwrap() call on read_vectors_async results is intenti...
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.
Applied to files:
lib/segment/src/index/hnsw_index/links_container.rslib/segment/benches/hnsw_build_graph.rslib/segment/benches/hnsw_build_asymptotic.rslib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in asyncrawscorerimpl, the unwrap() call after read_vectors_async is intentional. the io_uring featu...
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.
Applied to files:
lib/segment/src/index/hnsw_index/links_container.rslib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector a...
Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rslib/segment/benches/hnsw_build_graph.rslib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/benches/fixture.rslib/segment/src/index/hnsw_index/graph_layers.rslib/segment/src/index/hnsw_index/graph_layers_builder.rslib/segment/src/index/hnsw_index/tests/mod.rs
📚 Learning: in the qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across gpu...
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rslib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/fixtures/index_fixtures.rs
📚 Learning: in the posting_list crate's postingchunk struct, avoid adding extra fields like storing computed chu...
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs
📚 Learning: in the blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers ...
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
📚 Learning: in the bitpacking crate, the bitpacker::decompress_strictly_sorted function takes an option
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs📚 Learning: in the qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measur...
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/index/hnsw_index/gpu/mod.rs
📚 Learning: the rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_...
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Applied to files:
lib/segment/benches/hnsw_build_asymptotic.rslib/segment/src/index/hnsw_index/graph_layers_builder.rs
📚 Learning: the rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`...
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.
Applied to files:
lib/segment/benches/hnsw_build_asymptotic.rs
🧬 Code Graph Analysis (1)
lib/segment/src/index/hnsw_index/links_container.rs (2)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1)
async_raw_scorer_euclid(25-27)lib/segment/src/spaces/simple.rs (1)
EuclidMetric(31-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (windows-latest)
🔇 Additional comments (33)
lib/segment/src/index/hnsw_index/gpu/gpu_graph_builder.rs (1)
157-158: LGTM! Vector access properly refactored.The change from direct vector indexing to the
get_vector()method correctly encapsulates vector storage access, aligning with the PR objective to useVectorStorageEnum. The.to_vec()conversion handles the interface change appropriately.lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)
15-15: Import added for Distance enum.The
Distanceimport is correctly added to support the new non-generic API.
69-69: Clean migration to Distance enum API.The change from generic
CosineMetricparameter to explicitDistance::Cosineargument maintains the same functionality while using the new unified distance metric interface.lib/segment/benches/hnsw_build_graph.rs (3)
11-11: Import updated for new API.The import change correctly removes unused types (
CosineMetric,VectorOffsetType) and adds the requiredDistanceenum for the new interface.
21-21: TestRawScorerProducer construction updated correctly.The change from generic
CosineMetricparameter to explicitDistance::Cosineargument maintains the same distance metric while using the new unified API.
35-35: Vector access properly encapsulated.The change from direct vector indexing with type casting to the
get_vector()method call correctly uses the new vector storage abstraction. The.to_vec()conversion handles the interface change appropriately.lib/segment/src/index/hnsw_index/links_container.rs (3)
264-264: Import correctly updated for Distance enum.The import change from
EuclidMetrictoDistanceis consistent with the refactoring to use the unified distance metric interface.
275-276: TestRawScorerProducer construction migrated correctly.The change from generic
EuclidMetricparameter to explicitDistance::Euclidargument maintains the same Euclidean distance metric while using the new unified API.
403-403: Consistent API migration in second test.The
TestRawScorerProducerconstruction correctly usesDistance::Euclidinstead of the genericEuclidMetricparameter, maintaining consistency with the refactoring pattern.lib/segment/benches/hnsw_build_asymptotic.rs (4)
13-13: Distance import added for new API.The
Distanceimport is correctly added to support the new unified distance metric interface.
88-88: TestRawScorerProducer construction updated to use Distance enum.The change from generic
DotProductMetricparameter to explicitDistance::Dotargument maintains the same distance metric while using the new unified API.
105-105: Consistent API migration for 10x benchmark.The
TestRawScorerProducerconstruction correctly usesDistance::Dotinstead of the generic parameter, maintaining consistency with the refactoring pattern.
122-122: Consistent API migration for 50x benchmark.The
TestRawScorerProducerconstruction correctly usesDistance::Dotinstead of the generic parameter, completing the consistent refactoring across all benchmark variants.lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (5)
486-486: LGTM!The
VectorRefimport is correctly added to support the new vector insertion pattern usingVectorRef::from()wrapper.
507-507: LGTM!The removal of the generic metric parameter from
TestRawScorerProducerfield type is consistent with the broader refactor pattern.
520-521: LGTM!The
TestRawScorerProducerinstantiation correctly uses the explicitDistance::Dotparameter instead of the generic metric parameter, aligning with the refactor pattern.
530-530: LGTM!The vector insertion correctly uses
VectorRef::from(v.as_ref())wrapper, providing a unified interface for vector handling as part of the refactor.
544-544: LGTM!The vector access correctly uses the
get_vector()method instead of direct field access, providing better encapsulation and consistency with the refactoredTestRawScorerProducerinterface.Also applies to: 750-753, 808-811, 974-977
lib/segment/src/index/hnsw_index/gpu/mod.rs (3)
102-102: LGTM!The
VectorRefimport addition is correctly included to support the new vector insertion pattern.
115-115: LGTM!The removal of the generic metric parameter from the
vector_holderfield type is consistent with the refactor pattern.
129-130: LGTM!The changes correctly implement the refactor pattern:
TestRawScorerProducerinstantiation uses explicitDistance::Cosineparameter- Vector insertion uses
VectorRef::from()wrapper for unified interface- Vector access uses
get_vector()method for better encapsulationAlso applies to: 139-139, 159-159
lib/segment/benches/fixture.rs (3)
28-28: LGTM!The removal of the generic metric parameter from the return type is consistent with the
TestRawScorerProducerrefactor.
43-48: LGTM!The
TestRawScorerProducerinstantiation correctly passesMETRIC::distance()as an explicit parameter instead of using generic metric parameters, aligning with the refactor pattern.
67-67: LGTM!The vector access correctly uses the
get_vector()method instead of direct field access, providing better encapsulation consistent with the refactor.lib/segment/src/index/hnsw_index/graph_layers_builder.rs (7)
607-607: LGTM!The
Distanceimport addition is correctly included to support the refactored code that uses explicitDistanceenum values.
613-619: LGTM!The test helper function signatures correctly remove generic metric parameters and use explicit
Distanceparameters, with return types updated to match the refactoredTestRawScorerProducer. This is consistent with the broader refactor pattern.Also applies to: 660-666
633-633: LGTM!The
TestRawScorerProducerinstantiations correctly use explicitDistanceparameters (Distance::Cosine) instead of generic metric parameters, aligning with the refactor pattern.Also applies to: 674-674, 875-875
651-651: LGTM!The vector access correctly uses the
get_vector()method instead of direct field access, providing better encapsulation consistent with the refactor pattern.Also applies to: 690-690, 879-879
703-705: LGTM!The test functions correctly:
- Add necessary
VectorStorageimport for trait methods- Use explicit
Distance::Cosineparameters instead of generic metric parameters- Pass distance parameters to test helper functions
These changes align with the refactor pattern.
Also applies to: 715-715, 773-775, 783-783, 786-786
748-748: LGTM!The distance processing correctly:
- Uses
distance.preprocess_vector()for explicit distance preprocessing- Uses
CosineMetric::similarity()for explicit similarity calculationThis properly handles the transition from generic metric operations to explicit distance-specific operations.
Also applies to: 754-754, 842-842, 848-848
750-751: LGTM!The vector storage access correctly adapts to the new interface:
- Uses
vector_storage.total_vector_count()instead of direct vector count access- Uses
get_vector()method for encapsulated vector accessThis properly handles the transition to
VectorStorageEnumin the refactoredTestRawScorerProducer.Also applies to: 844-845
lib/segment/src/index/hnsw_index/tests/mod.rs (1)
14-68: LGTM!The refactoring of test fixture functions to use
Distanceenum instead of generic metric parameters is clean and consistent. The changes properly propagate the distance parameter through the call chain.lib/segment/src/fixtures/index_fixtures.rs (1)
19-64: Well-executed refactoring to use VectorStorageEnumThe refactoring successfully achieves the PR objectives by:
- Removing the generic metric parameter in favor of runtime Distance enum
- Using VectorStorageEnum::DenseVolatile variant through the helper function
- Properly encapsulating vector access through the new get_vector method
- Maintaining test compatibility while reducing boilerplate
This should indeed make it easier to extend tests with quantization functionality as mentioned in the PR objectives.
7ae8de7 to
888be0f
Compare
888be0f to
cc6834b
Compare
* TestRawScorerProducer to use VectorStorageEnum
* TestRawScorerProducer::new: accept distance as an argument
* Drop TMetric from TestRawScorerProducer
* Remove `impl VectorStorage for TestRawScorerProducer`
* Add TestRawScorerProducer::internal_scorer
* Remove TestRawScorerProducer::get_vector
* Rename TestRawScorerProducer::{get_scorer => scorer}
This PR makes
TestRawScorerProducerto useVectorStorageEnum::DenseVolatileinstead of rolling its ownimpl DenseVectorStorage for TestRawScorerProducer.It reduces some boilerplate, and, more important, this change would simplify unit testing the upcoming hnsw-graph-with-vectors feature. (particularly, having an instance of
VectorStorageEnumwould simplify extending some existing tests to use quantization).