Conversation
📝 WalkthroughWalkthroughThe diff genericizes vector/value access around an AccessPattern selector: adds an AccessPattern trait with marker types Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
36fdae4 to
12a7ee8
Compare
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/gridstore/src/gridstore.rs (1)
238-263: Updateget_valuecall sites and alias boolean consts for clarity
- Critical: In
lib/gridstore/src/gridstore.rs, addthen update every existingpub const READ_RANDOM: bool = false; pub const READ_SEQUENTIAL: bool = true;.get_value()invocation flagged by the verification script (acrosslib/shard,lib/segment,lib/collection, etc.) to include the const generic—e.g.orstorage.get_value::<{READ_RANDOM}>(point, &hw_counter)to resolve compilation errors.storage.get_value::<{READ_SEQUENTIAL}>(point, &hw_counter)- Optional refactor: Replace the bool const generic with a trait-bound API—e.g.
for improved readability and consistency.pub fn get_value<P: AccessPattern>(&self, …) -> Option<V> { let raw = self.read_from_pages::<{P::SEQUENTIAL}>(…); … }- Optional refactor: Prevent panics on LZ4 corruption by propagating decompression errors (e.g. return a
Resultor map failures toNonewith logging) instead of unwrapping insidedecompress.lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)
3-3: Fix missing import for size_of.This file calls size_of::() but only imports MaybeUninit; this won’t compile.
Apply:
-use std::mem::MaybeUninit; +use std::mem::{size_of, MaybeUninit};
🧹 Nitpick comments (33)
lib/gridstore/src/gridstore.rs (2)
717-748: Add at least one test path that exercises sequential reads via the new API.Most reads in tests use
::<false>. To cover the new public API’s sequential variant, add a small readback using::<true>(or a named const) in an already sequential access loop (e.g., right after initial inserts intest_put_payload). This guards the::<true>specialization and future refactors ofPage::read_value::<true>.
238-263: Cheap safeguard against panic on corrupt payloads.While broader error-plumbing is bigger in scope, you can non-invasively replace
unwrap()in the LZ4 path withok()?and returnNonefromget_valueon decompression failure, preserving theOption<V>API. Sketch:- fn decompress_lz4(value: &[u8]) -> Vec<u8> { - lz4_flex::decompress_size_prepended(value).unwrap() - } + fn decompress_lz4(value: &[u8]) -> Option<Vec<u8>> { + lz4_flex::decompress_size_prepended(value).ok() + } @@ - fn decompress(&self, value: Vec<u8>) -> Vec<u8> { + fn decompress(&self, value: Vec<u8>) -> Option<Vec<u8>> { match self.config.compression { - Compression::None => value, - Compression::LZ4 => decompress_lz4(&value), + Compression::None => Some(value), + Compression::LZ4 => decompress_lz4(&value), } } @@ - let decompressed = self.decompress(raw); - let value = V::from_bytes(&decompressed); - Some(value) + let decompressed = self.decompress(raw)?; + Some(V::from_bytes(&decompressed))Repeat the same
Optionhandling indelete_value,iter, andfor_each_unfiltered. This avoids crashes on rare corruption without widening the public API.lib/segment/src/segment_constructor/batched_reader.rs (1)
83-91: Make Sequential access actually sequential within a segmentCurrently points per segment aren’t ordered by internal_id, so IO may still be scattered. Sort per-segment to maximize locality while preserving buffer positions.
- for (segment_index, points) in self.seg_to_points_buffer.drain() { + for (segment_index, mut points) in self.seg_to_points_buffer.drain() { + points.sort_unstable_by_key(|(pd, _)| pd.internal_id); let source_vector_storage = &self.source_vector_storages[segment_index.get() as usize]; for (point_data, offset_in_batch) in points { let vec = source_vector_storage.get_vector::<Sequential>(point_data.internal_id); let vector_deleted = source_vector_storage.is_deleted_vector(point_data.internal_id); self.buffer[offset_in_batch] = (vec, vector_deleted); } }lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)
76-86: Confirm batch path doesn’t need AccessPattern.score_stored_batch uses get_dense_batch(..) without an AccessPattern. If batch reads can benefit from Sequential, consider a P: AccessPattern parameter or document that batching already optimizes for locality internally.
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (1)
33-35: Redundant #[cfg(test)] on an integration test import.This file only builds under tests; the guard is unnecessary and slightly obscures intent. Optional cleanup: drop #[cfg(test)] here.
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
87-87: Makeget_dense_batchgeneric over the access pattern
Currently,get_dense_batchisn’t generic, so you can’t callget_dense_batch::<Random>. Add a<Pattern>type parameter to its signature invector_storage_base.rs,appendable_dense_vector_storage.rs, andmemmap_dense_vector_storage.rs, then update the call atcustom_query_scorer.rs:87toget_dense_batch::<Random>(…).lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)
97-97: Optional: replace the per‐indexget_multi::<Random>(idx)calls with a single genericget_batch_multi::<Random>(&keys)invocation to leverage the batch API.lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
17-17: UseSequentialhere for strictly linear iteration over IDs.This loop walks
0..total_vector_count(); hintingSequentiallets storages optimize prefetching and amortize per-call overhead.Apply:
- use crate::vector_storage::{MultiVectorStorage, Random}; + use crate::vector_storage::{MultiVectorStorage, Sequential}; @@ - .get_multi::<Random>(id as PointOffsetType) + .get_multi::<Sequential>(id as PointOffsetType)Also applies to: 92-94
lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
1054-1054: Deduplicate per-functionuse Sequentialimports (optional).Move
use crate::vector_storage::Sequential;to module scope under the same#[cfg(feature = "rocksdb")]to avoid repetition.Apply outside-range module import:
+#[cfg(feature = "rocksdb")] +use crate::vector_storage::Sequential;Then remove in-function imports:
- use crate::vector_storage::Sequential;Also applies to: 1143-1143, 1281-1281
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)
596-601: Use Sequential when only counting multivector lengths.These loops walk ids monotonically to sum vectors_count(). Hinting Sequential lets MMAP backends leverage readahead with zero downside.
Apply:
- .get_multi::<Random>(id as PointOffsetType) + .get_multi::<Sequential>(id as PointOffsetType)in:
- new_multi_f32: lines around 596–603
- new_multi_f16: lines around 629–633
- new_multi: lines around 656–660
Also applies to: 629-633, 656-660
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
622-624: Prefer Sequential for multivector offsets scan.Offsets are computed via 0..vectors_count; Sequential improves locality when reading metadata.
- .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType) + .map(|idx| vector_storage.get_multi::<Sequential>(idx).vectors_count() as PointOffsetType)lib/collection/src/collection/mmr/lazy_matrix.rs (1)
36-36: Avoid truncation: use fallible conversion for index.Prevent silent usize→u32 truncation in debug runs.
- let stored_vector = storage.get_vector::<Random>(i as u32); + let stored_vector = + storage.get_vector::<Random>(u32::try_from(i).expect("index overflow"));lib/segment/src/segment/segment_ops.rs (1)
381-388: Defensive read: prefer get_vector_opt to avoid panic on rare races/corruption.You bounded-checked, but using
_opt(...).expect(...)makes intent explicit and safer against unexpected backends.- let vector = vector_storage.get_vector::<Random>(point_offset); + let vector = vector_storage + .get_vector_opt::<Random>(point_offset) + .expect("vector missing despite prior bounds check"); if vector_storage.is_on_disk() { hw_counter .vector_io_read() .incr_delta(vector.estimate_size_in_bytes()); }lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
321-325: Use Sequential for linear scans in tests.These loops read vectors in ascending order; Sequential hints can improve read-ahead on mmap backends.
-use crate::vector_storage::{ - DEFAULT_STOPPED, MultiVectorStorage, Random, VectorStorage, VectorStorageEnum, -}; +use crate::vector_storage::{ + DEFAULT_STOPPED, MultiVectorStorage, Random, Sequential, VectorStorage, VectorStorageEnum, +}; @@ - assert!( - storage - .get_vector_opt::<Random>(id as PointOffsetType) - .is_some() - ); + assert!( + storage + .get_vector_opt::<Sequential>(id as PointOffsetType) + .is_some() + ); @@ - assert!( - storage - .get_vector_opt::<Random>(id as PointOffsetType) - .is_some() - ); + assert!( + storage + .get_vector_opt::<Sequential>(id as PointOffsetType) + .is_some() + ); @@ - assert!( - storage - .get_vector_opt::<Random>(id as PointOffsetType) - .is_some() - ); + assert!( + storage + .get_vector_opt::<Sequential>(id as PointOffsetType) + .is_some() + ); @@ - assert!( - storage - .get_vector_opt::<Random>(id as PointOffsetType) - .is_some() - ); + assert!( + storage + .get_vector_opt::<Sequential>(id as PointOffsetType) + .is_some() + );Also applies to: 353-357, 400-404, 421-425
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
88-92: Micro: add inline and unify expect message.Tiny perf nit; message parity with get_vector.
- fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { + #[inline(always)] + fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { self.vectors .get::<P>(key as VectorOffsetType) - .expect("mmap vector not found") + .expect("vector not found") }lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
45-47: Hot path: consider force-inlining.These getters are called frequently.
- #[inline] + #[inline(always)] fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]> { self.mmap_storage.get::<P>(key) }
96-98: Hot path: consider force-inlining.Same rationale for batched contiguous reads.
- #[inline] + #[inline(always)] fn get_many<P: AccessPattern>(&self, key: VectorOffsetType, count: usize) -> Option<&[T]> { self.mmap_storage.get_many::<P>(key, count) }lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
133-141: Add alignment guard around unsafe transmute.This function assumes
offsetis aligned forT. Add a debug assertion to prevent UB ifTever changes (e.g., non-4-byte alignment).fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] { let mmap = if P::IS_SEQUENTIAL { &self.mmap_sequential } else { &self.mmap }; - let byte_slice = &mmap[offset..(offset + self.raw_size())]; - let arr: &[T] = unsafe { transmute(byte_slice) }; + let byte_slice = &mmap[offset..(offset + self.raw_size())]; + debug_assert_eq!(offset % std::mem::align_of::<T>(), 0, "unaligned offset for T"); + let arr: &[T] = unsafe { transmute(byte_slice) }; &arr[0..self.dim] }lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)
83-85: get_dense: P is unused; acceptable but may monomorphize twice.
Optional: delegate to a private non-generic
_get_denseto avoid codegen duplication acrossRandom/Sequential.- fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { - self.vectors.get(key as VectorOffsetType) - } + fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { + self._get_dense(key) + } + +impl<T: PrimitiveVectorElement> VolatileDenseVectorStorage<T> { + #[inline] + fn _get_dense(&self, key: PointOffsetType) -> &[T] { + self.vectors.get(key as VectorOffsetType) + } +}
105-115: get_vector/get_vector_opt
: same P-unaware semantics; fine.
Note: identical codegen for Random/Sequential; consider the same private-delegate pattern if code size matters.
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)
236-239: Clarify that P is intentionally ignored for in-RAM reads.Add a short doc to avoid confusion for future readers.
impl<T: PrimitiveVectorElement> DenseVectorStorage<T> for SimpleDenseVectorStorage<T> { fn vector_dim(&self) -> usize { self.dim } - fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { + /// In-memory storage; AccessPattern `P` does not change behavior. + fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] { self.vectors.get(key as VectorOffsetType) } }
420-422: Add a Random-path assertion for parity.Since RAM storage ignores access pattern, validate both markers.
- assert_eq!( - new_storage.get_vector::<Sequential>(internal_id), - CowVector::from(point), - ); + let got_seq = new_storage.get_vector::<Sequential>(internal_id); + assert_eq!(got_seq, CowVector::from(point)); + let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id); + assert_eq!(got_rand, got_seq);lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (1)
189-204: No sequential optimizations in RAM: consider an explicit note.- ) -> Option<TypedMultiDenseVectorRef<'_, T>> { - // No sequential optimizations available for in memory storage. + ) -> Option<TypedMultiDenseVectorRef<'_, T>> { + // In-memory storage; AccessPattern `P` does not affect behavior.lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)
343-358: Clarify AccessPattern irrelevance for RAM.- ) -> Option<TypedMultiDenseVectorRef<'_, T>> { - // No sequential optimizations available for in memory storage. + ) -> Option<TypedMultiDenseVectorRef<'_, T>> { + // In-memory storage; AccessPattern `P` does not affect behavior.
572-574: Consider adding Random-path parity assertion.- assert_eq!( - new_storage.get_vector::<Sequential>(internal_id), - CowVector::from(&multivec), - ); + let got_seq = new_storage.get_vector::<Sequential>(internal_id); + assert_eq!(got_seq, CowVector::from(&multivec)); + let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id); + assert_eq!(got_rand, got_seq);lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)
150-159: Doc nit: method isn’t “already in memory.”This storage uses RocksDB; wording can confuse. Suggest neutral wording.
- fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> { - // Already in memory, so no sequential optimizations available. + fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> { + // No access-pattern optimizations available for this backend.
357-359: Optional: assert Random parity for non-deleted cases.- assert_eq!( - new_storage.get_vector::<Sequential>(internal_id), - CowVector::from(vector), - ); + let got_seq = new_storage.get_vector::<Sequential>(internal_id); + assert_eq!(got_seq, CowVector::from(vector)); + let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id); + assert_eq!(got_rand, got_seq);lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
415-416: Add Sequential-path test to cover both branches.- let result_vector = storage.get_vector::<Random>(0); + let result_vector = storage.get_vector::<Random>(0); + let result_vector_seq = storage.get_vector::<crate::vector_storage::Sequential>(0); + assert_eq!(result_vector_seq, result_vector);lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
134-148: Add a debug assertion to catch offset corruption early.While keys iterate over 0..len(), a lightweight bound check helps catch rare offset/count corruption before indexing vectors.
let mmap_offset = self .offsets .get::<Sequential>(key as VectorOffsetType) .unwrap() .first() .unwrap(); + debug_assert!( + (mmap_offset.offset as usize + mmap_offset.count as usize) <= self.vectors.len(), + "multivector offset range is out of bounds" + ); (0..mmap_offset.count).map(|i| { self.vectors .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType) .unwrap() })lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
194-201: Avoid unnecessary read in delete_vector().update_stored() ignores the provided vector when deleted == true (elements_added guarded by !deleted). Skip the extra lookup/clone to reduce overhead.
- if is_deleted { - let old_vector = self.get_sparse_opt::<Random>(key).ok().flatten(); - self.update_stored(key, true, old_vector.as_ref()); - } + if is_deleted { + self.update_stored(key, true, None); + }lib/segment/src/vector_storage/vector_storage_base.rs (3)
40-56: Seal AccessPattern to prevent external impls.Limiting implementers to Random/Sequential avoids misuse and future semver hazards while keeping call sites generic.
-pub trait AccessPattern: Copy { +mod __sealed { + pub trait Sealed {} +} + +pub trait AccessPattern: __sealed::Sealed + Copy { const IS_SEQUENTIAL: bool; } #[derive(Copy, Clone)] pub struct Random; #[derive(Copy, Clone)] pub struct Sequential; -impl AccessPattern for Random { +impl __sealed::Sealed for Random {} +impl __sealed::Sealed for Sequential {} + +impl AccessPattern for Random { const IS_SEQUENTIAL: bool = false; } -impl AccessPattern for Sequential { +impl AccessPattern for Sequential { const IS_SEQUENTIAL: bool = true; }
158-164: Prefer Sequential in get_dense_batch() for consecutive keys.The doc says keys are consecutive; sequential hint should yield better OS readahead on mmap-backed stores.
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
191-195: Prefer Sequential in get_batch_multi() for consecutive keys.Same rationale as dense batch.
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (49)
lib/collection/src/collection/mmr/lazy_matrix.rs(2 hunks)lib/gridstore/benches/bustle_bench/payload_storage.rs(1 hunks)lib/gridstore/benches/random_data_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/gridstore.rs(16 hunks)lib/segment/benches/sparse_vector_storage.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(7 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers_healer.rs(2 hunks)lib/segment/src/index/hnsw_index/point_scorer.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(5 hunks)lib/segment/src/segment/segment_ops.rs(3 hunks)lib/segment/src/segment_constructor/batched_reader.rs(2 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(6 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(3 hunks)lib/segment/src/vector_storage/chunked_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(11 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(4 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs(3 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs(4 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(7 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (20)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/vector_storage/quantized/quantized_multivector_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/index/hnsw_index/graph_layers_healer.rslib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/index/sparse_index/sparse_vector_index.rslib/segment/benches/sparse_vector_storage.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/vector_storage/tests/async_raw_scorer.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/chunked_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/vector_storage/quantized/quantized_multivector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-08-06T09:56:59.311Z
Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/index/hnsw_index/graph_layers_healer.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/vector_storage/quantized/quantized_multivector_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/index/sparse_index/sparse_vector_index.rslib/segment/benches/sparse_vector_storage.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/vector_storage/tests/async_raw_scorer.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rslib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/segment_constructor/batched_reader.rslib/segment/src/vector_storage/quantized/quantized_multivector_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/index/hnsw_index/graph_layers_healer.rslib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/index/sparse_index/sparse_vector_index.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/vector_storage/query_scorer/metric_query_scorer.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/index/hnsw_index/point_scorer.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/segment/src/segment_constructor/batched_reader.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/index/sparse_index/sparse_vector_index.rslib/segment/benches/sparse_vector_storage.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/vector_storage/query_scorer/metric_query_scorer.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/index/sparse_index/sparse_vector_index.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/index/hnsw_index/gpu/mod.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-08-21T13:43:52.191Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-21T13:46:10.866Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/segment/segment_ops.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-11T12:30:47.220Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.
Applied to files:
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
📚 Learning: 2025-07-02T17:10:13.847Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Applied to files:
lib/gridstore/benches/real_data_bench.rslib/segment/src/index/hnsw_index/graph_layers_healer.rslib/gridstore/benches/random_data_bench.rs
📚 Learning: 2025-05-30T15:26:14.488Z
Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.
Applied to files:
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/src/index/sparse_index/sparse_vector_index.rs
📚 Learning: 2025-04-22T23:19:51.232Z
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.
Applied to files:
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-05-19T14:40:20.068Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Applied to files:
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
Show resolved
Hide resolved
|
Is there any advantage of this, apart from making access pattern explicit? |
|
ok, I presume less code as well |
Less code; and it's more convenient to base new methods1 on this to get both options at once. Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
533-539: Sequential read is good for full scan; also avoid usize→PointOffsetType overflow.Iterate over
0..countand convert each index withtry_into()to prevent truncation.- let vectors = (0..count as PointOffsetType).map(|i| { + let vectors = (0..count).map(|i| { + let i: PointOffsetType = i.try_into().expect("vector_count exceeds PointOffsetType"); PrimitiveVectorElement::quantization_preprocess( quantization_config, distance, - vector_storage.get_dense::<Sequential>(i), + vector_storage.get_dense::<Sequential>(i), ) });lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
64-80: Bounds check uses vectors.len() instead of offsets.len().
set_deletedshould validate keys against the number of multivectors (offsets), not the number of flattened inner vectors. Current check can incorrectly allow clearing deletion on a non-existent key.Fix:
- if !deleted && self.vectors.len() <= key as usize { + if !deleted && self.offsets.len() <= key as usize { return false; }
🧹 Nitpick comments (9)
lib/collection/src/collection/mmr/lazy_matrix.rs (1)
36-36: Avoid lossy cast in debug assertion index.Use a checked conversion to PointOffsetType to prevent silent truncation on large collections.
- let stored_vector = storage.get_vector::<Random>(i as u32); + let idx: PointOffsetType = i.try_into().expect("index overflow"); + let stored_vector = storage.get_vector::<Random>(idx);lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
20-22: Remove unnecessarySelf: Sizedbound for consistency.The trait isn’t object-safe anyway due to generics; keep both methods consistent.
- fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]> - where - Self: Sized; + fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]>;lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
128-133: Prefer explicit Cow::from over into().Align with repo guideline to avoid x.into() in new code.
- .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))
135-139: Prefer explicit Cow::from over into().- .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
182-189: Use Cow::from instead of into() chains.Keeps conversions explicit per style guide.
- .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .get_vector_opt::<P>(key) + .map(|vector| CowVector::from(T::slice_to_float_cow(Cow::from(vector))))
191-197: Same here: make conversions explicit.- .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .get_vector_opt::<P>(key) + .map(|vector| CowVector::from(T::slice_to_float_cow(Cow::from(vector))))lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)
263-268: Prefer explicit Cow::from over into().- self.vectors - .get_opt(key as VectorOffsetType) - .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + self.vectors + .get_opt(key as VectorOffsetType) + .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1)
110-115: Prefer explicit Cow::from over into().- self.vectors - .get_opt(key as VectorOffsetType) - .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + self.vectors + .get_opt(key as VectorOffsetType) + .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))lib/segment/src/vector_storage/vector_storage_base.rs (1)
40-56: Seal AccessPattern to prevent unsound external impls.Avoid third-party implementations that could misdeclare
IS_SEQUENTIAL. Seal the trait:+mod private { + pub trait Sealed {} +} -pub trait AccessPattern: Copy { +pub trait AccessPattern: Copy + private::Sealed { const IS_SEQUENTIAL: bool; } #[derive(Copy, Clone)] pub struct Random; #[derive(Copy, Clone)] pub struct Sequential; impl AccessPattern for Random { const IS_SEQUENTIAL: bool = false; } impl AccessPattern for Sequential { const IS_SEQUENTIAL: bool = true; } + +impl private::Sealed for Random {} +impl private::Sealed for Sequential {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (49)
lib/collection/src/collection/mmr/lazy_matrix.rs(2 hunks)lib/gridstore/benches/bustle_bench/payload_storage.rs(1 hunks)lib/gridstore/benches/random_data_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/gridstore.rs(16 hunks)lib/segment/benches/sparse_vector_storage.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(7 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers_healer.rs(2 hunks)lib/segment/src/index/hnsw_index/point_scorer.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(5 hunks)lib/segment/src/segment/segment_ops.rs(3 hunks)lib/segment/src/segment_constructor/batched_reader.rs(2 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(6 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(3 hunks)lib/segment/src/vector_storage/chunked_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(11 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(4 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs(3 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs(4 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(7 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (34)
- lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
- lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
- lib/segment/benches/sparse_vector_storage.rs
- lib/segment/src/index/hnsw_index/point_scorer.rs
- lib/segment/src/segment_constructor/segment_constructor_base.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
- lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
- lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
- lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/segment_constructor/batched_reader.rs
- lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/graph_layers_healer.rs
- lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
- lib/segment/src/payload_storage/mmap_payload_storage.rs
- lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/tests/integration/sparse_vector_index_search_tests.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/gridstore/benches/bustle_bench/payload_storage.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
- lib/gridstore/src/gridstore.rs
- lib/segment/src/segment/segment_ops.rs
- lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
- lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
- lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
- lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/gridstore/benches/real_data_bench.rslib/segment/src/vector_storage/query_scorer/custom_query_scorer.rslib/gridstore/benches/random_data_bench.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/gridstore/benches/real_data_bench.rslib/gridstore/benches/random_data_bench.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
🧠 Learnings (13)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
📚 Learning: 2025-07-02T17:10:13.847Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Applied to files:
lib/gridstore/benches/real_data_bench.rslib/gridstore/benches/random_data_bench.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/chunked_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
📚 Learning: 2025-08-21T13:46:10.866Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-08-21T13:43:52.191Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/quantized/quantized_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.
Applied to files:
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-05-19T14:40:20.068Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Applied to files:
lib/segment/src/vector_storage/chunked_vector_storage.rs
📚 Learning: 2025-04-22T23:19:51.232Z
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.
Applied to files:
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
🧬 Code graph analysis (11)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (8)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)
vector_storage(61-63)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (3)
vector_storage(55-57)vector_storage(73-75)vector_storage(82-84)
lib/collection/src/collection/mmr/lazy_matrix.rs (5)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)
vector_storage(220-221)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (11)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)
vector_storage(61-63)lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
get(20-22)get_many(52-52)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
get(45-47)get_many(96-98)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
get_dense(88-92)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)
get_dense(236-238)get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
get_dense(88-92)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)
get_dense(236-238)get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)get_vector(138-141)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
storage(168-168)storage(179-179)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (4)
get(271-273)get(362-364)get_many(277-300)get_many(412-414)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
get(45-47)get_many(96-98)lib/segment/src/vector_storage/chunked_vectors.rs (2)
get(44-46)get_many(61-72)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)
get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)lib/segment/src/vector_storage/vector_storage_base.rs (7)
get_multi(181-181)get_multi(193-193)get_multi_opt(182-185)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
get_dense(88-92)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)
get_dense(147-153)get_vector(182-189)get_vector_opt(191-197)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (4)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)get_vector_opt(110-115)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/vector_storage_base.rs (10)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)
get_vector(128-133)get_vector_opt(135-139)get_dense(88-92)vectors(89-90)vectors(129-130)vectors(136-137)vector_dim(84-86)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)
get_vector(182-189)get_vector_opt(191-197)get_dense(147-153)vector_dim(143-145)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)
get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)get_dense(236-238)vector_dim(232-234)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)
get_vector(105-107)get_vector_opt(106-106)get_vector_opt(110-115)get_dense(83-85)vector_dim(79-81)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)
get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)vectors(123-123)vectors(143-144)vector_dim(105-107)get_multi(110-112)get_multi_opt(111-111)get_multi_opt(115-132)get_multi_opt(192-192)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)
get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)vector_dim(333-335)get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)
get_vector(216-219)get_vector_opt(217-217)get_vector_opt(224-229)get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)lib/common/common/src/maybe_uninit.rs (1)
maybe_uninit_fill_from(10-33)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4)
lib/common/common/src/maybe_uninit.rs (2)
transmute(30-30)maybe_uninit_fill_from(10-33)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5)
get_vector(128-133)get_vector_opt(135-139)vectors(89-90)vectors(129-130)vectors(136-137)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
get_vector(182-189)get_vector_opt(191-197)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)
get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: storage-compat-test
- GitHub Check: test-consensus-compose
- GitHub Check: test-consistency
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
🔇 Additional comments (25)
lib/collection/src/collection/mmr/lazy_matrix.rs (1)
5-5: Importing Random for debug-only check is correct.Matches the new generic API and is scoped under debug assertions.
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2)
15-15: Import of Random aligns this scorer with the new access-pattern API.
87-90: I’ve pulled the relevant snippet around the batch call; analyzing now.lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
41-43: Remove unusedVectorStorageimport
DropVectorStoragefrom theusestatement inlib/segment/src/vector_storage/quantized/quantized_vectors.rs(line 42).lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
8-8: AccessPattern import added — OK.
52-53: Genericget_manysignature looks good and matches the new pattern.lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)
21-21: AccessPattern import — OK.
362-364: Traitget<P>correctly delegates to the inherent method withP::IS_SEQUENTIAL.Prevents recursion and preserves existing logic.
412-414: Traitget_many<P>delegation is correct and mirrors prior behavior.lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (2)
26-26: Good import consolidation for AccessPattern types.
145-153: API unification via P: AccessPattern looks correct.The generic get_vector/get_vector_opt and the batched path correctly switch between Sequential/Random.
Also applies to: 163-168
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
88-91: Dense getter genericization looks good.lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
147-153: Good: get_dense now respects AccessPattern.
369-371: Tests updated to generic API look correct.Also applies to: 377-378, 396-399, 457-458, 576-579, 646-649, 720-723
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)
236-238: Generic get_densefor in-RAM storage: OK.
258-260: OK to delegate to get_vector_opt::.
349-349: Test adjustments to Sequential look good.Also applies to: 420-422
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)
83-85: Generic get_densematches in-memory semantics.
105-107: OK to delegate to get_vector_opt::.
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)
179-183: AccessPattern generalization: read-path selection looks correct.Const-generic dispatch to Gridstore’s
get_value::<true/false>based onP::IS_SEQUENTIALis clean, andtranspose()preserves error semantics fortry_from. No issues.Also applies to: 185-196
216-219: Vector accessors: API unification LGTM.
get_vector::<P>delegates toget_vector_opt::<P>with the same not-found semantics as before. Good.Also applies to: 224-229
22-22: Imports and test updates consistent with new API.Importing
AccessPattern/Randomand migrating tests toget_vector::<Random>is correct.Also applies to: 347-348, 415-416
71-76: Add +1 to deleted.get_bitslice().last_one() before max()
storage.max_point_id() returns the next point offset (pointer_count), so only the last_one() index needs a +1:- let next_point_offset = deleted - .get_bitslice() - .last_one() - .max(Some(storage.max_point_id() as usize)) - .unwrap_or_default(); + let next_point_offset = deleted + .get_bitslice() + .last_one() + .map(|i| i + 1) + .max(Some(storage.max_point_id() as usize)) + .unwrap_or_default();lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
22-24: AccessPattern adoption across multi-dense storage looks good.
- Generic
get_multi/get_multi_opt/get_vector/get_vector_optare consistent with the trait changes.- Using
Sequentialiniterate_inner_vectorsis the right call for scan-heavy paths.- Reading existing offset with
get::<Random>ininsert_vectoris appropriate.Also applies to: 109-112, 115-132, 138-145, 187-189, 191-197, 218-223
lib/segment/src/vector_storage/vector_storage_base.rs (1)
830-873: Enum dispatch updated comprehensively.All variants forward
get_vector/get_vector_optwith<P>; exhaustive and consistent with feature gates.Also applies to: 876-919
| b.iter(|| { | ||
| for i in 0..storage.max_point_id() { | ||
| let res = storage.get_value(i, &hw_counter).unwrap(); | ||
| let res = storage.get_value::<false>(i, &hw_counter).unwrap(); |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
“scan storage” should use the sequential read path
This is a linear scan; prefer the sequential specialization.
- let res = storage.get_value::<false>(i, &hw_counter).unwrap();
+ let res = storage.get_value::<true>(i, &hw_counter).unwrap();To verify other benches aren’t accidentally using the random path for linear scans, run:
Expected: linear scans use ::<true>, random-index benches (if any) use ::<false>.
🏁 Script executed:
#!/bin/bash
rg -nC2 --type=rust 'get_value::<' lib/gridstore/benches | sed -n '1,120p'Length of output: 1445
Enable sequential read path in real_data_bench
Change the call in lib/gridstore/benches/real_data_bench.rs line 91 from
let res = storage.get_value::<false>(i, &hw_counter).unwrap();to
-let res = storage.get_value::<false>(i, &hw_counter).unwrap();
+let res = storage.get_value::<true>(i, &hw_counter).unwrap();[random_data_bench remains correct with ::<false>, and bustle_bench’s use of ::<false> in its own context can be left as is]
🤖 Prompt for AI Agents
In lib/gridstore/benches/real_data_bench.rs around line 91, the call uses the
random-access specialization get_value::<false> which disables the sequential
read path; change the type parameter to true so it reads sequentially by
replacing get_value::<false> with get_value::<true> at that location.
| fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] { | ||
| let mmap = if P::IS_SEQUENTIAL { | ||
| &self.mmap_sequential | ||
| } else { | ||
| &self.mmap | ||
| }; | ||
| let byte_slice = &mmap[offset..(offset + self.raw_size())]; | ||
| let arr: &[T] = unsafe { transmute(byte_slice) }; | ||
| &arr[0..self.dim] | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Replace unsafe transmute with bytemuck cast_slice.
Avoid raw transmute on &[u8]→&[T]; use bytemuck for soundness and readability.
- let byte_slice = &mmap[offset..(offset + self.raw_size())];
- let arr: &[T] = unsafe { transmute(byte_slice) };
- &arr[0..self.dim]
+ let byte_slice = &mmap[offset..offset + self.raw_size()];
+ let arr: &[T] = bytemuck::cast_slice(byte_slice);
+ arrAdd import (outside selection):
use bytemuck::cast_slice;🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs around lines 133
to 142, replace the unsafe transmute from &[u8] to &[T] with
bytemuck::cast_slice for soundness and readability: add the import use
bytemuck::cast_slice at the top of the file, then change the conversion to let
arr: &[T] = cast_slice(byte_slice); and continue returning &arr[0..self.dim];
ensure T implements the required bytemuck traits (e.g., Pod) where needed.
| let offsets = (0..vectors_count as PointOffsetType) | ||
| .map(|idx| vector_storage.get_multi(idx).vectors_count() as PointOffsetType) | ||
| .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType) | ||
| .scan(0, |offset_acc, multi_vector_len| { |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Prefer Sequential for offset scan and avoid lossy cast.
This loop walks all ids; Sequential improves locality. Also replace as PointOffsetType with try_into().
- let offsets = (0..vectors_count as PointOffsetType)
- .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType)
+ let offsets = (0..vectors_count)
+ .map(|i| {
+ let idx: PointOffsetType = i.try_into().expect("vector_count exceeds PointOffsetType");
+ vector_storage
+ .get_multi::<Sequential>(idx)
+ .vectors_count()
+ .try_into()
+ .expect("inner vectors_count exceeds PointOffsetType")
+ })🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/quantized/quantized_vectors.rs around lines
622 to 624, the offset accumulation should use Sequential access for better
locality and avoid a lossy cast; change get_multi::<Random> to
get_multi::<Sequential> and replace the `as PointOffsetType` cast with a
fallible conversion using try_into(), handling the Result (e.g., propagate or
call expect with a clear message) so you don't silently truncate values.
| let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); | ||
| maybe_uninit_fill_from(vectors, iter).0 | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Batch dense reads should prefer Sequential.
Keys are documented as consecutive; prefer Sequential to hint optimal prefetch/IO.
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); | |
| maybe_uninit_fill_from(vectors, iter).0 | |
| } | |
| let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key)); | |
| maybe_uninit_fill_from(vectors, iter).0 | |
| } |
🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/vector_storage_base.rs around lines 162 to
164, the batch dense read currently calls self.get_dense::<Random>(*key) for
each key even though keys are documented as consecutive; change the generic type
parameter from Random to Sequential (i.e., call
self.get_dense::<Sequential>(*key)) so the underlying implementation can prefer
sequential prefetch/IO for better performance when reading consecutive keys.
| let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); | ||
| maybe_uninit_fill_from(vectors, iter).0 | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Batch multi-dense reads should prefer Sequential.
Same rationale as dense batch.
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); | |
| maybe_uninit_fill_from(vectors, iter).0 | |
| } | |
| let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key)); | |
| maybe_uninit_fill_from(vectors, iter).0 | |
| } |
🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/vector_storage_base.rs around lines 193 to
195, the batch multi-dense read is calling self.get_multi::<Random> but should
use the Sequential access pattern like other dense batch reads; change the
generic parameter from Random to Sequential for get_multi so the iterator
prefers sequential access, leaving the surrounding maybe_uninit_fill_from call
unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
lib/gridstore/src/gridstore.rs (1)
244-263: Avoid panic on corrupt data: handle LZ4 decompression errors instead of unwrap()If on-disk bytes are corrupted or truncated,
decompress_size_prependedcan fail. Currently this path panics viadecompress_lz4(...).unwrap(), bringing the whole process down on a single bad entry. ReturnNoneinstead to keep semantics consistent with “missing/unreadable value” and avoid a crash.Apply this diff within
get_value:let raw = self.read_from_pages::<READ_SEQUENTIAL>(page_id, block_offset, length); hw_counter.payload_io_read_counter().incr_delta(raw.len()); - let decompressed = self.decompress(raw); - let value = V::from_bytes(&decompressed); - - Some(value) + let decompressed = match self.config.compression { + Compression::None => raw, + Compression::LZ4 => match lz4_flex::decompress_size_prepended(&raw) { + Ok(v) => v, + Err(_) => return None, + }, + }; + Some(V::from_bytes(&decompressed))Note: This confines the change to the new API surface without refactoring the existing
decompress()helper globally. If desired, we can later introduce a fallibledecompress_result()and migrate all internal call sites.lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)
149-159: Bug: get_sparse ignores deleted flag; delegate through get_sparse_opt for consistency.As implemented, get_sparse returns vectors even if the record is marked deleted, unlike other sparse storages (mmap/volatile) that route via get_sparse_opt and error on None. Unify semantics and avoid duplication.
Apply:
- fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> { - // Already in memory, so no sequential optimizations available. - let bin_key = bincode::serialize(&key) - .map_err(|_| OperationError::service_error("Cannot serialize sparse vector key"))?; - let data = self.db_wrapper.get(bin_key)?; - let record: StoredSparseVector = bincode::deserialize(&data).map_err(|_| { - OperationError::service_error("Cannot deserialize sparse vector from db") - })?; - Ok(record.vector) - } + fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> { + self.get_sparse_opt::<P>(key)? + .ok_or_else(|| OperationError::service_error(format!("Key {key} not found"))) + }
260-266: Don't swallow IO/deserialize errors during delete; propagate instead.ok().flatten() hides failures, breaking size accounting on error. Return the error and keep totals correct.
Apply:
- let old_vector = self.get_sparse_opt::<Random>(key).ok().flatten(); + let old_vector = self.get_sparse_opt::<Random>(key)?; self.update_stored( key, true, - old_vector.as_ref(), + old_vector.as_ref(), &HardwareCounterCell::disposable(), // We don't measure deletions )?;lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
119-132: Avoid panic in get_multi_opt; eliminate expect and return None on malformed/empty offset.This path returns Option; panicking on empty offset defeats that contract.
Apply:
- self.offsets - .get::<P>(key as VectorOffsetType) - .and_then(|mmap_offset| { - let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty"); - self.vectors.get_many::<P>( - mmap_offset.offset as VectorOffsetType, - mmap_offset.count as usize, - ) - }) + self.offsets + .get::<P>(key as VectorOffsetType) + .and_then(|arr| { + arr.first().and_then(|o| { + self.vectors + .get_many::<P>(o.offset as VectorOffsetType, o.count as usize) + }) + }) .map(|flattened_vectors| TypedMultiDenseVectorRef { flattened_vectors, dim: self.vectors.dim(), })
♻️ Duplicate comments (3)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
96-103: Guard u32 casts and fix accumulator literal typeKeep the previous overflow concern addressed and ensure the accumulator is
usize. Add a debug check (or a hard check) before casting.- .scan(0, |acc, count| { + .scan(0usize, |acc, count| { let start = *acc; *acc += count; + debug_assert!(start <= u32::MAX as usize && count <= u32::MAX as usize); Some(GpuMultivectorOffset { start: start as u32, count: count as u32, }) }),If you prefer a hard check instead of
debug_assert!, refactor to perform a bounded cast withu32::try_from(...)and return anOperationErroron overflow.lib/segment/src/vector_storage/vector_storage_base.rs (2)
162-164: Batch dense reads should prefer Sequential.Apply:
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
193-195: Batch multi-dense reads should prefer Sequential.Apply:
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));
🧹 Nitpick comments (8)
lib/gridstore/src/gridstore.rs (1)
629-629: Minor ergonomics: consider small helpers to reduce turbofish noise in testsOptional: add tiny wrappers in the test module for readability, e.g.,
fn get(storage: &Gridstore<_>, i, hw) -> _ { storage.get_value::<false>(i, hw) }.Also applies to: 648-648, 676-676, 736-736, 745-745, 769-769, 779-779, 808-808, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1100, 1150-1150
lib/gridstore/benches/random_data_bench.rs (1)
27-31: Harden the benchmark against optimizer effectsWrap loop vars/results in
black_boxto avoid potential elision and to better isolate read path costs.Apply this diff:
- b.iter(|| { + b.iter(|| { for i in 0..PAYLOAD_COUNT { - let res = storage.get_value::<false>(i, &hw_counter); + let i = criterion::black_box(i); + let res = storage.get_value::<false>(i, &hw_counter); + criterion::black_box(&res); assert!(res.is_some()); } });Also add the import:
use criterion::{black_box, BatchSize, Criterion, criterion_group, criterion_main};lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
1054-1055: Sequential access for migration loops — LGTM; consider hoisting importUsing
get_vector::<Sequential>in linear migrations is correct. To avoid repeated local imports, consider importingSequentialat module scope.Possible tidy-up:
- use crate::vector_storage::{VectorStorage, VectorStorageEnum}; + use crate::vector_storage::{Sequential, VectorStorage, VectorStorageEnum};Then drop the function-scoped
use crate::vector_storage::Sequential;lines.Also applies to: 1083-1086, 1143-1144, 1174-1176, 1281-1282, 1306-1308
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
88-91: Improve panic context.
Include key in the panic to aid debugging.- .expect("mmap vector not found") + .unwrap_or_else(|| panic!("mmap vector not found: {key}"))lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)
185-196: Collapse branch to a single const-generic call.
Slightly cleaner and avoids duplicate branches.- let storage = self.storage.read(); - let result = if P::IS_SEQUENTIAL { - storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured - } else { - storage.get_value::<false>(key, &HardwareCounterCell::disposable()) - }; + let storage = self.storage.read(); + // Vector storage read IO not measured + let result = storage.get_value::<{ P::IS_SEQUENTIAL }>( + key, + &HardwareCounterCell::disposable(), + );
224-229: Optional: log suppressed errors at debug level.
Keeps “ignore errors” semantics while aiding diagnosis.- match self.get_sparse_opt::<P>(key) { - Ok(Some(vector)) => Some(CowVector::from(vector)), - _ => None, - } + match self.get_sparse_opt::<P>(key) { + Ok(Some(vector)) => Some(CowVector::from(vector)), + Err(err) => { + log::debug!("get_sparse_opt failed for key {key}: {err}"); + None + } + _ => None, + }lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
134-148: Optional: make iterate_inner_vectors robust to missing/empty offsets.Unwraps can panic on sparse/deleted keys or partially flushed state. Iterating defensively is cheap here.
Apply:
- (0..self.total_vector_count()).flat_map(|key| { - let mmap_offset = self - .offsets - .get::<Sequential>(key as VectorOffsetType) - .unwrap() - .first() - .unwrap(); - (0..mmap_offset.count).map(|i| { - self.vectors - .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType) - .unwrap() - }) - }) + (0..self.total_vector_count()).flat_map(|key| { + self.offsets + .get::<Sequential>(key as VectorOffsetType) + .and_then(|arr| arr.first().copied()) + .into_iter() + .flat_map(move |o| { + (0..o.count).filter_map(move |i| { + self.vectors + .get::<Sequential>((o.offset + i) as VectorOffsetType) + }) + }) + })lib/segment/src/vector_storage/vector_storage_base.rs (1)
40-56: Seal AccessPattern to prevent external impls.AccessPattern is part of a performance-sensitive API; sealing avoids third-party types altering invariants.
Apply:
+mod sealed { + pub trait Sealed {} +} -pub trait AccessPattern: Copy { +pub trait AccessPattern: sealed::Sealed + Copy { const IS_SEQUENTIAL: bool; } #[derive(Copy, Clone)] pub struct Random; #[derive(Copy, Clone)] pub struct Sequential; impl AccessPattern for Random { const IS_SEQUENTIAL: bool = false; } impl AccessPattern for Sequential { const IS_SEQUENTIAL: bool = true; }And add below (outside the range) to implement the seal:
impl sealed::Sealed for Random {} impl sealed::Sealed for Sequential {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (49)
lib/collection/src/collection/mmr/lazy_matrix.rs(2 hunks)lib/gridstore/benches/bustle_bench/payload_storage.rs(1 hunks)lib/gridstore/benches/random_data_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/gridstore.rs(16 hunks)lib/segment/benches/sparse_vector_storage.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(7 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers_healer.rs(2 hunks)lib/segment/src/index/hnsw_index/point_scorer.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(5 hunks)lib/segment/src/segment/segment_ops.rs(3 hunks)lib/segment/src/segment_constructor/batched_reader.rs(2 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(6 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(3 hunks)lib/segment/src/vector_storage/chunked_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(11 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(4 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs(3 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs(4 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(7 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (34)
- lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
- lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
- lib/gridstore/benches/real_data_bench.rs
- lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
- lib/segment/src/index/hnsw_index/point_scorer.rs
- lib/collection/src/collection/mmr/lazy_matrix.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/benches/sparse_vector_storage.rs
- lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
- lib/segment/src/payload_storage/mmap_payload_storage.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
- lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/segment/segment_ops.rs
- lib/segment/src/index/hnsw_index/graph_layers_healer.rs
- lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/tests/integration/sparse_vector_index_search_tests.rs
- lib/segment/src/vector_storage/chunked_vector_storage.rs
- lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
- lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
- lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
- lib/gridstore/benches/bustle_bench/payload_storage.rs
- lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
- lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
- lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/gridstore/benches/random_data_bench.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/gridstore/benches/random_data_bench.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
🧠 Learnings (12)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/batched_reader.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
📚 Learning: 2025-08-21T13:46:10.866Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/simple_dense_vector_storage.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-08-21T13:43:52.191Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/segment_constructor/segment_constructor_base.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rslib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rslib/segment/src/vector_storage/in_ram_persisted_vectors.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-07-02T17:10:13.847Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Applied to files:
lib/gridstore/benches/random_data_bench.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
📚 Learning: 2025-08-11T12:30:47.220Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.
Applied to files:
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
🧬 Code graph analysis (14)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (17)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)
vector_storage(529-529)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)
vector_storage(61-63)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)
get_dense(147-153)get_vector(182-189)get_vector_opt(191-197)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)
get_dense(236-238)get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (3)
get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (3)
get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (3)
get_vector(216-219)get_vector_opt(217-217)get_vector_opt(224-229)
lib/segment/src/segment_constructor/batched_reader.rs (5)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)
vector_storage(220-221)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)
vector_storage(220-221)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (4)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)get_vector(138-141)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
storage(168-168)storage(179-179)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (4)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)get_vector_opt(110-115)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (7)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)
vector_storage(97-97)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (4)
get(271-273)get(362-364)get_many(277-300)get_many(412-414)lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
get(20-22)get_many(52-52)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (2)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (4)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)
get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)get_vector(216-219)get_vector_opt(217-217)get_vector_opt(224-229)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)get_vector(138-141)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)
get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)
lib/gridstore/src/gridstore.rs (5)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1)
get_value(34-34)lib/gridstore/benches/random_data_bench.rs (1)
storage(29-29)lib/gridstore/benches/real_data_bench.rs (1)
storage(91-91)lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
storage(168-168)storage(179-179)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)
storage(191-191)storage(193-193)storage(380-380)storage(415-415)storage(452-452)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(537-537)vector_storage(623-623)lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
get(20-22)get_many(52-52)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
get(45-47)get_many(96-98)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)
get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)lib/segment/src/vector_storage/vector_storage_base.rs (7)
get_multi(181-181)get_multi(193-193)get_multi_opt(182-185)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/vector_storage_base.rs (11)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)
get_vector(128-133)get_vector_opt(135-139)get_dense(88-92)vectors(89-90)vectors(129-130)vectors(136-137)vector_dim(84-86)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)
get_vector(182-189)get_vector_opt(191-197)get_dense(147-153)vector_dim(143-145)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)
get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)get_dense(236-238)vector_dim(232-234)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)
get_vector(105-107)get_vector_opt(106-106)get_vector_opt(110-115)get_dense(83-85)vector_dim(79-81)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)
get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)vectors(123-123)vectors(143-144)vector_dim(105-107)get_multi(110-112)get_multi_opt(111-111)get_multi_opt(115-132)get_multi_opt(192-192)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)
get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)vector_dim(333-335)get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)
get_vector(216-219)get_vector_opt(217-217)get_vector_opt(224-229)get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_vector(138-141)get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)lib/common/common/src/maybe_uninit.rs (1)
maybe_uninit_fill_from(10-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (33)
lib/gridstore/src/gridstore.rs (2)
238-248: API consolidation to const-generic get_value is good; counter usage is correctSingle entry point with
const READ_SEQUENTIAL: boolreduces duplication. Accounting storage read bytes viapayload_io_read_counter().incr_delta(raw.len())aligns with measuring IO, not compute.
629-629: Call site migration to the new const-generic API looks correctAll updated tests use
get_value::<false>for non-sequential access where appropriate; sequential paths elsewhere still use<true>. This matches the intended semantics.Also applies to: 648-648, 676-676, 736-736, 745-745, 769-769, 779-779, 808-808, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1100, 1150-1150
lib/gridstore/benches/random_data_bench.rs (1)
29-29: Bench call updated to new API — goodSwitch to
get_value::<false>matches the consolidated API.lib/segment/src/segment_constructor/batched_reader.rs (1)
11-11: Switch to generic Sequential access — LGTMImporting
Sequentialand switching toget_vector::<Sequential>aligns with the newAccessPatternAPI and preserves intended locality.Also applies to: 86-90
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)
17-18: UseRandomfor scorer reads — LGTMExplicit
get_multi::<Random>is appropriate for these random-access scoring paths and keeps counters correct.Also applies to: 97-103, 134-141
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)
29-30: Test updated toRandomaccess — LGTMMatches the new generic API; no further changes needed.
Also applies to: 146-149
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
17-18: GPU path switched toget_multi::<Random>— LGTMImport consolidation and explicit access pattern are correct for GPU staging.
Also applies to: 91-94
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
9-9: LGTM: AccessPattern import hookup is correct.
45-47: LGTM: Generic forwarding to underlying storage.
Both get and get_many correctly delegate to ChunkedMmapVectors::. No issues spotted.
Also applies to: 96-98
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
26-26: LGTM: Adopted AccessPattern in the prelude.
128-133: LGTM: get_vector now generic and zero-cost maps to CowVector.
135-139: LGTM: get_vector_opt generic path is consistent with other backends.lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)
21-21: LGTM: AccessPattern import for const-based dispatch.
361-364: LGTM: Generic get uses P::IS_SEQUENTIAL to select slice type.
412-414: LGTM: Generic get_many mirrors get behavior.lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)
26-26: LGTM: AccessPattern added to imports.
236-238: LGTM: Generic Dense/Vector retrieval for in-memory backend.
AccessPattern is intentionally ignored (RAM). API aligns with trait changes.Also applies to: 258-260, 262-268
348-348: LGTM: Tests updated to generic API (Sequential).Also applies to: 420-420
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)
22-22: LGTM: AccessPattern imported for sparse mmap backend.
180-183: LGTM: get_sparse now delegates via generic opt-variant.
216-219: LGTM: get_vector generic returns default sparse on miss.
347-347: LGTM: Tests aligned to generic API (Random).Also applies to: 415-415
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (6)
23-25: LGTM: adopts AccessPattern generics where needed.
161-180: LGTM: get_sparse_opt honors deletion and maps decode errors.
200-204: LGTM: generic get_vector delegates to opt and falls back to default sparse.
209-214: LGTM: get_vector_opt correctly composes from get_sparse_opt.
294-294: LGTM: test imports Sequential for explicit access pattern.
357-359: LGTM: test uses get_vector:: after migration.lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)
22-24: LGTM: imports AccessPattern markers and traits.
110-112: LGTM: get_multi now generic and delegates to opt variant.
187-197: LGTM: vector getters wrap multi-dense into CowVector with AccessPattern.
218-223: LGTM: reading offsets with Random for metadata probe is appropriate.lib/segment/src/vector_storage/vector_storage_base.rs (1)
830-874: LGTM: enum dispatch passes through AccessPattern generics across all variants.Also applies to: 876-919
| fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> { | ||
| self.get_sparse_opt::<P>(key)? | ||
| .ok_or_else(|| OperationError::service_error(format!("Key {key} not found"))) | ||
| } | ||
|
|
||
| fn get_sparse_opt(&self, key: PointOffsetType) -> OperationResult<Option<SparseVector>> { | ||
| self.storage | ||
| .read() | ||
| .get_value(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured | ||
| .map(SparseVector::try_from) | ||
| .transpose() | ||
| } | ||
|
|
||
| fn get_sparse_sequential(&self, key: PointOffsetType) -> OperationResult<SparseVector> { | ||
| let stored_sparse = self | ||
| .storage | ||
| .read() | ||
| .get_value_sequential(key, &HardwareCounterCell::disposable()) | ||
| .ok_or_else(|| OperationError::service_error(format!("Key {key} not found")))?; | ||
| SparseVector::try_from(stored_sparse) | ||
| fn get_sparse_opt<P: AccessPattern>( | ||
| &self, | ||
| key: PointOffsetType, | ||
| ) -> OperationResult<Option<SparseVector>> { | ||
| let storage = self.storage.read(); | ||
| let result = if P::IS_SEQUENTIAL { | ||
| storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured | ||
| } else { | ||
| storage.get_value::<false>(key, &HardwareCounterCell::disposable()) | ||
| }; | ||
| result.map(SparseVector::try_from).transpose() | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Repo-wide check for removed sequential methods and generic call sites.
Run to ensure no stale usages and help catch missed updates.
Also applies to: 216-229
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Ensure removed methods aren't referenced
rg -nP '(get_(vector|dense|many|sparse)_sequential)\s*\(' -S || echo "OK: no *_sequential calls found"
# 2) Spot potential non-generic get_vector/dense calls at call sites
# Heuristic: a dot call without a::<...> following the method name.
rg -nP '\.\s*(get_vector|get_dense|get_sparse)\s*\(' -g '!**/vector_storage_base.rs' -g '!**/*.md' -S || echo "OK: all calls appear genericized"
# 3) For ChunkedMmapVectors, ensure no callers use the old bool parameter (should be internal only)
rg -nP 'get\([^,]+,\s*(true|false)\)' lib/segment/src -S || echo "OK: no external bool-based get/get_many calls"Length of output: 1660
Add generic AccessPattern parameters to get_vector/get_dense/get_sparse calls
Found non-generic calls in:
- src/common/inference/query_requests_rest.rs (165, 174, 183)
- src/common/inference/query_requests_grpc.rs (345, 359, 372)
- src/common/inference/update_requests.rs (430, 437, 444)
- lib/segment/src/segment/snapshot.rs (146)
- lib/segment/src/segment_constructor/batched_reader.rs (64)
- lib/segment/src/index/hnsw_index/graph_links/serializer.rs (135)
These invocations must be updated to use the generic form, e.g. get_vector::<P>(…), with the appropriate AccessPattern.
🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs around
lines 180 to 196, the public API methods expect an AccessPattern generic but
many call sites use non-generic invocations; update callers to call the generic
form (e.g., replace get_sparse(key) with get_sparse::<P>(key) where P implements
AccessPattern), importing or forwarding the concrete AccessPattern type (e.g.,
Sequential or Random) as needed. Apply the same change to all listed
files/locations (src/common/inference/query_requests_rest.rs lines ~165/174/183,
src/common/inference/query_requests_grpc.rs ~345/359/372,
src/common/inference/update_requests.rs ~430/437/444,
lib/segment/src/segment/snapshot.rs ~146,
lib/segment/src/segment_constructor/batched_reader.rs ~64,
lib/segment/src/index/hnsw_index/graph_links/serializer.rs ~135) so each
get_vector/get_dense/get_sparse call uses the explicit generic parameter (or
thread a generic P through the caller) and add any necessary use/imports for the
concrete AccessPattern types.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/gridstore/src/gridstore.rs (1)
238-264: Replace all non-genericget_valuecalls with the const-generic API
Existing call-sites such as.get_value(&JsonPath::new("k"))must be updated to specify the
READ_SEQUENTIALconst parameter (e.g.get_value::<false>orget_value::<true>). Tests and code underlib/collection,lib/shard,lib/segment, andlib/posting_liststill use the old form and will fail to compile until updated.
♻️ Duplicate comments (1)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
133-142: Replace unsafe transmute with bytemuck::cast_slice (soundness).Casting &[u8] to &[T] with transmute is UB on misalignment and hides invariants. Use bytemuck::cast_slice; it checks size/alignment and requires T: Pod, which PrimitiveVectorElement types satisfy in this codebase. Also drop the redundant slicing.
- fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] { + fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] { let mmap = if P::IS_SEQUENTIAL { &self.mmap_sequential } else { &self.mmap }; - let byte_slice = &mmap[offset..(offset + self.raw_size())]; - let arr: &[T] = unsafe { transmute(byte_slice) }; - &arr[0..self.dim] + let byte_slice = &mmap[offset..offset + self.raw_size()]; + let arr: &[T] = bytemuck::cast_slice(byte_slice); + arr }Add import and remove transmute:
-use std::mem::{self, MaybeUninit, size_of, transmute}; +use std::mem::{self, MaybeUninit, size_of}; +use bytemuck::cast_slice;
🧹 Nitpick comments (13)
lib/collection/src/collection/mmr/lazy_matrix.rs (2)
5-5: LGTM: debug-only import scope is correct.Importing
Randomand the trait only in debug builds matches usage in the guarded block.Optional: if you decide to validate both access patterns in debug (see next comment), extend the import:
-#[cfg(debug_assertions)] -use segment::vector_storage::{Random, VectorStorage}; +#[cfg(debug_assertions)] +use segment::vector_storage::{Random, Sequential, VectorStorage};
36-36: Use PointOffsetType (and optionally validate both access patterns).
- Prefer casting to
PointOffsetTypefor resilience if the alias changes.- Optional: since the loop is sequential and this is debug-only, also fetch via
Sequentialto catch pattern-specific regressions.Apply:
- let stored_vector = storage.get_vector::<Random>(i as u32); + let stored_vector = storage.get_vector::<Random>(i as PointOffsetType); + // Optional: also verify the sequential path in debug + // let stored_vector_seq = storage.get_vector::<Sequential>(i as PointOffsetType); + // assert_eq!(stored_vector_seq.to_owned(), *vector);lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1)
22-24: Refineget_vector_datain chunked mmap storage.
- Casting
PointOffsetType (u32)toVectorOffsetType (usize)is non‐narrowing on all supported targets; atry_fromguard can be deferred as an optional future‐proofing measure.- Quantization code invokes
get_vector_datasequentially across vectors; if predictable linear scans dominate hot paths, consider a sequential readahead pattern instead ofRandom.- For readability, invoke the trait on the data directly:
self.data.get::<Random>(key).unwrap_or_default().lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
208-216: Consider using Sequential for obviously ordered scans on non-Linux.process_points_simple always uses Random. If ids are mostly ascending/contiguous, using Sequential may improve readahead. Optional: detect monotonic/non-decreasing sequences and pick P accordingly.
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
88-92: Nit: error message wording."mmap vector not found" may be misleading when S is an in-RAM variant. Consider a storage-agnostic message.
- .expect("mmap vector not found") + .expect("vector not found")
128-139: Prefer explicit From over Into (repo guideline).Replace slice.into() with Cow::from(slice) and the final into() with CowVector::from(...).
fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> { self.vectors .get::<P>(key as VectorOffsetType) - .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice)))) .expect("Vector not found") } fn get_vector_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<CowVector<'_>> { self.vectors .get::<P>(key as VectorOffsetType) - .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into()))) + .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice)))) }lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
182-189: Prefer explicit From over Into (repo guideline).Make the conversions explicit for clarity.
fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> { self.mmap_store .as_ref() .unwrap() .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .map(|vector| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(vector)))) .expect("Vector not found") }
191-197: Prefer explicit From over Into (repo guideline).Mirror the change in get_vector for the optional path.
fn get_vector_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<CowVector<'_>> { self.mmap_store .as_ref() .unwrap() .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .map(|vector| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(vector)))) }lib/gridstore/src/gridstore.rs (3)
238-264: Public const-bool API: add inline and avoid “bool blindness”.Good consolidation. Two tweaks:
- Add #[inline] (hot path).
- Provide named consts (RANDOM/SEQUENTIAL) to improve call-site readability vs literals true/false.
Apply inline:
- pub fn get_value<const READ_SEQUENTIAL: bool>( + #[inline] + pub fn get_value<const READ_SEQUENTIAL: bool>(Optionally add named consts (outside this hunk, e.g. near CONFIG_FILENAME):
pub mod read_pattern { pub const RANDOM: bool = false; pub const SEQUENTIAL: bool = true; }Then call sites can use:
storage.get_value::<{ read_pattern::RANDOM }>(...)
259-263: Decompression panics on corrupt data; prefer non-panicking path.decompress_lz4(...).unwrap() can crash the process on malformed bytes. Consider returning Result from decompress()/decompress_lz4() and mapping to an error (or Option) at this layer.
Example (outside this hunk):
fn decompress_lz4(value: &[u8]) -> std::io::Result<Vec<u8>> { lz4_flex::decompress_size_prepended(value).map_err(|e| std::io::Error::other(e)) } fn decompress(&self, value: Vec<u8>) -> std::io::Result<Vec<u8>> { ... }And here:
let decompressed = self.decompress(raw).ok()?;
630-630: Use named consts for readability in tests.get_value::<false/true> is clear to us now, but named consts make intent obvious to future readers.
If you add:
use crate::read_pattern::{RANDOM, SEQUENTIAL};then replace literals with:
- get_value::<{RANDOM}> for random access
- get_value::<{SEQUENTIAL}> for sequential access
Also applies to: 649-649, 676-676, 736-736, 744-744, 769-769, 780-780, 808-808, 844-844, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1097, 1099-1100, 1150-1150
lib/segment/src/payload_storage/mmap_payload_storage.rs (1)
109-126: Prefer named consts over raw booleans at call sites.To avoid boolean ambiguity, import and use named consts if added in Gridstore.
Example:
use gridstore::read_pattern::{RANDOM, SEQUENTIAL}; match guard.get_value::<{RANDOM}>(point_id, hw_counter) { ... } self.storage.read().get_value::<{SEQUENTIAL}>(point_id, hw_counter)Also applies to: 136-161, 168-172, 179-183, 192-208
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
46-50: Prefer explicit From over Into in tests (repo guideline).Replace
vec.into()withVectorRef::from(vec)for clarity and consistency with the codebase rules.Apply this diff:
@@ - storage - .insert_vector(i as PointOffsetType, vec.into(), &hw_counter) + use crate::data_types::vectors::VectorRef; + storage + .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter) .unwrap(); @@ - .insert_vector(i as PointOffsetType, vec.into(), &hw_counter) + .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter) .unwrap(); @@ - storage - .insert_vector(i as PointOffsetType, vec.into(), &hw_counter) + use crate::data_types::vectors::VectorRef; + storage + .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter) .unwrap();Also applies to: 141-146, 226-231
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (49)
lib/collection/src/collection/mmr/lazy_matrix.rs(2 hunks)lib/gridstore/benches/bustle_bench/payload_storage.rs(1 hunks)lib/gridstore/benches/random_data_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/gridstore.rs(16 hunks)lib/segment/benches/sparse_vector_storage.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(7 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers_healer.rs(2 hunks)lib/segment/src/index/hnsw_index/point_scorer.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(5 hunks)lib/segment/src/segment/segment_ops.rs(3 hunks)lib/segment/src/segment_constructor/batched_reader.rs(2 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(6 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(3 hunks)lib/segment/src/vector_storage/chunked_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(11 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(4 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs(3 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs(4 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(7 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (38)
- lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/index/hnsw_index/point_scorer.rs
- lib/segment/src/segment_constructor/batched_reader.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
- lib/gridstore/benches/random_data_bench.rs
- lib/segment/benches/sparse_vector_storage.rs
- lib/segment/src/segment/segment_ops.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
- lib/gridstore/benches/bustle_bench/payload_storage.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/gridstore/benches/real_data_bench.rs
- lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
- lib/segment/src/segment_constructor/segment_constructor_base.rs
- lib/segment/tests/integration/sparse_vector_index_search_tests.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
- lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
- lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
- lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
- lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
- lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
- lib/segment/src/vector_storage/chunked_mmap_vectors.rs
- lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/graph_layers_healer.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/vector_storage/chunked_vector_storage.rs
- lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
- lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/vector_storage_base.rs
- lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
🧠 Learnings (13)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/payload_storage/mmap_payload_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/collection/src/collection/mmr/lazy_matrix.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
📚 Learning: 2025-08-21T13:43:52.191Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
📚 Learning: 2025-08-21T13:46:10.866Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.
Applied to files:
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rslib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/collection/src/collection/mmr/lazy_matrix.rslib/gridstore/src/gridstore.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.
Applied to files:
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
📚 Learning: 2025-08-29T08:16:28.512Z
Learnt from: CR
PR: qdrant/qdrant#0
File: .github/review-rules.md:0-0
Timestamp: 2025-08-29T08:16:28.512Z
Learning: Applies to **/*.rs : Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Applied to files:
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
📚 Learning: 2025-04-22T23:19:51.232Z
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.
Applied to files:
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
🧬 Code graph analysis (10)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)
get(259-261)get(350-352)lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
get(20-22)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (7)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(535-535)vector_storage(621-621)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)
vector_storage(61-63)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (3)
vector_storage(55-57)vector_storage(73-75)vector_storage(82-84)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
storage(408-408)get_vector_opt(135-139)is_deleted_vector(199-201)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (8)
storage(191-191)storage(193-193)storage(380-380)storage(415-415)storage(452-452)get_vector_opt(217-217)get_vector_opt(224-229)is_deleted_vector(306-308)lib/segment/src/vector_storage/vector_storage_base.rs (4)
get_vector_opt(90-90)get_vector_opt(876-920)is_deleted_vector(125-125)is_deleted_vector(1247-1291)
lib/collection/src/collection/mmr/lazy_matrix.rs (8)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)
vector_storage(220-221)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(535-535)vector_storage(621-621)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)
vector_storage(73-73)
lib/gridstore/src/gridstore.rs (4)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1)
get_value(34-34)lib/gridstore/benches/random_data_bench.rs (1)
storage(29-29)lib/gridstore/benches/real_data_bench.rs (1)
storage(91-91)lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
storage(168-168)storage(179-179)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
get_dense(88-92)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)
get_dense(236-238)get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (8)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)get_vector(138-141)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
storage(408-408)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)
storage(377-377)get_vector(182-189)get_vector_opt(191-197)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)
storage(614-614)get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)lib/segment/src/vector_storage/sparse/stored_sparse_vectors.rs (1)
try_from(50-63)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)
vector_storage(529-529)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)
get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)get_vector(216-219)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (5)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (3)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5)
get_vector(128-133)get_vector_opt(135-139)vectors(89-90)vectors(129-130)vectors(136-137)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
get_vector(182-189)get_vector_opt(191-197)lib/segment/src/vector_storage/vector_storage_base.rs (4)
get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
🔇 Additional comments (23)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1)
7-7: Importing Random is appropriate and consistent with the new API.
No issues.lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
145-153: LGTM: generic get_vector/get_vector_opt over AccessPattern.The consolidation to a single generic path is clear and keeps behavior identical while removing duplication.
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
118-121: LGTM: make access pattern explicit (Random).Using get_multi::(idx) matches the rest of the PR and makes intent explicit.
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
147-153: LGTM: generic get_dense via inner get_vector_opt::.
Behavior preserved with clearer API.
lib/segment/src/payload_storage/mmap_payload_storage.rs (1)
109-126: LGTM: correct mapping to random/sequential read paths.Swapping to get_value:: for random and get_value:: for sequential matches the new Gridstore API.
Also applies to: 136-161, 168-172, 179-183, 192-208
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (5)
23-23: Imports look correct for the new generic API.Bringing
Randomand the traits into scope matches the PR-wide changes.
54-56: Good: explicit access pattern at call site.
get_vector::<Random>(…)is the intended usage.
151-156: Good: iterator uses the new typed accessor.
storage2.get_vector::<Random>(i)aligns with the generic API.
245-251: Correct: verifying deletion viaget_vector_opt::<Random>(…).This checks the new API semantics properly.
253-258: Good: reading with explicit pattern in persistence check.Consistent with the consolidated API.
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)
22-23: Trait imports match the new design.OK to expose both
SparseVectorStorageandVectorStoragewithAccessPattern.
180-183: OK:get_sparsedelegates to genericget_sparse_opt.Error text preserved; API surface matches other backends.
216-219: OK:get_vectornow generic and falls back to default sparse.Matches other storage backends’ behavior.
224-229: OK:get_vector_optcomposes overget_sparse_opt::<P>.Error-eliding semantics preserved as before.
347-348: Tests importRandomappropriately.Matches the new explicit call-site pattern.
415-423: Good: test asserts throughget_vector::<Random>.Covers the new API surface.
185-196: Ignore this suggestion; the code is correct.storage.get_valuereturnsOption<V>, so callingmap(SparseVector::try_from)produces anOption<Result<_, _>>which.transpose()cleanly converts toResult<Option<_>, _>as intended.Likely an incorrect or invalid review comment.
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
17-19: Imports align with the consolidated API.OK to bring
AccessPatternandRandomhere.
104-109: OK:get_sparsedelegates to genericget_sparse_opt::<P>.Consistent with other backends.
111-118: OK: In-memory backend ignores access pattern at runtime.Signature is generic for uniformity; body remains O(1) lookups.
138-141: OK:get_vectoruses generic access pattern and default sparse fallback.Matches trait expectations.
146-152: OK:get_vector_optcomposes overget_sparse_opt::<P>.Comment accurately documents error-elision.
194-201: Intentional use ofRandomon delete path.Since
delete_vectorisn’t generic, defaulting toRandomfor the (rare) read-before-delete is reasonable.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
67-72: Missingsize_ofimport causes compile error
size_of::<TElement>()is used but not imported. Addsize_ofto the std::mem import.-use std::mem::MaybeUninit; +use std::mem::{size_of, MaybeUninit};
♻️ Duplicate comments (2)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
152-164: Batch dense reads should use Sequential.Keys are documented as consecutive; prefer
Sequentialto hint optimal prefetch/IO.Apply:
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
182-195: Batch multi-dense reads should use Sequential.Same rationale as dense batch.
Apply:
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));
🧹 Nitpick comments (10)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
40-56: AccessPattern API is minimal and effective.Marker types with a const flag keep costs at compile time. Consider adding short rustdoc comments on when to choose each pattern in public docs (optional).
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)
29-30: Prefer Sequential for linear GPU uploads.All these paths iterate ids in ascending order; using
Sequentialcan improve MMAP prefetch and IO.Apply:
-use crate::vector_storage::{ - DenseVectorStorage, MultiVectorStorage, Random, VectorStorage, VectorStorageEnum, -}; +use crate::vector_storage::{ + DenseVectorStorage, MultiVectorStorage, Random, Sequential, VectorStorage, VectorStorageEnum, +}; @@ - VectorElementTypeHalf::slice_from_float_cow(Cow::Borrowed( - vector_storage.get_dense::<Random>(id as PointOffsetType), - )) + VectorElementTypeHalf::slice_from_float_cow(Cow::Borrowed( + vector_storage.get_dense::<Sequential>(id as PointOffsetType), + )) @@ - VectorElementTypeHalf::slice_to_float_cow(Cow::Borrowed( - vector_storage.get_dense::<Random>(id as PointOffsetType), - )) + VectorElementTypeHalf::slice_to_float_cow(Cow::Borrowed( + vector_storage.get_dense::<Sequential>(id as PointOffsetType), + )) @@ - (0..vector_storage.total_vector_count()) - .map(|id| Cow::Borrowed(vector_storage.get_dense::<Random>(id as PointOffsetType))), + (0..vector_storage.total_vector_count()).map(|id| { + Cow::Borrowed(vector_storage.get_dense::<Sequential>(id as PointOffsetType)) + }), @@ - vector_storage - .get_multi::<Random>(id as PointOffsetType) + vector_storage + .get_multi::<Sequential>(id as PointOffsetType) @@ - vector_storage - .get_multi::<Random>(id as PointOffsetType) + vector_storage + .get_multi::<Sequential>(id as PointOffsetType) @@ - vector_storage - .get_multi::<Random>(id as PointOffsetType) + vector_storage + .get_multi::<Sequential>(id as PointOffsetType)Also applies to: 528-531, 556-559, 578-580, 597-601, 629-633, 656-660
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
182-189: Prefer explicit From over Into for CowVector.Match project style and improve readability.
Apply:
- .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .get_vector_opt::<P>(key) + .map(|vector| CowVector::from(T::slice_to_float_cow(vector.into()))) @@ - .get_vector_opt::<P>(key) - .map(|vector| T::slice_to_float_cow(vector.into()).into()) + .get_vector_opt::<P>(key) + .map(|vector| CowVector::from(T::slice_to_float_cow(vector.into())))Also applies to: 191-197
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
133-136: Batch read path isn’t pattern-parameterizedIf/when a generic batch API exists (e.g.,
get_batch_multi::<P>), threadPhere for consistency and potential sequential-IO optimization.lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2)
33-33: Prefer Sequential in linear storage scanThis loop walks ids in order; Sequential better reflects the access pattern and may improve readahead on mmap.
-use segment::vector_storage::{Random, VectorStorage}; +use segment::vector_storage::{Sequential, VectorStorage};
156-156: Use Sequential for ordered iteration- let vector = borrowed_vector_storage.get_vector::<Random>(id); + let vector = borrowed_vector_storage.get_vector::<Sequential>(id);lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
179-196: Minor: avoid duplicating disposable counter creationNot performance-critical, but slightly cleaner to allocate once.
- let storage = self.storage.read(); - let result = if P::IS_SEQUENTIAL { - storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured - } else { - storage.get_value::<false>(key, &HardwareCounterCell::disposable()) - }; + let storage = self.storage.read(); + let io_counter = HardwareCounterCell::disposable(); // Vector storage read IO not measured + let result = if P::IS_SEQUENTIAL { + storage.get_value::<true>(key, &io_counter) + } else { + storage.get_value::<false>(key, &io_counter) + };lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
197-199: Deletion path uses generic read—fine for RAM; consider direct accessOptional: you could read
self.vectors.get(...)directly here to avoidResultplumbing, since it’s in-memory and you ignore errors anyway.lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)
115-132: Prefer Sequential for inner bulk read in get_multi_optOffsets lookup can follow caller’s pattern P, but inner vectors are read contiguously via get_many; hinting Sequential there improves readahead on mmap without constraining callers’ outer pattern.
- .and_then(|mmap_offset| { - let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty"); - self.vectors.get_many::<P>( + .and_then(|mmap_offset| { + let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty"); + self.vectors.get_many::<Sequential>( mmap_offset.offset as VectorOffsetType, mmap_offset.count as usize, ) })
138-145: Replace unwraps with expects for clearer diagnostics; keep Sequential hintCurrent unwraps can panic without context on corruption or partial writes. Expect messages aid debugging and log triage, especially in prod.
- .get::<Sequential>(key as VectorOffsetType) - .unwrap() + .get::<Sequential>(key as VectorOffsetType) + .expect("offsets: missing entry for key") .first() - .unwrap(); + .expect("offsets: empty slice for key"); @@ - .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType) - .unwrap() + .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType) + .expect("vectors: missing inner slice")Optional: you could batch per-key with get_many:: and then yield chunks(dim) to reduce repeated lookups, but that may complicate the Iterator + Clone bound; fine to keep as-is.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (49)
lib/collection/src/collection/mmr/lazy_matrix.rs(2 hunks)lib/gridstore/benches/bustle_bench/payload_storage.rs(1 hunks)lib/gridstore/benches/random_data_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/gridstore.rs(16 hunks)lib/segment/benches/sparse_vector_storage.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(7 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers_healer.rs(2 hunks)lib/segment/src/index/hnsw_index/point_scorer.rs(2 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/payload_storage/mmap_payload_storage.rs(5 hunks)lib/segment/src/segment/segment_ops.rs(3 hunks)lib/segment/src/segment_constructor/batched_reader.rs(2 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(6 hunks)lib/segment/src/vector_storage/chunked_mmap_vectors.rs(3 hunks)lib/segment/src/vector_storage/chunked_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(11 hunks)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs(4 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs(3 hunks)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(3 hunks)lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs(3 hunks)lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs(2 hunks)lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs(4 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(4 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(7 hunks)lib/segment/tests/integration/sparse_vector_index_search_tests.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (38)
- lib/gridstore/benches/real_data_bench.rs
- lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
- lib/collection/src/collection/mmr/lazy_matrix.rs
- lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
- lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
- lib/segment/src/index/hnsw_index/graph_layers_healer.rs
- lib/segment/src/vector_storage/chunked_vector_storage.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/gridstore/benches/bustle_bench/payload_storage.rs
- lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
- lib/segment/src/index/hnsw_index/point_scorer.rs
- lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
- lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
- lib/segment/src/segment/segment_ops.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/gridstore/benches/random_data_bench.rs
- lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
- lib/gridstore/src/gridstore.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/segment_constructor/segment_constructor_base.rs
- lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/payload_storage/mmap_payload_storage.rs
- lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
- lib/segment/benches/sparse_vector_storage.rs
- lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
- lib/segment/src/segment_constructor/batched_reader.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
- lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
- lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
🧠 Learnings (14)
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Applied to files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-07-11T11:35:21.549Z
Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.
Applied to files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Applied to files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-08-21T13:45:05.899Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
Applied to files:
lib/segment/tests/integration/sparse_vector_index_search_tests.rslib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
📚 Learning: 2025-08-21T13:46:10.866Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.
Applied to files:
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rslib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rslib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rslib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rslib/segment/src/vector_storage/vector_storage_base.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
📚 Learning: 2025-05-15T22:54:30.292Z
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
📚 Learning: 2025-08-21T13:43:52.191Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Applied to files:
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/segment/src/vector_storage/chunked_mmap_vectors.rslib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.
Applied to files:
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.
Applied to files:
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
🧬 Code graph analysis (10)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
vector_storage(68-70)vector_storage(633-634)lib/segment/src/segment/segment_ops.rs (2)
vector_storage(381-381)vector_storage(609-609)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (6)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)get_vector(138-141)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/payload_storage/mmap_payload_storage.rs (2)
storage(168-168)storage(179-179)lib/segment/src/vector_storage/sparse/stored_sparse_vectors.rs (1)
try_from(50-63)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (3)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(535-535)vector_storage(621-621)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (6)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)vector_storage(598-599)vector_storage(630-631)vector_storage(657-658)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(535-535)vector_storage(621-621)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (8)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)
vector_storage(535-535)vector_storage(621-621)lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)
vector_storage(87-87)lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)
vector_storage(73-73)vector_storage(101-101)vector_storage(102-102)lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
vector_storage(120-120)lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)
vector_storage(97-97)vector_storage(134-134)vector_storage(135-135)lib/segment/src/vector_storage/chunked_vector_storage.rs (2)
get(20-22)get_many(52-52)lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
get(44-46)get_many(95-97)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)
get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)get_vector(216-219)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (5)
get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)get_vector(200-204)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_sparse(172-172)get_sparse_opt(173-176)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (4)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
vector_storage(91-92)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (7)
get_multi(110-112)get_multi_opt(111-111)get_multi_opt(115-132)get_multi_opt(192-192)get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4)
get_multi(184-186)get_multi_opt(185-185)get_multi_opt(189-204)get_multi_opt(250-250)lib/segment/src/vector_storage/vector_storage_base.rs (7)
get_multi(181-181)get_multi(193-193)get_multi_opt(182-185)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)
get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)lib/segment/src/vector_storage/vector_storage_base.rs (7)
get_multi(181-181)get_multi(193-193)get_multi_opt(182-185)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)
vector_storage(529-529)vector_storage(557-557)vector_storage(579-579)lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)
get_dense(88-92)get_vector(128-133)get_vector_opt(135-139)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)
get_dense(236-238)get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
get_dense(83-85)get_vector(105-107)get_vector_opt(106-106)lib/segment/src/vector_storage/vector_storage_base.rs (6)
get_dense(152-152)get_dense(162-162)get_vector(87-87)get_vector(830-874)get_vector_opt(90-90)get_vector_opt(876-920)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)
lib/segment/src/vector_storage/vector_storage_base.rs (12)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)
get_vector(128-133)get_vector_opt(135-139)get_dense(88-92)vectors(89-90)vectors(129-130)vectors(136-137)vector_dim(84-86)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)
get_vector(182-189)get_vector_opt(191-197)get_dense(147-153)vector_dim(143-145)lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)
get_vector(145-147)get_vector(163-163)get_vector(166-166)get_vector(214-214)get_vector_opt(146-146)get_vector_opt(150-153)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)
get_vector(258-260)get_vector_opt(259-259)get_vector_opt(263-268)get_dense(236-238)vector_dim(232-234)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)
get_vector(105-107)get_vector_opt(106-106)get_vector_opt(110-115)get_dense(83-85)vector_dim(79-81)lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)
get_vector(187-189)get_vector_opt(188-188)get_vector_opt(191-197)vectors(123-123)vectors(143-144)vector_dim(105-107)get_multi(110-112)get_multi_opt(111-111)get_multi_opt(115-132)get_multi_opt(192-192)lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)
get_vector(399-401)get_vector_opt(400-400)get_vector_opt(403-409)vector_dim(333-335)get_multi(338-340)get_multi_opt(339-339)get_multi_opt(343-358)get_multi_opt(404-404)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (6)
get_vector(245-247)vector_dim(179-181)get_multi(184-186)get_multi_opt(185-185)get_multi_opt(189-204)get_multi_opt(250-250)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)
get_vector(216-219)get_vector_opt(217-217)get_vector_opt(224-229)get_sparse(180-183)get_sparse_opt(181-181)get_sparse_opt(185-196)get_sparse_opt(225-225)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)
get_vector(200-204)get_vector_opt(202-202)get_vector_opt(209-214)get_sparse(150-159)get_sparse_opt(161-180)get_sparse_opt(210-210)get_sparse_opt(260-260)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)
get_vector(138-141)get_sparse(104-109)get_sparse_opt(105-106)get_sparse_opt(111-118)get_sparse_opt(148-148)get_sparse_opt(197-197)lib/common/common/src/maybe_uninit.rs (1)
maybe_uninit_fill_from(10-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: lint
- GitHub Check: integration-tests
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
🔇 Additional comments (35)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)
21-21: AccessPattern integration looks correct.Importing
AccessPatternhere is appropriate for the generic dispatch below.
350-352: Generic get(): good use of compile-time pattern.Routing via
P::IS_SEQUENTIALcleanly selects the access path without duplicating methods.
400-402: Generic get_many(): LGTM; keeps heuristic fallback.For random reads you still upgrade to sequential for large blocks, which preserves the earlier optimization.
lib/segment/src/vector_storage/vector_storage_base.rs (4)
87-91: VectorStorage generics: interface consolidation looks good.The unified generic methods eliminate duplication without altering semantics.
172-177: Sparse access genericization: LGTM.Traits now expose both patterns consistently.
830-874: Enum dispatch for get_vector: clean and exhaustive.No catch-all arms; follows guidelines.
876-919: Enum dispatch for get_vector_opt: LGTM.Consistent with the
get_vectorchange.lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)
23-24: Importing AccessPattern is correct here.Required for the new generic methods below.
147-153: get_dense genericization: LGTM.Delegates to
MmapDenseVectors::get_vector_opt::<P>with the same error semantics.
366-372: Tests updated to use typed access: OK.Using
<Random>against volatile storage is fine; access pattern is ignored in-memory.Also applies to: 377-379, 394-400, 456-458, 574-580, 644-647, 718-723
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (1)
8-8: Sparse scorer switch to typed access: LGTM.Volatile sparse storage doesn’t optimize by pattern;
Randomis appropriate and keeps intent clear.Also applies to: 57-58, 75-77, 84-89
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2)
18-18: Importing Random is correct for explicit access patternBrings the file in line with the generic AccessPattern API.
120-120: Confirm Random is the intended pattern herePer-ID scoring typically does random access; if upstream ever guarantees sequential walks, consider Sequential for potential I/O wins on mmapped backends.
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)
22-22: AccessPattern imported—API alignment looks goodMatches the trait signatures introduced in the refactor.
216-219: Genericget_vectormaps correctly toget_vector_optFalls back to default sparse when absent; consistent with existing trait semantics.
224-229: Genericget_vector_optdelegates throughget_sparse_opt::<P>Error-to-None mapping preserved as before; OK.
347-348: Test import updated for new APIRandom brings the generic call into scope; fine.
415-416: Test read updated toget_vector::<Random>Correct usage under the new API.
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (5)
17-19: AccessPattern/Random imports added—consistent with new trait signaturesNo behavior change for in-memory storage, just API alignment.
104-109: Genericget_sparsedelegates toget_sparse_opt::<P>Semantics preserved; clear error on missing vector.
111-118: In-memoryget_sparse_opt::<P>ignores pattern as expectedStraightforward and correct for RAM-backed storage.
138-141: Genericget_vector::<P>mirrors previous behaviorReturns default sparse when absent; OK.
146-152: Genericget_vector_opt::<P>correctly wraps sparse pathUses
CowVector::fromper code style; good.lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)
26-26: AccessPattern added to imports—API alignmentMatches trait changes; no functional impact.
337-340: Genericget_multi::<P>is a thin wrapper—OKPanic-on-missing kept consistent with previous non-generic API.
343-358: In-memoryget_multi_opt::<P>correctly ignores patternClean delegation to
ChunkedVectors; panic message is informative.
399-401: Genericget_vector::<P>routed throughget_vector_opt::<P>Matches trait expectations; OK.
403-409:get_vector_opt::<P>wraps multi-dense into CowVector correctlyUses explicit
fromconversion as per style guide.
483-483: Test imports Sequential appropriatelyReflects the intended sequential read in migration test.
572-574: Test updated toget_vector::<Sequential>—good match for contiguous readsKeeps test intent and aligns with the new API.
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5)
22-24: LGTM: correct imports for the new generic access patternBrings in AccessPattern, Random, and Sequential as needed by this file.
109-112: LGTM: unified generic get_multiSignature and delegation to get_multi_opt::
match the new trait API.
187-189: LGTM: get_vector now properly genericMatches the VectorStorage trait and preserves “panics if not found” semantics.
191-197: LGTM: get_vector_opt delegates via multivector conversionUses CowVector::MultiDense with correct float conversion path.
220-221: LGTM: read existing offset with RandomRandom pattern is appropriate for single-key metadata reads in insert path.
This PR removes duplications of
get_something/get_something_sequentialby replacing them with a single generic method. Similar to #6750.Before this PR:
After this PR: