Generic AccessPattern by xzfc · Pull Request #7166 · qdrant/qdrant

xzfc · 2025-08-28T13:08:48Z

This PR removes duplications of get_something/get_something_sequential by replacing them with a single generic method. Similar to #6750.

Before this PR:

fn get_dense(&self, key: PointOffsetType) -> &[T] {
    self.mmap_store
        .as_ref()
        .unwrap()
        .get_vector_opt(key)
        .unwrap_or_else(|| panic!("vector not found: {key}"))
}

fn get_dense_sequential(&self, key: PointOffsetType) -> &[T] {
    self.mmap_store
        .as_ref()
        .unwrap()
        .get_vector_opt_sequential(key)
        .unwrap_or_else(|| panic!("vector not found: {key}"))
}

let _ = vector_storage.get_dense(i);
let _ = vector_storage.get_dense_sequential(i);

After this PR:

fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
    self.mmap_store
        .as_ref()
        .unwrap()
        .get_vector_opt::<P>(key)
        .unwrap_or_else(|| panic!("vector not found: {key}"))
}

let _ = vector_storage.get_dense::<Random>(i);
let _ = vector_storage.get_dense::<Sequential>(i);

coderabbitai · 2025-08-28T13:14:35Z

📝 Walkthrough

Walkthrough

The diff genericizes vector/value access around an AccessPattern selector: adds an AccessPattern trait with marker types Random and Sequential, converts pattern-specific methods (e.g., get_vector_sequential, get_dense_sequential, get_sparse_sequential, get_multi_opt_sequential) into generic versions get_*::<P: AccessPattern>, and updates many concrete implementations, benchmarks, tests, HNSW, scorers, quantized storage, migrations, and gridstore to call the new generic APIs (including const-generic get_value::<READ_SEQUENTIAL>).

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

[gridstore] use seq mmap in iter #6750 — Introduces/changes const-generic gridstore read API (get_value::<READ_SEQUENTIAL>) used by many call sites in this diff.
use sequential reads for building quantized vectors #6796 — Modifies vector storage access patterns and related trait/signature changes; closely overlaps the AccessPattern genericization here.
Chunked mmap as a quantization storage #7116 — Updates call sites to the new generic vector access API (e.g., get_vector::<Random> / ChunkedVectorStorage::get::<Random>), similar to many edits in this diff.

Suggested labels

next patch

Suggested reviewers

timvisee
coszio

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch access-pattern

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

xzfc · 2025-08-28T14:24:54Z

@CodeRabbit review

coderabbitai · 2025-08-28T14:25:01Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

lib/gridstore/src/gridstore.rs (1)
238-263: Update get_value call sites and alias boolean consts for clarity
Critical: In lib/gridstore/src/gridstore.rs, add
pub const READ_RANDOM: bool = false;
pub const READ_SEQUENTIAL: bool = true;
then update every existing .get_value() invocation flagged by the verification script (across lib/shard, lib/segment, lib/collection, etc.) to include the const generic—e.g.
storage.get_value::<{READ_RANDOM}>(point, &hw_counter)
or
storage.get_value::<{READ_SEQUENTIAL}>(point, &hw_counter)
to resolve compilation errors.
Optional refactor: Replace the bool const generic with a trait-bound API—e.g.
pub fn get_value<P: AccessPattern>(&self, …) -> Option<V> {
 let raw = self.read_from_pages::<{P::SEQUENTIAL}>(…);
 …
}
for improved readability and consistency.
Optional refactor: Prevent panics on LZ4 corruption by propagating decompression errors (e.g. return a Result or map failures to None with logging) instead of unwrapping inside decompress.
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)
3-3: Fix missing import for size_of.

This file calls size_of::() but only imports MaybeUninit; this won’t compile.

Apply:
-use std::mem::MaybeUninit;
+use std::mem::{size_of, MaybeUninit};

🧹 Nitpick comments (33)

lib/gridstore/src/gridstore.rs (2)
717-748: Add at least one test path that exercises sequential reads via the new API.

Most reads in tests use ::<false>. To cover the new public API’s sequential variant, add a small readback using ::<true> (or a named const) in an already sequential access loop (e.g., right after initial inserts in test_put_payload). This guards the ::<true> specialization and future refactors of Page::read_value::<true>.

238-263: Cheap safeguard against panic on corrupt payloads.

While broader error-plumbing is bigger in scope, you can non-invasively replace unwrap() in the LZ4 path with ok()? and return None from get_value on decompression failure, preserving the Option<V> API. Sketch:
- fn decompress_lz4(value: &[u8]) -> Vec<u8> {
- lz4_flex::decompress_size_prepended(value).unwrap()
- }
+ fn decompress_lz4(value: &[u8]) -> Option<Vec<u8>> {
+ lz4_flex::decompress_size_prepended(value).ok()
+ }
@@
- fn decompress(&self, value: Vec<u8>) -> Vec<u8> {
+ fn decompress(&self, value: Vec<u8>) -> Option<Vec<u8>> {
 match self.config.compression {
- Compression::None => value,
- Compression::LZ4 => decompress_lz4(&value),
+ Compression::None => Some(value),
+ Compression::LZ4 => decompress_lz4(&value),
 }
 }
@@
- let decompressed = self.decompress(raw);
- let value = V::from_bytes(&decompressed);
- Some(value)
+ let decompressed = self.decompress(raw)?;
+ Some(V::from_bytes(&decompressed))
Repeat the same Option handling in delete_value, iter, and for_each_unfiltered. This avoids crashes on rare corruption without widening the public API.
lib/segment/src/segment_constructor/batched_reader.rs (1)
83-91: Make Sequential access actually sequential within a segment

Currently points per segment aren’t ordered by internal_id, so IO may still be scattered. Sort per-segment to maximize locality while preserving buffer positions.
- for (segment_index, points) in self.seg_to_points_buffer.drain() {
+ for (segment_index, mut points) in self.seg_to_points_buffer.drain() {
+ points.sort_unstable_by_key(|(pd, _)| pd.internal_id);
 let source_vector_storage = &self.source_vector_storages[segment_index.get() as usize];
 for (point_data, offset_in_batch) in points {
 let vec = source_vector_storage.get_vector::<Sequential>(point_data.internal_id);
 let vector_deleted =
 source_vector_storage.is_deleted_vector(point_data.internal_id);
 self.buffer[offset_in_batch] = (vec, vector_deleted);
 }
 }
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)

76-86: Confirm batch path doesn’t need AccessPattern.

score_stored_batch uses get_dense_batch(..) without an AccessPattern. If batch reads can benefit from Sequential, consider a P: AccessPattern parameter or document that batching already optimizes for locality internally.

lib/segment/tests/integration/sparse_vector_index_search_tests.rs (1)

33-35: Redundant #[cfg(test)] on an integration test import.

This file only builds under tests; the guard is unnecessary and slightly obscures intent. Optional cleanup: drop #[cfg(test)] here.

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

87-87: Make get_dense_batch generic over the access pattern
Currently, get_dense_batch isn’t generic, so you can’t call get_dense_batch::<Random>. Add a <Pattern> type parameter to its signature in vector_storage_base.rs, appendable_dense_vector_storage.rs, and memmap_dense_vector_storage.rs, then update the call at custom_query_scorer.rs:87 to get_dense_batch::<Random>(…).

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)

97-97: Optional: replace the per‐index get_multi::<Random>(idx) calls with a single generic get_batch_multi::<Random>(&keys) invocation to leverage the batch API.
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
17-17: Use Sequential here for strictly linear iteration over IDs.

This loop walks 0..total_vector_count(); hinting Sequential lets storages optimize prefetching and amortize per-call overhead.

Apply:
- use crate::vector_storage::{MultiVectorStorage, Random};
+ use crate::vector_storage::{MultiVectorStorage, Sequential};
@@
- .get_multi::<Random>(id as PointOffsetType)
+ .get_multi::<Sequential>(id as PointOffsetType)
Also applies to: 92-94
lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
1054-1054: Deduplicate per-function use Sequential imports (optional).

Move use crate::vector_storage::Sequential; to module scope under the same #[cfg(feature = "rocksdb")] to avoid repetition.

Apply outside-range module import:
+#[cfg(feature = "rocksdb")]
+use crate::vector_storage::Sequential;
Then remove in-function imports:
- use crate::vector_storage::Sequential;
Also applies to: 1143-1143, 1281-1281
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)
596-601: Use Sequential when only counting multivector lengths.

These loops walk ids monotonically to sum vectors_count(). Hinting Sequential lets MMAP backends leverage readahead with zero downside.

Apply:
- .get_multi::<Random>(id as PointOffsetType)
+ .get_multi::<Sequential>(id as PointOffsetType)
in:

new_multi_f32: lines around 596–603

new_multi_f16: lines around 629–633

new_multi: lines around 656–660

Also applies to: 629-633, 656-660
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
622-624: Prefer Sequential for multivector offsets scan.

Offsets are computed via 0..vectors_count; Sequential improves locality when reading metadata.
- .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType)
+ .map(|idx| vector_storage.get_multi::<Sequential>(idx).vectors_count() as PointOffsetType)
lib/collection/src/collection/mmr/lazy_matrix.rs (1)
36-36: Avoid truncation: use fallible conversion for index.

Prevent silent usize→u32 truncation in debug runs.
- let stored_vector = storage.get_vector::<Random>(i as u32);
+ let stored_vector =
+ storage.get_vector::<Random>(u32::try_from(i).expect("index overflow"));
lib/segment/src/segment/segment_ops.rs (1)
381-388: Defensive read: prefer get_vector_opt to avoid panic on rare races/corruption.

You bounded-checked, but using _opt(...).expect(...) makes intent explicit and safer against unexpected backends.
- let vector = vector_storage.get_vector::<Random>(point_offset);
+ let vector = vector_storage
+ .get_vector_opt::<Random>(point_offset)
+ .expect("vector missing despite prior bounds check");
 if vector_storage.is_on_disk() {
 hw_counter
 .vector_io_read()
 .incr_delta(vector.estimate_size_in_bytes());
 }
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
321-325: Use Sequential for linear scans in tests.

These loops read vectors in ascending order; Sequential hints can improve read-ahead on mmap backends.
-use crate::vector_storage::{
- DEFAULT_STOPPED, MultiVectorStorage, Random, VectorStorage, VectorStorageEnum,
-};
+use crate::vector_storage::{
+ DEFAULT_STOPPED, MultiVectorStorage, Random, Sequential, VectorStorage, VectorStorageEnum,
+};
@@
- assert!(
- storage
- .get_vector_opt::<Random>(id as PointOffsetType)
- .is_some()
- );
+ assert!(
+ storage
+ .get_vector_opt::<Sequential>(id as PointOffsetType)
+ .is_some()
+ );
@@
- assert!(
- storage
- .get_vector_opt::<Random>(id as PointOffsetType)
- .is_some()
- );
+ assert!(
+ storage
+ .get_vector_opt::<Sequential>(id as PointOffsetType)
+ .is_some()
+ );
@@
- assert!(
- storage
- .get_vector_opt::<Random>(id as PointOffsetType)
- .is_some()
- );
+ assert!(
+ storage
+ .get_vector_opt::<Sequential>(id as PointOffsetType)
+ .is_some()
+ );
@@
- assert!(
- storage
- .get_vector_opt::<Random>(id as PointOffsetType)
- .is_some()
- );
+ assert!(
+ storage
+ .get_vector_opt::<Sequential>(id as PointOffsetType)
+ .is_some()
+ );
Also applies to: 353-357, 400-404, 421-425
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)
88-92: Micro: add inline and unify expect message.

Tiny perf nit; message parity with get_vector.
- fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
+ #[inline(always)]
+ fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
 self.vectors
 .get::(key as VectorOffsetType)
- .expect("mmap vector not found")
+ .expect("vector not found")
 }
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)
45-47: Hot path: consider force-inlining.

These getters are called frequently.
- #[inline]
+ #[inline(always)]
 fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]> {
 self.mmap_storage.get::(key)
 }
96-98: Hot path: consider force-inlining.

Same rationale for batched contiguous reads.
- #[inline]
+ #[inline(always)]
 fn get_many<P: AccessPattern>(&self, key: VectorOffsetType, count: usize) -> Option<&[T]> {
 self.mmap_storage.get_many::(key, count)
 }
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
133-141: Add alignment guard around unsafe transmute.

This function assumes offset is aligned for T. Add a debug assertion to prevent UB if T ever changes (e.g., non-4-byte alignment).
 fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] {
 let mmap = if P::IS_SEQUENTIAL {
 &self.mmap_sequential
 } else {
 &self.mmap
 };
- let byte_slice = &mmap[offset..(offset + self.raw_size())];
- let arr: &[T] = unsafe { transmute(byte_slice) };
+ let byte_slice = &mmap[offset..(offset + self.raw_size())];
+ debug_assert_eq!(offset % std::mem::align_of::<T>(), 0, "unaligned offset for T");
+ let arr: &[T] = unsafe { transmute(byte_slice) };
 &arr[0..self.dim]
 }
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)
83-85: get_dense
: P is unused; acceptable but may monomorphize twice.

Optional: delegate to a private non-generic _get_dense to avoid codegen duplication across Random/Sequential.
- fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
- self.vectors.get(key as VectorOffsetType)
- }
+ fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
+ self._get_dense(key)
+ }
+
+impl<T: PrimitiveVectorElement> VolatileDenseVectorStorage<T> {
+ #[inline]
+ fn _get_dense(&self, key: PointOffsetType) -> &[T] {
+ self.vectors.get(key as VectorOffsetType)
+ }
+}
105-115: get_vector
/get_vector_opt
: same P-unaware semantics; fine.

Note: identical codegen for Random/Sequential; consider the same private-delegate pattern if code size matters.
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)
236-239: Clarify that P is intentionally ignored for in-RAM reads.

Add a short doc to avoid confusion for future readers.
 impl<T: PrimitiveVectorElement> DenseVectorStorage<T> for SimpleDenseVectorStorage<T> {
 fn vector_dim(&self) -> usize {
 self.dim
 }
 
- fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
+ /// In-memory storage; AccessPattern `P` does not change behavior.
+ fn get_dense<P: AccessPattern>(&self, key: PointOffsetType) -> &[T] {
 self.vectors.get(key as VectorOffsetType)
 }
 }
420-422: Add a Random-path assertion for parity.

Since RAM storage ignores access pattern, validate both markers.
- assert_eq!(
- new_storage.get_vector::<Sequential>(internal_id),
- CowVector::from(point),
- );
+ let got_seq = new_storage.get_vector::<Sequential>(internal_id);
+ assert_eq!(got_seq, CowVector::from(point));
+ let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id);
+ assert_eq!(got_rand, got_seq);
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (1)
189-204: No sequential optimizations in RAM: consider an explicit note.
- ) -> Option<TypedMultiDenseVectorRef<'_, T>> {
- // No sequential optimizations available for in memory storage.
+ ) -> Option<TypedMultiDenseVectorRef<'_, T>> {
+ // In-memory storage; AccessPattern `P` does not affect behavior.
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)
343-358: Clarify AccessPattern irrelevance for RAM.
- ) -> Option<TypedMultiDenseVectorRef<'_, T>> {
- // No sequential optimizations available for in memory storage.
+ ) -> Option<TypedMultiDenseVectorRef<'_, T>> {
+ // In-memory storage; AccessPattern `P` does not affect behavior.
572-574: Consider adding Random-path parity assertion.
- assert_eq!(
- new_storage.get_vector::<Sequential>(internal_id),
- CowVector::from(&multivec),
- );
+ let got_seq = new_storage.get_vector::<Sequential>(internal_id);
+ assert_eq!(got_seq, CowVector::from(&multivec));
+ let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id);
+ assert_eq!(got_rand, got_seq);
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)
150-159: Doc nit: method isn’t “already in memory.”

This storage uses RocksDB; wording can confuse. Suggest neutral wording.
- fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
- // Already in memory, so no sequential optimizations available.
+ fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
+ // No access-pattern optimizations available for this backend.
357-359: Optional: assert Random parity for non-deleted cases.
- assert_eq!(
- new_storage.get_vector::<Sequential>(internal_id),
- CowVector::from(vector),
- );
+ let got_seq = new_storage.get_vector::<Sequential>(internal_id);
+ assert_eq!(got_seq, CowVector::from(vector));
+ let got_rand = new_storage.get_vector::<crate::vector_storage::Random>(internal_id);
+ assert_eq!(got_rand, got_seq);
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
415-416: Add Sequential-path test to cover both branches.
- let result_vector = storage.get_vector::<Random>(0);
+ let result_vector = storage.get_vector::<Random>(0);
+ let result_vector_seq = storage.get_vector::<crate::vector_storage::Sequential>(0);
+ assert_eq!(result_vector_seq, result_vector);
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
134-148: Add a debug assertion to catch offset corruption early.

While keys iterate over 0..len(), a lightweight bound check helps catch rare offset/count corruption before indexing vectors.
 let mmap_offset = self
 .offsets
 .get::<Sequential>(key as VectorOffsetType)
 .unwrap()
 .first()
 .unwrap();
+ debug_assert!(
+ (mmap_offset.offset as usize + mmap_offset.count as usize) <= self.vectors.len(),
+ "multivector offset range is out of bounds"
+ );
 (0..mmap_offset.count).map(|i| {
 self.vectors
 .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType)
 .unwrap()
 })
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
194-201: Avoid unnecessary read in delete_vector().

update_stored() ignores the provided vector when deleted == true (elements_added guarded by !deleted). Skip the extra lookup/clone to reduce overhead.
- if is_deleted {
- let old_vector = self.get_sparse_opt::<Random>(key).ok().flatten();
- self.update_stored(key, true, old_vector.as_ref());
- }
+ if is_deleted {
+ self.update_stored(key, true, None);
+ }
lib/segment/src/vector_storage/vector_storage_base.rs (3)
40-56: Seal AccessPattern to prevent external impls.

Limiting implementers to Random/Sequential avoids misuse and future semver hazards while keeping call sites generic.
-pub trait AccessPattern: Copy {
+mod __sealed {
+ pub trait Sealed {}
+}
+
+pub trait AccessPattern: __sealed::Sealed + Copy {
 const IS_SEQUENTIAL: bool;
 }
 
 #[derive(Copy, Clone)]
 pub struct Random;
 
 #[derive(Copy, Clone)]
 pub struct Sequential;
 
-impl AccessPattern for Random {
+impl __sealed::Sealed for Random {}
+impl __sealed::Sealed for Sequential {}
+
+impl AccessPattern for Random {
 const IS_SEQUENTIAL: bool = false;
 }
 
-impl AccessPattern for Sequential {
+impl AccessPattern for Sequential {
 const IS_SEQUENTIAL: bool = true;
 }
158-164: Prefer Sequential in get_dense_batch() for consecutive keys.

The doc says keys are consecutive; sequential hint should yield better OS readahead on mmap-backed stores.
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
191-195: Prefer Sequential in get_batch_multi() for consecutive keys.

Same rationale as dense batch.
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d78290b and 9f7ccb5.

📒 Files selected for processing (49)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (11 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (7 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (20)

📓 Common learnings

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/segment/segment_ops.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-07-02T17:10:13.847Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.

Applied to files:

lib/gridstore/benches/real_data_bench.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/gridstore/benches/random_data_bench.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/graph_layers_healer.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/index/sparse_index/sparse_vector_index.rs

📚 Learning: 2025-04-22T23:19:51.232Z

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.

Applied to files:

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-05-19T14:40:20.068Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

lib/segment/src/vector_storage/chunked_vector_storage.rs

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs

lib/segment/src/index/hnsw_index/gpu/mod.rs

generall · 2025-08-28T14:51:32Z

Is there any advantage of this, apart from making access pattern explicit?

generall · 2025-08-28T14:52:15Z

ok, I presume less code as well

xzfc · 2025-08-28T15:07:16Z

Is there any advantage of this, apart from making access pattern explicit?
ok, I presume less code as well

Less code; and it's more convenient to base new methods¹ on this to get both options at once.

My next PR would include new method VectorStorage::get_vector_bytes_opt<P: AccessPattern>. ↩

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
533-539: Sequential read is good for full scan; also avoid usize→PointOffsetType overflow.

Iterate over 0..count and convert each index with try_into() to prevent truncation.
- let vectors = (0..count as PointOffsetType).map(|i| {
+ let vectors = (0..count).map(|i| {
+ let i: PointOffsetType = i.try_into().expect("vector_count exceeds PointOffsetType");
 PrimitiveVectorElement::quantization_preprocess(
 quantization_config,
 distance,
- vector_storage.get_dense::<Sequential>(i),
+ vector_storage.get_dense::<Sequential>(i),
 )
 });
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)
64-80: Bounds check uses vectors.len() instead of offsets.len().

set_deleted should validate keys against the number of multivectors (offsets), not the number of flattened inner vectors. Current check can incorrectly allow clearing deletion on a non-existent key.

Fix:
- if !deleted && self.vectors.len() <= key as usize {
+ if !deleted && self.offsets.len() <= key as usize {
 return false;
 }

🧹 Nitpick comments (9)

lib/collection/src/collection/mmr/lazy_matrix.rs (1)

36-36: Avoid lossy cast in debug assertion index.

Use a checked conversion to PointOffsetType to prevent silent truncation on large collections.

-                let stored_vector = storage.get_vector::<Random>(i as u32);
+                let idx: PointOffsetType = i.try_into().expect("index overflow");
+                let stored_vector = storage.get_vector::<Random>(idx);

lib/segment/src/vector_storage/chunked_vector_storage.rs (1)

20-22: Remove unnecessary Self: Sized bound for consistency.

The trait isn’t object-safe anyway due to generics; keep both methods consistent.
- fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]>
- where
- Self: Sized;
+ fn get<P: AccessPattern>(&self, key: VectorOffsetType) -> Option<&[T]>;

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

128-133: Prefer explicit Cow::from over into().

Align with repo guideline to avoid x.into() in new code.

-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+            .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))

135-139: Prefer explicit Cow::from over into().

-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+            .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

182-189: Use Cow::from instead of into() chains.

Keeps conversions explicit per style guide.

-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| CowVector::from(T::slice_to_float_cow(Cow::from(vector))))

191-197: Same here: make conversions explicit.

-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| CowVector::from(T::slice_to_float_cow(Cow::from(vector))))

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

263-268: Prefer explicit Cow::from over into().

-        self.vectors
-            .get_opt(key as VectorOffsetType)
-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+        self.vectors
+            .get_opt(key as VectorOffsetType)
+            .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1)

110-115: Prefer explicit Cow::from over into().

-        self.vectors
-            .get_opt(key as VectorOffsetType)
-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+        self.vectors
+            .get_opt(key as VectorOffsetType)
+            .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))

lib/segment/src/vector_storage/vector_storage_base.rs (1)

40-56: Seal AccessPattern to prevent unsound external impls.

Avoid third-party implementations that could misdeclare IS_SEQUENTIAL. Seal the trait:

+mod private {
+    pub trait Sealed {}
+}
-pub trait AccessPattern: Copy {
+pub trait AccessPattern: Copy + private::Sealed {
     const IS_SEQUENTIAL: bool;
 }
 
 #[derive(Copy, Clone)]
 pub struct Random;
 
 #[derive(Copy, Clone)]
 pub struct Sequential;
 
 impl AccessPattern for Random {
     const IS_SEQUENTIAL: bool = false;
 }
 
 impl AccessPattern for Sequential {
     const IS_SEQUENTIAL: bool = true;
 }
+
+impl private::Sealed for Random {}
+impl private::Sealed for Sequential {}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7ccb5 and a7a23cd.

📒 Files selected for processing (49)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (11 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (7 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (34)

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/gridstore/benches/bustle_bench/payload_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/segment/segment_ops.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/gridstore/benches/real_data_bench.rs
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/gridstore/benches/random_data_bench.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/gridstore/benches/real_data_bench.rs
lib/gridstore/benches/random_data_bench.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

🧠 Learnings (13)

📓 Common learnings

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

📚 Learning: 2025-07-02T17:10:13.847Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.

Applied to files:

lib/gridstore/benches/real_data_bench.rs
lib/gridstore/benches/random_data_bench.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs

📚 Learning: 2025-05-19T14:40:20.068Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

lib/segment/src/vector_storage/chunked_vector_storage.rs

📚 Learning: 2025-04-22T23:19:51.232Z

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.

Applied to files:

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

🧬 Code graph analysis (11)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (8)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)

vector_storage (61-63)

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (3)

vector_storage (55-57)

vector_storage (73-75)

vector_storage (82-84)

lib/collection/src/collection/mmr/lazy_matrix.rs (5)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)

vector_storage (220-221)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (11)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)

vector_storage (61-63)

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

get (20-22)

get_many (52-52)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (45-47)

get_many (96-98)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

get_dense (88-92)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

get_dense (236-238)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

get_dense (88-92)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

get_dense (236-238)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

get_vector (138-141)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

storage (168-168)

storage (179-179)

lib/segment/src/vector_storage/chunked_vector_storage.rs (3)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (4)

get (271-273)

get (362-364)

get_many (277-300)

get_many (412-414)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (45-47)

get_many (96-98)

lib/segment/src/vector_storage/chunked_vectors.rs (2)

get (44-46)

get_many (61-72)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

lib/segment/src/vector_storage/vector_storage_base.rs (7)

get_multi (181-181)

get_multi (193-193)

get_multi_opt (182-185)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

get_dense (88-92)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

get_dense (147-153)

get_vector (182-189)

get_vector_opt (191-197)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (4)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

get_vector_opt (110-115)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/vector_storage_base.rs (10)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)

get_vector (128-133)

get_vector_opt (135-139)

get_dense (88-92)

vectors (89-90)

vectors (129-130)

vectors (136-137)

vector_dim (84-86)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)

get_vector (182-189)

get_vector_opt (191-197)

get_dense (147-153)

vector_dim (143-145)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

get_dense (236-238)

vector_dim (232-234)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)

get_vector (105-107)

get_vector_opt (106-106)

get_vector_opt (110-115)

get_dense (83-85)

vector_dim (79-81)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

vectors (123-123)

vectors (143-144)

vector_dim (105-107)

get_multi (110-112)

get_multi_opt (111-111)

get_multi_opt (115-132)

get_multi_opt (192-192)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

vector_dim (333-335)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)

get_vector (216-219)

get_vector_opt (217-217)

get_vector_opt (224-229)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

lib/common/common/src/maybe_uninit.rs (1)

maybe_uninit_fill_from (10-33)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4)

lib/common/common/src/maybe_uninit.rs (2)

transmute (30-30)

maybe_uninit_fill_from (10-33)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5)

get_vector (128-133)

get_vector_opt (135-139)

vectors (89-90)

vectors (129-130)

vectors (136-137)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

get_vector (182-189)

get_vector_opt (191-197)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: storage-compat-test
GitHub Check: test-consensus-compose
GitHub Check: test-consistency
GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)

🔇 Additional comments (25)

lib/collection/src/collection/mmr/lazy_matrix.rs (1)

5-5: Importing Random for debug-only check is correct.

Matches the new generic API and is scoped under debug assertions.

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2)

15-15: Import of Random aligns this scorer with the new access-pattern API.

87-90: I’ve pulled the relevant snippet around the batch call; analyzing now.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

41-43: Remove unused VectorStorage import
Drop VectorStorage from the use statement in lib/segment/src/vector_storage/quantized/quantized_vectors.rs (line 42).

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

8-8: AccessPattern import added — OK.

52-53: Generic get_many signature looks good and matches the new pattern.

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)

21-21: AccessPattern import — OK.

362-364: Trait get correctly delegates to the inherent method with P::IS_SEQUENTIAL.

Prevents recursion and preserves existing logic.

412-414: Trait get_many delegation is correct and mirrors prior behavior.

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (2)

26-26: Good import consolidation for AccessPattern types.

145-153: API unification via P: AccessPattern looks correct.

The generic get_vector/get_vector_opt and the batched path correctly switch between Sequential/Random.

Also applies to: 163-168

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)

88-91: Dense getter genericization looks good.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

147-153: Good: get_dense now respects AccessPattern.

369-371: Tests updated to generic API look correct.

Also applies to: 377-378, 396-399, 457-458, 576-579, 646-649, 720-723

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

236-238: Generic get_dense
for in-RAM storage: OK.

258-260: OK to delegate to get_vector_opt::
.

349-349: Test adjustments to Sequential look good.

Also applies to: 420-422

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)

83-85: Generic get_dense
matches in-memory semantics.

105-107: OK to delegate to get_vector_opt::
.
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)
179-183: AccessPattern generalization: read-path selection looks correct.

Const-generic dispatch to Gridstore’s get_value::<true/false> based on P::IS_SEQUENTIAL is clean, and transpose() preserves error semantics for try_from. No issues.

Also applies to: 185-196

216-219: Vector accessors: API unification LGTM.

get_vector:: delegates to get_vector_opt:: with the same not-found semantics as before. Good.

Also applies to: 224-229

22-22: Imports and test updates consistent with new API.

Importing AccessPattern/Random and migrating tests to get_vector::<Random> is correct.

Also applies to: 347-348, 415-416

71-76: Add +1 to deleted.get_bitslice().last_one() before max()
storage.max_point_id() returns the next point offset (pointer_count), so only the last_one() index needs a +1:
- let next_point_offset = deleted
- .get_bitslice()
- .last_one()
- .max(Some(storage.max_point_id() as usize))
- .unwrap_or_default();
+ let next_point_offset = deleted
+ .get_bitslice()
+ .last_one()
+ .map(|i| i + 1)
+ .max(Some(storage.max_point_id() as usize))
+ .unwrap_or_default();
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)

22-24: AccessPattern adoption across multi-dense storage looks good.

Generic get_multi/get_multi_opt/get_vector/get_vector_opt are consistent with the trait changes.

Using Sequential in iterate_inner_vectors is the right call for scan-heavy paths.

Reading existing offset with get::<Random> in insert_vector is appropriate.

Also applies to: 109-112, 115-132, 138-145, 187-189, 191-197, 218-223

lib/segment/src/vector_storage/vector_storage_base.rs (1)

830-873: Enum dispatch updated comprehensively.

All variants forward get_vector/get_vector_opt with ; exhaustive and consistent with feature gates.

Also applies to: 876-919

coderabbitai · 2025-08-29T10:05:35Z

lib/gridstore/benches/real_data_bench.rs

        b.iter(|| {
            for i in 0..storage.max_point_id() {
-                let res = storage.get_value(i, &hw_counter).unwrap();
+                let res = storage.get_value::<false>(i, &hw_counter).unwrap();


💡 Verification agent

🧩 Analysis chain

“scan storage” should use the sequential read path

This is a linear scan; prefer the sequential specialization.

- let res = storage.get_value::<false>(i, &hw_counter).unwrap(); + let res = storage.get_value::<true>(i, &hw_counter).unwrap();

To verify other benches aren’t accidentally using the random path for linear scans, run:

Expected: linear scans use ::<true>, random-index benches (if any) use ::<false>.

🏁 Script executed:

#!/bin/bash rg -nC2 --type=rust 'get_value::<' lib/gridstore/benches | sed -n '1,120p'

Length of output: 1445

Enable sequential read path in real_data_bench
Change the call in lib/gridstore/benches/real_data_bench.rs line 91 from

let res = storage.get_value::<false>(i, &hw_counter).unwrap();

to

-let res = storage.get_value::<false>(i, &hw_counter).unwrap(); +let res = storage.get_value::<true>(i, &hw_counter).unwrap();

[random_data_bench remains correct with ::<false>, and bustle_bench’s use of ::<false> in its own context can be left as is]

🤖 Prompt for AI Agents

In lib/gridstore/benches/real_data_bench.rs around line 91, the call uses the random-access specialization get_value::<false> which disables the sequential read path; change the type parameter to true so it reads sequentially by replacing get_value::<false> with get_value::<true> at that location.

coderabbitai · 2025-08-29T10:05:35Z

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

+    fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] {
+        let mmap = if P::IS_SEQUENTIAL {
+            &self.mmap_sequential
+        } else {
+            &self.mmap
+        };
+        let byte_slice = &mmap[offset..(offset + self.raw_size())];
        let arr: &[T] = unsafe { transmute(byte_slice) };
        &arr[0..self.dim]
    }


🛠️ Refactor suggestion

Replace unsafe transmute with bytemuck cast_slice.

Avoid raw transmute on &[u8]→&[T]; use bytemuck for soundness and readability.

- let byte_slice = &mmap[offset..(offset + self.raw_size())]; - let arr: &[T] = unsafe { transmute(byte_slice) }; - &arr[0..self.dim] + let byte_slice = &mmap[offset..offset + self.raw_size()]; + let arr: &[T] = bytemuck::cast_slice(byte_slice); + arr

Add import (outside selection):

use bytemuck::cast_slice;

🤖 Prompt for AI Agents

In lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs around lines 133 to 142, replace the unsafe transmute from &[u8] to &[T] with bytemuck::cast_slice for soundness and readability: add the import use bytemuck::cast_slice at the top of the file, then change the conversion to let arr: &[T] = cast_slice(byte_slice); and continue returning &arr[0..self.dim]; ensure T implements the required bytemuck traits (e.g., Pod) where needed.

coderabbitai · 2025-08-29T10:05:35Z

lib/segment/src/vector_storage/quantized/quantized_vectors.rs

        let offsets = (0..vectors_count as PointOffsetType)
-            .map(|idx| vector_storage.get_multi(idx).vectors_count() as PointOffsetType)
+            .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType)
            .scan(0, |offset_acc, multi_vector_len| {


🛠️ Refactor suggestion

Prefer Sequential for offset scan and avoid lossy cast.

This loop walks all ids; Sequential improves locality. Also replace as PointOffsetType with try_into().

- let offsets = (0..vectors_count as PointOffsetType) - .map(|idx| vector_storage.get_multi::<Random>(idx).vectors_count() as PointOffsetType) + let offsets = (0..vectors_count) + .map(|i| { + let idx: PointOffsetType = i.try_into().expect("vector_count exceeds PointOffsetType"); + vector_storage + .get_multi::<Sequential>(idx) + .vectors_count() + .try_into() + .expect("inner vectors_count exceeds PointOffsetType") + })

🤖 Prompt for AI Agents

In lib/segment/src/vector_storage/quantized/quantized_vectors.rs around lines 622 to 624, the offset accumulation should use Sequential access for better locality and avoid a lossy cast; change get_multi::<Random> to get_multi::<Sequential> and replace the `as PointOffsetType` cast with a fallible conversion using try_into(), handling the Result (e.g., propagate or call expect with a clear message) so you don't silently truncate values.

coderabbitai · 2025-08-29T10:05:35Z

lib/segment/src/vector_storage/vector_storage_base.rs

+        let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+        maybe_uninit_fill_from(vectors, iter).0
    }


🛠️ Refactor suggestion

Batch dense reads should prefer Sequential.

Keys are documented as consecutive; prefer Sequential to hint optimal prefetch/IO.

- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));

maybe_uninit_fill_from(vectors, iter).0

}

let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));

maybe_uninit_fill_from(vectors, iter).0

}

🤖 Prompt for AI Agents

In lib/segment/src/vector_storage/vector_storage_base.rs around lines 162 to 164, the batch dense read currently calls self.get_dense::<Random>(*key) for each key even though keys are documented as consecutive; change the generic type parameter from Random to Sequential (i.e., call self.get_dense::<Sequential>(*key)) so the underlying implementation can prefer sequential prefetch/IO for better performance when reading consecutive keys.

coderabbitai · 2025-08-29T10:05:35Z

lib/segment/src/vector_storage/vector_storage_base.rs

+        let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+        maybe_uninit_fill_from(vectors, iter).0
    }


🛠️ Refactor suggestion

Batch multi-dense reads should prefer Sequential.

Same rationale as dense batch.

- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key)); + let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));

maybe_uninit_fill_from(vectors, iter).0

}

let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

maybe_uninit_fill_from(vectors, iter).0

}

🤖 Prompt for AI Agents

In lib/segment/src/vector_storage/vector_storage_base.rs around lines 193 to 195, the batch multi-dense read is calling self.get_multi::<Random> but should use the Sequential access pattern like other dense batch reads; change the generic parameter from Random to Sequential for get_multi so the iterator prefers sequential access, leaving the surrounding maybe_uninit_fill_from call unchanged.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

lib/gridstore/src/gridstore.rs (1)

244-263: Avoid panic on corrupt data: handle LZ4 decompression errors instead of unwrap()

If on-disk bytes are corrupted or truncated, decompress_size_prepended can fail. Currently this path panics via decompress_lz4(...).unwrap(), bringing the whole process down on a single bad entry. Return None instead to keep semantics consistent with “missing/unreadable value” and avoid a crash.

Apply this diff within get_value:
 let raw = self.read_from_pages::<READ_SEQUENTIAL>(page_id, block_offset, length);

 hw_counter.payload_io_read_counter().incr_delta(raw.len());

- let decompressed = self.decompress(raw);
- let value = V::from_bytes(&decompressed);
-
- Some(value)
+ let decompressed = match self.config.compression {
+ Compression::None => raw,
+ Compression::LZ4 => match lz4_flex::decompress_size_prepended(&raw) {
+ Ok(v) => v,
+ Err(_) => return None,
+ },
+ };
+ Some(V::from_bytes(&decompressed))
Note: This confines the change to the new API surface without refactoring the existing decompress() helper globally. If desired, we can later introduce a fallible decompress_result() and migrate all internal call sites.

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

149-159: Bug: get_sparse ignores deleted flag; delegate through get_sparse_opt for consistency.

As implemented, get_sparse returns vectors even if the record is marked deleted, unlike other sparse storages (mmap/volatile) that route via get_sparse_opt and error on None. Unify semantics and avoid duplication.

Apply:

-    fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
-        // Already in memory, so no sequential optimizations available.
-        let bin_key = bincode::serialize(&key)
-            .map_err(|_| OperationError::service_error("Cannot serialize sparse vector key"))?;
-        let data = self.db_wrapper.get(bin_key)?;
-        let record: StoredSparseVector = bincode::deserialize(&data).map_err(|_| {
-            OperationError::service_error("Cannot deserialize sparse vector from db")
-        })?;
-        Ok(record.vector)
-    }
+    fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
+        self.get_sparse_opt::<P>(key)?
+            .ok_or_else(|| OperationError::service_error(format!("Key {key} not found")))
+    }

260-266: Don't swallow IO/deserialize errors during delete; propagate instead.

ok().flatten() hides failures, breaking size accounting on error. Return the error and keep totals correct.

Apply:

-            let old_vector = self.get_sparse_opt::<Random>(key).ok().flatten();
+            let old_vector = self.get_sparse_opt::<Random>(key)?;
             self.update_stored(
                 key,
                 true,
-                old_vector.as_ref(),
+                old_vector.as_ref(),
                 &HardwareCounterCell::disposable(), // We don't measure deletions
             )?;

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)

119-132: Avoid panic in get_multi_opt; eliminate expect and return None on malformed/empty offset.

This path returns Option; panicking on empty offset defeats that contract.

Apply:

-        self.offsets
-            .get::<P>(key as VectorOffsetType)
-            .and_then(|mmap_offset| {
-                let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty");
-                self.vectors.get_many::<P>(
-                    mmap_offset.offset as VectorOffsetType,
-                    mmap_offset.count as usize,
-                )
-            })
+        self.offsets
+            .get::<P>(key as VectorOffsetType)
+            .and_then(|arr| {
+                arr.first().and_then(|o| {
+                    self.vectors
+                        .get_many::<P>(o.offset as VectorOffsetType, o.count as usize)
+                })
+            })
             .map(|flattened_vectors| TypedMultiDenseVectorRef {
                 flattened_vectors,
                 dim: self.vectors.dim(),
             })

♻️ Duplicate comments (3)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)
96-103: Guard u32 casts and fix accumulator literal type

Keep the previous overflow concern addressed and ensure the accumulator is usize. Add a debug check (or a hard check) before casting.
- .scan(0, |acc, count| {
+ .scan(0usize, |acc, count| {
 let start = *acc;
 *acc += count;
+ debug_assert!(start <= u32::MAX as usize && count <= u32::MAX as usize);
 Some(GpuMultivectorOffset {
 start: start as u32,
 count: count as u32,
 })
 }),
If you prefer a hard check instead of debug_assert!, refactor to perform a bounded cast with u32::try_from(...) and return an OperationError on overflow.
lib/segment/src/vector_storage/vector_storage_base.rs (2)
162-164: Batch dense reads should prefer Sequential.

Apply:
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
193-195: Batch multi-dense reads should prefer Sequential.

Apply:
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

🧹 Nitpick comments (8)

lib/gridstore/src/gridstore.rs (1)

629-629: Minor ergonomics: consider small helpers to reduce turbofish noise in tests

Optional: add tiny wrappers in the test module for readability, e.g., fn get(storage: &Gridstore<_>, i, hw) -> _ { storage.get_value::<false>(i, hw) }.

Also applies to: 648-648, 676-676, 736-736, 745-745, 769-769, 779-779, 808-808, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1100, 1150-1150

lib/gridstore/benches/random_data_bench.rs (1)

27-31: Harden the benchmark against optimizer effects

Wrap loop vars/results in black_box to avoid potential elision and to better isolate read path costs.

Apply this diff:

-        b.iter(|| {
+        b.iter(|| {
             for i in 0..PAYLOAD_COUNT {
-                let res = storage.get_value::<false>(i, &hw_counter);
+                let i = criterion::black_box(i);
+                let res = storage.get_value::<false>(i, &hw_counter);
+                criterion::black_box(&res);
                 assert!(res.is_some());
             }
         });

Also add the import:

use criterion::{black_box, BatchSize, Criterion, criterion_group, criterion_main};

lib/segment/src/segment_constructor/segment_constructor_base.rs (1)

1054-1055: Sequential access for migration loops — LGTM; consider hoisting import

Using get_vector::<Sequential> in linear migrations is correct. To avoid repeated local imports, consider importing Sequential at module scope.

Possible tidy-up:
- use crate::vector_storage::{VectorStorage, VectorStorageEnum};
+ use crate::vector_storage::{Sequential, VectorStorage, VectorStorageEnum};
Then drop the function-scoped use crate::vector_storage::Sequential; lines.

Also applies to: 1083-1086, 1143-1144, 1174-1176, 1281-1282, 1306-1308

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)

88-91: Improve panic context.
Include key in the panic to aid debugging.

-            .expect("mmap vector not found")
+            .unwrap_or_else(|| panic!("mmap vector not found: {key}"))

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

185-196: Collapse branch to a single const-generic call.
Slightly cleaner and avoids duplicate branches.

-        let storage = self.storage.read();
-        let result = if P::IS_SEQUENTIAL {
-            storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured
-        } else {
-            storage.get_value::<false>(key, &HardwareCounterCell::disposable())
-        };
+        let storage = self.storage.read();
+        // Vector storage read IO not measured
+        let result = storage.get_value::<{ P::IS_SEQUENTIAL }>(
+            key,
+            &HardwareCounterCell::disposable(),
+        );

224-229: Optional: log suppressed errors at debug level.
Keeps “ignore errors” semantics while aiding diagnosis.

-        match self.get_sparse_opt::<P>(key) {
-            Ok(Some(vector)) => Some(CowVector::from(vector)),
-            _ => None,
-        }
+        match self.get_sparse_opt::<P>(key) {
+            Ok(Some(vector)) => Some(CowVector::from(vector)),
+            Err(err) => {
+                log::debug!("get_sparse_opt failed for key {key}: {err}");
+                None
+            }
+            _ => None,
+        }

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)

134-148: Optional: make iterate_inner_vectors robust to missing/empty offsets.

Unwraps can panic on sparse/deleted keys or partially flushed state. Iterating defensively is cheap here.

Apply:

-        (0..self.total_vector_count()).flat_map(|key| {
-            let mmap_offset = self
-                .offsets
-                .get::<Sequential>(key as VectorOffsetType)
-                .unwrap()
-                .first()
-                .unwrap();
-            (0..mmap_offset.count).map(|i| {
-                self.vectors
-                    .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType)
-                    .unwrap()
-            })
-        })
+        (0..self.total_vector_count()).flat_map(|key| {
+            self.offsets
+                .get::<Sequential>(key as VectorOffsetType)
+                .and_then(|arr| arr.first().copied())
+                .into_iter()
+                .flat_map(move |o| {
+                    (0..o.count).filter_map(move |i| {
+                        self.vectors
+                            .get::<Sequential>((o.offset + i) as VectorOffsetType)
+                    })
+                })
+        })

lib/segment/src/vector_storage/vector_storage_base.rs (1)

40-56: Seal AccessPattern to prevent external impls.

AccessPattern is part of a performance-sensitive API; sealing avoids third-party types altering invariants.

Apply:

+mod sealed {
+    pub trait Sealed {}
+}
-pub trait AccessPattern: Copy {
+pub trait AccessPattern: sealed::Sealed + Copy {
     const IS_SEQUENTIAL: bool;
 }
 
 #[derive(Copy, Clone)]
 pub struct Random;
 
 #[derive(Copy, Clone)]
 pub struct Sequential;
 
 impl AccessPattern for Random {
     const IS_SEQUENTIAL: bool = false;
 }
 
 impl AccessPattern for Sequential {
     const IS_SEQUENTIAL: bool = true;
 }

And add below (outside the range) to implement the seal:

impl sealed::Sealed for Random {}
impl sealed::Sealed for Sequential {}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a7a23cd and c6fa936.

📒 Files selected for processing (49)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (11 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (7 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (34)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/gridstore/benches/real_data_bench.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/segment/segment_ops.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/gridstore/benches/bustle_bench/payload_storage.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/gridstore/benches/random_data_bench.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/gridstore/benches/random_data_bench.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

🧠 Learnings (12)

📓 Common learnings

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-07-02T17:10:13.847Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.

Applied to files:

lib/gridstore/benches/random_data_bench.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

🧬 Code graph analysis (14)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (17)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)

vector_storage (529-529)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)

vector_storage (61-63)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

get_dense (147-153)

get_vector (182-189)

get_vector_opt (191-197)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

get_dense (236-238)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (3)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (3)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (3)

get_vector (216-219)

get_vector_opt (217-217)

get_vector_opt (224-229)

lib/segment/src/segment_constructor/batched_reader.rs (5)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)

vector_storage (220-221)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/segment_constructor/segment_constructor_base.rs (6)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)

vector_storage (220-221)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (4)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

get_vector (138-141)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

storage (168-168)

storage (179-179)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (4)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

get_vector_opt (110-115)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (7)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)

vector_storage (97-97)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (4)

get (271-273)

get (362-364)

get_many (277-300)

get_many (412-414)

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

get (20-22)

get_many (52-52)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (2)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (4)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

get_vector (216-219)

get_vector_opt (217-217)

get_vector_opt (224-229)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

get_vector (138-141)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/gridstore/src/gridstore.rs (5)

lib/gridstore/benches/bustle_bench/payload_storage.rs (1)

get_value (34-34)

lib/gridstore/benches/random_data_bench.rs (1)

storage (29-29)

lib/gridstore/benches/real_data_bench.rs (1)

storage (91-91)

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

storage (168-168)

storage (179-179)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

storage (191-191)

storage (193-193)

storage (380-380)

storage (415-415)

storage (452-452)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (537-537)

vector_storage (623-623)

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

get (20-22)

get_many (52-52)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (45-47)

get_many (96-98)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

lib/segment/src/vector_storage/vector_storage_base.rs (7)

get_multi (181-181)

get_multi (193-193)

get_multi_opt (182-185)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/vector_storage_base.rs (11)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)

get_vector (128-133)

get_vector_opt (135-139)

get_dense (88-92)

vectors (89-90)

vectors (129-130)

vectors (136-137)

vector_dim (84-86)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)

get_vector (182-189)

get_vector_opt (191-197)

get_dense (147-153)

vector_dim (143-145)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

get_dense (236-238)

vector_dim (232-234)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)

get_vector (105-107)

get_vector_opt (106-106)

get_vector_opt (110-115)

get_dense (83-85)

vector_dim (79-81)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

vectors (123-123)

vectors (143-144)

vector_dim (105-107)

get_multi (110-112)

get_multi_opt (111-111)

get_multi_opt (115-132)

get_multi_opt (192-192)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

vector_dim (333-335)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)

get_vector (216-219)

get_vector_opt (217-217)

get_vector_opt (224-229)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_vector (138-141)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

lib/common/common/src/maybe_uninit.rs (1)

maybe_uninit_fill_from (10-33)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: storage-compat-test
GitHub Check: lint

🔇 Additional comments (33)

lib/gridstore/src/gridstore.rs (2)

238-248: API consolidation to const-generic get_value is good; counter usage is correct

Single entry point with const READ_SEQUENTIAL: bool reduces duplication. Accounting storage read bytes via payload_io_read_counter().incr_delta(raw.len()) aligns with measuring IO, not compute.

629-629: Call site migration to the new const-generic API looks correct

All updated tests use get_value::<false> for non-sequential access where appropriate; sequential paths elsewhere still use <true>. This matches the intended semantics.

Also applies to: 648-648, 676-676, 736-736, 745-745, 769-769, 779-779, 808-808, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1100, 1150-1150

lib/gridstore/benches/random_data_bench.rs (1)

29-29: Bench call updated to new API — good

Switch to get_value::<false> matches the consolidated API.

lib/segment/src/segment_constructor/batched_reader.rs (1)

11-11: Switch to generic Sequential access — LGTM

Importing Sequential and switching to get_vector::<Sequential> aligns with the new AccessPattern API and preserves intended locality.

Also applies to: 86-90

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)

17-18: Use Random for scorer reads — LGTM

Explicit get_multi::<Random> is appropriate for these random-access scoring paths and keeps counters correct.

Also applies to: 97-103, 134-141

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)

29-30: Test updated to Random access — LGTM

Matches the new generic API; no further changes needed.

Also applies to: 146-149

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

17-18: GPU path switched to get_multi::<Random> — LGTM

Import consolidation and explicit access pattern are correct for GPU staging.

Also applies to: 91-94

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

9-9: LGTM: AccessPattern import hookup is correct.

45-47: LGTM: Generic forwarding to underlying storage.
Both get and get_many correctly delegate to ChunkedMmapVectors::
. No issues spotted.

Also applies to: 96-98

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

26-26: LGTM: Adopted AccessPattern in the prelude.

128-133: LGTM: get_vector now generic and zero-cost maps to CowVector.

135-139: LGTM: get_vector_opt generic path is consistent with other backends.

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)

21-21: LGTM: AccessPattern import for const-based dispatch.

361-364: LGTM: Generic get uses P::IS_SEQUENTIAL to select slice type.

412-414: LGTM: Generic get_many mirrors get behavior.

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

26-26: LGTM: AccessPattern added to imports.

236-238: LGTM: Generic Dense/Vector retrieval for in-memory backend.
AccessPattern is intentionally ignored (RAM). API aligns with trait changes.

Also applies to: 258-260, 262-268

348-348: LGTM: Tests updated to generic API (Sequential).

Also applies to: 420-420

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (4)

22-22: LGTM: AccessPattern imported for sparse mmap backend.

180-183: LGTM: get_sparse now delegates via generic opt-variant.

216-219: LGTM: get_vector generic returns default sparse on miss.

347-347: LGTM: Tests aligned to generic API (Random).

Also applies to: 415-415

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (6)

23-25: LGTM: adopts AccessPattern generics where needed.

161-180: LGTM: get_sparse_opt honors deletion and maps decode errors.

200-204: LGTM: generic get_vector delegates to opt and falls back to default sparse.

209-214: LGTM: get_vector_opt correctly composes from get_sparse_opt.

294-294: LGTM: test imports Sequential for explicit access pattern.

357-359: LGTM: test uses get_vector:: after migration.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

22-24: LGTM: imports AccessPattern markers and traits.

110-112: LGTM: get_multi now generic and delegates to opt variant.

187-197: LGTM: vector getters wrap multi-dense into CowVector with AccessPattern.

218-223: LGTM: reading offsets with Random for metadata probe is appropriate.

lib/segment/src/vector_storage/vector_storage_base.rs (1)

830-874: LGTM: enum dispatch passes through AccessPattern generics across all variants.

Also applies to: 876-919

coderabbitai · 2025-08-29T10:19:58Z

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs

+    fn get_sparse<P: AccessPattern>(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
+        self.get_sparse_opt::<P>(key)?
            .ok_or_else(|| OperationError::service_error(format!("Key {key} not found")))
    }

-    fn get_sparse_opt(&self, key: PointOffsetType) -> OperationResult<Option<SparseVector>> {
-        self.storage
-            .read()
-            .get_value(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured
-            .map(SparseVector::try_from)
-            .transpose()
-    }
-
-    fn get_sparse_sequential(&self, key: PointOffsetType) -> OperationResult<SparseVector> {
-        let stored_sparse = self
-            .storage
-            .read()
-            .get_value_sequential(key, &HardwareCounterCell::disposable())
-            .ok_or_else(|| OperationError::service_error(format!("Key {key} not found")))?;
-        SparseVector::try_from(stored_sparse)
+    fn get_sparse_opt<P: AccessPattern>(
+        &self,
+        key: PointOffsetType,
+    ) -> OperationResult<Option<SparseVector>> {
+        let storage = self.storage.read();
+        let result = if P::IS_SEQUENTIAL {
+            storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured
+        } else {
+            storage.get_value::<false>(key, &HardwareCounterCell::disposable())
+        };
+        result.map(SparseVector::try_from).transpose()
    }


💡 Verification agent

🧩 Analysis chain

Repo-wide check for removed sequential methods and generic call sites.
Run to ensure no stale usages and help catch missed updates.

Also applies to: 216-229

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Ensure removed methods aren't referenced rg -nP '(get_(vector|dense|many|sparse)_sequential)\s*$' -S || echo "OK: no *_sequential calls found" # 2) Spot potential non-generic get_vector/dense calls at call sites # Heuristic: a dot call without a::<...> following the method name. rg -nP '\.\s*(get_vector|get_dense|get_sparse)\s*\(' -g '!**/vector_storage_base.rs' -g '!**/*.md' -S || echo "OK: all calls appear genericized" # 3) For ChunkedMmapVectors, ensure no callers use the old bool parameter (should be internal only) rg -nP 'get\([^,]+,\s*(true|false)$' lib/segment/src -S || echo "OK: no external bool-based get/get_many calls"

Length of output: 1660

Add generic AccessPattern parameters to get_vector/get_dense/get_sparse calls
Found non-generic calls in:

src/common/inference/query_requests_rest.rs (165, 174, 183)

src/common/inference/query_requests_grpc.rs (345, 359, 372)

src/common/inference/update_requests.rs (430, 437, 444)

lib/segment/src/segment/snapshot.rs (146)

lib/segment/src/segment_constructor/batched_reader.rs (64)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (135)

These invocations must be updated to use the generic form, e.g. get_vector::(…), with the appropriate AccessPattern.

🤖 Prompt for AI Agents

In lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs around lines 180 to 196, the public API methods expect an AccessPattern generic but many call sites use non-generic invocations; update callers to call the generic form (e.g., replace get_sparse(key) with get_sparse::(key) where P implements AccessPattern), importing or forwarding the concrete AccessPattern type (e.g., Sequential or Random) as needed. Apply the same change to all listed files/locations (src/common/inference/query_requests_rest.rs lines ~165/174/183, src/common/inference/query_requests_grpc.rs ~345/359/372, src/common/inference/update_requests.rs ~430/437/444, lib/segment/src/segment/snapshot.rs ~146, lib/segment/src/segment_constructor/batched_reader.rs ~64, lib/segment/src/index/hnsw_index/graph_links/serializer.rs ~135) so each get_vector/get_dense/get_sparse call uses the explicit generic parameter (or thread a generic P through the caller) and add any necessary use/imports for the concrete AccessPattern types.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/gridstore/src/gridstore.rs (1)
238-264: Replace all non-generic get_value calls with the const-generic API
Existing call-sites such as
.get_value(&JsonPath::new("k"))
must be updated to specify the READ_SEQUENTIAL const parameter (e.g. get_value::<false> or get_value::<true>). Tests and code under lib/collection, lib/shard, lib/segment, and lib/posting_list still use the old form and will fail to compile until updated.

♻️ Duplicate comments (1)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)
133-142: Replace unsafe transmute with bytemuck::cast_slice (soundness).

Casting &[u8] to &[T] with transmute is UB on misalignment and hides invariants. Use bytemuck::cast_slice; it checks size/alignment and requires T: Pod, which PrimitiveVectorElement types satisfy in this codebase. Also drop the redundant slicing.
- fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] {
+ fn raw_vector_offset<P: AccessPattern>(&self, offset: usize) -> &[T] {
 let mmap = if P::IS_SEQUENTIAL {
 &self.mmap_sequential
 } else {
 &self.mmap
 };
- let byte_slice = &mmap[offset..(offset + self.raw_size())];
- let arr: &[T] = unsafe { transmute(byte_slice) };
- &arr[0..self.dim]
+ let byte_slice = &mmap[offset..offset + self.raw_size()];
+ let arr: &[T] = bytemuck::cast_slice(byte_slice);
+ arr
 }
Add import and remove transmute:
-use std::mem::{self, MaybeUninit, size_of, transmute};
+use std::mem::{self, MaybeUninit, size_of};
+use bytemuck::cast_slice;

🧹 Nitpick comments (13)

lib/collection/src/collection/mmr/lazy_matrix.rs (2)
5-5: LGTM: debug-only import scope is correct.

Importing Random and the trait only in debug builds matches usage in the guarded block.

Optional: if you decide to validate both access patterns in debug (see next comment), extend the import:
-#[cfg(debug_assertions)]
-use segment::vector_storage::{Random, VectorStorage};
+#[cfg(debug_assertions)]
+use segment::vector_storage::{Random, Sequential, VectorStorage};
36-36: Use PointOffsetType (and optionally validate both access patterns).

Prefer casting to PointOffsetType for resilience if the alias changes.

Optional: since the loop is sequential and this is debug-only, also fetch via Sequential to catch pattern-specific regressions.

Apply:
- let stored_vector = storage.get_vector::<Random>(i as u32);
+ let stored_vector = storage.get_vector::<Random>(i as PointOffsetType);
+ // Optional: also verify the sequential path in debug
+ // let stored_vector_seq = storage.get_vector::<Sequential>(i as PointOffsetType);
+ // assert_eq!(stored_vector_seq.to_owned(), *vector);
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1)

22-24: Refine get_vector_data in chunked mmap storage.

Casting PointOffsetType (u32) to VectorOffsetType (usize) is non‐narrowing on all supported targets; a try_from guard can be deferred as an optional future‐proofing measure.

Quantization code invokes get_vector_data sequentially across vectors; if predictable linear scans dominate hot paths, consider a sequential readahead pattern instead of Random.

For readability, invoke the trait on the data directly:
self.data.get::<Random>(key).unwrap_or_default().

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)

208-216: Consider using Sequential for obviously ordered scans on non-Linux.

process_points_simple always uses Random. If ids are mostly ascending/contiguous, using Sequential may improve readahead. Optional: detect monotonic/non-decreasing sequences and pick P accordingly.
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)
88-92: Nit: error message wording.

"mmap vector not found" may be misleading when S is an in-RAM variant. Consider a storage-agnostic message.
- .expect("mmap vector not found")
+ .expect("vector not found")
128-139: Prefer explicit From over Into (repo guideline).

Replace slice.into() with Cow::from(slice) and the final into() with CowVector::from(...).
 fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> {
 self.vectors
 .get::(key as VectorOffsetType)
- .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+ .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice))))
 .expect("Vector not found")
 }

 fn get_vector_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<CowVector<'_>> {
 self.vectors
 .get::(key as VectorOffsetType)
- .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+ .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice))))
 }
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
182-189: Prefer explicit From over Into (repo guideline).

Make the conversions explicit for clarity.
 fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> {
 self.mmap_store
 .as_ref()
 .unwrap()
 .get_vector_opt::(key)
- .map(|vector| T::slice_to_float_cow(vector.into()).into())
+ .map(|vector| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(vector))))
 .expect("Vector not found")
 }
191-197: Prefer explicit From over Into (repo guideline).

Mirror the change in get_vector for the optional path.
 fn get_vector_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<CowVector<'_>> {
 self.mmap_store
 .as_ref()
 .unwrap()
 .get_vector_opt::(key)
- .map(|vector| T::slice_to_float_cow(vector.into()).into())
+ .map(|vector| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(vector))))
 }
lib/gridstore/src/gridstore.rs (3)
238-264: Public const-bool API: add inline and avoid “bool blindness”.

Good consolidation. Two tweaks:

Add #[inline] (hot path).

Provide named consts (RANDOM/SEQUENTIAL) to improve call-site readability vs literals true/false.

Apply inline:
- pub fn get_value<const READ_SEQUENTIAL: bool>(
+ #[inline]
+ pub fn get_value<const READ_SEQUENTIAL: bool>(
Optionally add named consts (outside this hunk, e.g. near CONFIG_FILENAME):
pub mod read_pattern {
 pub const RANDOM: bool = false;
 pub const SEQUENTIAL: bool = true;
}
Then call sites can use:
storage.get_value::<{ read_pattern::RANDOM }>(...)

259-263: Decompression panics on corrupt data; prefer non-panicking path.

decompress_lz4(...).unwrap() can crash the process on malformed bytes. Consider returning Result from decompress()/decompress_lz4() and mapping to an error (or Option) at this layer.

Example (outside this hunk):
fn decompress_lz4(value: &[u8]) -> std::io::Result<Vec<u8>> {
 lz4_flex::decompress_size_prepended(value).map_err(|e| std::io::Error::other(e))
}
fn decompress(&self, value: Vec<u8>) -> std::io::Result<Vec<u8>> { ... }
And here:
let decompressed = self.decompress(raw).ok()?;
630-630: Use named consts for readability in tests.

get_value::<false/true> is clear to us now, but named consts make intent obvious to future readers.

If you add:
use crate::read_pattern::{RANDOM, SEQUENTIAL};
then replace literals with:

get_value::<{RANDOM}> for random access

get_value::<{SEQUENTIAL}> for sequential access

Also applies to: 649-649, 676-676, 736-736, 744-744, 769-769, 780-780, 808-808, 844-844, 926-926, 947-947, 982-982, 1000-1000, 1026-1026, 1038-1038, 1095-1097, 1099-1100, 1150-1150
lib/segment/src/payload_storage/mmap_payload_storage.rs (1)
109-126: Prefer named consts over raw booleans at call sites.

To avoid boolean ambiguity, import and use named consts if added in Gridstore.

Example:
use gridstore::read_pattern::{RANDOM, SEQUENTIAL};

match guard.get_value::<{RANDOM}>(point_id, hw_counter) { ... }

self.storage.read().get_value::<{SEQUENTIAL}>(point_id, hw_counter)
Also applies to: 136-161, 168-172, 179-183, 192-208
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
46-50: Prefer explicit From over Into in tests (repo guideline).

Replace vec.into() with VectorRef::from(vec) for clarity and consistency with the codebase rules.

Apply this diff:
@@
- storage
- .insert_vector(i as PointOffsetType, vec.into(), &hw_counter)
+ use crate::data_types::vectors::VectorRef;
+ storage
+ .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter)
 .unwrap();
@@
- .insert_vector(i as PointOffsetType, vec.into(), &hw_counter)
+ .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter)
 .unwrap();
@@
- storage
- .insert_vector(i as PointOffsetType, vec.into(), &hw_counter)
+ use crate::data_types::vectors::VectorRef;
+ storage
+ .insert_vector(i as PointOffsetType, VectorRef::from(vec), &hw_counter)
 .unwrap();
Also applies to: 141-146, 226-231

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c6fa936 and db557ce.

📒 Files selected for processing (49)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (11 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (7 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (38)

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/gridstore/benches/random_data_bench.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/segment/segment_ops.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/gridstore/benches/bustle_bench/payload_storage.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/gridstore/benches/real_data_bench.rs
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs

🧠 Learnings (13)

📓 Common learnings

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs

📚 Learning: 2025-08-29T08:16:28.512Z

Learnt from: CR
PR: qdrant/qdrant#0
File: .github/review-rules.md:0-0
Timestamp: 2025-08-29T08:16:28.512Z
Learning: Applies to **/*.rs : Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Applied to files:

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

📚 Learning: 2025-04-22T23:19:51.232Z

Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.

Applied to files:

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs

🧬 Code graph analysis (10)

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (2)

get (259-261)

get (350-352)

lib/segment/src/vector_storage/chunked_vector_storage.rs (1)

get (20-22)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (7)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)

vector_storage (61-63)

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (3)

vector_storage (55-57)

vector_storage (73-75)

vector_storage (82-84)

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

storage (408-408)

get_vector_opt (135-139)

is_deleted_vector (199-201)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (8)

storage (191-191)

storage (193-193)

storage (380-380)

storage (415-415)

storage (452-452)

get_vector_opt (217-217)

get_vector_opt (224-229)

is_deleted_vector (306-308)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_opt (90-90)

get_vector_opt (876-920)

is_deleted_vector (125-125)

is_deleted_vector (1247-1291)

lib/collection/src/collection/mmr/lazy_matrix.rs (8)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/hnsw_index/graph_layers_healer.rs (1)

vector_storage (220-221)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (1)

vector_storage (73-73)

lib/gridstore/src/gridstore.rs (4)

lib/gridstore/benches/bustle_bench/payload_storage.rs (1)

get_value (34-34)

lib/gridstore/benches/random_data_bench.rs (1)

storage (29-29)

lib/gridstore/benches/real_data_bench.rs (1)

storage (91-91)

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

storage (168-168)

storage (179-179)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

get_dense (88-92)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

get_dense (236-238)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (8)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

get_vector (138-141)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

storage (408-408)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

storage (377-377)

get_vector (182-189)

get_vector_opt (191-197)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

storage (614-614)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

lib/segment/src/vector_storage/sparse/stored_sparse_vectors.rs (1)

try_from (50-63)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)

vector_storage (529-529)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

get_vector (216-219)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (5)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (3)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5)

get_vector (128-133)

get_vector_opt (135-139)

vectors (89-90)

vectors (129-130)

vectors (136-137)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

get_vector (182-189)

get_vector_opt (191-197)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: integration-tests
GitHub Check: integration-tests-consensus
GitHub Check: test-consistency
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consensus-compose
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: lint
GitHub Check: storage-compat-test
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)

🔇 Additional comments (23)

lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1)

7-7: Importing Random is appropriate and consistent with the new API.
No issues.

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (1)

145-153: LGTM: generic get_vector/get_vector_opt over AccessPattern.

The consolidation to a single generic path is clear and keeps behavior identical while removing duplication.

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

118-121: LGTM: make access pattern explicit (Random).

Using get_multi::(idx) matches the rest of the PR and makes intent explicit.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

147-153: LGTM: generic get_dense via inner get_vector_opt::
.

Behavior preserved with clearer API.

lib/segment/src/payload_storage/mmap_payload_storage.rs (1)

109-126: LGTM: correct mapping to random/sequential read paths.

Swapping to get_value:: for random and get_value:: for sequential matches the new Gridstore API.

Also applies to: 136-161, 168-172, 179-183, 192-208

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (5)

23-23: Imports look correct for the new generic API.

Bringing Random and the traits into scope matches the PR-wide changes.

54-56: Good: explicit access pattern at call site.

get_vector::<Random>(…) is the intended usage.

151-156: Good: iterator uses the new typed accessor.

storage2.get_vector::<Random>(i) aligns with the generic API.

245-251: Correct: verifying deletion via get_vector_opt::<Random>(…).

This checks the new API semantics properly.

253-258: Good: reading with explicit pattern in persistence check.

Consistent with the consolidated API.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)

22-23: Trait imports match the new design.

OK to expose both SparseVectorStorage and VectorStorage with AccessPattern.

180-183: OK: get_sparse delegates to generic get_sparse_opt.

Error text preserved; API surface matches other backends.

216-219: OK: get_vector now generic and falls back to default sparse.

Matches other storage backends’ behavior.

224-229: OK: get_vector_opt composes over get_sparse_opt::.

Error-eliding semantics preserved as before.

347-348: Tests import Random appropriately.

Matches the new explicit call-site pattern.

415-423: Good: test asserts through get_vector::<Random>.

Covers the new API surface.

185-196: Ignore this suggestion; the code is correct. storage.get_value returns Option<V>, so calling map(SparseVector::try_from) produces an Option<Result<_, _>> which .transpose() cleanly converts to Result<Option<_>, _> as intended.

Likely an incorrect or invalid review comment.

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

17-19: Imports align with the consolidated API.

OK to bring AccessPattern and Random here.

104-109: OK: get_sparse delegates to generic get_sparse_opt::.

Consistent with other backends.

111-118: OK: In-memory backend ignores access pattern at runtime.

Signature is generic for uniformity; body remains O(1) lookups.

138-141: OK: get_vector uses generic access pattern and default sparse fallback.

Matches trait expectations.

146-152: OK: get_vector_opt composes over get_sparse_opt::.

Comment accurately documents error-elision.

194-201: Intentional use of Random on delete path.

Since delete_vector isn’t generic, defaulting to Random for the (rare) read-before-delete is reasonable.

generall

Pls fix linter

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)
67-72: Missing size_of import causes compile error

size_of::<TElement>() is used but not imported. Add size_of to the std::mem import.
-use std::mem::MaybeUninit;
+use std::mem::{size_of, MaybeUninit};

♻️ Duplicate comments (2)

lib/segment/src/vector_storage/vector_storage_base.rs (2)
152-164: Batch dense reads should use Sequential.

Keys are documented as consecutive; prefer Sequential to hint optimal prefetch/IO.

Apply:
- let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
182-195: Batch multi-dense reads should use Sequential.

Same rationale as dense batch.

Apply:
- let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+ let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

🧹 Nitpick comments (10)

lib/segment/src/vector_storage/vector_storage_base.rs (1)

40-56: AccessPattern API is minimal and effective.

Marker types with a const flag keep costs at compile time. Consider adding short rustdoc comments on when to choose each pattern in public docs (optional).

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (1)

29-30: Prefer Sequential for linear GPU uploads.

All these paths iterate ids in ascending order; using Sequential can improve MMAP prefetch and IO.

Apply:

-use crate::vector_storage::{
-    DenseVectorStorage, MultiVectorStorage, Random, VectorStorage, VectorStorageEnum,
-};
+use crate::vector_storage::{
+    DenseVectorStorage, MultiVectorStorage, Random, Sequential, VectorStorage, VectorStorageEnum,
+};
@@
-                    VectorElementTypeHalf::slice_from_float_cow(Cow::Borrowed(
-                        vector_storage.get_dense::<Random>(id as PointOffsetType),
-                    ))
+                    VectorElementTypeHalf::slice_from_float_cow(Cow::Borrowed(
+                        vector_storage.get_dense::<Sequential>(id as PointOffsetType),
+                    ))
@@
-                    VectorElementTypeHalf::slice_to_float_cow(Cow::Borrowed(
-                        vector_storage.get_dense::<Random>(id as PointOffsetType),
-                    ))
+                    VectorElementTypeHalf::slice_to_float_cow(Cow::Borrowed(
+                        vector_storage.get_dense::<Sequential>(id as PointOffsetType),
+                    ))
@@
-            (0..vector_storage.total_vector_count())
-                .map(|id| Cow::Borrowed(vector_storage.get_dense::<Random>(id as PointOffsetType))),
+            (0..vector_storage.total_vector_count()).map(|id| {
+                Cow::Borrowed(vector_storage.get_dense::<Sequential>(id as PointOffsetType))
+            }),
@@
-                        vector_storage
-                            .get_multi::<Random>(id as PointOffsetType)
+                        vector_storage
+                            .get_multi::<Sequential>(id as PointOffsetType)
@@
-                        vector_storage
-                            .get_multi::<Random>(id as PointOffsetType)
+                        vector_storage
+                            .get_multi::<Sequential>(id as PointOffsetType)
@@
-                    vector_storage
-                        .get_multi::<Random>(id as PointOffsetType)
+                    vector_storage
+                        .get_multi::<Sequential>(id as PointOffsetType)

Also applies to: 528-531, 556-559, 578-580, 597-601, 629-633, 656-660

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

182-189: Prefer explicit From over Into for CowVector.

Match project style and improve readability.

Apply:

-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| CowVector::from(T::slice_to_float_cow(vector.into())))
@@
-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| CowVector::from(T::slice_to_float_cow(vector.into())))

Also applies to: 191-197

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

133-136: Batch read path isn’t pattern-parameterized

If/when a generic batch API exists (e.g., get_batch_multi::), thread P here for consistency and potential sequential-IO optimization.

lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2)

33-33: Prefer Sequential in linear storage scan

This loop walks ids in order; Sequential better reflects the access pattern and may improve readahead on mmap.
-use segment::vector_storage::{Random, VectorStorage};
+use segment::vector_storage::{Sequential, VectorStorage};
156-156: Use Sequential for ordered iteration
- let vector = borrowed_vector_storage.get_vector::<Random>(id);
+ let vector = borrowed_vector_storage.get_vector::<Sequential>(id);

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)

179-196: Minor: avoid duplicating disposable counter creation

Not performance-critical, but slightly cleaner to allocate once.

-        let storage = self.storage.read();
-        let result = if P::IS_SEQUENTIAL {
-            storage.get_value::<true>(key, &HardwareCounterCell::disposable()) // Vector storage read IO not measured
-        } else {
-            storage.get_value::<false>(key, &HardwareCounterCell::disposable())
-        };
+        let storage = self.storage.read();
+        let io_counter = HardwareCounterCell::disposable(); // Vector storage read IO not measured
+        let result = if P::IS_SEQUENTIAL {
+            storage.get_value::<true>(key, &io_counter)
+        } else {
+            storage.get_value::<false>(key, &io_counter)
+        };

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)

197-199: Deletion path uses generic read—fine for RAM; consider direct access

Optional: you could read self.vectors.get(...) directly here to avoid Result plumbing, since it’s in-memory and you ignore errors anyway.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

115-132: Prefer Sequential for inner bulk read in get_multi_opt

Offsets lookup can follow caller’s pattern P, but inner vectors are read contiguously via get_many; hinting Sequential there improves readahead on mmap without constraining callers’ outer pattern.

-            .and_then(|mmap_offset| {
-                let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty");
-                self.vectors.get_many::<P>(
+            .and_then(|mmap_offset| {
+                let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty");
+                self.vectors.get_many::<Sequential>(
                     mmap_offset.offset as VectorOffsetType,
                     mmap_offset.count as usize,
                 )
             })

138-145: Replace unwraps with expects for clearer diagnostics; keep Sequential hint

Current unwraps can panic without context on corruption or partial writes. Expect messages aid debugging and log triage, especially in prod.

-                .get::<Sequential>(key as VectorOffsetType)
-                .unwrap()
+                .get::<Sequential>(key as VectorOffsetType)
+                .expect("offsets: missing entry for key")
                 .first()
-                .unwrap();
+                .expect("offsets: empty slice for key");
@@
-                    .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType)
-                    .unwrap()
+                    .get::<Sequential>((mmap_offset.offset + i) as VectorOffsetType)
+                    .expect("vectors: missing inner slice")

Optional: you could batch per-key with get_many:: and then yield chunks(dim) to reduce repeated lookups, but that may complicate the Iterator + Clone bound; fine to keep as-is.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db557ce and 175be8a.

📒 Files selected for processing (49)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (11 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (7 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (38)

lib/gridstore/benches/real_data_bench.rs
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/gridstore/benches/bustle_bench/payload_storage.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/segment/segment_ops.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/gridstore/benches/random_data_bench.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

🧠 Learnings (14)

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs

📚 Learning: 2025-05-15T22:54:30.292Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

🧬 Code graph analysis (10)

lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (6)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

get_vector (138-141)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/payload_storage/mmap_payload_storage.rs (2)

storage (168-168)

storage (179-179)

lib/segment/src/vector_storage/sparse/stored_sparse_vectors.rs (1)

try_from (50-63)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (3)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (6)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (8)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

get (20-22)

get_many (52-52)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (44-46)

get_many (95-97)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

get_vector (216-219)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (5)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

get_vector (200-204)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_sparse (172-172)

get_sparse_opt (173-176)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (4)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (7)

get_multi (110-112)

get_multi_opt (111-111)

get_multi_opt (115-132)

get_multi_opt (192-192)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4)

get_multi (184-186)

get_multi_opt (185-185)

get_multi_opt (189-204)

get_multi_opt (250-250)

lib/segment/src/vector_storage/vector_storage_base.rs (7)

get_multi (181-181)

get_multi (193-193)

get_multi_opt (182-185)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

lib/segment/src/vector_storage/vector_storage_base.rs (7)

get_multi (181-181)

get_multi (193-193)

get_multi_opt (182-185)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

get_dense (88-92)

get_vector (128-133)

get_vector_opt (135-139)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

get_dense (236-238)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

get_dense (83-85)

get_vector (105-107)

get_vector_opt (106-106)

lib/segment/src/vector_storage/vector_storage_base.rs (6)

get_dense (152-152)

get_dense (162-162)

get_vector (87-87)

get_vector (830-874)

get_vector_opt (90-90)

get_vector_opt (876-920)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/vector_storage_base.rs (12)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (7)

get_vector (128-133)

get_vector_opt (135-139)

get_dense (88-92)

vectors (89-90)

vectors (129-130)

vectors (136-137)

vector_dim (84-86)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)

get_vector (182-189)

get_vector_opt (191-197)

get_dense (147-153)

vector_dim (143-145)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (5)

get_vector (258-260)

get_vector_opt (259-259)

get_vector_opt (263-268)

get_dense (236-238)

vector_dim (232-234)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5)

get_vector (105-107)

get_vector_opt (106-106)

get_vector_opt (110-115)

get_dense (83-85)

vector_dim (79-81)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (10)

get_vector (187-189)

get_vector_opt (188-188)

get_vector_opt (191-197)

vectors (123-123)

vectors (143-144)

vector_dim (105-107)

get_multi (110-112)

get_multi_opt (111-111)

get_multi_opt (115-132)

get_multi_opt (192-192)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (8)

get_vector (399-401)

get_vector_opt (400-400)

get_vector_opt (403-409)

vector_dim (333-335)

get_multi (338-340)

get_multi_opt (339-339)

get_multi_opt (343-358)

get_multi_opt (404-404)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (6)

get_vector (245-247)

vector_dim (179-181)

get_multi (184-186)

get_multi_opt (185-185)

get_multi_opt (189-204)

get_multi_opt (250-250)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (7)

get_vector (216-219)

get_vector_opt (217-217)

get_vector_opt (224-229)

get_sparse (180-183)

get_sparse_opt (181-181)

get_sparse_opt (185-196)

get_sparse_opt (225-225)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

get_vector (200-204)

get_vector_opt (202-202)

get_vector_opt (209-214)

get_sparse (150-159)

get_sparse_opt (161-180)

get_sparse_opt (210-210)

get_sparse_opt (260-260)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (6)

get_vector (138-141)

get_sparse (104-109)

get_sparse_opt (105-106)

get_sparse_opt (111-118)

get_sparse_opt (148-148)

get_sparse_opt (197-197)

lib/common/common/src/maybe_uninit.rs (1)

maybe_uninit_fill_from (10-33)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: lint
GitHub Check: integration-tests
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: integration-tests-consensus
GitHub Check: storage-compat-test
GitHub Check: test-consensus-compose
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)

🔇 Additional comments (35)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3)

21-21: AccessPattern integration looks correct.

Importing AccessPattern here is appropriate for the generic dispatch below.

350-352: Generic get(): good use of compile-time pattern.

Routing via P::IS_SEQUENTIAL cleanly selects the access path without duplicating methods.

400-402: Generic get_many(): LGTM; keeps heuristic fallback.

For random reads you still upgrade to sequential for large blocks, which preserves the earlier optimization.

lib/segment/src/vector_storage/vector_storage_base.rs (4)

87-91: VectorStorage generics: interface consolidation looks good.

The unified generic methods eliminate duplication without altering semantics.

172-177: Sparse access genericization: LGTM.

Traits now expose both patterns consistently.

830-874: Enum dispatch for get_vector: clean and exhaustive.

No catch-all arms; follows guidelines.

876-919: Enum dispatch for get_vector_opt: LGTM.

Consistent with the get_vector change.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

23-24: Importing AccessPattern is correct here.

Required for the new generic methods below.

147-153: get_dense genericization: LGTM.

Delegates to MmapDenseVectors::get_vector_opt:: with the same error semantics.

366-372: Tests updated to use typed access: OK.

Using <Random> against volatile storage is fine; access pattern is ignored in-memory.

Also applies to: 377-379, 394-400, 456-458, 574-580, 644-647, 718-723

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (1)

8-8: Sparse scorer switch to typed access: LGTM.

Volatile sparse storage doesn’t optimize by pattern; Random is appropriate and keeps intent clear.

Also applies to: 57-58, 75-77, 84-89

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2)

18-18: Importing Random is correct for explicit access pattern

Brings the file in line with the generic AccessPattern API.

120-120: Confirm Random is the intended pattern here

Per-ID scoring typically does random access; if upstream ever guarantees sequential walks, consider Sequential for potential I/O wins on mmapped backends.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

22-22: AccessPattern imported—API alignment looks good

Matches the trait signatures introduced in the refactor.

216-219: Generic get_vector maps correctly to get_vector_opt

Falls back to default sparse when absent; consistent with existing trait semantics.

224-229: Generic get_vector_opt delegates through get_sparse_opt::

Error-to-None mapping preserved as before; OK.

347-348: Test import updated for new API

Random brings the generic call into scope; fine.

415-416: Test read updated to get_vector::<Random>

Correct usage under the new API.

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (5)

17-19: AccessPattern/Random imports added—consistent with new trait signatures

No behavior change for in-memory storage, just API alignment.

104-109: Generic get_sparse delegates to get_sparse_opt::

Semantics preserved; clear error on missing vector.

111-118: In-memory get_sparse_opt:: ignores pattern as expected

Straightforward and correct for RAM-backed storage.

138-141: Generic get_vector:: mirrors previous behavior

Returns default sparse when absent; OK.

146-152: Generic get_vector_opt:: correctly wraps sparse path

Uses CowVector::from per code style; good.

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (7)

26-26: AccessPattern added to imports—API alignment

Matches trait changes; no functional impact.

337-340: Generic get_multi:: is a thin wrapper—OK

Panic-on-missing kept consistent with previous non-generic API.

343-358: In-memory get_multi_opt:: correctly ignores pattern

Clean delegation to ChunkedVectors; panic message is informative.

399-401: Generic get_vector:: routed through get_vector_opt::

Matches trait expectations; OK.

403-409: get_vector_opt:: wraps multi-dense into CowVector correctly

Uses explicit from conversion as per style guide.

483-483: Test imports Sequential appropriately

Reflects the intended sequential read in migration test.

572-574: Test updated to get_vector::<Sequential>—good match for contiguous reads

Keeps test intent and aligns with the new API.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5)

22-24: LGTM: correct imports for the new generic access pattern

Brings in AccessPattern, Random, and Sequential as needed by this file.

109-112: LGTM: unified generic get_multi

Signature and delegation to get_multi_opt::
match the new trait API.

187-189: LGTM: get_vector now properly generic

Matches the VectorStorage trait and preserves “panics if not found” semantics.

191-197: LGTM: get_vector_opt delegates via multivector conversion

Uses CowVector::MultiDense with correct float conversion path.

220-221: LGTM: read existing offset with Random

Random pattern is appropriate for single-key metadata reads in insert path.

xzfc requested review from generall and timvisee August 28, 2025 13:08

xzfc force-pushed the access-pattern branch from d5fcabb to 07570b8 Compare August 28, 2025 13:11

xzfc force-pushed the access-pattern branch 3 times, most recently from 36fdae4 to 12a7ee8 Compare August 28, 2025 13:28

github-actions bot mentioned this pull request Aug 28, 2025

Flaky test save_on_disk::tests::test_wait_for_multiple_notifications #7156

Open

xzfc force-pushed the access-pattern branch from 12a7ee8 to 9f7ccb5 Compare August 28, 2025 13:37

coderabbitai bot reviewed Aug 28, 2025

View reviewed changes

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs Show resolved Hide resolved

lib/segment/src/index/hnsw_index/gpu/mod.rs Show resolved Hide resolved

coszio self-requested a review August 28, 2025 15:20

xzfc mentioned this pull request Aug 28, 2025

CompressedWithVectors: include base/full vectors #7171

Merged

xzfc force-pushed the access-pattern branch from 9f7ccb5 to a7a23cd Compare August 29, 2025 09:55

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

xzfc force-pushed the access-pattern branch from a7a23cd to c6fa936 Compare August 29, 2025 10:09

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

xzfc force-pushed the access-pattern branch from c6fa936 to db557ce Compare August 29, 2025 12:55

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

github-actions bot mentioned this pull request Aug 29, 2025

Flaky test save_on_disk::tests::test_wait_for_concurrent_readers #7181

Open

generall approved these changes Aug 29, 2025

View reviewed changes

Generic AccessPattern

175be8a

xzfc force-pushed the access-pattern branch from db557ce to 175be8a Compare August 30, 2025 11:55

coderabbitai bot reviewed Aug 30, 2025

View reviewed changes

xzfc merged commit febf72a into dev Sep 1, 2025
16 checks passed

xzfc deleted the access-pattern branch September 1, 2025 11:16

timvisee pushed a commit that referenced this pull request Sep 29, 2025

Generic AccessPattern (#7166)

4473d36

coderabbitai bot mentioned this pull request Jan 23, 2026

Remove ChunkedVectorStorage trait #7977

Merged

coderabbitai bot mentioned this pull request Feb 2, 2026

Callback-based DenseVectorStorage::for_each_in_dense_batch #8033

Merged

coderabbitai bot mentioned this pull request Mar 19, 2026

Touch-up UIO traits #8457

Merged

Conversation

xzfc commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

xzfc commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

generall commented Aug 28, 2025

Uh oh!

generall commented Aug 28, 2025

Uh oh!

xzfc commented Aug 28, 2025

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

generall left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Aug 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)