CompressedWithVectors: include base/full vectors by xzfc · Pull Request #7171 · qdrant/qdrant

xzfc · 2025-08-28T18:28:42Z

~~Depends on #7166. (ignore the first commit during review)~~
PR #6982 introduced new on-disk format that includes HNSW graph links and (quantized) link vectors. This PR extends this format to include also base vectors (aka full vectors).

Updated description of the format:

Compressed links with vectors.

Similar to [CompressionInfo::Compressed], but includes vectors.

Each node on level 0 has a fixed-size "base" vector.

Each link is accompanied by a fixed-size "link" vector.
[BBBB#ccccccc_LLLLLL_][BBBB#ccccccc_LLLLLL_][BBBB#ccccccc_LLLLLL_]
[neighbors for node 0][neighbors for node 1][neighbors for node 2]...
Where:

B is a base vector (i.e. Vec<u8> of fixed size).
Only present on level 0, omitted on higher levels.

# is a varint-encoded length.
This value == number of links == number of link vectors.

c are compressed links (i.e. a compressed form of Vec<u32>).

_ is a padding to make link vectors aligned.

L are encoded link vectors, one per link (i.e. Vec<Vec<u8>>).

_ is a padding to make the next base vector aligned.
Only present on level 0, omitted on higher levels.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

243-245: Avoid transmute_to_u8_slice; use zerocopy AsBytes/IntoBytes.

Per guidelines, replace unsafe transmute helper with safe as_bytes().

-            let raw_bites = mmap_ops::transmute_to_u8_slice(vector.as_ref());
-            vectors_file.write_all(raw_bites)?;
+            let raw_bytes = vector.as_ref().as_bytes();
+            vectors_file.write_all(raw_bytes)?;

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)

110-133: Avoid panics on empty offset slice in get_multi_opt; return None instead.

first().expect(...) will panic if the offset record is unexpectedly empty. Prefer propagating None to the caller for robustness (corrupt/migrating storages).

-        self.offsets
-            .get::<P>(key as VectorOffsetType)
-            .and_then(|mmap_offset| {
-                let mmap_offset = mmap_offset.first().expect("mmap_offset must not be empty");
-                self.vectors.get_many::<P>(
-                    mmap_offset.offset as VectorOffsetType,
-                    mmap_offset.count as usize,
-                )
-            })
+        self.offsets
+            .get::<P>(key as VectorOffsetType)
+            .and_then(|offsets| offsets.first().copied())
+            .and_then(|mmap_offset| {
+                self.vectors.get_many::<P>(
+                    mmap_offset.offset as VectorOffsetType,
+                    mmap_offset.count as usize,
+                )
+            })

lib/segment/src/vector_storage/vector_storage_base.rs (2)

159-171: Preserve sequential IO advantage in get_dense_batch.

Docs say keys are consecutive; use Sequential to avoid regressions on mmap-backed stores.
-        let iter = keys.iter().map(|key| self.get_dense::<Random>(*key));
+        let iter = keys.iter().map(|key| self.get_dense::<Sequential>(*key));
188-202: Same for get_batch_multi: prefer Sequential for consecutive keys.

Leverage contiguous access for better prefetch/read-ahead.
-        let iter = keys.iter().map(|key| self.get_multi::<Random>(*key));
+        let iter = keys.iter().map(|key| self.get_multi::<Sequential>(*key));

♻️ Duplicate comments (1)

lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
51-53: Add Self: Sized to generic get_many to preserve object safety

Without the bound, ChunkedVectorStorage becomes non–object-safe when used via dyn. This mirrors the fix applied to get.
-    fn get_many<P: AccessPattern>(&self, key: VectorOffsetType, count: usize) -> Option<&[T]>;
+    fn get_many<P: AccessPattern>(&self, key: VectorOffsetType, count: usize) -> Option<&[T]>
+    where
+        Self: Sized;

🧹 Nitpick comments (12)

lib/segment/src/payload_storage/mmap_payload_storage.rs (1)

109-125: Optional: avoid “bool-as-flag” footguns

If you find the turbofish booleans cryptic, consider named consts (READ_RANDOM/READ_SEQUENTIAL) or small wrappers to improve readability and prevent accidental misuse across call sites.

Example constants to add near other module-level consts:
const READ_RANDOM: bool = false;
const READ_SEQUENTIAL: bool = true;
Then:
- match guard.get_value::<false>(point_id, hw_counter) {
+ match guard.get_value::<READ_RANDOM>(point_id, hw_counter) {
and
- match self.storage.read().get_value::<true>(point_id, hw_counter) {
+ match self.storage.read().get_value::<READ_SEQUENTIAL>(point_id, hw_counter) {
Also applies to: 137-160, 168-171, 179-182, 192-207

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (1)

61-65: Avoid panicking on missing vectors in production scorer

Replace expect with an early return to prevent crashes if storage returns None.

-        let stored = self
-            .vector_storage
-            .get_sparse::<Random>(idx)
-            .expect("Failed to get sparse vector");
+        let Some(stored) = self
+            .vector_storage
+            .get_sparse::<Random>(idx)
+        else {
+            return 0.0;
+        };

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)

70-72: Optional: fetch multivectors directly to avoid wrappers

Since this test targets multi-dense storage, consider get_multi:: to skip CowVector wrapping/conversion. Not required, but reduces indirection.

Also applies to: 226-229

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (1)

190-205: Fix panic message grammar.

Tighten the panic text.

-                .get_many(metadata.start, metadata.inner_vectors_count)
-                .unwrap_or_else(|| panic!("Vectors does not contain data for {metadata:?}"));
+                .get_many(metadata.start, metadata.inner_vectors_count)
+                .unwrap_or_else(|| panic!("Vector storage does not contain data for {metadata:?}"));

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

151-153: Fix misleading comment (RocksDB-backed, not in-memory).

Clarify that P is ignored here.

-        // Already in memory, so no sequential optimizations available.
+        // RocksDB-backed; AccessPattern is ignored (no sequential optimization here).

162-166: Same comment tweak for get_sparse_opt.

Keep docs consistent.

-        // Already in memory, so no sequential optimizations available.
+        // RocksDB-backed; AccessPattern is ignored (no sequential optimization here).

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

184-191: Prefer explicit Cow::from(...) over .into().

Aligns with repo guideline.

-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| T::slice_to_float_cow(Cow::from(vector)).into())

-            .get_vector_opt::<P>(key)
-            .map(|vector| T::slice_to_float_cow(vector.into()).into())
+            .get_vector_opt::<P>(key)
+            .map(|vector| T::slice_to_float_cow(Cow::from(vector)).into())

Also applies to: 193-199

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

265-270: Prefer Cow::from over .into in float-cow conversion.

Style consistency with guidelines.

-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+            .map(|slice| CowVector::from(T::slice_to_float_cow(Cow::from(slice))))

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (1)

139-149: Replace unwraps with expect carrying context in iterate_inner_vectors.

These unwraps can panic without a helpful key. Add context to speed up debugging if invariants are violated.

-                .get::<Sequential>(key as VectorOffsetType)
-                .unwrap()
-                .first()
-                .unwrap();
+                .get::<Sequential>(key as VectorOffsetType)
+                .expect("offset not found for key in iterate_inner_vectors")
+                .first()
+                .expect("empty offset slice in iterate_inner_vectors");

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

91-96: Use u64 math to avoid truncating large positions

stream_position() returns u64. Casting to usize may truncate on non-64-bit targets. Minor, but easy to make robust.

-    if let Some(vector_layout) = vector_layout.as_ref() {
-        let pos = writer.stream_position()? as usize;
-        let alignment = vector_layout.base.align().max(vector_layout.link.align());
-        writer.write_zeros(pos.next_multiple_of(alignment) - pos)?;
-    }
+    if let Some(vector_layout) = vector_layout.as_ref() {
+        let pos = writer.stream_position()?;
+        let alignment = vector_layout.base.align().max(vector_layout.link.align()) as u64;
+        let padding = ((pos + alignment - 1) / alignment) * alignment - pos;
+        writer.write_zeros(padding as usize)?;
+    }

lib/segment/src/index/hnsw_index/graph_links.rs (2)

72-84: Clarify get_link_vector parameter semantics

The param is a neighbor id, not the source point id. Rename for clarity and reflect that in docs.

 pub trait GraphLinksVectors {
@@
-    /// Link vectors will be included for each link per point.
-    /// The layout of each vector must correspond to [`VectorLayout::link`].
-    fn get_link_vector(&self, point_id: PointOffsetType) -> OperationResult<&[u8]>;
+    /// Link vectors will be included for each neighbor link.
+    /// The `link_id` is the neighbor's point id.
+    /// The layout of each vector must correspond to [`VectorLayout::link`].
+    fn get_link_vector(&self, link_id: PointOffsetType) -> OperationResult<&[u8]>;

And update impls accordingly.

332-372: Add negative tests for serializer validation paths

You added size/align checks; include tests that:

fail when base/link size is not a multiple of alignment,
fail when alignment > 255.

@@
 mod tests {
@@
     #[rstest]
     #[case::plain(GraphLinksFormat::Plain, 8, 8)]
@@
     fn test_save_load(
@@
     }
+
+    #[test]
+    fn test_reject_large_alignment() {
+        let hnsw_m = HnswM::new2(8);
+        let vectors = TestGraphLinksVectors::new(10, 256, 8);
+        let links = vec![vec![vec![]]; 10];
+        let format = GraphLinksFormat::CompressedWithVectors;
+        let format_param = format.with_param_for_tests(Some(&vectors));
+        let mut buf = std::io::Cursor::new(Vec::<u8>::new());
+        let err = serializer::serialize_graph_links(links, format_param, hnsw_m, &mut buf)
+            .expect_err("should fail on align > 255");
+        assert!(format!("{err:?}").contains("alignment"));
+    }
+
+    #[test]
+    fn test_reject_size_not_multiple_of_align() {
+        // Build a vectors provider with mismatched size vs align (manually craft layouts).
+        struct BadVectors;
+        impl GraphLinksVectors for BadVectors {
+            fn get_base_vector(&self, _p: PointOffsetType) -> OperationResult<&[u8]> {
+                Ok(&[0u8; 7]) // size 7
+            }
+            fn get_link_vector(&self, _p: PointOffsetType) -> OperationResult<&[u8]> {
+                Ok(&[0u8; 5]) // size 5
+            }
+            fn vector_layout(&self) -> OperationResult<VectorLayout> {
+                Ok(VectorLayout {
+                    base: Layout::from_size_align(7, 8).unwrap(),
+                    link: Layout::from_size_align(5, 8).unwrap(),
+                })
+            }
+        }
+        let hnsw_m = HnswM::new2(8);
+        let links = vec![vec![vec![]]; 1];
+        let mut buf = std::io::Cursor::new(Vec::<u8>::new());
+        let err = serializer::serialize_graph_links(
+            links,
+            GraphLinksFormatParam::CompressedWithVectors(&BadVectors),
+            hnsw_m,
+            &mut buf,
+        )
+        .expect_err("should fail on size%align != 0");
+        assert!(format!("{err:?}").contains("multiple of its alignment"));
+    }

Also applies to: 438-468

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1f203c5 and 3f7b1af.

📒 Files selected for processing (59)

lib/collection/src/collection/mmr/lazy_matrix.rs (2 hunks)
lib/gridstore/benches/bustle_bench/payload_storage.rs (1 hunks)
lib/gridstore/benches/random_data_bench.rs (1 hunks)
lib/gridstore/benches/real_data_bench.rs (1 hunks)
lib/gridstore/src/gridstore.rs (16 hunks)
lib/segment/benches/sparse_vector_storage.rs (3 hunks)
lib/segment/src/fixtures/index_fixtures.rs (2 hunks)
lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (2 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (7 hunks)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_layers_healer.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_links.rs (8 hunks)
lib/segment/src/index/hnsw_index/graph_links/header.rs (1 hunks)
lib/segment/src/index/hnsw_index/graph_links/serializer.rs (7 hunks)
lib/segment/src/index/hnsw_index/graph_links/view.rs (9 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (2 hunks)
lib/segment/src/index/hnsw_index/point_scorer.rs (2 hunks)
lib/segment/src/index/hnsw_index/tests/mod.rs (1 hunks)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1 hunks)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
lib/segment/src/payload_storage/mmap_payload_storage.rs (5 hunks)
lib/segment/src/segment/segment_ops.rs (3 hunks)
lib/segment/src/segment_constructor/batched_reader.rs (2 hunks)
lib/segment/src/segment_constructor/segment_constructor_base.rs (6 hunks)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (3 hunks)
lib/segment/src/vector_storage/chunked_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (13 hunks)
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (6 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (6 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (2 hunks)
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (4 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (6 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (8 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (5 hunks)
lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (7 hunks)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (4 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (8 hunks)
lib/segment/tests/integration/sparse_vector_index_search_tests.rs (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (43)

lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs
lib/gridstore/benches/random_data_bench.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
lib/segment/src/segment_constructor/batched_reader.rs
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
lib/segment/src/vector_storage/quantized/quantized_vectors.rs
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
lib/collection/src/collection/mmr/lazy_matrix.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_healer.rs
lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs
lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs
lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs
lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
lib/gridstore/benches/real_data_bench.rs
lib/segment/src/vector_storage/in_ram_persisted_vectors.rs
lib/segment/tests/integration/sparse_vector_index_search_tests.rs
lib/segment/src/index/hnsw_index/point_scorer.rs
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
lib/segment/src/index/hnsw_index/gpu/mod.rs
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs
lib/gridstore/benches/bustle_bench/payload_storage.rs
lib/segment/benches/sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/vector_storage/tests/async_raw_scorer.rs
lib/segment/src/index/sparse_index/sparse_vector_index.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/gridstore/src/gridstore.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs
lib/segment/src/segment_constructor/segment_constructor_base.rs
lib/segment/src/segment/segment_ops.rs
lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

🧠 Learnings (17)

📓 Common learnings

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/vector_storage/chunked_vector_storage.rs
lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

📚 Learning: 2025-05-19T14:40:20.068Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

lib/segment/src/vector_storage/chunked_vector_storage.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/payload_storage/mmap_payload_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/chunked_mmap_vectors.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-22T13:49:39.261Z

Learnt from: xzfc
PR: qdrant/qdrant#7118
File: lib/segment/src/index/hnsw_index/graph_links/serializer.rs:50-54
Timestamp: 2025-08-22T13:49:39.261Z
Learning: In Qdrant's graph serialization code (lib/segment/src/index/hnsw_index/graph_links/serializer.rs), the team prefers to keep the code simple rather than adding defensive input validation for edge cases that don't occur in practice, relying on unit tests to catch any unexpected issues early.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs

📚 Learning: 2025-06-02T18:10:47.203Z

Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-08-14T11:31:21.777Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs

🧬 Code graph analysis (15)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (3)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/chunked_vector_storage.rs (4)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (4)

get (259-261)

get (350-352)

get_many (265-288)

get_many (400-402)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (44-46)

get_many (95-97)

lib/segment/src/vector_storage/chunked_vectors.rs (2)

get (44-46)

get_many (61-72)

lib/segment/src/vector_storage/query_scorer/sparse_custom_query_scorer.rs (8)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/gpu_multivectors.rs (1)

vector_storage (91-92)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/query_scorer/sparse_metric_query_scorer.rs (3)

vector_storage (55-57)

vector_storage (73-75)

vector_storage (82-84)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (6)

Layout (279-279)

get_vector (260-262)

get_vector_opt (261-261)

get_vector_opt (265-270)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (9)

get_multi (111-113)

get_multi_opt (112-112)

get_multi_opt (116-133)

get_multi_opt (193-193)

get_vector (188-190)

get_vector_opt (189-189)

get_vector_opt (192-198)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (7)

get_multi (185-187)

get_multi_opt (186-186)

get_multi_opt (190-205)

get_multi_opt (251-251)

get_vector (246-248)

get_vector_bytes_opt (258-260)

get_vector_layout (262-267)

lib/segment/src/vector_storage/vector_storage_base.rs (11)

get_multi (188-188)

get_multi (200-200)

get_multi_opt (189-192)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (6)

Layout (150-150)

get_dense (90-94)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/vector_storage_base.rs (10)

get_dense (159-159)

get_dense (169-169)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6)

Layout (210-210)

get_dense (149-155)

get_vector (184-191)

get_vector_opt (193-199)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (7)

Layout (126-126)

get_dense (85-87)

get_vector (107-109)

get_vector_opt (108-108)

get_vector_opt (112-117)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (6)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

storage (391-391)

get_vector_opt (193-199)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (3)

storage (626-626)

get_vector_opt (189-189)

get_vector_opt (192-198)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)

get_vector_opt (261-261)

get_vector_opt (265-270)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

get_vector_opt (401-401)

get_vector_opt (404-410)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2)

get_vector_opt (247-247)

get_vector_opt (250-256)

lib/segment/src/vector_storage/vector_storage_base.rs (2)

get_vector_opt (91-91)

get_vector_opt (883-927)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (2)

lib/quantization/src/encoded_vectors_binary.rs (15)

std (264-264)

std (266-266)

std (268-268)

std (270-270)

vector (217-217)

vector (225-225)

vector (238-238)

vector (246-246)

vector (324-324)

vector (332-332)

vector (345-345)

vector (353-353)

vector (366-366)

vector (374-374)

layout (818-824)

lib/segment/src/index/hnsw_index/graph_links.rs (4)

vector_layout (83-83)

vector_layout (112-117)

vector_layout (366-371)

new (340-354)

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (7)

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6)

vector_storage (529-529)

vector_storage (557-557)

vector_storage (579-579)

vector_storage (598-599)

vector_storage (630-631)

vector_storage (657-658)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (1)

vector_storage (97-97)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (5)

Layout (150-150)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (6)

Layout (279-279)

get_vector (260-262)

get_vector_opt (261-261)

get_vector_opt (265-270)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (5)

get_vector (188-190)

get_vector_opt (189-189)

get_vector_opt (192-198)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (5)

get_vector (400-402)

get_vector_opt (401-401)

get_vector_opt (404-410)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (8)

get_sparse (105-110)

get_sparse_opt (106-107)

get_sparse_opt (112-119)

get_sparse_opt (149-149)

get_sparse_opt (209-209)

get_vector (139-142)

get_vector_bytes_opt (155-157)

get_vector_layout (159-164)

lib/segment/src/vector_storage/vector_storage_base.rs (10)

get_sparse (179-179)

get_sparse_opt (180-183)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (9)

get_sparse (181-184)

get_sparse_opt (182-182)

get_sparse_opt (186-197)

get_sparse_opt (226-226)

get_vector (217-220)

get_vector_opt (218-218)

get_vector_opt (225-230)

get_vector_bytes_opt (232-234)

get_vector_layout (236-241)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (6)

Layout (150-150)

get_dense (90-94)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/vector_storage_base.rs (10)

get_dense (159-159)

get_dense (169-169)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (6)

Layout (126-126)

get_dense (85-87)

get_vector (107-109)

get_vector_opt (108-108)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (6)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

get_vector_opt (146-146)

get_vector_opt (150-153)

lib/segment/src/vector_storage/vector_storage_base.rs (7)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (12)

std (405-407)

Layout (150-150)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

get_dense (90-94)

vectors (91-92)

vectors (131-132)

vectors (138-139)

vectors (144-145)

vector_dim (86-88)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (10)

std (388-390)

std (428-430)

Layout (279-279)

get_vector (260-262)

get_vector_opt (261-261)

get_vector_opt (265-270)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

get_dense (238-240)

vector_dim (234-236)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (14)

std (157-157)

std (606-612)

get_vector (188-190)

get_vector_opt (189-189)

get_vector_opt (192-198)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

vectors (124-124)

vectors (144-145)

vector_dim (106-108)

get_multi (111-113)

get_multi_opt (112-112)

get_multi_opt (116-133)

get_multi_opt (193-193)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)

Layout (210-210)

get_vector (184-191)

get_vector_opt (193-199)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

get_dense (149-155)

vector_dim (145-147)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (9)

get_vector (217-220)

get_vector_opt (218-218)

get_vector_opt (225-230)

get_vector_bytes_opt (232-234)

get_vector_layout (236-241)

get_sparse (181-184)

get_sparse_opt (182-182)

get_sparse_opt (186-197)

get_sparse_opt (226-226)

lib/segment/src/vector_storage/dense/mmap_dense_vectors.rs (4)

get_vector (145-147)

get_vector (163-163)

get_vector (166-166)

get_vector (214-214)

lib/common/common/src/maybe_uninit.rs (1)

maybe_uninit_fill_from (10-33)

lib/segment/src/index/hnsw_index/graph_links.rs (2)

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

links (230-261)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

serialize_graph_links (23-237)

lib/segment/src/vector_storage/chunked_mmap_vectors.rs (10)

lib/segment/src/index/hnsw_index/graph_links.rs (1)

vector_storage (99-100)

lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)

vector_storage (68-70)

vector_storage (633-634)

lib/segment/src/segment/segment_ops.rs (2)

vector_storage (381-381)

vector_storage (609-609)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2)

vector_storage (535-535)

vector_storage (621-621)

lib/segment/src/vector_storage/query_scorer/custom_query_scorer.rs (1)

vector_storage (87-87)

lib/segment/src/vector_storage/query_scorer/metric_query_scorer.rs (3)

vector_storage (73-73)

vector_storage (101-101)

vector_storage (102-102)

lib/segment/src/vector_storage/query_scorer/multi_custom_query_scorer.rs (1)

vector_storage (120-120)

lib/segment/src/vector_storage/query_scorer/multi_metric_query_scorer.rs (3)

vector_storage (97-97)

vector_storage (134-134)

vector_storage (135-135)

lib/segment/src/vector_storage/chunked_vector_storage.rs (2)

get (20-22)

get_many (52-52)

lib/segment/src/vector_storage/in_ram_persisted_vectors.rs (2)

get (44-46)

get_many (95-97)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (6)

std (405-407)

Layout (150-150)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (12)

std (374-374)

std (525-531)

std (575-581)

get_multi (339-341)

get_multi_opt (340-340)

get_multi_opt (344-359)

get_multi_opt (405-405)

get_vector (400-402)

get_vector_opt (401-401)

get_vector_opt (404-410)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/vector_storage_base.rs (12)

std (174-174)

get_multi (188-188)

get_multi (200-200)

get_multi_opt (189-192)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (7)

get_multi (185-187)

get_multi_opt (186-186)

get_multi_opt (190-205)

get_multi_opt (251-251)

get_vector (246-248)

get_vector_bytes_opt (258-260)

get_vector_layout (262-267)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (5)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (6)

std (405-407)

Layout (150-150)

get_vector (130-135)

get_vector_opt (137-141)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (11)

std (157-157)

std (606-612)

get_multi (111-113)

get_multi_opt (112-112)

get_multi_opt (116-133)

get_multi_opt (193-193)

get_vector (188-190)

get_vector_opt (189-189)

get_vector_opt (192-198)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (12)

std (374-374)

std (525-531)

std (575-581)

get_multi (339-341)

get_multi_opt (340-340)

get_multi_opt (344-359)

get_multi_opt (405-405)

get_vector (400-402)

get_vector_opt (401-401)

get_vector_opt (404-410)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/vector_storage_base.rs (12)

std (174-174)

get_multi (188-188)

get_multi (200-200)

get_multi_opt (189-192)

get_vector (88-88)

get_vector (837-881)

get_vector_opt (91-91)

get_vector_opt (883-927)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (6)

Layout (126-126)

get_vector (107-109)

get_vector_opt (108-108)

get_vector_opt (112-117)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: lint
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: storage-compat-test

coderabbitai · 2025-08-29T13:07:51Z

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs

+    fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> {
+        self.get_vector_opt::<P>(key).expect("vector not found")
    }


⚠️ Potential issue

Fix datatype() to respect storage element type T (Float32/Byte/Half).

Current impl returns Float32 for all T, which is wrong for Byte/Half variants and breaks type reporting.

Apply in this impl block:

- fn datatype(&self) -> VectorStorageDatatype { - VectorStorageDatatype::Float32 - } + fn datatype(&self) -> VectorStorageDatatype { + T::datatype() + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn get_vector<P: AccessPattern>(&self, key: PointOffsetType) -> CowVector<'_> {

self.get_vector_opt::<P>(key).expect("vector not found")

}

fn datatype(&self) -> VectorStorageDatatype {

T::datatype()

}

coderabbitai · 2025-09-01T11:27:18Z

📝 Walkthrough

Walkthrough

This PR refactors HNSW graph-links vector handling to split “base” and “link” vectors, propagating this through traits, storage wrappers, serialization headers, on-disk view, iterators, and tests. It introduces GraphLinksVectorsLayout and StorageGraphLinksVectors, updates GraphLinks::links_with_vectors to return a base-vector slice plus iterator, and replaces prior single-layout fields with packed base/link layouts in headers and serializer/view logic. VectorStorageEnum gains raw-bytes and layout accessors. Test fixtures and builders now supply graph-links vectors via the new wrapper, using base vectors from VectorStorageEnum and link vectors from QuantizedVectors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Implement GraphLinksFormat::CompressedWithVectors #6982 — Also changes HNSW graph-links vectors, traits, headers, serializer, and tests; overlaps the same subsystem.
Graph links serializer: avoid large intermediate allocations #7118 — Touches graph_links serializer and graph-building paths; related serialization/format logic.
Add volatile dense vector storage, use in tests #6561 — Modifies VectorStorageEnum and dense storage APIs, intersecting with newly added bytes/layout accessors.

Suggested labels

next patch

Suggested reviewers

timvisee
generall
IvanPleshkov

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hnsw-with-base-vectors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

37-47: Pre-validate alignment bounds and improve error messages.

Avoid a potential panic in pack_layout when align() > 255, and make size/align errors actionable.

-            if vector_layout.base.size() % vector_layout.base.align() != 0 {
-                return Err(OperationError::service_error(
-                    "Vector size must be a multiple of its alignment",
-                ));
-            }
-            if vector_layout.link.size() % vector_layout.link.align() != 0 {
-                return Err(OperationError::service_error(
-                    "Vector size must be a multiple of its alignment",
-                ));
-            }
+            if vector_layout.base.size() % vector_layout.base.align() != 0 {
+                return Err(OperationError::service_error(format!(
+                    "Base vector: size ({}) must be a multiple of alignment ({})",
+                    vector_layout.base.size(),
+                    vector_layout.base.align()
+                )));
+            }
+            if vector_layout.link.size() % vector_layout.link.align() != 0 {
+                return Err(OperationError::service_error(format!(
+                    "Link vector: size ({}) must be a multiple of alignment ({})",
+                    vector_layout.link.size(),
+                    vector_layout.link.align()
+                )));
+            }
+            // Header encodes alignment in u8; reject larger alignments to avoid panic later.
+            if vector_layout.base.align() > u8::MAX as usize
+                || vector_layout.link.align() > u8::MAX as usize
+            {
+                return Err(OperationError::service_error(format!(
+                    "Vector alignment too large for header (base: {}, link: {}, max: {})",
+                    vector_layout.base.align(),
+                    vector_layout.link.align(),
+                    u8::MAX
+                )));
+            }

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

175-181: Validate base-vector alignment before using it.

You validate link-vector alignment here, but you use base-vector alignment below (Line 191) to compute pre-neighbors padding before it’s validated. Validate or construct the base Layout first, then use its align().

Apply these diffs together (1/2: pre-validate and use align() when computing pre-neighbors padding):

@@
-        let (reindex, data) = get_slice::<PointOffsetType>(data, header.point_count.get())?;
-        let (_, data) = get_slice::<u8>(data, {
+        let (reindex, data) = get_slice::<PointOffsetType>(data, header.point_count.get())?;
+        // Validate base-vector layout early; we'll reuse it later.
+        let base_vector_layout = Layout::from_size_align(
+            header.base_vector_layout.size.get() as usize,
+            header.base_vector_layout.alignment as usize,
+        )
+        .map_err(|_| {
+            OperationError::service_error("Invalid base vector layout in GraphLinks file")
+        })?;
+        let (_, data) = get_slice::<u8>(data, {
             let pos = total_len - data.len();
-            let alignment = std::cmp::max(
-                header.link_vector_layout.alignment,
-                header.base_vector_layout.alignment,
-            );
-            (pos.next_multiple_of(alignment as usize) - pos) as u64
+            let alignment = std::cmp::max(
+                header.link_vector_layout.alignment as usize,
+                base_vector_layout.align(),
+            );
+            (pos.next_multiple_of(alignment) - pos) as u64
         })?;

And (2/2: reuse the validated layout below):

@@
-                base_vector_layout: Layout::from_size_align(
-                    header.base_vector_layout.size.get() as usize,
-                    header.base_vector_layout.alignment as usize,
-                )
-                .map_err(|_| {
-                    OperationError::service_error("Invalid base vector layout in GraphLinks file")
-                })?,
+                base_vector_layout,

♻️ Duplicate comments (4)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (1)
388-390: Use T::datatype() instead of hardcoded Float32

Mirrors other dense backends and fixes type reporting for Byte/Half variants.

Apply:
-    fn datatype(&self) -> VectorStorageDatatype {
-        VectorStorageDatatype::Float32
-    }
+    fn datatype(&self) -> VectorStorageDatatype {
+        T::datatype()
+    }
lib/segment/src/index/hnsw_index/graph_links/header.rs (1)
45-47: Freeze on-disk ABI: add const size/alignment checks for header and layout.

Lock the layout at compile-time to prevent silent drift across platforms and refactors. Also document the u8 alignment bound.
 use zerocopy::{FromBytes, Immutable, IntoBytes, KnownLayout};
+use core::mem::{align_of, size_of};

@@
 pub(super) struct HeaderCompressedWithVectors {
@@
     pub(super) m0: LittleU64,
     pub(super) base_vector_layout: PackedVectorLayout,
     pub(super) link_vector_layout: PackedVectorLayout,
     pub(super) zero_padding: [u8; 3],
 }
 
+// Compile-time layout sanity checks (on-disk format)
+const _: () = {
+    // Headers are 8-byte aligned and sized to a multiple of 8
+    const { assert!(align_of::<HeaderCompressedWithVectors>() == 8) };
+    const { assert!(size_of::<HeaderCompressedWithVectors>() % 8 == 0) };
+    // PackedVectorLayout must remain 16 bytes, 8-byte aligned
+    const { assert!(align_of::<PackedVectorLayout>() == 8) };
+    const { assert!(size_of::<PackedVectorLayout>() == 16) };
+};
@@
 /// Packed representation of [`std::alloc::Layout`].
 pub(super) struct PackedVectorLayout {
     pub(super) size: LittleU64,
     pub(super) alignment: u8,
 }
Additionally, consider adding a brief doc note that alignment must fit into u8 (≤255).

Also applies to: 53-59
lib/segment/src/index/hnsw_index/graph_links/view.rs (2)
317-320: OK to unwrap varint per perf policy.

unwrap() here matches the established, accepted pattern in this module for performance-critical paths.

336-342: Exclude overflow and ensure exact vector bytes with a bounds assert.

You already slice exact bytes; add a simple bound check to catch corrupt inputs before slicing. Also keeps parity with base-vector and links checks.
                 // 5. Link vectors (`L` in the doc).
                 let link_vector_bytes = (neighbors_count as usize) * link_vector_size.get();
-                let link_vectors = &neighbors[pos..pos + link_vector_bytes];
+                debug_assert!(
+                    end >= pos && end - pos >= link_vector_bytes,
+                    "Corrupt GraphLinks: vectors region too short"
+                );
+                let link_vectors = &neighbors[pos..pos + link_vector_bytes];
                 debug_assert!(link_vectors.as_ptr().addr() % link_vector_alignment as usize == 0);

🧹 Nitpick comments (19)

lib/segment/src/fixtures/index_fixtures.rs (1)
80-85: Minor readability tweak: use map() instead of Some(...)? for Option plumbing.

Both are correct; map() is a bit clearer about intent.

Apply:
-    pub fn graph_links_vectors(&self) -> Option<StorageGraphLinksVectors<'_>> {
-        Some(StorageGraphLinksVectors {
-            vector_storage: &self.storage,
-            quantized_vectors: self.quantized_vectors.as_ref()?,
-        })
-    }
+    pub fn graph_links_vectors(&self) -> Option<StorageGraphLinksVectors<'_>> {
+        self.quantized_vectors.as_ref().map(|qv| StorageGraphLinksVectors {
+            vector_storage: &self.storage,
+            quantized_vectors: qv,
+        })
+    }
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)
272-276: Expose raw bytes for dense vectors

as_bytes() on &[T] is correct and zero-copy.

Optional style: On Line 269, prefer explicit conversion per repo guideline:
-            .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+            .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice))))
278-280: Return fixed Layout for dense vectors

Matches other dense backends. Optional safety: assert extremely large dims in debug to catch overflow before unwrap.

You could add:
 fn get_vector_layout(&self) -> OperationResult<Layout> {
+    debug_assert!(self.dim <= isize::MAX as usize / std::mem::size_of::<T>());
     Ok(Layout::array::<T>(self.dim).unwrap())
 }
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)
15-15: Replace transmute_to_u8_slice with IntoBytes

Now that IntoBytes is imported, avoid transmute-based helpers for writing vectors.

Apply:
-            let raw_bites = mmap_ops::transmute_to_u8_slice(vector.as_ref());
-            vectors_file.write_all(raw_bites)?;
+            let raw_bytes = vector.as_bytes();
+            vectors_file.write_all(raw_bytes)?;
Also applies to: 243-245
lib/segment/src/vector_storage/vector_storage_base.rs (2)

93-98: New raw-bytes/layout API on VectorStorage: please document lifetime/endianness semantics.

The bytes are borrowed from internal storage; callers must not persist them after mutations. Also, clarify endianness expectations for f32/f16 to avoid cross-arch ambiguity.

If helpful, add a brief doc comment on both methods noting borrow lifetime and that layout describes a single vector’s in-memory representation.

979-1023: Enum delegation for get_vector_layout propagates backend errors as intended.

LGTM. Minor: several backends use Layout::array(...).unwrap(). If you ever expect large dims, consider mapping the error to OperationError instead of panicking; otherwise staying consistent is fine.
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)
125-127: Consider avoiding unwrap on Layout::array (optional).

Consistent with other backends, but you could map overflow to an OperationError to prevent panic on extreme dims.

Here’s a drop-in alternative:
- Ok(Layout::array::<T>(self.dim).unwrap())
+ Layout::array::<T>(self.dim)
+     .map_err(|_| OperationError::service_error("vector layout overflow"))
111-117: Style nit: prefer explicit Cow::from over into().

Guideline: use Cow::from(slice) instead of slice.into().
- .map(|slice| CowVector::from(T::slice_to_float_cow(slice.into())))
+ .map(|slice| CowVector::from(T::slice_to_float_cow(std::borrow::Cow::from(slice))))
lib/segment/src/index/hnsw_index/hnsw.rs (1)

616-624: Provider gating ties vectors-in-links to presence of quantized vectors — intended?

Currently, base/link vectors are embedded only if quantization exists. If the new on-disk format should include base vectors even without quantization, consider constructing StorageGraphLinksVectors unconditionally (with quantized_vectors: None-like handling inside), or confirm this gating is by design.
lib/segment/src/index/hnsw_index/graph_links/serializer.rs (3)
91-95: Use u64 math for stream position to avoid usize truncation on 32-bit targets.

Not critical for typical 64-bit builds, but cheap to harden. Safe since alignment ≤ 255 after the new check.
-    if let Some(vector_layout) = vector_layout.as_ref() {
-        let pos = writer.stream_position()? as usize;
-        let alignment = vector_layout.base.align().max(vector_layout.link.align());
-        writer.write_zeros(pos.next_multiple_of(alignment) - pos)?;
-    }
+    if let Some(vector_layout) = vector_layout.as_ref() {
+        let pos = writer.stream_position()?;
+        let alignment = vector_layout.base.align().max(vector_layout.link.align()) as u64;
+        let next = (pos + alignment - 1) / alignment * alignment;
+        let pad = (next - pos) as usize;
+        writer.write_zeros(pad)?;
+    }
160-166: Next-base alignment is correct but could be tightened.

You align to base.align() after link vectors. Given per-link alignment is handled earlier, this is fine. If you ever introduce per-node trailers, consider aligning to max(base.align(), link.align()) here to keep the next record start uniformly aligned.

239-244: Make pack_layout fallible or keep it infallible via earlier checks.

With the added pre-validation, the expect is effectively unreachable. Either:

Keep as-is (acceptable), or

Make it explicit and fallible to avoid accidental reordering issues later.
-fn pack_layout(layout: &Layout) -> PackedVectorLayout {
+#[inline]
+fn pack_layout(layout: &Layout) -> PackedVectorLayout {
     PackedVectorLayout {
         size: LittleU64::new(layout.size() as u64),
         alignment: u8::try_from(layout.align()).expect("Alignment must fit in u8"),
     }
 }
lib/segment/src/index/hnsw_index/graph_links.rs (2)
72-84: Trait docs: clarify semantics and lifetime of returned slices.

Explicitly state that:

point_id for get_link_vector is the neighbor’s point id.

Returned slices must remain valid until the call returns (serialization-time borrow), not owned.

This avoids ambiguity for external implementors.

438-447: Add a negative test for oversized alignment.

Now that alignment is capped to u8, add a test case where base_align or link_align > 255 to assert we return an OperationError.
 #[case::comp_vec_4_16(GraphLinksFormat::CompressedWithVectors, 4, 16)]
+#[case::comp_vec_align_too_large(GraphLinksFormat::CompressedWithVectors, 512, 8)]
 fn test_save_load(
lib/segment/src/index/hnsw_index/graph_links/view.rs (5)
213-225: Do not recreate Layout after using its fields earlier; reuse the validated one.

See proposed diffs in the earlier comment to construct once and reuse here.

258-259: Minor readability: avoid tuple indexing.

Binding the tuple improves clarity and avoids a magic .1.
-                // Not intended to be used outside of tests.
-                Either::Right(self.links_with_vectors(point_id, level).1)
+                // Not intended to be used outside of tests.
+                let (_base, links, _vecs) = self.links_with_vectors(point_id, level);
+                Either::Right(links)
309-314: Bounds assert before slicing base vector.

Add a debug bound check to detect corrupt files early; keeps consistency with later assertions.
-                if level == 0 {
-                    base_vector = &neighbors[pos..pos + base_vector_layout.size()];
-                    debug_assert!(base_vector.as_ptr().addr() % base_vector_layout.align() == 0);
-                    pos += base_vector_layout.size();
-                }
+                if level == 0 {
+                    debug_assert!(
+                        pos + base_vector_layout.size() <= end,
+                        "Corrupt GraphLinks: base vector out of bounds"
+                    );
+                    base_vector = &neighbors[pos..pos + base_vector_layout.size()];
+                    debug_assert_eq!(
+                        base_vector.as_ptr().addr() % base_vector_layout.align(),
+                        0
+                    );
+                    pos += base_vector_layout.size();
+                }
321-334: Add a bounds assert for compressed-links region.

A small debug assert helps catch malformed data before slicing.
                 let links_size = packed_links_size(
                     &neighbors[pos..end],
                     bits_per_unsorted,
                     hnsw_m.level_m(level),
                     neighbors_count as usize,
                 );
+                debug_assert!(
+                    pos + links_size <= end,
+                    "Corrupt GraphLinks: links region too short"
+                );
                 let links = iterate_packed_links(
                     &neighbors[pos..pos + links_size],
                     bits_per_unsorted,
                     hnsw_m.level_m(level),
                 );
397-399: Typo in user-facing error string.

“Unsufficent” → “Insufficient”.
-fn error_unsufficent_size() -> OperationError {
-    OperationError::service_error("Unsufficent file size for GraphLinks file")
+fn error_unsufficent_size() -> OperationError {
+    OperationError::service_error("Insufficient file size for GraphLinks file")
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between febf72a and cc5bccb.

📒 Files selected for processing (21)

lib/segment/src/fixtures/index_fixtures.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_links.rs (8 hunks)
lib/segment/src/index/hnsw_index/graph_links/header.rs (1 hunks)
lib/segment/src/index/hnsw_index/graph_links/serializer.rs (7 hunks)
lib/segment/src/index/hnsw_index/graph_links/view.rs (9 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (2 hunks)
lib/segment/src/index/hnsw_index/tests/mod.rs (1 hunks)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1 hunks)
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3 hunks)
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (2 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (3 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs

🧠 Learnings (14)

📓 Common learnings

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

📚 Learning: 2025-08-22T13:49:39.261Z

Learnt from: xzfc
PR: qdrant/qdrant#7118
File: lib/segment/src/index/hnsw_index/graph_links/serializer.rs:50-54
Timestamp: 2025-08-22T13:49:39.261Z
Learning: In Qdrant's graph serialization code (lib/segment/src/index/hnsw_index/graph_links/serializer.rs), the team prefers to keep the code simple rather than adding defensive input validation for edge cases that don't occur in practice, relying on unit tests to catch any unexpected issues early.

Applied to files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-07-11T11:35:21.549Z

Learnt from: generall
PR: qdrant/qdrant#6854
File: lib/segment/src/index/query_estimator.rs:320-327
Timestamp: 2025-07-11T11:35:21.549Z
Learning: In test code for Qdrant's query estimator (lib/segment/src/index/query_estimator.rs), simplified ID resolution logic using `id.to_string().parse().unwrap()` is acceptable for testing purposes and doesn't need to match production code's `id_tracker.internal_id()` approach. Test code can use mock implementations that serve the testing goals.

Applied to files:

lib/segment/src/index/hnsw_index/tests/mod.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-09-01T11:19:26.330Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.330Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs
lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-05-30T15:26:14.488Z

Learnt from: generall
PR: qdrant/qdrant#6479
File: lib/segment/src/index/hnsw_index/graph_layers_builder.rs:137-182
Timestamp: 2025-05-30T15:26:14.488Z
Learning: In the subgraph_connectivity method in graph_layers_builder.rs, the previous_visited_points vector and visited BitVec entry point marking are automatically re-initialized at the start of each retry loop iteration, so manual clearing is not needed.

Applied to files:

lib/segment/src/index/hnsw_index/graph_layers_builder.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/hnsw.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-06-02T18:10:47.203Z

Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs
lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs

📚 Learning: 2025-05-19T14:40:20.068Z

Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

Applied to files:

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/vector_storage/vector_storage_base.rs

🧬 Code graph analysis (18)

lib/segment/src/index/hnsw_index/tests/mod.rs (2)

lib/segment/src/index/hnsw_index/gpu/gpu_links.rs (1)

graph_layers_builder (232-234)

lib/segment/src/index/hnsw_index/graph_links.rs (1)

format (239-247)

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (2)

lib/segment/src/index/hnsw_index/gpu/gpu_links.rs (1)

graph_layers_builder (232-234)

lib/segment/src/index/hnsw_index/graph_links.rs (1)

format (239-247)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (5)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

get_vector_bytes_opt (217-219)

get_vector_layout (221-226)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (2)

get_vector_bytes_opt (155-157)

get_vector_layout (159-164)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

get_vector_bytes_opt (217-219)

get_vector_layout (221-226)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

get_vector_bytes_opt (232-234)

get_vector_layout (236-241)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

Layout (126-126)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/index/hnsw_index/graph_layers.rs (2)

lib/segment/src/fixtures/index_fixtures.rs (1)

graph_links_vectors (80-85)

lib/segment/src/index/hnsw_index/graph_links.rs (1)

format (239-247)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (7)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

Layout (150-150)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

get_vector_bytes_opt (217-219)

get_vector_layout (221-226)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

Layout (126-126)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (258-260)

get_vector_layout (262-267)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (258-260)

get_vector_layout (262-267)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/collection/src/operations/types.rs (1)

service_error (1022-1027)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

Layout (150-150)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

Layout (279-279)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

Layout (279-279)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

Layout (126-126)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (7)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

Layout (150-150)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

Layout (279-279)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (2)

get_vector_bytes_opt (155-157)

get_vector_layout (159-164)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

get_vector_bytes_opt (232-234)

get_vector_layout (236-241)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (2)

lib/segment/src/index/hnsw_index/graph_links.rs (4)

vector_layout (83-83)

vector_layout (112-117)

vector_layout (366-371)

new (340-354)

lib/common/common/src/bitpacking_links.rs (1)

pack_links (23-66)

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

lib/common/memory/src/madvise.rs (6)

madvise (93-95)

madvise (101-101)

madvise (107-113)

madvise (131-137)

will_need_multiple_pages (184-209)

will_need_multiple_pages (212-212)

lib/segment/src/index/hnsw_index/graph_links.rs (2)

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

links (230-261)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

serialize_graph_links (23-237)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (4)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

Layout (150-150)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

Layout (279-279)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/vector_storage_base.rs (4)

get_vector_bytes_opt (94-94)

get_vector_bytes_opt (929-977)

get_vector_layout (97-97)

get_vector_layout (979-1023)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

Layout (126-126)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/index/hnsw_index/hnsw.rs (3)

lib/segment/src/fixtures/index_fixtures.rs (1)

quantized_vectors (76-78)

lib/segment/src/vector_storage/vector_storage_base.rs (2)

is_on_disk (67-67)

is_on_disk (745-789)

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

is_on_disk (129-150)

lib/segment/src/vector_storage/vector_storage_base.rs (10)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (3)

Layout (150-150)

get_vector_bytes_opt (143-147)

get_vector_layout (149-151)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (3)

Layout (279-279)

get_vector_bytes_opt (272-276)

get_vector_layout (278-280)

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (200-202)

get_vector_layout (204-209)

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (412-414)

get_vector_layout (416-421)

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2)

get_vector_bytes_opt (258-260)

get_vector_layout (262-267)

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

get_vector_bytes_opt (217-219)

get_vector_layout (221-226)

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (2)

get_vector_bytes_opt (155-157)

get_vector_layout (159-164)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (3)

Layout (210-210)

get_vector_bytes_opt (201-207)

get_vector_layout (209-211)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)

Layout (126-126)

get_vector_bytes_opt (119-123)

get_vector_layout (125-127)

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

get_vector_bytes_opt (232-234)

get_vector_layout (236-241)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: test-consistency
GitHub Check: lint
GitHub Check: e2e-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: integration-tests
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: storage-compat-test
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)

🔇 Additional comments (37)

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1)

90-92: Switch to graph_links_vectors() in RAM build is correct.

Passing graph_links_vectors().as_ref() cleanly supplies vectors only for formats that need them and avoids copies.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (3)

760-762: Good: tests construct RAM graph with graph-links vectors.

The parameterization matches format.is_with_vectors() and keeps lifetimes local to the build step.

866-868: Good: consistent parameter switch in add-points test.

Mirror change maintains parity with the fixture and other tests.

906-908: Good: properties test uses graph-links vectors path.

Keeps the format under test aligned with the data source.

lib/segment/src/index/hnsw_index/tests/mod.rs (1)

65-67: Fixture now feeds graph-links vectors — looks good.

This unifies test configuration across formats and remains a no-op for non-vector formats.

lib/segment/src/fixtures/index_fixtures.rs (1)

9-9: Import added for StorageGraphLinksVectors — OK.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (2)

1-1: Imports updated for Layout — correct.

232-241: Trait extension implemented appropriately for sparse storage.

Returning None for bytes and an error for layout matches other sparse backends and the trait’s semantics.

lib/segment/src/index/hnsw_index/graph_layers.rs (3)

218-226: Correctly iterates only link vectors; base vector intentionally ignored

Using .1 focuses on (link, vector) pairs; aligns with the new API split between base and link vectors.

464-466: Tests now source vectors via graph_links_vectors()

Switch to graph_links_vectors().as_ref() matches the new test plumbing.

529-533: Load path test updated for vector-bearing formats

Passing graph_links_vectors to with_param_for_tests is consistent with CompressedWithVectors.

lib/segment/src/vector_storage/multi_dense/appendable_mmap_multi_dense_vector_storage.rs (2)

1-1: Import Layout for new VectorStorage API

Good addition to satisfy get_vector_layout signature.

200-209: Approve: Explicit unavailable vector bytes/layout for multi-dense
No callers unwrap or expect get_vector_bytes_opt or get_vector_layout; returning None/Err is consistent with other multi-dense variants and safe to merge.

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (2)

1-1: Import Layout for new VectorStorage API

Matches trait changes.

258-267: Approve multi-dense “no raw bytes / no fixed layout” behavior. Verified there are no .get_vector_layout().unwrap or .expect usages in call sites—error paths are handled consistently.

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (2)

1-1: Import Layout for new VectorStorage API

Required for get_vector_layout.

217-226: Sparse storage: no raw bytes and no fixed layout

Returns None and a clear error message, consistent with other sparse backends.

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (2)

1-1: Import Layout for new VectorStorage API

Looks good.

15-15: Use zerocopy::IntoBytes for safe byte views

Appropriate choice per guidelines (avoid transmutes).

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)

155-165: Correct: sparse storage should not expose fixed bytes/layout

Returning None for bytes and Err for Layout matches other sparse backends and avoids implying a fixed ABI.

lib/segment/src/vector_storage/multi_dense/simple_multi_dense_vector_storage.rs (2)

1-1: Import looks correct

Layout is required by the new trait methods below.

412-421: Correct: multidense has no fixed layout

None for bytes and Err for Layout is consistent with other multidense storages.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

201-212: Good: zero-copy byte view and explicit Layout

Using IntoBytes::as_bytes and Layout::array is the right, safe approach here.

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (2)

143-152: Good: exposes raw bytes and concrete Layout

Matches simple/volatile/memmap dense implementations; enables layout-aware consumers.

14-14: Import is appropriate

IntoBytes is needed for as_bytes() above.

lib/segment/src/vector_storage/vector_storage_base.rs (2)

1-1: Import of Layout is fine.

Needed for new API; no issues.

929-977: Enum delegation for get_vector_bytes_opt looks correct; ensure sparse/multidense None is handled upstream.

Dense variants expose bytes; sparse/multi return None. Please confirm all new graph-links code paths check for None and gracefully skip embedding where unsupported.
lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (3)
1-1: Layout import is appropriate.

Needed for get_vector_layout.

119-124: get_vector_bytes_opt implementation is correct after fixing the AsBytes import.

Returning a borrowed byte slice is ideal for zero-copy paths.

10-10: Wrong zerocopy trait imported; this will not compile.

slice.as_bytes() is provided by zerocopy::AsBytes, not IntoBytes. Replace the import.

Apply:
- use zerocopy::IntoBytes;
+ use zerocopy::AsBytes;
⛔ Skipped due to learnings
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: CR
PR: qdrant/qdrant#0
File: .github/review-rules.md:0-0
Timestamp: 2025-08-29T08:16:28.512Z
Learning: Applies to **/*.rs : Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
lib/segment/src/index/hnsw_index/hnsw.rs (2)

49-49: Import of StorageGraphLinksVectors is correct.

Matches the new parameterization of GraphLinksFormat.

625-627: Lifetimes: ensure with_param doesn’t outlive combined_provider.

format_param must not retain references past the call to into_graph_layers. If with_param stores by reference only for the duration of into_graph_layers, you’re good; otherwise, switch to passing the provider directly into into_graph_layers to avoid dangling refs.

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

124-134: Base vector write path looks correct.

Level-0-only emission, length check against layout, and offset tracking are consistent with the documented layout.

lib/segment/src/index/hnsw_index/graph_links.rs (2)

97-106: Graceful error on missing base vectors.

Good use of get_vector_bytes_opt::<Sequential> and precise error message.

272-275: API change: returning base slice + zipped iterator is clean.

The tuple shape cleanly separates base vector from per-link vectors and keeps links() unchanged.

lib/segment/src/index/hnsw_index/graph_links/view.rs (2)

80-98: Docs: new on-disk layout description is clear and accurate.

The B/#/c//L/ diagram and per-level notes read well and match the serializer’s stated format.

102-106: Good: explicit vector layout fields.

Using Layout for base vectors and NonZero<usize> for per-link size removes ambiguity and guards zero-sized vectors.

coderabbitai · 2025-09-01T11:27:21Z

lib/segment/src/index/hnsw_index/graph_links.rs

+    fn get_link_vector(&self, point_id: PointOffsetType) -> OperationResult<&[u8]> {
+        Ok(self.quantized_vectors.get_quantized_vector(point_id))
    }


💡 Verification agent

🧩 Analysis chain

Potential panic path via QuantizedVectors::get_quantized_vector.

If this method panics on OOB, an invalid link id would abort serialization. Either validate ids upstream or wrap with a checked accessor if available.

🏁 Script executed:

#!/bin/bash # Inspect QuantizedVectors to confirm behavior of get_quantized_vector on OOB. rg -nP -C3 --type=rust '\bimpl\b.*QuantizedVectors\b' lib | sed -n '1,200p' rg -nP -C3 --type=rust '\bfn\s+get_quantized_vector\s*\(' lib

Length of output: 3005

Validate point_id before indexing quantized_vectors
QuantizedVectors::get_quantized_vector invokes an unchecked slice access and will panic if point_id ≥ number of stored vectors. In get_link_vector, add a bounds check (e.g. compare against self.quantized_vectors.config().vectors_count() or storage_impl.vectors_count()) and return an OperationError instead of letting a panic abort serialization.

🤖 Prompt for AI Agents

In lib/segment/src/index/hnsw_index/graph_links.rs around lines 108 to 110, get_link_vector currently calls self.quantized_vectors.get_quantized_vector(point_id) which can panic on out-of-bounds access; add an explicit bounds check before indexing (e.g. if point_id as usize >= self.quantized_vectors.config().vectors_count() or use storage_impl.vectors_count()), and if out of range return an appropriate OperationError (with a clear message) instead of calling the unchecked accessor so serialization cannot be aborted by a panic.

lib/segment/src/index/hnsw_index/graph_links/header.rs

lib/segment/src/index/hnsw_index/graph_links/view.rs

IvanPleshkov

I don't have any blocking requests, nits and proposals only

IvanPleshkov · 2025-09-02T15:44:07Z

lib/segment/src/index/hnsw_index/graph_links.rs

-    fn get_vector(&self, point_id: PointOffsetType) -> OperationResult<&[u8]> {
-        Ok(self.get_quantized_vector(point_id))
-    }
+pub struct VectorLayout {


I propose to rename this struct because I find non-obvious that VectorLayoutis not a parameter of vector, it's a pair of vector and quantization parameters. We can discuss proper name, I can propose at least LinksVectorsLayout to mark that this struct is related to the links data and not related to vector storage/user vector input etc

Renamed into GraphLinksVectorsLayout and added more comments to this part of code.

IvanPleshkov · 2025-09-02T15:50:30Z

lib/segment/src/index/hnsw_index/graph_links/header.rs

+#[derive(FromBytes, Immutable, IntoBytes, KnownLayout)]
+#[repr(C)]
+/// Packed representation of [`std::alloc::Layout`].
+pub(super) struct PackedVectorLayout {


The same as of VectorLayout: rename proposal because it's not obvious that PackedVectorLayout contains quantization params.

lib/segment/src/index/hnsw_index/graph_links.rs

IvanPleshkov · 2025-09-02T16:07:57Z

lib/segment/src/index/hnsw_index/graph_links/header.rs

-    pub(super) zero_padding: [u8; 4],
+    pub(super) base_vector_layout: PackedVectorLayout,
+    pub(super) link_vector_layout: PackedVectorLayout,
+    pub(super) zero_padding: [u8; 3],


Is it possible to avoid this field and have zeroing automatically in the serializer? Not in this PR, in the near future as a part of a milestone. Or at least can you please make a unit test that checks that all headers should have a fixed size +comments

It's for alignment (#[repr(C, align(8))]), and it's enforced by compiler and the zerocopy derive macro. E.g. try to replace 3 with other value: you'll get a compiler error.

Okay, it there any comment about it? Would you like to add it?

Added // for 8-byte alignment.

IvanPleshkov · 2025-09-02T16:13:03Z

lib/segment/src/index/hnsw_index/graph_links/view.rs

+
+        // Check alignment of link vectors. Alignment of base vectors is checked
+        // in `Layout::from_size_align` later.
+        if !header.link_vector_layout.alignment.is_power_of_two() {


Very unimportant nit. Can we do a separate function PackedVectorLayout::validate(&self) -> OperationResult<()> and move is_power_of_two with a comment?

Added PackedVectorLayout::try_into_layout. The validation happen in Layout::from_size_align().

IvanPleshkov · 2025-09-02T16:17:35Z

lib/segment/src/index/hnsw_index/graph_links/view.rs

        /// `NonZero` to avoid handling unlikely corner cases.
-        vector_size_bytes: NonZero<usize>,
-        vector_alignment: u8,
+        link_vector_size: NonZero<usize>,


Can we use for link_vector_size and link_vector_alignment the Layout type as for base_vector_layout? If yes, you can avoid header.link_vector_layout.alignment.is_power_of_two() check in load_compressed_with_vectors and use better validation Layout::from_size_align for both base and link vectors

It is expanded so NonZero<usize> can be used. <[T]>::chunks_exact(chunk_size) has an assert!(chunk_size != 0) inside. By using NonZero we'll get:

One less branch in a hot loop.

One less potential panic.

lib/segment/src/index/hnsw_index/graph_links/serializer.rs

IvanPleshkov · 2025-09-02T16:36:03Z

lib/segment/src/index/hnsw_index/graph_links/serializer.rs

+    if let Some(vector_layout) = vector_layout.as_ref() {
        let pos = writer.stream_position()? as usize;
-        writer.write_zeros(pos.next_multiple_of(vector_layout.align()) - pos)?;
+        let alignment = vector_layout.base.align().max(vector_layout.link.align());


You did max function between base and link alignments in multiple places. I propose to add VectorLayout::alignment function which calculates max and has a comment why do we need to calculate max

Aligning to both base and link layouts simplifies the GraphLinksView implementation. I'd prefer to keep this logic inlined to avoid ravioli code.

IvanPleshkov · 2025-09-02T16:45:28Z

lib/segment/src/vector_storage/vector_storage_base.rs

    fn get_vector_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<CowVector<'_>>;

+    /// Get the raw bytes of the vector by the given key if it exists
+    fn get_vector_bytes_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<&[u8]>;


Is it possible to move this function from this trait into DenseVectorStorage? And in DenseVectorStorage we can do an implementation of this func using get_dense.
To access it from VectorStorageEnum we can do that same as try_multi_vector_config.
My proposal is ugly but at least we don't touch all storages implementation and avoid an unreachable code.

Thanks, it made things simpler.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

382-384: Fix user-facing typo: “Unsufficent” → “Insufficient”; optionally rename helper.

This error bubbles to users. Correct the message; consider renaming the helper for consistency.

Apply this diff:

-fn error_unsufficent_size() -> OperationError {
-    OperationError::service_error("Unsufficent file size for GraphLinks file")
+fn error_insufficient_size() -> OperationError {
+    OperationError::service_error("Insufficient file size for GraphLinks file")
 }

And update call sites:

-HeaderPlain::ref_from_prefix(data).map_err(|_| error_unsufficent_size())?;
+HeaderPlain::ref_from_prefix(data).map_err(|_| error_insufficient_size())?;

-HeaderCompressed::ref_from_prefix(data).map_err(|_| error_unsufficent_size())?;
+HeaderCompressed::ref_from_prefix(data).map_err(|_| error_insufficient_size())?;

-HeaderCompressedWithVectors::ref_from_prefix(data)
-    .map_err(|_| error_unsufficent_size())?;
+HeaderCompressedWithVectors::ref_from_prefix(data)
+    .map_err(|_| error_insufficient_size())?;

♻️ Duplicate comments (3)

lib/segment/src/index/hnsw_index/graph_links/header.rs (1)

37-52: Freeze ABI: add compile-time size/alignment checks for headers

We previously suggested const asserts to prevent silent ABI drift; still recommended here.

 use zerocopy::{FromBytes, Immutable, IntoBytes, KnownLayout};
+use core::mem::{align_of, size_of};

@@
 pub(super) struct HeaderCompressedWithVectors {
@@
     pub(super) zero_padding: [u8; 3], // for 8-byte alignment
 }
+
+// Compile-time layout sanity checks (fail build if layout changes)
+const _: () = {
+    // HeaderCompressedWithVectors must remain 8-byte aligned
+    assert!(align_of::<HeaderCompressedWithVectors>() == 8);
+    // Add size checks if you rely on a fixed header size in readers
+    // Example:
+    // assert!(size_of::<HeaderCompressedWithVectors>() == EXPECTED_SIZE);
+};

lib/segment/src/index/hnsw_index/graph_links.rs (1)

111-114: Bounds-check link-vector access to prevent panic on OOB.

get_quantized_vector may panic on invalid point_id. Validate the index and return OperationError instead.

Apply this diff:

-    fn get_link_vector(&self, point_id: PointOffsetType) -> OperationResult<&[u8]> {
-        Ok(self.quantized_vectors.get_quantized_vector(point_id))
-    }
+    fn get_link_vector(&self, point_id: PointOffsetType) -> OperationResult<&[u8]> {
+        let idx = point_id as usize;
+        // Prefer the most direct count accessor available in QuantizedVectors.
+        let count = self.quantized_vectors.vectors_count();
+        if idx >= count {
+            return Err(OperationError::service_error(format!(
+                "Point {point_id} out of range for link vectors (count = {count})"
+            )));
+        }
+        Ok(self.quantized_vectors.get_quantized_vector(point_id))
+    }

Run to confirm the exact accessor for total vectors (adjust the check above accordingly):

#!/bin/bash
# Locate the proper count accessor on QuantizedVectors
rg -nP --type=rust -C2 '\bimpl\b.*QuantizedVectors\b' lib
rg -nP --type=rust -C2 '\bfn\s+(vectors_count|len|len_vectors|config|storage_impl)\b' lib

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

186-188: Padding uses validated alignments; previous concern resolved.

Computing with max(validated_base.align(), validated_link.align()) is correct.

🧹 Nitpick comments (9)

lib/segment/src/vector_storage/vector_storage_base.rs (2)

157-166: Good addition: zero-copy bytes + safe Layout derivation

Default impls look correct and avoid allocations. Minor: consider adding #[inline] to both for hot paths.
-    fn get_dense_bytes_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<&[u8]> {
+    #[inline]
+    fn get_dense_bytes_opt<P: AccessPattern>(&self, key: PointOffsetType) -> Option<&[u8]> {
@@
-    fn get_dense_vector_layout(&self) -> OperationResult<Layout> {
+    #[inline]
+    fn get_dense_vector_layout(&self) -> OperationResult<Layout> {
703-751: Layout accessor: early-returns are fine; consider unifying style

Works as-is. If desired, you can return inside the match for dense and fall through to a single error (current approach), or collect to a variable for a slightly cleaner flow. No functional issue.

lib/segment/src/index/hnsw_index/graph_links/header.rs (1)

57-70: Avoid possible usize truncation on 32-bit in try_into_layout

Cast from u64 to usize can truncate on 32-bit. Use try_from to be safe; message clarifies the failure mode.

 impl PackedVectorLayout {
     pub(super) fn try_into_layout(self) -> OperationResult<Layout> {
-        Layout::from_size_align(self.size.get() as usize, self.alignment as usize)
-            .map_err(|_| OperationError::service_error("Invalid vector layout"))
+        let size = usize::try_from(self.size.get())
+            .map_err(|_| OperationError::service_error("Vector layout size doesn't fit usize"))?;
+        Layout::from_size_align(size, self.alignment as usize)
+            .map_err(|_| OperationError::service_error("Invalid vector layout"))
     }
 }

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (3)

32-49: Layout validation is good; error strings could be more specific

Validations are correct. Consider including expected/actual in errors for faster debugging.

-            if vectors_layout.base.size() % vectors_layout.base.align() != 0 {
-                return Err(OperationError::service_error(
-                    "Base vector size must be a multiple of its alignment",
-                ));
-            }
+            if vectors_layout.base.size() % vectors_layout.base.align() != 0 {
+                return Err(OperationError::service_error(format!(
+                    "Base vector size must be a multiple of its alignment (size={}, align={})",
+                    vectors_layout.base.size(),
+                    vectors_layout.base.align()
+                )));
+            }
@@
-            if vectors_layout.link.size() % vectors_layout.link.align() != 0 {
-                return Err(OperationError::service_error(
-                    "Link vector size must be a multiple of its alignment",
-                ));
-            }
+            if vectors_layout.link.size() % vectors_layout.link.align() != 0 {
+                return Err(OperationError::service_error(format!(
+                    "Link vector size must be a multiple of its alignment (size={}, align={})",
+                    vectors_layout.link.size(),
+                    vectors_layout.link.align()
+                )));
+            }

122-166: Write order/details look correct; tighten mismatch errors

The B/#/c//L/ sequencing matches the spec. Improve mismatch errors to indicate base vs link and include sizes to aid triage.

-                        if vector.len() != vectors_layout.base.size() {
-                            return Err(OperationError::service_error("Vector size mismatch"));
-                        }
+                        if vector.len() != vectors_layout.base.size() {
+                            return Err(OperationError::service_error(format!(
+                                "Base vector size mismatch: expected {}, got {}",
+                                vectors_layout.base.size(),
+                                vector.len()
+                            )));
+                        }
@@
-                        if vector.len() != vectors_layout.link.size() {
-                            return Err(OperationError::service_error("Vector size mismatch"));
-                        }
+                        if vector.len() != vectors_layout.link.size() {
+                            return Err(OperationError::service_error(format!(
+                                "Link vector size mismatch: expected {}, got {}",
+                                vectors_layout.link.size(),
+                                vector.len()
+                            )));
+                        }

239-244: Consider moving pack_layout next to PackedVectorLayout

Minor cohesion tweak: implement From for PackedVectorLayout in header.rs and reuse here.

-fn pack_layout(layout: &Layout) -> PackedVectorLayout {
-    PackedVectorLayout {
-        size: LittleU64::new(layout.size() as u64),
-        alignment: u8::try_from(layout.align()).expect("Alignment must fit in u8"),
-    }
-}
+// In header.rs
+impl From<&Layout> for PackedVectorLayout {
+    fn from(layout: &Layout) -> Self {
+        PackedVectorLayout {
+            size: LittleU64::new(layout.size() as u64),
+            alignment: u8::try_from(layout.align()).expect("Alignment must fit in u8"),
+        }
+    }
+}
+// Here:
+// base_vector_layout: PackedVectorLayout::from(&vectors_layout.base),

lib/segment/src/index/hnsw_index/graph_links/view.rs (3)

291-299: Add a bounds debug-assert before slicing the base vector.

Helps catch corrupt files in debug without affecting release perf.

Apply this diff:

-                if level == 0 {
-                    base_vector = &neighbors[pos..pos + base_vector_layout.size()];
+                if level == 0 {
+                    debug_assert!(pos + base_vector_layout.size() <= end);
+                    base_vector = &neighbors[pos..pos + base_vector_layout.size()];
                     debug_assert!(base_vector.as_ptr().addr() % base_vector_layout.align() == 0);
                     pos += base_vector_layout.size();
                 }

320-327: Assert link-vectors region bounds; exclude overflow.

Defensive in debug: ensure pos + bytes doesn’t exceed end and that multiplication didn’t overflow.

Apply this diff:

-                let link_vector_bytes = (neighbors_count as usize) * link_vector_size.get();
-                let link_vectors = &neighbors[pos..pos + link_vector_bytes];
+                let link_vector_bytes = (neighbors_count as usize)
+                    .checked_mul(link_vector_size.get())
+                    .expect("link_vector_bytes overflow");
+                debug_assert!(pos + link_vector_bytes <= end, "Corrupt GraphLinks: vectors region out of bounds");
+                let link_vectors = &neighbors[pos..pos + link_vector_bytes];
                 debug_assert!(link_vectors.as_ptr().addr() % link_vector_alignment as usize == 0);

205-210: Prefer NonZeroUsize for alignment to avoid casts and zero-state.

Keeps types honest and removes the u8 cast at use sites.

Apply this diff and adjust uses at lines 321 and 326:

-                link_vector_size: NonZero::try_from(link_vector_layout.size()).map_err(|_| {
+                link_vector_size: NonZero::try_from(link_vector_layout.size()).map_err(|_| {
                     OperationError::service_error("Zero link vector size in GraphLinks file")
                 })?,
-                link_vector_alignment: link_vector_layout.align() as u8,
+                link_vector_alignment: std::num::NonZeroUsize::new(link_vector_layout.align())
+                    .expect("non-zero alignment guaranteed by Layout"),

And:

-                pos = pos.next_multiple_of(link_vector_alignment as usize);
+                pos = pos.next_multiple_of(link_vector_alignment.get());

-                debug_assert!(link_vectors.as_ptr().addr() % link_vector_alignment as usize == 0);
+                debug_assert!(link_vectors.as_ptr().addr() % link_vector_alignment.get() == 0);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between cc5bccb and c5f4c62.

📒 Files selected for processing (11)

lib/segment/src/fixtures/index_fixtures.rs (2 hunks)
lib/segment/src/index/hnsw_index/graph_layers.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_links.rs (8 hunks)
lib/segment/src/index/hnsw_index/graph_links/header.rs (3 hunks)
lib/segment/src/index/hnsw_index/graph_links/serializer.rs (7 hunks)
lib/segment/src/index/hnsw_index/graph_links/view.rs (9 hunks)
lib/segment/src/index/hnsw_index/hnsw.rs (2 hunks)
lib/segment/src/index/hnsw_index/tests/mod.rs (1 hunks)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1 hunks)
lib/segment/src/vector_storage/vector_storage_base.rs (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
lib/segment/src/index/hnsw_index/graph_layers.rs
lib/segment/src/index/hnsw_index/tests/mod.rs
lib/segment/src/fixtures/index_fixtures.rs
lib/segment/src/index/hnsw_index/hnsw.rs
lib/segment/src/index/hnsw_index/graph_layers_builder.rs

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs

🧠 Learnings (16)

📓 Common learnings

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

📚 Learning: 2025-09-01T11:19:26.330Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.330Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-06-02T18:10:47.203Z

Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/header.rs
lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-21T13:46:10.866Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.866Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

lib/segment/src/vector_storage/vector_storage_base.rs
lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-21T13:43:52.191Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.191Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

lib/segment/src/vector_storage/vector_storage_base.rs

📚 Learning: 2025-08-06T09:56:59.311Z

Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs
lib/segment/src/index/hnsw_index/graph_links/view.rs

📚 Learning: 2025-08-22T13:49:39.261Z

Learnt from: xzfc
PR: qdrant/qdrant#7118
File: lib/segment/src/index/hnsw_index/graph_links/serializer.rs:50-54
Timestamp: 2025-08-22T13:49:39.261Z
Learning: In Qdrant's graph serialization code (lib/segment/src/index/hnsw_index/graph_links/serializer.rs), the team prefers to keep the code simple rather than adding defensive input validation for edge cases that don't occur in practice, relying on unit tests to catch any unexpected issues early.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs
lib/segment/src/index/hnsw_index/graph_links/serializer.rs

📚 Learning: 2025-08-10T18:31:56.855Z

Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/point_ops.rs:501-528
Timestamp: 2025-08-10T18:31:56.855Z
Learning: In Qdrant, batch operations validate that `ids`, `vectors`, and `payloads` (if present) have matching lengths at the REST API level in `lib/api/src/rest/validate.rs` through the `Validate` trait implementation for `Batch`. This validation happens before data is converted to internal structures like `PointInsertOperationsInternal`, so methods operating on these internal structures can safely assume the lengths match.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-10T18:30:02.986Z

Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links.rs

📚 Learning: 2025-08-15T11:24:38.974Z

Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:0-0
Timestamp: 2025-08-15T11:24:38.974Z
Learning: The `usize::is_multiple_of` method in Rust does not trigger a division-by-zero panic when called with 0. This method handles the zero case safely without panicking.

Applied to files:

lib/segment/src/index/hnsw_index/graph_links/view.rs

🧬 Code graph analysis (5)

lib/segment/src/index/hnsw_index/graph_links/header.rs (1)

lib/segment/src/vector_storage/vector_storage_base.rs (1)

Layout (164-164)

lib/segment/src/vector_storage/vector_storage_base.rs (5)

lib/segment/src/vector_storage/dense/appendable_dense_vector_storage.rs (1)

get_dense (88-92)

lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)

get_dense (236-238)

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1)

get_dense (83-85)

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (1)

get_dense (147-153)

lib/segment/src/common/operation_error.rs (1)

service_error (84-89)

lib/segment/src/index/hnsw_index/graph_links.rs (3)

lib/collection/src/operations/types.rs (2)

service_error (1022-1027)

iter (826-828)

lib/segment/src/index/hnsw_index/graph_links/view.rs (1)

links (215-246)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (1)

serialize_graph_links (23-237)

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (2)

lib/quantization/src/encoded_vectors_binary.rs (11)

vector (217-217)

vector (225-225)

vector (238-238)

vector (246-246)

vector (324-324)

vector (332-332)

vector (345-345)

vector (353-353)

vector (366-366)

vector (374-374)

layout (820-826)

lib/segment/src/index/hnsw_index/graph_links.rs (4)

vectors_layout (83-83)

vectors_layout (116-121)

vectors_layout (370-375)

new (344-358)

lib/segment/src/index/hnsw_index/graph_links/view.rs (2)

lib/common/memory/src/madvise.rs (6)

madvise (93-95)

madvise (101-101)

madvise (107-113)

madvise (131-137)

will_need_multiple_pages (184-209)

will_need_multiple_pages (212-212)

lib/segment/src/index/hnsw_index/graph_links.rs (1)

links (267-269)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: test-consensus-compose
GitHub Check: e2e-tests
GitHub Check: integration-tests-consensus
GitHub Check: integration-tests
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: test-consistency
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
GitHub Check: storage-compat-test
GitHub Check: lint

🔇 Additional comments (8)

lib/segment/src/vector_storage/vector_storage_base.rs (1)

656-701: Enum delegation: bytes accessor LGTM

Clean delegation across dense variants; returning None for non-dense is appropriate.

lib/segment/src/index/hnsw_index/graph_links/header.rs (1)

49-52: Header fields for base/link layouts: nice, explicit separation

Storing both layouts in the header matches the new on-disk format and keeps parsing straightforward.

lib/segment/src/index/hnsw_index/graph_links/serializer.rs (2)

90-95: Initial neighbors padding: alignment choice is correct

Using max(base.align, link.align) here ensures the first base and the first link vectors can both be aligned. Good.

218-233: Header population for PackedVectorLayout: matches layout model

Header fields are written consistently with computed layouts and metadata. Looks good.

lib/segment/src/index/hnsw_index/graph_links.rs (2)

98-109: Good call using Sequential for base vectors; solid error on miss.

Sequential read matches serializer traversal; the explicit OperationError helps locate missing points.

271-279: Wrapper return shape is correct.

Zipping links with vectors here matches the public API expectation; clean separation of base slice vs. per-link pairs.

lib/segment/src/index/hnsw_index/graph_links/view.rs (2)

80-98: Docs clearly describe the on-disk layout B/#/c//L/.

This matches serializer semantics and helps future maintainers.

171-177: Validated Layouts from header: good.

Using try_into_layout() ensures size/align are checked before use.

* Add VectorStorageEnum::{get_vector_bytes_opt, get_vector_layout} * Update CompressedWithVectors format to include base vectors * StorageGraphLinksVectors * Clarify code

xzfc added this to the Graph with vectors milestone Aug 28, 2025

xzfc requested review from generall and timvisee August 28, 2025 18:28

This comment was marked as resolved.

Sign in to view

xzfc force-pushed the hnsw-with-base-vectors branch from 26a77f6 to 1f203c5 Compare August 29, 2025 09:55

This comment was marked as resolved.

Sign in to view

xzfc force-pushed the hnsw-with-base-vectors branch from 1f203c5 to 2a1c27c Compare August 29, 2025 10:09

github-actions bot mentioned this pull request Aug 29, 2025

Flaky test save_on_disk::tests::test_wait_for_multiple_notifications #7156

Open

xzfc force-pushed the hnsw-with-base-vectors branch from 2a1c27c to 3f7b1af Compare August 29, 2025 12:55

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

xzfc force-pushed the hnsw-with-base-vectors branch from 3f7b1af to 3f3f463 Compare August 30, 2025 11:55

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Sep 1, 2025

xzfc force-pushed the hnsw-with-base-vectors branch from 3f3f463 to cc5bccb Compare September 1, 2025 11:16

xzfc requested a review from IvanPleshkov September 1, 2025 11:17

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

xzfc mentioned this pull request Sep 1, 2025

HNSW: use new graph format during search #7195

Merged

IvanPleshkov approved these changes Sep 2, 2025

View reviewed changes

xzfc added 4 commits September 3, 2025 04:26

Add VectorStorageEnum::{get_vector_bytes_opt, get_vector_layout}

e1d6b9e

Update CompressedWithVectors format to include base vectors

dcfa18d

StorageGraphLinksVectors

70f1415

Clarify code

c5f4c62

xzfc force-pushed the hnsw-with-base-vectors branch from cc5bccb to c5f4c62 Compare September 3, 2025 06:49

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

xzfc merged commit 1e4a39a into dev Sep 3, 2025
16 checks passed

xzfc deleted the hnsw-with-base-vectors branch September 3, 2025 07:28

coderabbitai bot mentioned this pull request Sep 8, 2025

Integrate hnsw_with_vectors #7232

Merged

coderabbitai bot mentioned this pull request Feb 25, 2026

use Cow<'_, [T]> as return type in dense vector storages #8221

Merged

coderabbitai bot mentioned this pull request Feb 27, 2026

Deferred threshold integration #8246

Merged

Conversation

xzfc commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IvanPleshkov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

xzfc commented Aug 28, 2025 •

edited

Loading

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)