Skip to content

Chunked mmap as a quantization storage#7116

Merged
IvanPleshkov merged 1 commit intodevfrom
chunked-mmap-as-a-quantization-storage
Aug 25, 2025
Merged

Chunked mmap as a quantization storage#7116
IvanPleshkov merged 1 commit intodevfrom
chunked-mmap-as-a-quantization-storage

Conversation

@IvanPleshkov
Copy link
Contributor

This PR adds implementation of quantization storage for ChunkedMmap with persist upsert and flush implementation.

Also this PR adds ChunkedMmap variant as a quantization storage.

This PR does not have unit test because ChunkedMmap does not have any usage and QuantizedVectors does not have a create/load mechanism. This PR is mostly about compilation correctness; unit tests are the next step

@coderabbitai

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

176-180: Include BinaryChunkedMmap in default_rescoring.

default_rescoring() returns true for Binary RAM/MMAP but omits the new BinaryChunkedMmap single-vector variant. Unless intentionally excluded, this likely should be included to preserve behavior parity.

-        matches!(
-            self.storage_impl,
-            QuantizedVectorStorage::BinaryRam(_) | QuantizedVectorStorage::BinaryMmap(_)
-        )
+        matches!(
+            self.storage_impl,
+            QuantizedVectorStorage::BinaryRam(_)
+                | QuantizedVectorStorage::BinaryMmap(_)
+                | QuantizedVectorStorage::BinaryChunkedMmap(_)
+        )
🧹 Nitpick comments (3)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)

197-205: Flusher error mapping is OK, minor style simplification possible.

Logic mirrors other storages and properly maps domain errors into io::Error. Optional: use the same minimal form used elsewhere for consistency.

-        Box::new(move || {
-            flusher().map_err(|e| {
-                std::io::Error::other(format!("Failed to flush multivector offsets storage: {e}"))
-            })?;
-            Ok(())
-        })
+        Box::new(move || flusher().map_err(|e| {
+            std::io::Error::other(format!("Failed to flush multivector offsets storage: {e}"))
+        }))
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)

14-22: Upsert error conversion is fine; optional size sanity check.

Mapping domain errors to io::Error::other matches trait expectations. If you want extra safety, you could assert stored vector length consistency when replacing an existing entry (when id < self.len()), but given EncodedStorage lacks size metadata and encoders enforce sizes upstream, this is optional.


32-39: Flusher closure can be simplified (optional).

Same outcome with a slightly cleaner closure.

-        Box::new(move || {
-            Ok(flusher().map_err(|e| {
-                std::io::Error::other(format!("Failed to flush quantization storage: {e}"))
-            })?)
-        })
+        Box::new(move || flusher().map_err(|e| {
+            std::io::Error::other(format!("Failed to flush quantization storage: {e}"))
+        }))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5eb4011 and 1f961fd.

📒 Files selected for processing (5)
  • lib/segment/src/vector_storage/quantized/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (12 hunks)
🧰 Additional context used
🧠 Learnings (12)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.139Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/src/vector_storage/quantized/mod.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

  • lib/segment/src/vector_storage/quantized/mod.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

  • lib/segment/src/vector_storage/quantized/mod.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

  • lib/segment/src/vector_storage/quantized/mod.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-21T13:43:52.139Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.139Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-21T13:45:05.888Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.888Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-08-21T13:46:10.824Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.824Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` does not return a `Result` because the underlying `mmap.populate()` method doesn't return a Result either. Adding an unnecessary `Ok(())` would cause Clippy to complain.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🧬 Code graph analysis (3)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (12)
  • get (270-272)
  • get (361-363)
  • len (178-180)
  • len (351-353)
  • flusher (320-332)
  • flusher (380-382)
  • insert (198-205)
  • insert (394-401)
  • files (334-342)
  • files (370-372)
  • immutable_files (344-346)
  • immutable_files (375-377)
lib/segment/src/vector_storage/chunked_vector_storage.rs (1)
  • insert (35-40)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (4)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (21)
  • std (533-533)
  • upsert_vector (425-468)
  • is_on_disk (371-373)
  • vectors_count (346-348)
  • vectors_count (470-472)
  • flusher (43-43)
  • flusher (107-109)
  • flusher (171-174)
  • flusher (197-205)
  • flusher (474-482)
  • new (252-264)
  • files (45-45)
  • files (111-113)
  • files (176-178)
  • files (217-219)
  • files (484-488)
  • immutable_files (47-47)
  • immutable_files (115-117)
  • immutable_files (180-182)
  • immutable_files (221-223)
  • immutable_files (490-494)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (5)
  • upsert_vector (1366-1428)
  • is_on_disk (131-152)
  • flusher (1342-1364)
  • files (359-382)
  • immutable_files (384-407)
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
  • new (35-53)
lib/segment/src/vector_storage/chunked_vectors.rs (1)
  • quantization (172-223)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (9)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2)
  • is_on_disk (371-373)
  • storage (248-250)
lib/quantization/src/encoded_vectors.rs (1)
  • is_on_disk (40-40)
lib/quantization/src/encoded_vectors_pq.rs (3)
  • is_on_disk (495-497)
  • storage (50-52)
  • get_quantized_vector (462-464)
lib/quantization/src/encoded_vectors_binary.rs (13)
  • is_on_disk (854-856)
  • storage (403-405)
  • get_quantized_vector (802-804)
  • vector (217-217)
  • vector (225-225)
  • vector (238-238)
  • vector (246-246)
  • vector (324-324)
  • vector (332-332)
  • vector (345-345)
  • vector (353-353)
  • vector (366-366)
  • vector (374-374)
lib/quantization/src/encoded_vectors_u8.rs (3)
  • is_on_disk (373-375)
  • storage (42-44)
  • get_quantized_vector (315-317)
lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs (2)
  • is_on_disk (59-61)
  • build (99-116)
lib/segment/src/vector_storage/vector_storage_base.rs (3)
  • is_on_disk (48-48)
  • is_on_disk (722-766)
  • populate (532-577)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (2)
  • build (106-114)
  • populate (19-21)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
  • populate (447-452)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: integration-tests
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: lint
🔇 Additional comments (11)
lib/segment/src/vector_storage/quantized/mod.rs (1)

1-1: Module wiring looks correct.

Private module added and ordered consistently with the rest. No API surface changes.

lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)

100-147: New storage variants are correctly dispatched in scorer builder.

All six ChunkedMmap variants route to the same scorer constructors as their RAM/MMAP peers. This maintains parity and avoids code paths divergence.

lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (1)

225-229: Unreachable load is acceptable as a temporary guard.

Given the current architecture does not support loading ChunkedMmapVectors<MultivectorOffset> directly from a single file, keeping this unreachable! with a comment is fine until the migration in #7113 lands.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (8)

104-108: Binary multivector uses u8 bitstore with ChunkedMmap — correct.

The BinaryChunkedMmapMulti alias keeps the intentional u8 bitstore for multivectors (vs u128 for single), consistent with prior design and perf rationale.


112-128: New ChunkedMmap variants are added consistently across the enum.

Variant naming and parameterization match existing RAM/MMAP patterns and downstream usage sites (files, flusher, populate, etc.).


1299-1330: Populate calls cover quantized data; offsets remain unpopulated (by design?).

For ChunkedMmap multivector storages, populate() is invoked for the quantized data via ChunkedVectorStorage::populate(...), but offsets (also chunked mmap) are not populated. The offsets trait doesn’t expose populate, so this mirrors existing MMAP behavior for offsets. If you expect sequential scans over offsets on startup, consider a follow-up to either:

  • add an (optional) populate() to MultivectorOffsetsStorage, or
  • explicitly populate offsets where the concrete storage type is known.

359-382: Files list inclusion looks complete for new variants.

All six ChunkedMmap variants are correctly delegated in files(). Same applies to immutable_files() below.


1346-1362: Flusher wiring includes all new variants.

Delegation is exhaustive and consistent with existing variants.


1379-1426: Upsert dispatch covers all new variants.

Dense and multivector upserts route appropriately; no gaps found.


212-231: Layout retrieval correctly errors for all multivector variants.

New multivector ChunkedMmap variants are included in the error branch; single-vector variants delegate to their storages’ layout() — good.


28-30: Imports for ChunkedMmap are scoped right.

Brings in both ChunkedMmapVectors and ChunkedVectorStorage for populate paths; matches usage below.

Comment on lines +10 to +12
fn get_vector_data(&self, index: PointOffsetType) -> &[u8] {
ChunkedVectorStorage::get(self, index as VectorOffsetType).unwrap_or_default()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t silently return an empty slice on out-of-bounds vector access.

Returning [] hides logic errors and can cascade into incorrect scoring or layout checks. Since the trait can’t return Result, assert in debug and panic in release for out-of-bounds to match MMAP/RAM behaviors.

-    fn get_vector_data(&self, index: PointOffsetType) -> &[u8] {
-        ChunkedVectorStorage::get(self, index as VectorOffsetType).unwrap_or_default()
-    }
+    fn get_vector_data(&self, index: PointOffsetType) -> &[u8] {
+        let opt = ChunkedVectorStorage::get(self, index as VectorOffsetType);
+        debug_assert!(
+            opt.is_some() || (index as usize) >= self.len(),
+            "quantized vector index {} out of bounds (len = {})",
+            index,
+            self.len()
+        );
+        opt.unwrap_or_else(|| {
+            panic!(
+                "quantized vector index {} out of bounds (len = {})",
+                index,
+                self.len()
+            )
+        })
+    }
🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
around lines 10 to 12, replace the current silent fallback (unwrap_or_default)
with a debug assertion and a release panic: add a debug_assert that the
requested index is within bounds, then call
ChunkedVectorStorage::get(...).expect(...) (or unwrap()) with a clear error
message so out-of-bounds access panics in release and asserts in debug rather
than returning an empty slice.

Comment on lines +186 to +193
fn get_offset(&self, idx: PointOffsetType) -> MultivectorOffset {
ChunkedVectorStorage::get(self, idx as VectorOffsetType)
.and_then(|offsets| offsets.first())
.cloned()
.unwrap_or_default()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid silently defaulting to zero offsets; align with RAM/MMAP semantics.

Current implementation returns MultivectorOffset::default() when an offset is missing. RAM/MMAP variants index directly and panic on out-of-bounds, which surfaces bugs early. Returning a default here can mask data corruption and yield incorrect scoring (effectively a zero-length inner set).

Recommend asserting presence and panicking if absent to match existing semantics.

-    fn get_offset(&self, idx: PointOffsetType) -> MultivectorOffset {
-        ChunkedVectorStorage::get(self, idx as VectorOffsetType)
-            .and_then(|offsets| offsets.first())
-            .cloned()
-            .unwrap_or_default()
-    }
+    fn get_offset(&self, idx: PointOffsetType) -> MultivectorOffset {
+        ChunkedVectorStorage::get(self, idx as VectorOffsetType)
+            .and_then(|offsets| offsets.first().cloned())
+            .unwrap_or_else(|| {
+                panic!(
+                    "Multivector offset for idx {} not found (len = {})",
+                    idx,
+                    ChunkedVectorStorage::len(self)
+                )
+            })
+    }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
around lines 186 to 191, the current get_offset silently returns
MultivectorOffset::default() when no offsets are found which can mask
out-of-bounds/data corruption; change this to fail fast to match RAM/MMAP
semantics by replacing the unwrap_or_default with a panic/assert (e.g. use
expect or unwrap with a clear message) that includes the requested idx (and/or
VectorOffsetType) so missing offsets cause an immediate panic and surface the
bug.

coderabbitai[bot]

This comment was marked as resolved.

@IvanPleshkov IvanPleshkov requested review from timvisee and xzfc August 21, 2025 15:26
@IvanPleshkov IvanPleshkov force-pushed the chunked-mmap-as-a-quantization-storage branch from c473a0f to c46759c Compare August 22, 2025 12:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

175-180: Include BinaryChunkedMmap in default rescoring decision.

Binary storage default-rescores for RAM/MMAP; excluding BinaryChunkedMmap changes behavior for the new backend only. Add it for parity.

     pub fn default_rescoring(&self) -> bool {
         matches!(
             self.storage_impl,
-            QuantizedVectorStorage::BinaryRam(_) | QuantizedVectorStorage::BinaryMmap(_)
+            QuantizedVectorStorage::BinaryRam(_)
+                | QuantizedVectorStorage::BinaryMmap(_)
+                | QuantizedVectorStorage::BinaryChunkedMmap(_)
         )
     }
♻️ Duplicate comments (1)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1)

10-12: Don’t return an empty slice on OOB; assert in debug and panic in release.

Returning [] hides logic errors and will cascade into bad scorer behavior. Match existing MMAP/RAM semantics: assert in debug, panic in release with a clear message.

Apply:

-    fn get_vector_data(&self, index: PointOffsetType) -> &[u8] {
-        ChunkedVectorStorage::get(self, index as VectorOffsetType).unwrap_or_default()
-    }
+    fn get_vector_data(&self, index: PointOffsetType) -> &[u8] {
+        let opt = ChunkedVectorStorage::get(self, index as VectorOffsetType);
+        debug_assert!(
+            opt.is_some() || (index as usize) >= self.len(),
+            "quantized vector index {} out of bounds (len = {})",
+            index,
+            self.len()
+        );
+        opt.unwrap_or_else(|| {
+            panic!(
+                "quantized vector index {} out of bounds (len = {})",
+                index,
+                self.len()
+            )
+        })
+    }
🧹 Nitpick comments (3)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)

32-39: Flusher closure: simplify and align style with multivector offsets storage.

Current Ok(flusher().map_err(...)?) is harder to read. Mirror the pattern used in multivector offsets flusher for consistency.

     fn flusher(&self) -> MmapFlusher {
         let flusher = self.flusher();
         Box::new(move || {
-            Ok(flusher().map_err(|e| {
-                std::io::Error::other(format!("Failed to flush quantization storage: {e}"))
-            })?)
+            flusher().map_err(|e| {
+                std::io::Error::other(format!("Failed to flush quantization storage: {e}"))
+            })?;
+            Ok(())
         })
     }

14-22: Include context in upsert errors (id, expected vs. actual lengths if available).

Mapping to io::Error::other is fine, but adding minimal context greatly helps diagnosing partial chunk flush issues during upserts.

-        ChunkedVectorStorage::insert(self, id as VectorOffsetType, vector, hw_counter)
-            .map_err(std::io::Error::other)
+        ChunkedVectorStorage::insert(self, id as VectorOffsetType, vector, hw_counter)
+            .map_err(|err| std::io::Error::other(format!("upsert id={id}: {err}")))

Note: if ChunkedVectorStorage exposes vector-length metadata, consider a debug_assert on vector.len() to catch mismatches early in debug builds.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

754-1010: Note: load/create still target RAM/MMAP backends only (no chunked).

This matches the PR description (compile-time wiring first, load/create to follow). Consider adding a brief TODO in the builder paths to avoid future regressions when chunked loaders arrive.

Would you like a follow-up patch that adds TODOs near create_*/load_* noting that chunked mmap backends are only constructed by external builders today?

Also applies to: 1121-1395

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c473a0f and c46759c.

📒 Files selected for processing (6)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs
  • lib/segment/src/vector_storage/quantized/mod.rs
🧰 Additional context used
🧠 Learnings (14)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.139Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.888Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-15T11:41:10.629Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-11T00:37:34.100Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-21T13:45:05.888Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.888Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-21T13:43:52.139Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:43:52.139Z
Learning: The `quantized_vector_size()` method is available on the `EncodedVectors` trait, not on the `EncodedStorage` trait. The `EncodedStorage` trait provides methods like `get_vector_data`, `vectors_count`, `is_on_disk`, `upsert_vector`, `flusher`, `files`, and `immutable_files`.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-14T11:31:21.777Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7048
File: lib/quantization/src/encoded_storage.rs:61-79
Timestamp: 2025-08-14T11:31:21.777Z
Learning: In test storage implementations (like TestEncodedStorage in lib/quantization/src/encoded_storage.rs), IvanPleshkov prefers to keep the code simple rather than adding complex overflow protection, since test storage is not used in production and can be allowed to panic on edge cases.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-21T13:46:10.824Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7113
File: lib/quantization/src/encoded_vectors_u8.rs:202-211
Timestamp: 2025-08-21T13:46:10.824Z
Learning: In `EncodedStorage` trait implementations, calling `get_vector_data(0)` for validation purposes is not safe because the storage may be empty (vectors_count() == 0), which would cause issues when trying to access the first vector.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-18T10:56:43.707Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-04-07T23:31:22.515Z
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` does not return a `Result` because the underlying `mmap.populate()` method doesn't return a Result either. Adding an unnecessary `Ok(())` would cause Clippy to complain.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
📚 Learning: 2025-03-29T17:43:28.174Z
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
📚 Learning: 2025-08-06T09:56:59.311Z
Learnt from: xzfc
PR: qdrant/qdrant#6982
File: lib/segment/src/index/hnsw_index/graph_links/view.rs:217-220
Timestamp: 2025-08-06T09:56:59.311Z
Learning: In Qdrant's GraphLinksView implementation (lib/segment/src/index/hnsw_index/graph_links/view.rs), methods like links() and links_with_vectors() intentionally use unwrap() and can panic for performance reasons, maintaining consistency across similar methods. Error handling improvements are considered secondary to feature completion and require benchmarking before implementation.

Applied to files:

  • lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs
🧬 Code graph analysis (2)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (6)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (2)
  • is_on_disk (340-342)
  • storage (245-247)
lib/quantization/src/encoded_vectors_u8.rs (3)
  • is_on_disk (367-369)
  • storage (42-44)
  • get_quantized_vector (326-328)
lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs (2)
  • is_on_disk (61-63)
  • build (101-118)
lib/segment/src/types.rs (6)
  • is_on_disk (555-560)
  • is_on_disk (1262-1271)
  • is_on_disk (1418-1423)
  • is_on_disk (1480-1488)
  • is_on_disk (1891-1902)
  • is_on_disk (2013-2018)
lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs (2)
  • build (108-116)
  • populate (19-21)
lib/segment/src/vector_storage/chunked_mmap_vectors.rs (1)
  • populate (447-452)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)
lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs (10)
  • std (502-502)
  • upsert_vector (394-437)
  • vectors_count (326-328)
  • vectors_count (439-441)
  • flusher (41-41)
  • flusher (107-109)
  • flusher (173-176)
  • flusher (199-207)
  • flusher (443-451)
  • new (249-261)
lib/segment/src/vector_storage/chunked_vectors.rs (1)
  • quantization (172-223)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: lint
  • GitHub Check: e2e-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: storage-compat-test
🔇 Additional comments (15)
lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs (2)

24-31: LGTM: diskness and count delegation are correct.

is_on_disk() hardcoded to true and vectors_count() delegating to len() match the storage characteristics.


41-47: LGTM: file enumeration delegates to underlying storage.

Delegating files()/immutable_files() keeps the surface consistent with other storage backends.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (13)

28-29: LGTM: imports are minimal and scoped.

Only the traits actually used for chunked mmap integration are imported.


74-77: LGTM: multivector alias wiring for ScalarChunkedMmapMulti.

Type parameters line up: inner encoded vectors on ChunkedMmapVectors<u8> and offsets on ChunkedMmapVectors<MultivectorOffset>.


89-92: LGTM: PQChunkedMmapMulti alias consistent with PQ single-vector types.

Uses EncodedVectorsPQ<ChunkedMmapVectors<u8>> for inner vectors and chunked offsets; matches RAM/MMAP patterns.


104-107: LGTM: BinaryChunkedMmapMulti uses u8 bits store (intentional).

Matches the established pattern where multi-binary uses u8 for byte-granular benefits; single-vector binary remains u128 for SIMD.


112-128: New storage variants: parity looks complete across enum.

All three single-vector and three multi-vector chunked mmap variants are added. Keep an eye on downstream exhaustiveness (scorers, populate, files), which appears handled below.


210-231: LGTM: layout retrieval covers all single-vector chunked variants.

Multivector variants correctly return an error.


237-255: LGTM: get_quantized_vector dispatch extended to new chunked variants.

Panic on multivector access remains intentional and unchanged.


300-343: LGTM: internal scorer dispatch covers all new variants (single and multi).

Uniformly reuses the helper build with EncodedVectors bound; no gaps.


363-383: LGTM: files() now includes chunked variants.

Downstream cache clearing and snapshot logic will pick these up automatically.


389-408: LGTM: immutable_files() parity for chunked variants.

Matches files() coverage; config path is appended once after the match.


1474-1506: Populate: correct use of ChunkedVectorStorage::populate with ?.

MMAP uses .populate() (no Result); chunked uses trait helper returning OperationResult. Good error propagation.


1519-1539: LGTM: flusher switch extended to new variants.

Consistent with existing pattern; closure mapping to OperationError remains unchanged.


1548-1601: LGTM: upsert paths wired for all new variants.

Dense vs. multi dispatch preserved; no accidental fall-throughs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants