Clarify multivector inline storage warning by agourlay · Pull Request #7523 · qdrant/qdrant

agourlay · 2025-11-12T13:54:21Z

Multivector supports quantization which creates confusing error messages.

e.g.

CollectionWarning {
    message: "The `hnsw_config.inline_storage` option for vector 'multi-dense-vector-float16-2' is not compatible with multivectors. This option will be ignored.",
},
CollectionWarning {
    message: "The `hnsw_config.inline_storage` option for vector 'multi-dense-vector-on-disk-1' requires quantization to be enabled. This option will be ignored.",
},

By checking first the multivector aspect, we can have an early exit and never pretend that quantization is required.

xzfc · 2025-11-12T14:04:05Z

lib/collection/src/config.rs

                    });
                }
-                if vector_config.multivector_config.is_some() {
+                if !vector_quantization {


Did you intend to use else if here?

Just moved the block around.

I like the else if idea 👍

I thought by "have an early exit" you mean something like this.

I misread the code, I hallucinated a return after push.

So the else if is what I actually wanted, good catch 👍

timvisee

Looks like we have tight testing which needs to be adjusted as well

coderabbitai · 2025-11-12T14:28:16Z

📝 Walkthrough

Walkthrough

This change reorders the conditional logic in the warning generation for vector configurations with inline_storage and HNSW indexing. The code now prioritizes checking for multivector configuration first; if present, it warns that inline_storage is incompatible with multivectors. Otherwise, it warns that quantization is required when inline_storage is enabled. Corresponding warning messages are swapped, and test expectations are updated to reflect that only one warning is generated when both inline_storage and multivectors are present.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The conditional reordering and message swapping require careful verification that the new branch logic remains functionally correct
Test assertion change (warnings count: 2 → 1) needs verification against the new warning generation flow
Impact is localized to warning generation logic, but the logic precedence change warrants attention

Possibly related PRs

Add CollectionInfo::warnings field #7293: Introduces CollectionWarning and the warnings field infrastructure; directly related to the warning generation and propagation code paths modified here
Rename HnswConfig::{copy_vectors -> inline_storage} #7389: Modifies the HnswConfig inline_storage flag and its associated warnings/tests; addresses the same configuration option under review

Suggested reviewers

generall
timvisee

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Clarify multivector inline storage warning' directly and clearly describes the main change: improving the clarity of warning messages for multivector inline storage by reordering condition checks.
Description check	✅ Passed	The description is directly related to the changeset, explaining the problem (confusing warning messages), providing concrete examples, and describing the solution (checking multivectors first to avoid misleading quantization messages).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch clarify-inline-storage-multivector-warning

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bc9050 and 5281d50.

📒 Files selected for processing (2)

lib/collection/src/config.rs (1 hunks)
tests/openapi/test_vector_config.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

lib/collection/src/config.rs

**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

lib/collection/src/config.rs

🧠 Learnings (9)

📓 Common learnings

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

📚 Learning: 2025-08-15T11:42:00.297Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-09-01T11:19:26.371Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7193
File: lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs:17-30
Timestamp: 2025-09-01T11:19:26.371Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_chunked_mmap_storage.rs, the ChunkedMmapVectors underlying implementation does not validate against zero-sized vectors, so adding such validation in QuantizedChunkedMmapStorage::new is unnecessary and would be redundant.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-11T07:57:01.399Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-21T13:45:05.899Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7113
File: lib/quantization/src/encoded_vectors_binary.rs:515-525
Timestamp: 2025-08-21T13:45:05.899Z
Learning: The `EncodedStorage` trait in `lib/quantization/src/encoded_storage.rs` does not have a `quantized_vector_size()` method. To validate vector sizes from storage, use `get_vector_data(index).len()` to get the actual size of stored vectors.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-11T00:37:34.100Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 6986
File: lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs:46-55
Timestamp: 2025-08-11T00:37:34.100Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_ram_storage.rs`, the `ChunkedVectors` used in `QuantizedRamStorage` is intentionally designed to be non-persistent during updates. The `push_vector` method only updates the in-memory vectors without flushing to disk, and this is expected behavior rather than a bug.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-15T11:41:10.629Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:857-857
Timestamp: 2025-08-15T11:41:10.629Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the multivector offset storage has two different patterns: for RAM storage, offsets are collected into Vec<MultivectorOffset> and used directly; for MMAP storage, offsets are consumed to create a file via create_offsets_file_from_iter, then the file is loaded back as MultivectorOffsetsStorageMmap. The direct consumption of offsets iterator in the MMAP case is intentional.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-18T10:56:43.707Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7043
File: lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs:340-347
Timestamp: 2025-08-18T10:56:43.707Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_multivector_storage.rs, the create_offsets_file_from_iter function intentionally requires paths to have a parent directory and returns an error if path.parent() is None. This was a conscious design decision to ensure proper path validation.

Applied to files:

lib/collection/src/config.rs

📚 Learning: 2025-08-11T12:30:47.220Z

Learnt from: IvanPleshkov
Repo: qdrant/qdrant PR: 7009
File: lib/quantization/src/encoded_vectors_binary.rs:614-618
Timestamp: 2025-08-11T12:30:47.220Z
Learning: In `lib/quantization/src/encoded_vectors_binary.rs`, the `encode_two_bits_value` function has two paths: when `vector_stats` is available, it uses mean/stddev for accurate z-score based quantization boundaries; when stats are None (for appendable BQ), it falls back to a simple sign-based heuristic (positive -> (true, true), non-positive -> (false, false)). Having stats available provides higher accuracy for storage encoding.

Applied to files:

lib/collection/src/config.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: lint
GitHub Check: test-shard-snapshot-api-s3-minio
GitHub Check: integration-tests
GitHub Check: e2e-tests
GitHub Check: test-consensus-compose
GitHub Check: integration-tests-consensus
GitHub Check: test-consistency
GitHub Check: rust-tests (macos-latest)
GitHub Check: rust-tests (windows-latest)
GitHub Check: rust-tests (ubuntu-latest)
GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)

🔇 Additional comments (2)

tests/openapi/test_vector_config.py (1)

247-247: LGTM! Test expectation correctly updated.

The reduction from 2 to 1 expected warning aligns with the updated logic in lib/collection/src/config.rs. With the reordered conditional checks, when both inline_storage and multivector_config are present, only the multivector incompatibility warning is generated (avoiding the misleading quantization requirement message).

lib/collection/src/config.rs (1)

291-307: LGTM! Improved warning clarity through reordered checks.

The conditional reordering successfully addresses the PR objective by checking for multivector configuration first. This prevents the misleading "quantization required" message from appearing for multivectors. The use of else if ensures only one warning is generated per vector configuration, which is the correct behavior.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

* Clarify multivector inline storage warning * do what I actually wanted to do in first place * fix test

Clarify multivector inline storage warning

449c855

agourlay marked this pull request as ready for review November 12, 2025 13:57

qdrant deleted a comment from coderabbitai bot Nov 12, 2025

xzfc approved these changes Nov 12, 2025

View reviewed changes

do what I actually wanted to do in first place

6abb191

qdrant deleted a comment from coderabbitai bot Nov 12, 2025

timvisee approved these changes Nov 12, 2025

View reviewed changes

fix test

5281d50

agourlay merged commit b7da071 into dev Nov 12, 2025
15 checks passed

agourlay deleted the clarify-inline-storage-multivector-warning branch November 12, 2025 14:52

timvisee pushed a commit that referenced this pull request Nov 14, 2025

Clarify multivector inline storage warning (#7523)

7a80e23

* Clarify multivector inline storage warning * do what I actually wanted to do in first place * fix test

timvisee mentioned this pull request Nov 14, 2025

Bump version to 1.16.0 #7535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify multivector inline storage warning#7523

Clarify multivector inline storage warning#7523
agourlay merged 3 commits intodevfrom
clarify-inline-storage-multivector-warning

agourlay commented Nov 12, 2025 •

edited

Loading

Uh oh!

xzfc Nov 12, 2025

Uh oh!

agourlay Nov 12, 2025

Uh oh!

xzfc Nov 12, 2025

Uh oh!

agourlay Nov 12, 2025

Uh oh!

timvisee left a comment

Uh oh!

coderabbitai bot commented Nov 12, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

agourlay commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xzfc Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

agourlay Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

xzfc Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

agourlay Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

timvisee left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Nov 12, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agourlay commented Nov 12, 2025 •

edited

Loading