Conversation
📝 WalkthroughWalkthroughThis change introduces support for asymmetric binary quantization query encoding throughout the API, gRPC definitions, internal configuration types, and vector storage logic. A new optional field, Possibly related PRs
Suggested labels
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Clippy (1.86.0)error: failed to load source for dependency Caused by: Caused by: Caused by: ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
docs/redoc/master/openapi.json (1)
7033-7040: Clean and extensible enum implementation for query quantization options.The
QueryQuantizationConfigschema provides a concise set of options for configuring asymmetric quantization:
default: Likely uses the collection's default quantizationbinary: Uses binary quantization for queriesscalar: Uses scalar quantization for queries (particularly useful for BQ vectors with SQ queries as mentioned in the PR objectives)This enum-based approach allows for future expansion if additional quantization methods are developed.
Consider adding slightly more detailed descriptions for each enum value directly in the schema to make the documentation more self-contained. For example:
"QueryQuantizationConfig": { "type": "string", "enum": [ - "default", - "binary", - "scalar" + "default", // Uses the same quantization as stored vectors + "binary", // Uses binary quantization for queries + "scalar" // Uses scalar quantization for queries ] },Note: OpenAPI doesn't actually support comments in enum values, but this could be added as part of the enum description or in a separate documentation section.
lib/api/src/grpc/qdrant.rs (2)
465-466: Tag numbering seems inconsistentThe oneof field uses tag number 3, but it's the only field in this message. Consider using tag 1 instead for the first field in a message.
- #[prost(oneof = "query_quantization_config::Variant", tags = "3")] + #[prost(oneof = "query_quantization_config::Variant", tags = "1")]
510-516: Single variant in oneof seems unusualThe
Variantoneof only contains one option (Setting). This structure seems more complex than necessary for the current functionality, though it may be intentional for future extensibility.If extensibility is the goal, consider adding a comment explaining this design choice to help future maintainers understand the pattern.
#[derive(serde::Serialize)] #[allow(clippy::derive_partial_eq_without_eq)] #[derive(Clone, PartialEq, ::prost::Oneof)] +/// Structured as a oneof for future extensibility of query quantization options pub enum Variant { #[prost(enumeration = "Setting", tag = "3")] Setting(i32), }docs/grpc/docs.md (4)
60-60: Fix TOC Unordered List Indentation
Static analysis indicates that the list item at this line is indented with 4 spaces instead of the expected 2. Please adjust the indentation to comply with markdown guidelines (MD007).🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
60-60: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
105-105: Fix TOC Unordered List Indentation
The unordered list item at this line also uses 4 spaces for indentation while 2 spaces are expected. Correcting this will improve markdown consistency.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
105-105: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
1240-1255: Document the New QueryQuantizationConfig Section
A new section forQueryQuantizationConfighas been added which currently documents only one field:setting. At this point, the description for the field is blank. It would be beneficial to include at least a brief description outlining what theQueryQuantizationConfigis for and how thesettingfield influences asymmetric quantization behavior in the system.
1972-1981: Enhance Descriptions in QueryQuantizationConfig.Setting
The newly introduced section forQueryQuantizationConfig.Settinglists the available options (Default,Binary, andScalar), but it does not provide any explanatory details about each option. Providing more context about the implications of selecting each setting (for example, when a user might choose Binary vs. Scalar quantization) would improve the usefulness of the documentation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
docs/grpc/docs.md(5 hunks)docs/redoc/master/openapi.json(1 hunks)lib/api/src/grpc/conversions.rs(3 hunks)lib/api/src/grpc/proto/collections.proto(2 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/types.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(1 hunks)lib/segment/tests/integration/byte_storage_quantization_test.rs(1 hunks)lib/segment/tests/integration/multivector_quantization_test.rs(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
60-60: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
105-105: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🔇 Additional comments (18)
docs/redoc/master/openapi.json (1)
7020-7030: Well-implemented new feature for asymmetric quantization.The addition of the
query_quantizationfield to theBinaryQuantizationConfigschema enables asymmetric quantization, allowing queries to use different quantization methods than stored vectors. This aligns perfectly with the PR objective of enhancing accuracy in scoring through different quantization algorithms.The implementation follows proper OpenAPI standards with:
- Clear description of the feature and its trade-offs
- Correct schema referencing
- Proper handling of nullable values
lib/api/src/grpc/qdrant.rs (3)
433-436: Well-documented field additionThe documentation clearly explains the purpose and tradeoff of asymmetric quantization. This helps users understand when to use this feature.
461-467: Good structure for the new QueryQuantizationConfigThe new message type follows the standard protobuf pattern and includes all necessary attributes for serialization.
483-487: Clear enum values for quantization settingsThe enum values are well-named and provide a good starting point for the available query quantization options.
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)
160-176: Addition ofquery_quantizationfield aligns with new asymmetric quantization APIThe addition of the
query_quantization: Nonefield to theBinaryQuantizationConfigstructure is appropriate for supporting the new asymmetric quantization API while maintaining backward compatibility in testing.This change ensures that the scoring equivalency tests will continue to work correctly with the enhanced binary quantization configuration without altering existing test behavior.
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1)
132-137: Addition ofquery_quantizationfield properly integrates with GPU vector storage testsThe addition of the
query_quantization: Nonefield to theBinaryQuantizationConfigstructure in the GPU vector storage test is consistent with the changes in other files.This ensures that the GPU-accelerated binary quantization tests continue to function while supporting the new asymmetric quantization capability.
lib/segment/tests/integration/multivector_quantization_test.rs (1)
284-289: Addition ofquery_quantizationfield maintains compatibility with multivector HNSW testsThe addition of the
query_quantization: Nonefield to theBinaryQuantizationConfigin the multivector quantization tests properly accounts for the new asymmetric quantization feature.This change allows the existing test cases to continue validating the basic binary quantization functionality (with a default
Nonevalue), while leaving room for future tests to explicitly test asymmetric quantization behavior.lib/segment/src/types.rs (3)
661-664: Good addition of asymmetric quantization configurationThe new
query_quantizationfield inBinaryQuantizationConfigis well-documented, clearly explaining the trade-off between accuracy and performance. The use ofserde_skip_conditionto avoid serializing default values is a good practice for keeping the JSON output clean.
703-713: LGTM! Clean enum definition for query quantization optionsThe
QueryQuantizationConfigenum is well-designed with appropriate derives and consistent with the codebase style. The lowercase serialization ensures compatibility with JSON conventions.
715-719: Smart serialization condition implementationThe
serde_skip_conditionmethod is a good approach to avoid serializing default values, which helps maintain clean and minimal serialized output. This pattern ensures that only non-default configurations appear in the serialized data.lib/api/src/grpc/proto/collections.proto (2)
300-305: Implementation matches PR objectivesThe addition of the
query_quantizationfield toBinaryQuantizationmessage successfully enables asymmetric quantization as described in the PR objectives. The documentation comment effectively explains the purpose and tradeoffs.
316-326: Enum fields are well-structuredThe
QueryQuantizationConfigmessage with itsSettingenum follows the established pattern in the codebase (similar toMaxOptimizationThreads). The three options (Default,Binary,Scalar) provide a good foundation for the asymmetric quantization feature.lib/segment/tests/integration/byte_storage_quantization_test.rs (1)
319-322: Test updated correctly for new fieldThe test has been properly updated to include the new
query_quantizationfield in theBinaryQuantizationConfigstructure. Setting it toNoneis appropriate for maintaining backward compatibility with existing test cases.lib/api/src/grpc/conversions.rs (5)
48-50: Imports for new QueryQuantizationConfig look fine.
These additions are straightforward and align with the new feature.
1011-1018: Forward conversion for new field is clear and consistent.
Mappingquery_quantizationviamap(From::from)cleanly integrates the optional field without errors.
1026-1034: Reverse conversion logic handles errors gracefully.
Using.map(TryFrom::try_from).transpose()?is a neat approach to propagate any error in the optional field.
1082-1108: QueryQuantizationConfig's TryFrom implementation is robust.
The error handling for missing or invalid variants is correct, ensuring malformed configurations return an appropriateStatus.
1110-1130: Reverse conversion matches settings accurately.
Variants are properly mapped back toQueryQuantizationConfigwith no omissions.
|
If we want to merge this PR in dev without actual implementation of the functionality, we need to hide it from OpenAPI speck, so the we could still release a patch version |
cff1f5b to
00660ef
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
check.sh (1)
16-18: Surface HTTP errors in curl.When using
-s, error messages are suppressed. Combine with-S(or--show-error) to ensure you still see server-side errors.- --fail -s | jq + --fail -sS | jqdocs/grpc/docs.md (3)
60-60: Fix list indentation to satisfy markdownlint.The new ToC entry uses 4 spaces indent (MD007 warns); adjust to 2 spaces to match other nested list items.
- - [QueryQuantizationConfig](#qdrant-QueryQuantizationConfig) + - [QueryQuantizationConfig](#qdrant-QueryQuantizationConfig)
107-107: Fix list indentation to satisfy markdownlint.Same issue for the enum ToC entry—use 2 spaces indent instead of 4.
- - [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting) + - [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting)
1247-1256: Enrich field table with label and description.The
settingfield currently has empty Label/Description columns. Mark it asoptionaland give a concise description.### QueryQuantizationConfig -| setting | [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting) | | | +| setting | [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting) | optional | Query quantization setting to override stored vector quantization for search queries. |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
check.sh(1 hunks)docs/grpc/docs.md(5 hunks)docs/redoc/master/openapi.json(0 hunks)lib/api/src/grpc/conversions.rs(5 hunks)lib/api/src/grpc/proto/collections.proto(2 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/quantization/benches/binary.rs(2 hunks)lib/quantization/src/encoded_vectors_binary.rs(12 hunks)lib/quantization/tests/integration/empty_storage.rs(1 hunks)lib/quantization/tests/integration/test_binary.rs(13 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(2 hunks)
💤 Files with no reviewable changes (1)
- docs/redoc/master/openapi.json
🚧 Files skipped from review as they are similar to previous changes (3)
- lib/api/src/grpc/proto/collections.proto
- lib/api/src/grpc/conversions.rs
- lib/api/src/grpc/qdrant.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
check.sh
[warning] 13-13: Quote this to prevent word splitting.
(SC2046)
🪛 GitHub Actions: Formatter and linter
lib/quantization/src/encoded_vectors_binary.rs
[error] 604-604: cargo clippy: unused variable: sum. If intentional, prefix it with an underscore: _sum. Compilation failed due to this lint error.
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
60-60: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
107-107: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
🔇 Additional comments (10)
check.sh (1)
1-3: Shebang and description look good.The use of
/usr/bin/env bashand the initial comment accurately describe the script’s purpose.lib/quantization/tests/integration/empty_storage.rs (1)
105-105: LGTM! Consistent API update for asymmetric quantization support.The addition of the
Noneparameter for the newquery_encodingoption correctly maintains backward compatibility in this empty data test case.lib/quantization/benches/binary.rs (1)
43-43: LGTM! Consistent benchmark updates for new API signature.Both
EncodedVectorsBin::encodecalls are correctly updated with the newquery_encodingparameter set toNone, maintaining existing benchmark behavior while supporting the new asymmetric quantization API.Also applies to: 84-84
lib/quantization/tests/integration/test_binary.rs (2)
53-53: LGTM! Consistent API updates.All
encodecalls have been correctly updated to include the new optionalquery_encodingparameter asNone, maintaining backward compatibility.Also applies to: 101-101, 149-149, 194-194, 238-238, 301-301, 364-364, 424-424, 484-484, 547-547, 610-610, 670-670
699-744: Well-structured test for scalar query encoding.The test correctly validates that
score_pointandscore_internalproduce identical results when using scalar query encoding, ensuring consistency in the implementation.lib/quantization/tests/integration/test_binary_encodings.rs (1)
124-227: Comprehensive test coverage for asymmetric query encoding.The tests effectively validate that scalar query encodings (4-bit and 8-bit) maintain or improve accuracy compared to binary encoding across various dimensions and encoding types.
lib/quantization/src/encoded_vectors_binary.rs (4)
38-44: Good enum design for query encoding options.The
QueryEncodingenum provides clear options with a sensible default (SameAsStorage) that maintains backward compatibility.
440-461: Clean implementation of query encoding dispatch.The method effectively handles different query encoding strategies while maintaining backward compatibility through the optional parameter.
463-526: Solid implementation of scalar query quantization.The scalar encoding implementation correctly:
- Handles different bit depths (4-bit and 8-bit)
- Uses symmetric quantization range
- Properly handles bit packing into storage types
730-759: Proper handling of different query encoding types in scoring.The updated
score_pointmethod correctly dispatches to the appropriate metric calculation based on the query encoding type.
3bc5977 to
800793b
Compare
2d10d71 to
8309829
Compare
800793b to
7023efc
Compare
fix tests Welford's Algorithm review remarks remove debug println in test try fix ci revert cargo-nextest async query 4 and 8 bits use quantized storage for query in hnsw build fix after rebase fix tests Welford's Algorithm review remarks remove debug println in test try fix ci revert cargo-nextest async query 4 and 8 bits use quantized storage for query in hnsw build fix after rebase are you happy fmt x64 simd for u128 fmt Asymmetric quantization api for BQ only fix ci fix error after rebase and 4bit support serde default quantization config without option
f7abf4e to
137ef3c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
1016-1030: Code duplication: Same mapping logic as in create_binary function.This mapping logic is identical to the one in
create_binaryfunction (lines 939-953). Please refer to the previous suggestion to extract this into a helper function.
🧹 Nitpick comments (6)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
939-953: Consider extracting the mapping logic to reduce code duplication.The query encoding mapping logic is correct and handles all enum variants appropriately. However, this same logic is duplicated in the
create_binary_multifunction (lines 1016-1030).Consider extracting this mapping to a helper function:
+ fn map_query_encoding(query_encoding: Option<BinaryQuantizationQueryEncoding>) -> quantization::encoded_vectors_binary::QueryEncoding { + match query_encoding { + Some(BinaryQuantizationQueryEncoding::Scalar4Bits) => { + quantization::encoded_vectors_binary::QueryEncoding::Scalar4bits + } + Some(BinaryQuantizationQueryEncoding::Scalar8Bits) => { + quantization::encoded_vectors_binary::QueryEncoding::Scalar8bits + } + Some(BinaryQuantizationQueryEncoding::Binary) => { + quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage + } + Some(BinaryQuantizationQueryEncoding::Default) => { + quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage + } + None => quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage, + } + } - let query_encoding = match binary_config.query_encoding { - Some(BinaryQuantizationQueryEncoding::Scalar4Bits) => { - quantization::encoded_vectors_binary::QueryEncoding::Scalar4bits - } - Some(BinaryQuantizationQueryEncoding::Scalar8Bits) => { - quantization::encoded_vectors_binary::QueryEncoding::Scalar8bits - } - Some(BinaryQuantizationQueryEncoding::Binary) => { - quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage - } - Some(BinaryQuantizationQueryEncoding::Default) => { - quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage - } - None => quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage, - }; + let query_encoding = Self::map_query_encoding(binary_config.query_encoding);docs/grpc/docs.md (5)
10-12: Fix list indentation to satisfy markdownlint MD007Current bullet has 4-space indentation, expected 2.
Adjusting keeps the ToC consistent and silences the linter.- - [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding) + - [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding)
99-101: Indentation of sub-bullet breaks nested list renderingSame MD007 issue for the
BinaryQuantizationQueryEncoding.Settingentry.- - [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting) + - [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting)
404-405: Document default behaviour of the newquery_encodingfieldReaders need to know what happens when the field is omitted (current implementation falls back to
Default, i.e. use storage encoding). Add a short note to avoid ambiguity.-| query_encoding | [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding) | optional | Asymmetric quantization configuration allows a query to have different quantization than stored vectors. It can increase the accuracy of search at the cost of performance. | +| query_encoding | [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding) | optional | Asymmetric quantization configuration. If **unset** the query uses the same encoding as stored vectors (`Default`). Can improve accuracy at the cost of performance. |
417-420: Missing description forsettingfieldThe description column is blank, making the row look malformed.
-| setting | [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting) | | | +| setting | [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting) | | Choice of query-side encoding algorithm |
1923-1929: Provide human-readable explanations for enum variantsEmpty description cells reduce usefulness of generated docs and OpenAPI. A one-liner improves clarity without bloating the table.
-| Default | 0 | | -| Binary | 1 | | -| Scalar4Bits | 2 | | -| Scalar8Bits | 3 | | +| Default | 0 | Use the same encoding as stored vectors | +| Binary | 1 | 1-bit binary encoding | +| Scalar4Bits | 2 | 4-bit scalar quantization | +| Scalar8Bits | 3 | 8-bit scalar quantization |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/grpc/docs.md(4 hunks)docs/redoc/master/openapi.json(2 hunks)lib/api/src/grpc/conversions.rs(5 hunks)lib/api/src/grpc/proto/collections.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/types.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(7 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(1 hunks)lib/segment/tests/integration/byte_storage_quantization_test.rs(1 hunks)lib/segment/tests/integration/multivector_quantization_test.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/tests/integration/byte_storage_quantization_test.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/api/src/grpc/qdrant.rs
- lib/segment/tests/integration/multivector_quantization_test.rs
- lib/api/src/grpc/proto/collections.proto
- lib/segment/src/types.rs
- lib/api/src/grpc/conversions.rs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
docs/redoc/master/openapi.json (1)
undefined
<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
</retrieved_learning>
docs/grpc/docs.md (2)
undefined
<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
</retrieved_learning>
<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
</retrieved_learning>
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
11-11: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
100-100: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (3)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3)
23-27: LGTM: Import changes are correct.The addition of
BinaryQuantizationQueryEncodingto the imports is properly structured and aligns with the new asymmetric quantization functionality.
969-969: LGTM: Query encoding parameter correctly passed to encode function.The
query_encodingparameter is properly passed to theEncodedVectorsBin::encodecall, enabling asymmetric quantization functionality.
1045-1045: LGTM: Query encoding consistently passed in multi-vector functions.The
query_encodingparameter is correctly passed to both RAM and mmap variants of the multi-vector encode functions, maintaining consistency with the single-vector implementation.Also applies to: 1068-1068
| "BinaryQuantizationQueryEncoding": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "default", | ||
| "binary", | ||
| "scalar4bits", | ||
| "scalar8bits" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Enum naming & default value missing
- The value
"default"insideBinaryQuantizationQueryEncodingis ambiguous: it is both an enum member and a generic English word. Client SDKs will end up with identifiers likeDefault_orDefaultEnum, which is confusing. Prefer a descriptive value such as"same_as_vector"or"none". - Add an explicit
"default": "default"(or the renamed value) if you want clients to omit the field while retaining current behaviour. Without it, generators cannot infer the default.
🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 7051 to 7059, the enum value
"default" in BinaryQuantizationQueryEncoding is ambiguous and lacks an explicit
default declaration. Rename the "default" enum value to a more descriptive term
like "same_as_vector" or "none" to avoid confusion in client SDKs, and add a
"default" property with this renamed value to explicitly specify the default
behavior for code generators.
| }, | ||
| "query_encoding": { | ||
| "description": "Asymmetric quantization configuration allows a query to have different quantization than stored vectors. It can increase the accuracy of search at the cost of performance.", | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/components/schemas/BinaryQuantizationQueryEncoding" | ||
| }, | ||
| { | ||
| "nullable": true | ||
| } | ||
| ] |
There was a problem hiding this comment.
query_encoding schema composition is non-standard and may break tooling
Wrapping the real schema and a naked { "nullable": true } inside anyOf produces a schema fragment with no type; several generators (e.g. openapi-generator, go-swag) reject or silently ignore such constructs.
The canonical pattern for an optional nullable enum is either of:
"query_encoding": {
"nullable": true,
"$ref": "#/components/schemas/BinaryQuantizationQueryEncoding"
}or
"query_encoding": {
"allOf": [
{ "$ref": "#/components/schemas/BinaryQuantizationQueryEncoding" }
],
"nullable": true
}Please switch to one of these to keep the spec OAS-3 compliant and generator-friendly.
If the feature is still experimental consider gating it behind x-internal: true.
🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 7029 to 7039, the
"query_encoding" schema uses an anyOf composition with a schema reference and a
naked nullable true, which is non-standard and causes issues with some OpenAPI
generators. To fix this, replace the anyOf array with a single schema object
that includes "nullable": true alongside the "$ref" to
BinaryQuantizationQueryEncoding, or use an allOf array with the schema reference
and add "nullable": true at the same level. This will ensure the schema is OAS-3
compliant and compatible with tooling.
* bq encodings * update docs * fix after rebase * fix gpu build * fix build after rebase
This PR introduces an API for an upcoming feature: asymmetric quantization.
Motivation
Asymmetric quantization is a quantization where vector and query have different quantization algorithms. In this PR we introduce only one asymmetric option: BQ with query encoded as SQ.
Such methods allow for increased accuracy because a more accurate query produces a more accurate score. But scoring will be more complicated, as not each use case is suitable for asymmetric mechanisms.
An example of BQ quantization with SQ query is a RaBitQ quantization. We want to have something similar.
This PR turns on a query quantization strategy in collection parameters. It's possible in the future to add this parameter as a search parameter. But for now, we keep it at the collection level. We need to have it in the collection config because this option is also applied during HNSW construction.
REST API example:
Possible values here are
scalar8bits,scalar4bits,binary, anddefault.defaultis equal to the undefined config value.GRPC API example:
Grpc API looks more complicated with an additional
settingfield. But this API allows us to extend features and provide some parameters if needed in the future