Skip to content

Asymmetric quantization api for BQ only#6201

Merged
generall merged 5 commits intodevfrom
Asymmetric-quantization-api-for-BQ-only
Jul 3, 2025
Merged

Asymmetric quantization api for BQ only#6201
generall merged 5 commits intodevfrom
Asymmetric-quantization-api-for-BQ-only

Conversation

@IvanPleshkov
Copy link
Contributor

@IvanPleshkov IvanPleshkov commented Mar 19, 2025

This PR introduces an API for an upcoming feature: asymmetric quantization.

Motivation

Asymmetric quantization is a quantization where vector and query have different quantization algorithms. In this PR we introduce only one asymmetric option: BQ with query encoded as SQ.

Such methods allow for increased accuracy because a more accurate query produces a more accurate score. But scoring will be more complicated, as not each use case is suitable for asymmetric mechanisms.

An example of BQ quantization with SQ query is a RaBitQ quantization. We want to have something similar.

This PR turns on a query quantization strategy in collection parameters. It's possible in the future to add this parameter as a search parameter. But for now, we keep it at the collection level. We need to have it in the collection config because this option is also applied during HNSW construction.

REST API example:

curl -X PUT "http://$QDRANT_HOST/collections/test_collection" \
  --data-raw '{
      "vectors": {
        "size": 4,
        "distance": "Dot"
      },
      "optimizers_config": {
        "default_segment_number": 2
      },
      "quantization_config": {
        "binary": {
          "always_ram": true,
          "query_encoding": "scalar8bits"
        }
      },
      "replication_factor": 2
    }'

Possible values here are scalar8bits, scalar4bits, binary, and default. default is equal to the undefined config value.

GRPC API example:

"${docker_grpcurl[@]}" -d '{
  "collection_name": "test_collection",
  "vectors_config": {
    "params": {
      "size": 4,
      "distance": "Dot"
    }
  },
  "quantization_config": {
    "binary": {
      "always_ram": true,
      "query_encoding": {
        "setting": "Scalar8Bits"
      }
    }
  }
}' $QDRANT_HOST qdrant.Collections/Create

Grpc API looks more complicated with an additional setting field. But this API allows us to extend features and provide some parameters if needed in the future

@IvanPleshkov IvanPleshkov marked this pull request as ready for review March 20, 2025 10:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2025

📝 Walkthrough

Walkthrough

This change introduces support for asymmetric binary quantization query encoding throughout the API, gRPC definitions, internal configuration types, and vector storage logic. A new optional field, query_encoding, is added to the binary quantization configuration structures, allowing queries to use a different quantization encoding than stored vectors. The new enum BinaryQuantizationQueryEncoding (with values Default, Binary, Scalar4Bits, Scalar8Bits) is defined and propagated through OpenAPI, Protobuf, Rust types, and conversion implementations. Tests and documentation are updated to reflect the new field, with default usage set to None in existing test configurations.

Possibly related PRs

  • Asymmetric binary quantization #6728: Implements the internal asymmetric binary quantization encoding and scoring logic, including SIMD-accelerated XOR-popcount functions, which is directly related to this PR’s API and conversion changes for supporting asymmetric query encoding.

Suggested labels

chore

Suggested reviewers

  • generall

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Clippy (1.86.0)
Updating git repository `https://github.com/qdrant/tar-rs`

error: failed to load source for dependency tar

Caused by:
Unable to update https://github.com/qdrant/tar-rs?branch=main#856dbd09

Caused by:
failed to create directory /usr/local/git/db/tar-rs-f03560d008573c0e

Caused by:
Permission denied (os error 13)

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
docs/redoc/master/openapi.json (1)

7033-7040: Clean and extensible enum implementation for query quantization options.

The QueryQuantizationConfig schema provides a concise set of options for configuring asymmetric quantization:

  • default: Likely uses the collection's default quantization
  • binary: Uses binary quantization for queries
  • scalar: Uses scalar quantization for queries (particularly useful for BQ vectors with SQ queries as mentioned in the PR objectives)

This enum-based approach allows for future expansion if additional quantization methods are developed.

Consider adding slightly more detailed descriptions for each enum value directly in the schema to make the documentation more self-contained. For example:

"QueryQuantizationConfig": {
  "type": "string",
  "enum": [
-   "default",
-   "binary",
-   "scalar"
+   "default",   // Uses the same quantization as stored vectors
+   "binary",    // Uses binary quantization for queries
+   "scalar"     // Uses scalar quantization for queries
  ]
},

Note: OpenAPI doesn't actually support comments in enum values, but this could be added as part of the enum description or in a separate documentation section.

lib/api/src/grpc/qdrant.rs (2)

465-466: Tag numbering seems inconsistent

The oneof field uses tag number 3, but it's the only field in this message. Consider using tag 1 instead for the first field in a message.

-    #[prost(oneof = "query_quantization_config::Variant", tags = "3")]
+    #[prost(oneof = "query_quantization_config::Variant", tags = "1")]

510-516: Single variant in oneof seems unusual

The Variant oneof only contains one option (Setting). This structure seems more complex than necessary for the current functionality, though it may be intentional for future extensibility.

If extensibility is the goal, consider adding a comment explaining this design choice to help future maintainers understand the pattern.

#[derive(serde::Serialize)]
#[allow(clippy::derive_partial_eq_without_eq)]
#[derive(Clone, PartialEq, ::prost::Oneof)]
+/// Structured as a oneof for future extensibility of query quantization options
pub enum Variant {
    #[prost(enumeration = "Setting", tag = "3")]
    Setting(i32),
}
docs/grpc/docs.md (4)

60-60: Fix TOC Unordered List Indentation
Static analysis indicates that the list item at this line is indented with 4 spaces instead of the expected 2. Please adjust the indentation to comply with markdown guidelines (MD007).

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

60-60: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


105-105: Fix TOC Unordered List Indentation
The unordered list item at this line also uses 4 spaces for indentation while 2 spaces are expected. Correcting this will improve markdown consistency.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

105-105: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


1240-1255: Document the New QueryQuantizationConfig Section
A new section for QueryQuantizationConfig has been added which currently documents only one field: setting. At this point, the description for the field is blank. It would be beneficial to include at least a brief description outlining what the QueryQuantizationConfig is for and how the setting field influences asymmetric quantization behavior in the system.


1972-1981: Enhance Descriptions in QueryQuantizationConfig.Setting
The newly introduced section for QueryQuantizationConfig.Setting lists the available options (Default, Binary, and Scalar), but it does not provide any explanatory details about each option. Providing more context about the implications of selecting each setting (for example, when a user might choose Binary vs. Scalar quantization) would improve the usefulness of the documentation.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75eb0b3 and cff1f5b.

📒 Files selected for processing (10)
  • docs/grpc/docs.md (5 hunks)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/src/grpc/conversions.rs (3 hunks)
  • lib/api/src/grpc/proto/collections.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/types.rs (2 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1 hunks)
  • lib/segment/tests/integration/byte_storage_quantization_test.rs (1 hunks)
  • lib/segment/tests/integration/multivector_quantization_test.rs (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

60-60: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


105-105: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🔇 Additional comments (18)
docs/redoc/master/openapi.json (1)

7020-7030: Well-implemented new feature for asymmetric quantization.

The addition of the query_quantization field to the BinaryQuantizationConfig schema enables asymmetric quantization, allowing queries to use different quantization methods than stored vectors. This aligns perfectly with the PR objective of enhancing accuracy in scoring through different quantization algorithms.

The implementation follows proper OpenAPI standards with:

  • Clear description of the feature and its trade-offs
  • Correct schema referencing
  • Proper handling of nullable values
lib/api/src/grpc/qdrant.rs (3)

433-436: Well-documented field addition

The documentation clearly explains the purpose and tradeoff of asymmetric quantization. This helps users understand when to use this feature.


461-467: Good structure for the new QueryQuantizationConfig

The new message type follows the standard protobuf pattern and includes all necessary attributes for serialization.


483-487: Clear enum values for quantization settings

The enum values are well-named and provide a good starting point for the available query quantization options.

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)

160-176: Addition of query_quantization field aligns with new asymmetric quantization API

The addition of the query_quantization: None field to the BinaryQuantizationConfig structure is appropriate for supporting the new asymmetric quantization API while maintaining backward compatibility in testing.

This change ensures that the scoring equivalency tests will continue to work correctly with the enhanced binary quantization configuration without altering existing test behavior.

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1)

132-137: Addition of query_quantization field properly integrates with GPU vector storage tests

The addition of the query_quantization: None field to the BinaryQuantizationConfig structure in the GPU vector storage test is consistent with the changes in other files.

This ensures that the GPU-accelerated binary quantization tests continue to function while supporting the new asymmetric quantization capability.

lib/segment/tests/integration/multivector_quantization_test.rs (1)

284-289: Addition of query_quantization field maintains compatibility with multivector HNSW tests

The addition of the query_quantization: None field to the BinaryQuantizationConfig in the multivector quantization tests properly accounts for the new asymmetric quantization feature.

This change allows the existing test cases to continue validating the basic binary quantization functionality (with a default None value), while leaving room for future tests to explicitly test asymmetric quantization behavior.

lib/segment/src/types.rs (3)

661-664: Good addition of asymmetric quantization configuration

The new query_quantization field in BinaryQuantizationConfig is well-documented, clearly explaining the trade-off between accuracy and performance. The use of serde_skip_condition to avoid serializing default values is a good practice for keeping the JSON output clean.


703-713: LGTM! Clean enum definition for query quantization options

The QueryQuantizationConfig enum is well-designed with appropriate derives and consistent with the codebase style. The lowercase serialization ensures compatibility with JSON conventions.


715-719: Smart serialization condition implementation

The serde_skip_condition method is a good approach to avoid serializing default values, which helps maintain clean and minimal serialized output. This pattern ensures that only non-default configurations appear in the serialized data.

lib/api/src/grpc/proto/collections.proto (2)

300-305: Implementation matches PR objectives

The addition of the query_quantization field to BinaryQuantization message successfully enables asymmetric quantization as described in the PR objectives. The documentation comment effectively explains the purpose and tradeoffs.


316-326: Enum fields are well-structured

The QueryQuantizationConfig message with its Setting enum follows the established pattern in the codebase (similar to MaxOptimizationThreads). The three options (Default, Binary, Scalar) provide a good foundation for the asymmetric quantization feature.

lib/segment/tests/integration/byte_storage_quantization_test.rs (1)

319-322: Test updated correctly for new field

The test has been properly updated to include the new query_quantization field in the BinaryQuantizationConfig structure. Setting it to None is appropriate for maintaining backward compatibility with existing test cases.

lib/api/src/grpc/conversions.rs (5)

48-50: Imports for new QueryQuantizationConfig look fine.
These additions are straightforward and align with the new feature.


1011-1018: Forward conversion for new field is clear and consistent.
Mapping query_quantization via map(From::from) cleanly integrates the optional field without errors.


1026-1034: Reverse conversion logic handles errors gracefully.
Using .map(TryFrom::try_from).transpose()? is a neat approach to propagate any error in the optional field.


1082-1108: QueryQuantizationConfig's TryFrom implementation is robust.
The error handling for missing or invalid variants is correct, ensuring malformed configurations return an appropriate Status.


1110-1130: Reverse conversion matches settings accurately.
Variants are properly mapped back to QueryQuantizationConfig with no omissions.

@generall
Copy link
Member

If we want to merge this PR in dev without actual implementation of the functionality, we need to hide it from OpenAPI speck, so the we could still release a patch version

@IvanPleshkov IvanPleshkov force-pushed the Asymmetric-quantization-api-for-BQ-only branch from cff1f5b to 00660ef Compare June 23, 2025 18:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
check.sh (1)

16-18: Surface HTTP errors in curl.

When using -s, error messages are suppressed. Combine with -S (or --show-error) to ensure you still see server-side errors.

-  --fail -s | jq
+  --fail -sS | jq
docs/grpc/docs.md (3)

60-60: Fix list indentation to satisfy markdownlint.

The new ToC entry uses 4 spaces indent (MD007 warns); adjust to 2 spaces to match other nested list items.

-    - [QueryQuantizationConfig](#qdrant-QueryQuantizationConfig)
+  - [QueryQuantizationConfig](#qdrant-QueryQuantizationConfig)

107-107: Fix list indentation to satisfy markdownlint.

Same issue for the enum ToC entry—use 2 spaces indent instead of 4.

-    - [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting)
+  - [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting)

1247-1256: Enrich field table with label and description.

The setting field currently has empty Label/Description columns. Mark it as optional and give a concise description.

 ### QueryQuantizationConfig

-| setting | [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting) |  |  |
+| setting | [QueryQuantizationConfig.Setting](#qdrant-QueryQuantizationConfig-Setting) | optional | Query quantization setting to override stored vector quantization for search queries. |
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cff1f5b and 00660ef.

📒 Files selected for processing (11)
  • check.sh (1 hunks)
  • docs/grpc/docs.md (5 hunks)
  • docs/redoc/master/openapi.json (0 hunks)
  • lib/api/src/grpc/conversions.rs (5 hunks)
  • lib/api/src/grpc/proto/collections.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/quantization/benches/binary.rs (2 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (12 hunks)
  • lib/quantization/tests/integration/empty_storage.rs (1 hunks)
  • lib/quantization/tests/integration/test_binary.rs (13 hunks)
  • lib/quantization/tests/integration/test_binary_encodings.rs (2 hunks)
💤 Files with no reviewable changes (1)
  • docs/redoc/master/openapi.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • lib/api/src/grpc/proto/collections.proto
  • lib/api/src/grpc/conversions.rs
  • lib/api/src/grpc/qdrant.rs
🧰 Additional context used
🪛 Shellcheck (0.10.0)
check.sh

[warning] 13-13: Quote this to prevent word splitting.

(SC2046)

🪛 GitHub Actions: Formatter and linter
lib/quantization/src/encoded_vectors_binary.rs

[error] 604-604: cargo clippy: unused variable: sum. If intentional, prefix it with an underscore: _sum. Compilation failed due to this lint error.

🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

60-60: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


107-107: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: storage-compat-test
🔇 Additional comments (10)
check.sh (1)

1-3: Shebang and description look good.

The use of /usr/bin/env bash and the initial comment accurately describe the script’s purpose.

lib/quantization/tests/integration/empty_storage.rs (1)

105-105: LGTM! Consistent API update for asymmetric quantization support.

The addition of the None parameter for the new query_encoding option correctly maintains backward compatibility in this empty data test case.

lib/quantization/benches/binary.rs (1)

43-43: LGTM! Consistent benchmark updates for new API signature.

Both EncodedVectorsBin::encode calls are correctly updated with the new query_encoding parameter set to None, maintaining existing benchmark behavior while supporting the new asymmetric quantization API.

Also applies to: 84-84

lib/quantization/tests/integration/test_binary.rs (2)

53-53: LGTM! Consistent API updates.

All encode calls have been correctly updated to include the new optional query_encoding parameter as None, maintaining backward compatibility.

Also applies to: 101-101, 149-149, 194-194, 238-238, 301-301, 364-364, 424-424, 484-484, 547-547, 610-610, 670-670


699-744: Well-structured test for scalar query encoding.

The test correctly validates that score_point and score_internal produce identical results when using scalar query encoding, ensuring consistency in the implementation.

lib/quantization/tests/integration/test_binary_encodings.rs (1)

124-227: Comprehensive test coverage for asymmetric query encoding.

The tests effectively validate that scalar query encodings (4-bit and 8-bit) maintain or improve accuracy compared to binary encoding across various dimensions and encoding types.

lib/quantization/src/encoded_vectors_binary.rs (4)

38-44: Good enum design for query encoding options.

The QueryEncoding enum provides clear options with a sensible default (SameAsStorage) that maintains backward compatibility.


440-461: Clean implementation of query encoding dispatch.

The method effectively handles different query encoding strategies while maintaining backward compatibility through the optional parameter.


463-526: Solid implementation of scalar query quantization.

The scalar encoding implementation correctly:

  • Handles different bit depths (4-bit and 8-bit)
  • Uses symmetric quantization range
  • Properly handles bit packing into storage types

730-759: Proper handling of different query encoding types in scoring.

The updated score_point method correctly dispatches to the appropriate metric calculation based on the query encoding type.

@IvanPleshkov IvanPleshkov marked this pull request as draft June 23, 2025 19:07
@IvanPleshkov IvanPleshkov changed the base branch from dev to 8bit-bq-query June 23, 2025 19:07
@IvanPleshkov IvanPleshkov force-pushed the Asymmetric-quantization-api-for-BQ-only branch from 3bc5977 to 800793b Compare June 24, 2025 01:11
@IvanPleshkov IvanPleshkov force-pushed the Asymmetric-quantization-api-for-BQ-only branch from 800793b to 7023efc Compare July 1, 2025 00:33
@IvanPleshkov IvanPleshkov marked this pull request as ready for review July 1, 2025 09:41
Base automatically changed from 8bit-bq-query to dev July 3, 2025 12:09
fix tests

Welford's Algorithm

review remarks

remove debug println in test

try fix ci

revert cargo-nextest

async query 4 and 8 bits

use quantized storage for query in hnsw build

fix after rebase

fix tests

Welford's Algorithm

review remarks

remove debug println in test

try fix ci

revert cargo-nextest

async query 4 and 8 bits

use quantized storage for query in hnsw build

fix after rebase

are you happy fmt

x64 simd for u128

fmt

Asymmetric quantization api for BQ only

fix ci

fix error after rebase and 4bit support

serde default

quantization config without option
@IvanPleshkov IvanPleshkov force-pushed the Asymmetric-quantization-api-for-BQ-only branch from f7abf4e to 137ef3c Compare July 3, 2025 12:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

1016-1030: Code duplication: Same mapping logic as in create_binary function.

This mapping logic is identical to the one in create_binary function (lines 939-953). Please refer to the previous suggestion to extract this into a helper function.

🧹 Nitpick comments (6)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

939-953: Consider extracting the mapping logic to reduce code duplication.

The query encoding mapping logic is correct and handles all enum variants appropriately. However, this same logic is duplicated in the create_binary_multi function (lines 1016-1030).

Consider extracting this mapping to a helper function:

+    fn map_query_encoding(query_encoding: Option<BinaryQuantizationQueryEncoding>) -> quantization::encoded_vectors_binary::QueryEncoding {
+        match query_encoding {
+            Some(BinaryQuantizationQueryEncoding::Scalar4Bits) => {
+                quantization::encoded_vectors_binary::QueryEncoding::Scalar4bits
+            }
+            Some(BinaryQuantizationQueryEncoding::Scalar8Bits) => {
+                quantization::encoded_vectors_binary::QueryEncoding::Scalar8bits
+            }
+            Some(BinaryQuantizationQueryEncoding::Binary) => {
+                quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage
+            }
+            Some(BinaryQuantizationQueryEncoding::Default) => {
+                quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage
+            }
+            None => quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage,
+        }
+    }

-        let query_encoding = match binary_config.query_encoding {
-            Some(BinaryQuantizationQueryEncoding::Scalar4Bits) => {
-                quantization::encoded_vectors_binary::QueryEncoding::Scalar4bits
-            }
-            Some(BinaryQuantizationQueryEncoding::Scalar8Bits) => {
-                quantization::encoded_vectors_binary::QueryEncoding::Scalar8bits
-            }
-            Some(BinaryQuantizationQueryEncoding::Binary) => {
-                quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage
-            }
-            Some(BinaryQuantizationQueryEncoding::Default) => {
-                quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage
-            }
-            None => quantization::encoded_vectors_binary::QueryEncoding::SameAsStorage,
-        };
+        let query_encoding = Self::map_query_encoding(binary_config.query_encoding);
docs/grpc/docs.md (5)

10-12: Fix list indentation to satisfy markdownlint MD007

Current bullet has 4-space indentation, expected 2.
Adjusting keeps the ToC consistent and silences the linter.

-    - [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding)
+  - [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding)

99-101: Indentation of sub-bullet breaks nested list rendering

Same MD007 issue for the BinaryQuantizationQueryEncoding.Setting entry.

-    - [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting)
+  - [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting)

404-405: Document default behaviour of the new query_encoding field

Readers need to know what happens when the field is omitted (current implementation falls back to Default, i.e. use storage encoding). Add a short note to avoid ambiguity.

-| query_encoding | [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding) | optional | Asymmetric quantization configuration allows a query to have different quantization than stored vectors. It can increase the accuracy of search at the cost of performance. |
+| query_encoding | [BinaryQuantizationQueryEncoding](#qdrant-BinaryQuantizationQueryEncoding) | optional | Asymmetric quantization configuration. If **unset** the query uses the same encoding as stored vectors (`Default`). Can improve accuracy at the cost of performance. |

417-420: Missing description for setting field

The description column is blank, making the row look malformed.

-| setting | [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting) |  |  |
+| setting | [BinaryQuantizationQueryEncoding.Setting](#qdrant-BinaryQuantizationQueryEncoding-Setting) |  | Choice of query-side encoding algorithm |

1923-1929: Provide human-readable explanations for enum variants

Empty description cells reduce usefulness of generated docs and OpenAPI. A one-liner improves clarity without bloating the table.

-| Default | 0 |  |
-| Binary | 1 |  |
-| Scalar4Bits | 2 |  |
-| Scalar8Bits | 3 |  |
+| Default | 0 | Use the same encoding as stored vectors |
+| Binary | 1 | 1-bit binary encoding |
+| Scalar4Bits | 2 | 4-bit scalar quantization |
+| Scalar8Bits | 3 | 8-bit scalar quantization |
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00660ef and 137ef3c.

📒 Files selected for processing (11)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (2 hunks)
  • lib/api/src/grpc/conversions.rs (5 hunks)
  • lib/api/src/grpc/proto/collections.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/types.rs (2 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1 hunks)
  • lib/segment/tests/integration/byte_storage_quantization_test.rs (1 hunks)
  • lib/segment/tests/integration/multivector_quantization_test.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
  • lib/segment/tests/integration/byte_storage_quantization_test.rs
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
  • lib/api/src/grpc/qdrant.rs
  • lib/segment/tests/integration/multivector_quantization_test.rs
  • lib/api/src/grpc/proto/collections.proto
  • lib/segment/src/types.rs
  • lib/api/src/grpc/conversions.rs
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
docs/redoc/master/openapi.json (1)

undefined

<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
</retrieved_learning>

docs/grpc/docs.md (2)

undefined

<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
</retrieved_learning>

<retrieved_learning>
Learnt from: IvanPleshkov
PR: #6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
</retrieved_learning>

🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

11-11: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


100-100: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: test-low-resources
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consensus-compose
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: integration-tests
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (3)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (3)

23-27: LGTM: Import changes are correct.

The addition of BinaryQuantizationQueryEncoding to the imports is properly structured and aligns with the new asymmetric quantization functionality.


969-969: LGTM: Query encoding parameter correctly passed to encode function.

The query_encoding parameter is properly passed to the EncodedVectorsBin::encode call, enabling asymmetric quantization functionality.


1045-1045: LGTM: Query encoding consistently passed in multi-vector functions.

The query_encoding parameter is correctly passed to both RAM and mmap variants of the multi-vector encode functions, maintaining consistency with the single-vector implementation.

Also applies to: 1068-1068

Comment on lines +7051 to +7059
"BinaryQuantizationQueryEncoding": {
"type": "string",
"enum": [
"default",
"binary",
"scalar4bits",
"scalar8bits"
]
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enum naming & default value missing

  1. The value "default" inside BinaryQuantizationQueryEncoding is ambiguous: it is both an enum member and a generic English word. Client SDKs will end up with identifiers like Default_ or DefaultEnum, which is confusing. Prefer a descriptive value such as "same_as_vector" or "none".
  2. Add an explicit "default": "default" (or the renamed value) if you want clients to omit the field while retaining current behaviour. Without it, generators cannot infer the default.
🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 7051 to 7059, the enum value
"default" in BinaryQuantizationQueryEncoding is ambiguous and lacks an explicit
default declaration. Rename the "default" enum value to a more descriptive term
like "same_as_vector" or "none" to avoid confusion in client SDKs, and add a
"default" property with this renamed value to explicitly specify the default
behavior for code generators.

Comment on lines +7029 to +7039
},
"query_encoding": {
"description": "Asymmetric quantization configuration allows a query to have different quantization than stored vectors. It can increase the accuracy of search at the cost of performance.",
"anyOf": [
{
"$ref": "#/components/schemas/BinaryQuantizationQueryEncoding"
},
{
"nullable": true
}
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

query_encoding schema composition is non-standard and may break tooling

Wrapping the real schema and a naked { "nullable": true } inside anyOf produces a schema fragment with no type; several generators (e.g. openapi-generator, go-swag) reject or silently ignore such constructs.
The canonical pattern for an optional nullable enum is either of:

"query_encoding": {
  "nullable": true,
  "$ref": "#/components/schemas/BinaryQuantizationQueryEncoding"
}

or

"query_encoding": {
  "allOf": [
    { "$ref": "#/components/schemas/BinaryQuantizationQueryEncoding" }
  ],
  "nullable": true
}

Please switch to one of these to keep the spec OAS-3 compliant and generator-friendly.
If the feature is still experimental consider gating it behind x-internal: true.

🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 7029 to 7039, the
"query_encoding" schema uses an anyOf composition with a schema reference and a
naked nullable true, which is non-standard and causes issues with some OpenAPI
generators. To fix this, replace the anyOf array with a single schema object
that includes "nullable": true alongside the "$ref" to
BinaryQuantizationQueryEncoding, or use an allOf array with the schema reference
and add "nullable": true at the same level. This will ensure the schema is OAS-3
compliant and compatible with tooling.

@generall generall merged commit 9dc393e into dev Jul 3, 2025
18 checks passed
@generall generall deleted the Asymmetric-quantization-api-for-BQ-only branch July 3, 2025 12:44
@generall generall added this to the Asymmetric BQ milestone Jul 17, 2025
generall pushed a commit that referenced this pull request Jul 17, 2025
* bq encodings

* update docs

* fix after rebase

* fix gpu build

* fix build after rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants