Skip to content

Asymmetric binary quantization#6728

Merged
generall merged 11 commits intodevfrom
8bit-bq-query
Jul 3, 2025
Merged

Asymmetric binary quantization#6728
generall merged 11 commits intodevfrom
8bit-bq-query

Conversation

@IvanPleshkov
Copy link
Contributor

@IvanPleshkov IvanPleshkov commented Jun 19, 2025

This PR introduces an internal implementation for an upcoming feature: asymmetric binary quantization.
API PR is here: #6201

Motivation
Asymmetric quantization is a quantization where vector and query have different quantization algorithms. In this PR we introduce only one asymmetric option: BQ with query encoded as SQ.

Such methods allow for increased accuracy because a more accurate query produces a more accurate score. But scoring will be more complicated, as not each use case is suitable for asymmetric mechanisms.

An example of BQ quantization with SQ query is a RaBitQ quantization. We want to have something similar.

We don't use RaBitQ alg, encoding is different, and only scoring is the same. We have our vision instead. As a pros, we don't need to store additional precomputed constants for each vector, and our approach is compatible with score_internal method. As a cons, we require negative numbers in vectors to make it work like in a regular BQ.

We don't use asymmetric BQ in HNSW construction because experiments have shown that HNSW accuracy is poor. No GPU integration is required.

Results on test laion 400M:

Regular BQ: Average precision: 62.37%
Asymmetric BQ 8bit query: Average precision: 70.46%

@IvanPleshkov IvanPleshkov changed the base branch from dev to dont-encode-quantization-query-while-hnsw-build June 20, 2025 10:38
Base automatically changed from dont-encode-quantization-query-while-hnsw-build to dev June 23, 2025 08:34
@IvanPleshkov IvanPleshkov force-pushed the 8bit-bq-query branch 2 times, most recently from 4492a3c to de741fa Compare June 23, 2025 08:45
@timvisee timvisee changed the title Asymetric binary quantization Asymmetric binary quantization Jun 24, 2025
# This is the 1st commit message:

bq encodings

fix tests

Welford's Algorithm

review remarks

remove debug println in test

try fix ci

revert cargo-nextest

async query 4 and 8 bits

use quantized storage for query in hnsw build

fix after rebase

fix tests

Welford's Algorithm

review remarks

remove debug println in test

try fix ci

revert cargo-nextest

async query 4 and 8 bits

use quantized storage for query in hnsw build

fix after rebase

are you happy fmt

# This is the commit message #2:

quantization config without option

# This is the commit message #3:

x64 simd for u128

# This is the commit message #4:

fmt

# This is the commit message #5:

neon simd

# This is the commit message #6:

refactor

# This is the commit message #7:

fmt
@IvanPleshkov IvanPleshkov marked this pull request as ready for review July 1, 2025 09:37
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 1, 2025

📝 Walkthrough

"""

Walkthrough

This change introduces support for multiple query encoding types in the binary quantization module. It adds a QueryEncoding enum with variants for storage-matching, 4-bit, and 8-bit scalar encodings, and updates the encoding and scoring logic to handle these new query representations. SIMD-accelerated XOR-popcount functions for scalar encodings are implemented for SSE, AVX, and NEON platforms. The metadata and encoding APIs are extended to accept and store the query encoding type. Tests are updated and expanded to validate the new encoding modes and their scoring accuracy, including new tests for asymmetric and scalar query encodings.

Possibly related PRs

  • Don't encode quantization query while hnsw build #6729: Adds a new method encode_internal_vector to the EncodedVectors trait and its implementation for EncodedVectorsBin, related to encoding internal vectors for scoring optimization.
  • BQ encodings #6663: Introduces new binary quantization encoding algorithms (two-bits and one-and-a-half-bits) and their integration, directly modifying the same binary quantization encoding logic and metadata.

Suggested reviewers

  • generall
    """

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Clippy (1.86.0)
Updating git repository `https://github.com/qdrant/tar-rs`

error: failed to load source for dependency tar

Caused by:
Unable to update https://github.com/qdrant/tar-rs?branch=main#856dbd09

Caused by:
failed to create directory /usr/local/git/db/tar-rs-f03560d008573c0e

Caused by:
Permission denied (os error 13)

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
lib/quantization/cpp/neon.c (1)

109-148: Consider simplifying the horizontal sum calculation.

The horizontal sum calculation repeats the same pattern 8 times. This could be simplified using a loop or helper function for better maintainability.

Example approach using arrays:

uint16x8_t results[8] = {result1, result2, result3, result4, result5, result6, result7, result8};
uint32_t sums[8];

for (int i = 0; i < 8; i++) {
    uint8x8_t low = vget_low_u8(vreinterpretq_u8_u16(results[i]));
    uint8x8_t high = vget_high_u8(vreinterpretq_u8_u16(results[i]));
    uint16x8_t sum = vaddl_u8(low, high);
    sums[i] = vaddvq_u16(sum);
}

return sums[0] + (sums[1] << 1) + (sums[2] << 2) + (sums[3] << 3) + 
       (sums[4] << 4) + (sums[5] << 5) + (sums[6] << 6) + (sums[7] << 7);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 488f13a and 4353ef8.

📒 Files selected for processing (9)
  • lib/quantization/benches/binary.rs (3 hunks)
  • lib/quantization/cpp/avx2.c (1 hunks)
  • lib/quantization/cpp/neon.c (1 hunks)
  • lib/quantization/cpp/sse.c (1 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (15 hunks)
  • lib/quantization/tests/integration/empty_storage.rs (2 hunks)
  • lib/quantization/tests/integration/test_binary.rs (14 hunks)
  • lib/quantization/tests/integration/test_binary_encodings.rs (2 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (5)
lib/quantization/benches/binary.rs (6)
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6446
File: lib/gridstore/benches/flush_bench.rs:18-18
Timestamp: 2025-04-29T16:48:34.967Z
Learning: The Rust rand crate version 0.9+ has introduced a top-level `rng()` function that replaces the now-deprecated `thread_rng()` function.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/tests/integration/empty_storage.rs (3)
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` does not return a `Result` because the underlying `mmap.populate()` method doesn't return a Result either. Adding an unnecessary `Ok(())` would cause Clippy to complain.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/tests/integration/test_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
lib/quantization/tests/integration/test_binary_encodings.rs (1)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
🧬 Code Graph Analysis (1)
lib/quantization/cpp/neon.c (3)
lib/quantization/cpp/avx2.c (2)
  • uint32_t (124-171)
  • uint32_t (173-205)
lib/quantization/cpp/sse.c (5)
  • uint32_t (49-70)
  • uint32_t (72-88)
  • uint32_t (90-106)
  • uint32_t (108-163)
  • uint32_t (165-210)
lib/quantization/src/encoded_vectors_binary.rs (2)
  • impl_xor_popcnt_scalar8_neon_uint128 (864-868)
  • impl_xor_popcnt_scalar4_neon_uint128 (870-874)
🔇 Additional comments (10)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)

953-953: LGTM! Consistent API update for query encoding support.

The addition of QueryEncoding::SameAsStorage parameter to all EncodedVectorsBin::encode calls is properly implemented across all binary quantization creation paths (RAM/MMAP and single/multi variants).

Also applies to: 970-970, 1014-1014, 1037-1037

lib/quantization/tests/integration/empty_storage.rs (1)

7-7: Test correctly updated for new API.

The test properly imports QueryEncoding and adds the required QueryEncoding::SameAsStorage parameter to match the updated encoding API.

Also applies to: 105-105

lib/quantization/benches/binary.rs (1)

7-7: Benchmarks properly updated for query encoding API.

Both benchmark encoding calls correctly specify QueryEncoding::SameAsStorage to maintain existing benchmark behavior with the new API.

Also applies to: 43-43, 84-84

lib/quantization/tests/integration/test_binary.rs (2)

7-9: Existing tests correctly updated with query encoding parameter.

All test functions properly specify QueryEncoding::SameAsStorage to maintain their original behavior with the updated API.

Also applies to: 55-55, 103-103, 151-151, 196-196, 240-240, 303-303, 366-366, 426-426, 486-486, 549-549, 612-612, 672-672


701-746: Well-structured test for scalar query encoding consistency.

The new test comprehensively validates that score_point and score_internal produce identical results when using QueryEncoding::Scalar4bits. Good coverage with multiple dimensions and all encoding types.

lib/quantization/cpp/avx2.c (2)

124-171: Verify pointer alignment for uint64_t access.

The function casts uint8_t* pointers to uint64_t* which requires 8-byte alignment. Misaligned access could cause crashes on some architectures or performance degradation.

Please ensure that the input pointers are always 8-byte aligned when calling these functions. Consider adding a debug assertion or documentation about alignment requirements.


173-205: Implementation looks correct for 4-bit scalar encoding.

The weighted popcount accumulation with factors [1,2,4,8] is appropriate for 4-bit encoding. Same alignment requirements apply as noted for the 8-bit function above.

lib/quantization/tests/integration/test_binary_encodings.rs (1)

7-225: Test implementation looks comprehensive.

The test changes properly validate the new asymmetric binary quantization functionality:

  • Tests different query encoding modes
  • Validates accuracy improvements
  • Covers both normal and inverted distance calculations
lib/quantization/src/encoded_vectors_binary.rs (2)

832-842: Verify parameter naming in external function declarations.

The parameter names in these external C function declarations might be misleading. Based on the implementation, it appears that for scalar encodings, the "query" data is expected to be larger (by a factor of bits_count) than the "vector" data.

Please verify that the parameter naming convention is intentional and document the expected data layout for these functions to prevent confusion.

Consider adding documentation like:

/// For scalar XOR-popcount functions:
/// - query_ptr: Points to the scalar-encoded query data (larger)
/// - vector_ptr: Points to the binary-encoded vector data (smaller)
/// - count: Number of 128-bit blocks in the vector data

Also applies to: 844-854, 864-874


229-233: Fix parameter order in SIMD function calls.

The parameter order is swapped when calling the C functions. The C functions expect query_ptr first and vector_ptr second, but the code passes v2 (query) as first and v1 (vector) as second parameter.

Apply this diff to fix the parameter order for NEON:

                     return impl_xor_popcnt_scalar8_neon_uint128(
-                        v2.as_ptr().cast::<u8>(),
-                        v1.as_ptr().cast::<u8>(),
+                        v1.as_ptr().cast::<u8>(),
+                        v2.as_ptr().cast::<u8>(),
                         v1.len() as u32,
                     ) as usize;

Apply similar fixes for all the scalar XOR-popcount function calls in this method (both AVX and SSE variants).

Also applies to: 250-254, 271-275

⛔ Skipped due to learnings
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.

@generall generall requested a review from xzfc July 2, 2025 09:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/quantization/cpp/neon.c (2)

150-323: Consider optimizing the gather operation to avoid temporary arrays.

The function creates temporary arrays on the stack for each iteration to gather non-contiguous bytes. This approach works but may impact performance due to extra memory operations.

Consider using NEON table lookup instructions (vtbl) or other gather mechanisms to avoid the temporary arrays.


325-420: Consider the same optimization for scalar4 to avoid temporary arrays.

Similar to the scalar8 version, this function also creates temporary arrays on the stack. The same optimization advice applies here.

lib/quantization/src/encoded_vectors_binary.rs (1)

569-605: Add documentation for the complex bit packing logic.

The scalar quantization and bit packing implementation is correct but complex. Consider adding comments to explain:

  • The symmetric quantization approach
  • How bits are interleaved across storage elements
  • The purpose of the modulo operation
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f52999 and a5007b7.

📒 Files selected for processing (2)
  • lib/quantization/cpp/neon.c (1 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (15 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
🧬 Code Graph Analysis (1)
lib/quantization/cpp/neon.c (3)
lib/quantization/cpp/avx2.c (2)
  • uint32_t (124-171)
  • uint32_t (173-205)
lib/quantization/cpp/sse.c (5)
  • uint32_t (49-70)
  • uint32_t (72-88)
  • uint32_t (90-106)
  • uint32_t (108-163)
  • uint32_t (165-201)
lib/quantization/src/encoded_vectors_binary.rs (4)
  • impl_xor_popcnt_scalar8_neon_uint128 (885-889)
  • impl_xor_popcnt_scalar4_neon_uint128 (891-895)
  • impl_xor_popcnt_scalar8_neon_u8 (897-901)
  • impl_xor_popcnt_scalar4_neon_u8 (903-907)
⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-low-resources
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: integration-tests
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: integration-tests-consensus
  • GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (11)
lib/quantization/cpp/neon.c (2)

69-116: LGTM! Correct NEON implementation for scalar8 XOR-popcount.

The function correctly uses uint16x8_t for accumulation and properly calculates the weighted sum with powers of two.


118-148: LGTM! Correct NEON implementation for scalar4 XOR-popcount.

The function correctly handles 4 query vectors with proper accumulation and weighted sum calculation.

lib/quantization/src/encoded_vectors_binary.rs (9)

38-75: LGTM! Well-structured query encoding types.

The new enums and structs provide a clean abstraction for different query encoding modes with proper serialization support.


77-100: LGTM! Appropriate trait bounds for scalar operations.

The additional trait bounds support the bit manipulation operations needed for scalar encodings.


157-188: LGTM! Correct scalar XOR-popcount implementation for u8.

The function properly validates input sizes, dispatches to SIMD implementations when available, and provides a correct fallback implementation.


243-316: LGTM! Comprehensive SIMD support for u128 scalar operations.

The function provides good coverage of SIMD instruction sets (NEON, AVX2, SSE4.2) with appropriate feature detection and fallback.


524-541: LGTM! Clean query encoding dispatch.

The method properly routes to appropriate encoding functions based on the query encoding mode.


543-567: LGTM! Correct query extension for different storage encodings.

The method properly extends the query vector to match the storage encoding format, with appropriate handling for the OneAndHalfBits case using max pooling.


624-668: LGTM! Correct normalization for scalar encodings.

The metric calculation properly normalizes scalar encoding results to maintain consistent scoring across different bit widths.


777-806: LGTM! Proper handling of different query encoding types.

The scoring correctly dispatches to calculate_metric with the appropriate bits_count for each encoding variant.


853-908: LGTM! Complete SIMD function declarations.

The external function declarations properly cover all SIMD variants (SSE, AVX, NEON) for both 4-bit and 8-bit scalar encodings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
lib/quantization/cpp/sse.c (4)

122-127: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).

The same argument order issue identified in previous reviews applies to these new _mm_set_epi32 calls. The intrinsic expects arguments in reverse order (from high to low), but the current implementation provides them in low to high order.

Also applies to: 130-135, 138-143, 146-151


178-183: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).

The same argument order issue applies to these _mm_set_epi32 calls in the scalar4 function.

Also applies to: 186-191


303-308: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).

The argument order issue also affects these _mm_set_epi32 calls in the u8 scalar8 function.

Also applies to: 311-316


428-433: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).

The argument order issue also affects this _mm_set_epi32 call in the u8 scalar4 function.

🧹 Nitpick comments (1)
lib/quantization/cpp/sse.c (1)

213-292: Consider refactoring repetitive array initialization.

The manual array initialization pattern is repetitive and error-prone. Consider using a loop or macro to reduce code duplication and improve maintainability.

-        uint8_t values1[8] = {
-            *(q_ptr + 0),
-            *(q_ptr + 8),
-            *(q_ptr + 16),
-            // ... more manual entries
-        };
+        for (int i = 0; i < 8; i++) {
+            for (int bit = 0; bit < 8; bit++) {
+                values[bit][i] = *(q_ptr + bit + i * 8);
+            }
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a5007b7 and 5548e19.

📒 Files selected for processing (2)
  • lib/quantization/cpp/sse.c (1 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (15 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: integration-tests
🔇 Additional comments (10)
lib/quantization/src/encoded_vectors_binary.rs (10)

38-50: LGTM - Well-designed query encoding enum.

The QueryEncoding enum is well-structured with appropriate defaults and utility methods. The is_same_as_storage method provides clear intent checking.


52-56: LGTM - Appropriate enum design for encoded queries.

The EncodedQueryBQ enum appropriately models the different query encoding variants and leverages the existing vector types effectively.


72-74: LGTM - Proper serialization attributes.

The query_encoding field is correctly configured with serde attributes to maintain backward compatibility by using default values and conditional serialization.


83-96: LGTM - Appropriate trait bounds expansion.

The additional trait bounds are necessary for the scalar operations and follow Rust best practices. The new xor_popcnt_scalar method signature is well-designed.


330-336: Verify consistency with u8 implementation.

The scalar popcount logic for u128 should be consistent with the u8 implementation. Both use the same bit shifting pattern for weighting.


361-361: LGTM - API extension for query encoding.

The addition of the query_encoding parameter to the encode function properly extends the API to support the new functionality.


601-614: Verify scalar quantization range calculation.

The quantization logic calculates ranges as (1 << bits_count) - 1 and then uses quantized % (ranges + 1). This effectively creates 2^bits_count quantization levels, which seems correct for the intended bit width.


668-673: LGTM - Proper metric normalization for scalar queries.

The normalization of scalar XOR results by ((1 << v2_bits_count) - 1) correctly scales the weighted popcount back to the [0,1] range for metric calculation.


578-584: No changes needed: OneAndHalfBits query‐extension logic is already covered by tests

The asymmetric “max of pairs” logic in extended_query.extend(query.chunks(2)… ) is exercised by existing integration tests, confirming it behaves as intended:

  • lib/quantization/tests/integration/test_binary_encodings.rs:132–134 (test_binary_dot_asymentric_impl::<…>(…, Encoding::OneAndHalfBits, false))
  • lib/quantization/tests/integration/test_binary.rs:707–708 (test_binary_scalar_internal_impl::<u8>(…, Encoding::OneAndHalfBits))

No further action required.


202-208: Verify weighted popcount shift
Please confirm that (b1 ^ b2).count_ones() << i implements the intended scalar-quantization weighting. I wasn’t able to find a matching pattern in other quantization modules—consider adding an inline comment or a unit test that references the weighting scheme from the spec for clarity.

• lib/quantization/src/encoded_vectors_binary.rs:202–208

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/quantization/cpp/sse.c (1)

218-297: Consider refactoring repetitive array initialization.

The manual array initialization for 8 query sets is highly repetitive and error-prone. Consider using a loop-based approach or helper function to reduce code duplication and improve maintainability.

Example refactor using a helper approach:

+        // Helper to extract query values for a specific bit position
+        static inline void extract_query_values(const uint8_t* q_ptr, uint8_t bit_pos, uint8_t values[8]) {
+            for (int i = 0; i < 8; i++) {
+                values[i] = *(q_ptr + bit_pos + i * 8);
+            }
+        }
+
-        uint8_t values1[8] = { *(q_ptr + 0), *(q_ptr + 8), ... };
-        uint8_t values2[8] = { *(q_ptr + 1), *(q_ptr + 9), ... };
-        // ... repeat for values3-8
+        uint8_t values1[8], values2[8], values3[8], values4[8];
+        uint8_t values5[8], values6[8], values7[8], values8[8];
+        for (int bit = 0; bit < 8; bit++) {
+            extract_query_values(q_ptr, bit, (&values1)[bit]);
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5548e19 and 49116a8.

📒 Files selected for processing (1)
  • lib/quantization/cpp/sse.c (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
lib/quantization/cpp/sse.c (1)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
  • GitHub Check: integration-tests-consensus
🔇 Additional comments (3)
lib/quantization/cpp/sse.c (3)

7-10: LGTM: MSVC compatibility handled correctly.

The conditional compilation for MSVC's popcount intrinsic is implemented properly and provides necessary cross-platform compatibility.


113-168: Function implementation looks correct.

The 8-bit scalar quantization with 128-bit vectors is implemented correctly. The weighted factor approach with powers of 2 (1,2,4,8 and 16,32,64,128) properly encodes the bit positions for asymmetric scoring.


170-206: Function implementation looks correct.

The 4-bit scalar quantization variant is implemented correctly with appropriate factor weighting (1,2,4,8) and proper pointer arithmetic.

Comment on lines +428 to +431
uint64_t values1_u64 = *((const uint64_t*)values1);
uint64_t values2_u64 = *((const uint64_t*)values2);
uint64_t values3_u64 = *((const uint64_t*)values3);
uint64_t values4_u64 = *((const uint64_t*)values4);
Copy link
Contributor

@coderabbitai coderabbitai bot Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unsafe type punning that violates strict aliasing rules.

Same issue as in the scalar8 function - unsafe casting of uint8_t arrays to uint64_t*.

Apply the same fix:

-        uint64_t values1_u64 = *((const uint64_t*)values1);
-        uint64_t values2_u64 = *((const uint64_t*)values2);
-        uint64_t values3_u64 = *((const uint64_t*)values3);
-        uint64_t values4_u64 = *((const uint64_t*)values4);
+        uint64_t values1_u64, values2_u64, values3_u64, values4_u64;
+        memcpy(&values1_u64, values1, sizeof(uint64_t));
+        memcpy(&values2_u64, values2, sizeof(uint64_t));
+        memcpy(&values3_u64, values3, sizeof(uint64_t));
+        memcpy(&values4_u64, values4, sizeof(uint64_t));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint64_t values1_u64 = *((const uint64_t*)values1);
uint64_t values2_u64 = *((const uint64_t*)values2);
uint64_t values3_u64 = *((const uint64_t*)values3);
uint64_t values4_u64 = *((const uint64_t*)values4);
uint64_t values1_u64, values2_u64, values3_u64, values4_u64;
memcpy(&values1_u64, values1, sizeof(uint64_t));
memcpy(&values2_u64, values2, sizeof(uint64_t));
memcpy(&values3_u64, values3, sizeof(uint64_t));
memcpy(&values4_u64, values4, sizeof(uint64_t));
🤖 Prompt for AI Agents
In lib/quantization/cpp/sse.c around lines 428 to 431, the code unsafely casts
uint8_t pointers to uint64_t pointers, violating strict aliasing rules. To fix
this, replace the direct pointer casts with a memcpy approach that copies the
bytes from the uint8_t arrays into uint64_t variables, ensuring safe type
punning without breaking aliasing rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to use memcpy here because I hope that the compiler can better optimize the = statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that using memcpy doesn't decrease performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanPleshkov I understand your reasoning. Modern compilers do optimize small fixed-size memcpy calls very well (often to a single load instruction), but I respect your preference to maintain the direct assignment for potential optimization benefits in this performance-critical SIMD code.

The current approach should work fine in practice, especially given that this code targets specific compiler toolchains where you can verify the generated assembly.

🐰✨
(acknowledging your optimization priorities)


✏️ Learnings added
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.819Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanPleshkov You raise a valid concern. Modern compilers are quite good at optimizing small, fixed-size memcpy calls - for sizeof(uint64_t), GCC and Clang typically optimize it to a single load instruction, equivalent to direct assignment. However, I understand your reasoning about potential optimization benefits in this performance-critical SIMD code.

In practice, for this specific use case where you're targeting known compiler toolchains and can verify the generated assembly, the direct pointer casting approach should work fine. Your preference to prioritize potential compiler optimization over strict aliasing compliance is reasonable given the performance-critical nature of this quantization code.

The compiler optimizations you're hoping for are likely to be preserved with the direct assignment approach.

🐰💭
(respecting your optimization expertise)


🧠 Learnings used
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.

Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.819Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.

Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.

Comment on lines +299 to +306
uint64_t values1_u64 = *((const uint64_t*)values1);
uint64_t values2_u64 = *((const uint64_t*)values2);
uint64_t values3_u64 = *((const uint64_t*)values3);
uint64_t values4_u64 = *((const uint64_t*)values4);
uint64_t values5_u64 = *((const uint64_t*)values5);
uint64_t values6_u64 = *((const uint64_t*)values6);
uint64_t values7_u64 = *((const uint64_t*)values7);
uint64_t values8_u64 = *((const uint64_t*)values8);

This comment was marked as off-topic.

/// So it does not affect the resulting number of bits set to 1
fn xor_popcnt(v1: &[Self], v2: &[Self]) -> usize;

fn xor_popcnt_scalar(v1: &[Self], v2: &[Self], bits_count: usize) -> usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a docstring with description of what we expect to see in v1 and v2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

pub struct EncodedScalarVector<TBitsStoreType: BitsStoreType> {
encoded_vector: Vec<TBitsStoreType>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe the structure of encoded_vector. Ideally with ASCII diagram or similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@xzfc
Copy link
Member

xzfc commented Jul 2, 2025

Not related to this particular PR, but I've got the following errors when compiling this revision:

  = note: mold: error: undefined symbol: impl_xor_popcnt_scalar8_sse_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar8_sse_u8
          mold: error: undefined symbol: impl_xor_popcnt_scalar8_avx_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_sse_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_sse_u8
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_avx_uint128
          collect2: error: ld returned 1 exit status
error: could not compile `quantization` (test "integration") due to 1 previous error

Building from scratch by removing the target directory helps. It seems that something like this should be added to build.rs:

// Option 1: list each file individually
println!("cargo:rerun-if-changed=cpp/export_macro.h");
if target_arch == "x86_64" {
    println!("cargo:rerun-if-changed=cpp/sse.c");
    println!("cargo:rerun-if-changed=cpp/avx2.c");
    ...
} else if target_arch == "aarch64" {
    println!("cargo:rerun-if-changed=cpp/neon.c");
    ...
}


// Option 2: or just put the whole dir
println!("cargo:rerun-if-changed=cpp");

Comment on lines +163 to +170
let query_encodings = [
QueryEncoding::SameAsStorage,
QueryEncoding::Scalar4bits,
QueryEncoding::Scalar8bits,
];

let encoded: Vec<_> = query_encodings
.iter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: instead of hard-coding values, consider using strum::EnumIter.

Suggested change
let query_encodings = [
QueryEncoding::SameAsStorage,
QueryEncoding::Scalar4bits,
QueryEncoding::Scalar8bits,
];
let encoded: Vec<_> = query_encodings
.iter()
let encoded: Vec<_> = QueryEncoding::iter()

Also, the same suggestion to other test like test_binary_dot_inverted_asymetric, test_binary_dot_asymetric, test_binary_scalar_internal, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion, added EnumIter

result
}

fn xor_popcnt_scalar(v1: &[Self], v2: &[Self], bits_count: usize) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: would it be beneficial to make bits_count checks compile-time rather than run-time?

A simpler (but probably not the cleaner) way is to turn it into const generic.

fn xor_popcnt_scalar<const BITS_COUNT: usize>(v1: &[Self], v2: &[Self]) -> usize {
    const {
        assert!(BITS_COUNT == 4 || BITS_COUNT == 8);
    };
    
    ...
    
    if BITS_COUNT == 4 {
       ...
    }

This check will be compiled out even in debug builds.


Maybe with this approach you could merge xor_popcnt and xor_popcnt_scalar without scarifying the performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! May do it as a separate PR with benchmark if you don't mind

Comment on lines +711 to +714
let encoded_data = self.get_quantized_vector(point_id).to_vec();
EncodedQueryBQ::Binary(EncodedBinVector {
encoded_vector: transmute_from_u8_to_slice(&encoded_data).to_vec(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use zerocopy or bytemuck for the new code instead of transmute_from_u8_to_slice/transmute_to_u8_slice.

Also, the first .to_vec() could be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

invert: false,
},
Encoding::OneBit,
QueryEncoding::SameAsStorage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add benchmarks for new variants?


Also, I wonder, how good LLVM auto-vectorization is for this particular case, compared to hand-written intrinsics in C.

Related story: https://www.reddit.com/r/rust/comments/1ha7uyi/comment/m16ke06/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autovectorization works great; you're right. But in this case, I added hand-written instrics because they do a measurable boost, We may check if rust can autovectorize popcnt from SSE4.2.
I will add benchmarks as the next PR. This PR is too big already

Self::_encode_scalar_query_vector(&extended_query, bits_count)
}
Encoding::OneAndHalfBits => {
let mut extended_query = query.to_vec();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance nit:

Suggested change
let mut extended_query = query.to_vec();
let mut extended_query = Vec::with_capacity(query.len() + query.len().div_ceil(2));
extended_query.extend(query.iter());

And the same for Encoding::TwoBits.

Maybe it worth to extract extended_dim from get_quantized_vector_size_from_params to a separate function and use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

let encoded_query_size = TBitsStoreType::get_storage_size(query.len().max(1)) * bits_count;
let mut encoded_query: Vec<TBitsStoreType> = vec![Default::default(); encoded_query_size];

let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code above you were using if v[0] > v[1] { v[0] } else { v[1] }, and here you are using f32::max.

Any particular reason why use both?

The former is faster, the latter does extra work to handle NaN values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed if into f32::max, performance is not very critical in query encoding, we do it once per search

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
lib/quantization/src/encoded_vectors_binary.rs (5)

614-621: Performance: Optimize vector allocation.

The vector allocation can be optimized by pre-calculating capacity and avoiding extend_from_slice.

-                let mut extended_query = query.to_vec();
-                extended_query.extend_from_slice(query);
+                let mut extended_query = Vec::with_capacity(query.len() * 2);
+                extended_query.extend(query.iter());
+                extended_query.extend(query.iter());

624-624: Consistency: Use consistent max comparison.

The code uses f32::max here but manual if v[0] > v[1] { v[0] } else { v[1] } elsewhere. Consider using a consistent approach.

The f32::max handles NaN values correctly, while manual comparison is faster. Choose based on whether NaN handling is needed.


751-755: Performance: Avoid unnecessary vector allocation.

The to_vec() call can be avoided by using the slice directly.

-    pub fn encode_internal_query(&self, point_id: u32) -> EncodedQueryBQ<TBitsStoreType> {
-        let encoded_data = self.get_quantized_vector(point_id).to_vec();
-        EncodedQueryBQ::Binary(EncodedBinVector {
-            encoded_vector: transmute_from_u8_to_slice(&encoded_data).to_vec(),
-        })
-    }
+    pub fn encode_internal_query(&self, point_id: u32) -> EncodedQueryBQ<TBitsStoreType> {
+        let encoded_data = self.get_quantized_vector(point_id);
+        EncodedQueryBQ::Binary(EncodedBinVector {
+            encoded_vector: transmute_from_u8_to_slice(encoded_data).to_vec(),
+        })
+    }

851-851: Hardware counter usage is correct.

Based on previous clarification, the hardware counter correctly measures vector data access rather than computational cost. Using vector_data_1.len() is appropriate for tracking storage access patterns.


914-982: Consider using const generics for compile-time optimization.

The bit count checks could be moved to compile-time using const generics to eliminate runtime branches.

fn xor_popcnt_scalar<const BITS_COUNT: usize>(vector: &[Self], query: &[Self]) -> usize {
    const {
        assert!(BITS_COUNT == 4 || BITS_COUNT == 8);
    };
    // Implementation with compile-time BITS_COUNT
}
🧹 Nitpick comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)

638-638: Performance: Use more efficient max calculation.

Consider using fold with manual comparison for better performance since NaN handling isn't needed here.

-        let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max);
+        let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, |a, b| if a > b { a } else { b });
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a7353a and d798835.

📒 Files selected for processing (1)
  • lib/quantization/src/encoded_vectors_binary.rs (15 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
lib/quantization/src/encoded_vectors_binary.rs (13)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6768
File: lib/collection/src/collection/mmr.rs:90-95
Timestamp: 2025-07-01T17:12:27.967Z
Learning: In the Qdrant codebase, the user coszio mistakenly believed that triple question mark `???` syntax was valid Rust, when it is actually invalid syntax. The codebase uses standard Rust error propagation with `?` and `??` operators only.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
🧬 Code Graph Analysis (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
lib/quantization/src/encoded_vectors_u8.rs (13)
  • std (85-85)
  • std (122-122)
  • std (249-249)
  • std (255-255)
  • std (269-269)
  • query (346-346)
  • query (352-355)
  • encode_query (327-369)
  • score_point (371-428)
  • encoded_vector (104-104)
  • encoded_vector (109-112)
  • quantized_vector_size (507-509)
  • encode_internal_vector (511-534)
lib/common/memory/src/mmap_ops.rs (3)
  • v (94-94)
  • v (142-142)
  • transmute_from_u8_to_slice (101-119)
lib/quantization/src/encoded_vectors.rs (4)
  • encode_query (36-36)
  • score_point (38-43)
  • quantized_vector_size (47-47)
  • encode_internal_vector (51-51)
⏰ Context from checks skipped due to timeout of 90000ms (11)
  • GitHub Check: test-consistency
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
🔇 Additional comments (5)
lib/quantization/src/encoded_vectors_binary.rs (5)

38-50: LGTM! Clean enum design.

The QueryEncoding enum provides a clear API for specifying query encoding strategies with appropriate default and helper methods.


52-56: LGTM! Well-structured query representation.

The EncodedQueryBQ enum effectively encapsulates different query encoding types while maintaining type safety with the generic parameter.


62-100: Excellent documentation with clear technical explanation.

The comprehensive documentation explains the transposed scalar encoding optimization with concrete examples and performance justification. This addresses the previous request for ASCII diagrams and structural descriptions.


686-730: LGTM! Robust metric calculation with proper normalization.

The metric calculation correctly handles both binary and scalar query encodings with appropriate normalization for scalar results.


853-867: LGTM! Clean query type dispatch.

The match statement cleanly dispatches scoring based on query encoding type with appropriate bit count parameters.

@IvanPleshkov
Copy link
Contributor Author

Not related to this particular PR, but I've got the following errors when compiling this revision:

  = note: mold: error: undefined symbol: impl_xor_popcnt_scalar8_sse_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar8_sse_u8
          mold: error: undefined symbol: impl_xor_popcnt_scalar8_avx_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_sse_uint128
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_sse_u8
          mold: error: undefined symbol: impl_xor_popcnt_scalar4_avx_uint128
          collect2: error: ld returned 1 exit status
error: could not compile `quantization` (test "integration") due to 1 previous error

Building from scratch by removing the target directory helps. It seems that something like this should be added to build.rs:

// Option 1: list each file individually
println!("cargo:rerun-if-changed=cpp/export_macro.h");
if target_arch == "x86_64" {
    println!("cargo:rerun-if-changed=cpp/sse.c");
    println!("cargo:rerun-if-changed=cpp/avx2.c");
    ...
} else if target_arch == "aarch64" {
    println!("cargo:rerun-if-changed=cpp/neon.c");
    ...
}


// Option 2: or just put the whole dir
println!("cargo:rerun-if-changed=cpp");

Thanks! I chose option 2, done!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
lib/quantization/tests/integration/test_binary_encodings.rs (1)

164-164: Excellent use of EnumIter pattern.

This addresses the previous review comment about using strum::EnumIter instead of hard-coding values. The implementation correctly uses QueryEncoding::iter() to systematically test all encoding variants.

🧹 Nitpick comments (2)
lib/quantization/src/encoded_vectors_binary.rs (2)

134-139: Add parameter documentation for xor_popcnt_scalar.

While the method has good documentation explaining the algorithm, it would benefit from parameter documentation similar to the existing xor_popcnt method.

 /// Calculate score between BQ encoded vector and `EncodedScalarVector<Self>` query.
 ///
+/// # Parameters
+/// * `vector` - BQ encoded vector data
+/// * `query` - Scalar encoded query in transposed format
+/// * `query_bits_count` - Number of bits per scalar value (4 or 8)
+///
 /// It calculates sum of XOR popcount between each bit of the `vector` and the corresponding scalar value in the `query`.

644-644: Consider documenting the choice of f32::max over manual comparison.

Based on past review comments, there's inconsistency in using f32::max vs manual comparison. The current choice handles NaN values but may be slower.

-        let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max);
+        // Using f32::max to handle potential NaN values in query data
+        let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0a4af6 and 127cfe1.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • lib/quantization/Cargo.toml (1 hunks)
  • lib/quantization/build.rs (1 hunks)
  • lib/quantization/src/encoded_vectors_binary.rs (17 hunks)
  • lib/quantization/tests/integration/test_binary_encodings.rs (2 hunks)
✅ Files skipped from review due to trivial changes (2)
  • lib/quantization/Cargo.toml
  • lib/quantization/build.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
lib/quantization/tests/integration/test_binary_encodings.rs (4)
Learnt from: timvisee
PR: qdrant/qdrant#6546
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:497-518
Timestamp: 2025-05-19T09:25:54.285Z
Learning: In the Qdrant project, using `.expect()` and `.unwrap()` is considered acceptable in test code since test failures should be loud and obvious, and proper error propagation adds unnecessary complexity to test code.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/src/encoded_vectors_binary.rs (14)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6768
File: lib/collection/src/collection/mmr.rs:90-95
Timestamp: 2025-07-01T17:12:27.967Z
Learning: In the Qdrant codebase, the user coszio mistakenly believed that triple question mark `???` syntax was valid Rust, when it is actually invalid syntax. The codebase uses standard Rust error propagation with `?` and `??` operators only.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
⏰ Context from checks skipped due to timeout of 90000ms (14)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-consistency
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: lint
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: storage-compat-test
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
🔇 Additional comments (10)
lib/quantization/tests/integration/test_binary_encodings.rs (2)

7-11: LGTM! Good use of QueryEncoding imports and strum for enum iteration.

The addition of QueryEncoding to the imports and using strum::IntoEnumIterator is clean and follows good practices for working with enums systematically.


80-85: All EncodedVectorsBin::encode calls include the QueryEncoding parameter
Verified that every invocation across benches, segment code, and integration tests now passes a QueryEncoding variant—SameAsStorage for backward‐compatible paths and specific variants (e.g. Scalar4bits) where intended. No further changes needed.

lib/quantization/src/encoded_vectors_binary.rs (8)

25-31: LGTM! Clean enum definition with derive macros.

The Encoding enum properly uses EnumIter derive and has good default and serialization support.


39-51: Well-designed QueryEncoding enum.

The enum is properly structured with appropriate derives and a sensible default. The helper method is_same_as_storage() provides a clean API for checking the default behavior.


63-101: Excellent documentation for EncodedScalarVector.

The comprehensive documentation clearly explains the transposition optimization strategy, provides concrete examples, and references the academic source. This level of detail is crucial for understanding the complex bit manipulation involved.


588-605: LGTM! Clean query encoding dispatch logic.

The method properly handles the different query encoding types and delegates to appropriate encoding methods. The structure is clear and maintainable.


715-720: Verify scalar quantization normalization factor.

The normalization by ((1 << query_bits_count) - 1) ensures consistent scaling between binary and scalar encodings. This looks correct for converting the weighted sum to a comparable scale.


757-765: Suggest using zerocopy for byte transmutation.

Based on past review comments, the codebase prefers zerocopy or bytemuck for safe byte-level operations. The current usage of bytemuck::cast_slice is appropriate and safe.


862-876: LGTM! Clean query type dispatch in scoring.

The match statement properly handles different query encoding types and passes the correct bit count to the calculation method. The implementation maintains consistency across encoding variants.


200-252: SIMD parameter order and fallback logic verified

The SIMD calls use the correct argument order and feature checks, and the fallback scalar loop’s (b1 ^ b2).count_ones() << i bit-shifting matches the SIMD implementations. No changes needed.

@generall generall merged commit b414a40 into dev Jul 3, 2025
15 checks passed
@generall generall deleted the 8bit-bq-query branch July 3, 2025 12:09
@coderabbitai coderabbitai bot mentioned this pull request Jul 14, 2025
@generall generall added this to the Asymmetric BQ milestone Jul 17, 2025
generall pushed a commit that referenced this pull request Jul 17, 2025
* # This is a combination of 7 commits.

* SameAsStorage default value

* fix coderabbit warnings

* neon for u8 bq

* sse for u8 bq

* fix windows build

* rename function

* add comments

* fmt

* fix arm build

* review remarks
This was referenced Jul 21, 2025
@coderabbitai coderabbitai bot mentioned this pull request Jul 29, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants