Conversation
4492a3c to
de741fa
Compare
# This is the 1st commit message: bq encodings fix tests Welford's Algorithm review remarks remove debug println in test try fix ci revert cargo-nextest async query 4 and 8 bits use quantized storage for query in hnsw build fix after rebase fix tests Welford's Algorithm review remarks remove debug println in test try fix ci revert cargo-nextest async query 4 and 8 bits use quantized storage for query in hnsw build fix after rebase are you happy fmt # This is the commit message #2: quantization config without option # This is the commit message #3: x64 simd for u128 # This is the commit message #4: fmt # This is the commit message #5: neon simd # This is the commit message #6: refactor # This is the commit message #7: fmt
2d10d71 to
8309829
Compare
📝 Walkthrough""" WalkthroughThis change introduces support for multiple query encoding types in the binary quantization module. It adds a Possibly related PRs
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Clippy (1.86.0)error: failed to load source for dependency Caused by: Caused by: Caused by: ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
lib/quantization/cpp/neon.c (1)
109-148: Consider simplifying the horizontal sum calculation.The horizontal sum calculation repeats the same pattern 8 times. This could be simplified using a loop or helper function for better maintainability.
Example approach using arrays:
uint16x8_t results[8] = {result1, result2, result3, result4, result5, result6, result7, result8}; uint32_t sums[8]; for (int i = 0; i < 8; i++) { uint8x8_t low = vget_low_u8(vreinterpretq_u8_u16(results[i])); uint8x8_t high = vget_high_u8(vreinterpretq_u8_u16(results[i])); uint16x8_t sum = vaddl_u8(low, high); sums[i] = vaddvq_u16(sum); } return sums[0] + (sums[1] << 1) + (sums[2] << 2) + (sums[3] << 3) + (sums[4] << 4) + (sums[5] << 5) + (sums[6] << 6) + (sums[7] << 7);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
lib/quantization/benches/binary.rs(3 hunks)lib/quantization/cpp/avx2.c(1 hunks)lib/quantization/cpp/neon.c(1 hunks)lib/quantization/cpp/sse.c(1 hunks)lib/quantization/src/encoded_vectors_binary.rs(15 hunks)lib/quantization/tests/integration/empty_storage.rs(2 hunks)lib/quantization/tests/integration/test_binary.rs(14 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(2 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(4 hunks)
🧰 Additional context used
🧠 Learnings (5)
lib/quantization/benches/binary.rs (6)
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6446
File: lib/gridstore/benches/flush_bench.rs:18-18
Timestamp: 2025-04-29T16:48:34.967Z
Learning: The Rust rand crate version 0.9+ has introduced a top-level `rng()` function that replaces the now-deprecated `thread_rng()` function.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/tests/integration/empty_storage.rs (3)
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` does not return a `Result` because the underlying `mmap.populate()` method doesn't return a Result either. Adding an unnecessary `Ok(())` would cause Clippy to complain.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/tests/integration/test_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
lib/quantization/tests/integration/test_binary_encodings.rs (1)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
🧬 Code Graph Analysis (1)
lib/quantization/cpp/neon.c (3)
lib/quantization/cpp/avx2.c (2)
uint32_t(124-171)uint32_t(173-205)lib/quantization/cpp/sse.c (5)
uint32_t(49-70)uint32_t(72-88)uint32_t(90-106)uint32_t(108-163)uint32_t(165-210)lib/quantization/src/encoded_vectors_binary.rs (2)
impl_xor_popcnt_scalar8_neon_uint128(864-868)impl_xor_popcnt_scalar4_neon_uint128(870-874)
🔇 Additional comments (10)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
953-953: LGTM! Consistent API update for query encoding support.The addition of
QueryEncoding::SameAsStorageparameter to allEncodedVectorsBin::encodecalls is properly implemented across all binary quantization creation paths (RAM/MMAP and single/multi variants).Also applies to: 970-970, 1014-1014, 1037-1037
lib/quantization/tests/integration/empty_storage.rs (1)
7-7: Test correctly updated for new API.The test properly imports
QueryEncodingand adds the requiredQueryEncoding::SameAsStorageparameter to match the updated encoding API.Also applies to: 105-105
lib/quantization/benches/binary.rs (1)
7-7: Benchmarks properly updated for query encoding API.Both benchmark encoding calls correctly specify
QueryEncoding::SameAsStorageto maintain existing benchmark behavior with the new API.Also applies to: 43-43, 84-84
lib/quantization/tests/integration/test_binary.rs (2)
7-9: Existing tests correctly updated with query encoding parameter.All test functions properly specify
QueryEncoding::SameAsStorageto maintain their original behavior with the updated API.Also applies to: 55-55, 103-103, 151-151, 196-196, 240-240, 303-303, 366-366, 426-426, 486-486, 549-549, 612-612, 672-672
701-746: Well-structured test for scalar query encoding consistency.The new test comprehensively validates that
score_pointandscore_internalproduce identical results when usingQueryEncoding::Scalar4bits. Good coverage with multiple dimensions and all encoding types.lib/quantization/cpp/avx2.c (2)
124-171: Verify pointer alignment for uint64_t access.The function casts
uint8_t*pointers touint64_t*which requires 8-byte alignment. Misaligned access could cause crashes on some architectures or performance degradation.Please ensure that the input pointers are always 8-byte aligned when calling these functions. Consider adding a debug assertion or documentation about alignment requirements.
173-205: Implementation looks correct for 4-bit scalar encoding.The weighted popcount accumulation with factors [1,2,4,8] is appropriate for 4-bit encoding. Same alignment requirements apply as noted for the 8-bit function above.
lib/quantization/tests/integration/test_binary_encodings.rs (1)
7-225: Test implementation looks comprehensive.The test changes properly validate the new asymmetric binary quantization functionality:
- Tests different query encoding modes
- Validates accuracy improvements
- Covers both normal and inverted distance calculations
lib/quantization/src/encoded_vectors_binary.rs (2)
832-842: Verify parameter naming in external function declarations.The parameter names in these external C function declarations might be misleading. Based on the implementation, it appears that for scalar encodings, the "query" data is expected to be larger (by a factor of bits_count) than the "vector" data.
Please verify that the parameter naming convention is intentional and document the expected data layout for these functions to prevent confusion.
Consider adding documentation like:
/// For scalar XOR-popcount functions: /// - query_ptr: Points to the scalar-encoded query data (larger) /// - vector_ptr: Points to the binary-encoded vector data (smaller) /// - count: Number of 128-bit blocks in the vector dataAlso applies to: 844-854, 864-874
229-233: Fix parameter order in SIMD function calls.The parameter order is swapped when calling the C functions. The C functions expect
query_ptrfirst andvector_ptrsecond, but the code passesv2(query) as first andv1(vector) as second parameter.Apply this diff to fix the parameter order for NEON:
return impl_xor_popcnt_scalar8_neon_uint128( - v2.as_ptr().cast::<u8>(), - v1.as_ptr().cast::<u8>(), + v1.as_ptr().cast::<u8>(), + v2.as_ptr().cast::<u8>(), v1.len() as u32, ) as usize;Apply similar fixes for all the scalar XOR-popcount function calls in this method (both AVX and SSE variants).
Also applies to: 250-254, 271-275
⛔ Skipped due to learnings
Learnt from: coszio PR: qdrant/qdrant#6609 File: lib/gridstore/src/blob.rs:46-59 Timestamp: 2025-06-02T18:10:47.203Z Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.Learnt from: coszio PR: qdrant/qdrant#6528 File: lib/posting_list/src/view.rs:118-118 Timestamp: 2025-05-19T14:40:20.068Z Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/quantization/cpp/neon.c (2)
150-323: Consider optimizing the gather operation to avoid temporary arrays.The function creates temporary arrays on the stack for each iteration to gather non-contiguous bytes. This approach works but may impact performance due to extra memory operations.
Consider using NEON table lookup instructions (vtbl) or other gather mechanisms to avoid the temporary arrays.
325-420: Consider the same optimization for scalar4 to avoid temporary arrays.Similar to the scalar8 version, this function also creates temporary arrays on the stack. The same optimization advice applies here.
lib/quantization/src/encoded_vectors_binary.rs (1)
569-605: Add documentation for the complex bit packing logic.The scalar quantization and bit packing implementation is correct but complex. Consider adding comments to explain:
- The symmetric quantization approach
- How bits are interleaved across storage elements
- The purpose of the modulo operation
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/quantization/cpp/neon.c(1 hunks)lib/quantization/src/encoded_vectors_binary.rs(15 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
🧬 Code Graph Analysis (1)
lib/quantization/cpp/neon.c (3)
lib/quantization/cpp/avx2.c (2)
uint32_t(124-171)uint32_t(173-205)lib/quantization/cpp/sse.c (5)
uint32_t(49-70)uint32_t(72-88)uint32_t(90-106)uint32_t(108-163)uint32_t(165-201)lib/quantization/src/encoded_vectors_binary.rs (4)
impl_xor_popcnt_scalar8_neon_uint128(885-889)impl_xor_popcnt_scalar4_neon_uint128(891-895)impl_xor_popcnt_scalar8_neon_u8(897-901)impl_xor_popcnt_scalar4_neon_u8(903-907)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests-consensus
- GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (11)
lib/quantization/cpp/neon.c (2)
69-116: LGTM! Correct NEON implementation for scalar8 XOR-popcount.The function correctly uses
uint16x8_tfor accumulation and properly calculates the weighted sum with powers of two.
118-148: LGTM! Correct NEON implementation for scalar4 XOR-popcount.The function correctly handles 4 query vectors with proper accumulation and weighted sum calculation.
lib/quantization/src/encoded_vectors_binary.rs (9)
38-75: LGTM! Well-structured query encoding types.The new enums and structs provide a clean abstraction for different query encoding modes with proper serialization support.
77-100: LGTM! Appropriate trait bounds for scalar operations.The additional trait bounds support the bit manipulation operations needed for scalar encodings.
157-188: LGTM! Correct scalar XOR-popcount implementation for u8.The function properly validates input sizes, dispatches to SIMD implementations when available, and provides a correct fallback implementation.
243-316: LGTM! Comprehensive SIMD support for u128 scalar operations.The function provides good coverage of SIMD instruction sets (NEON, AVX2, SSE4.2) with appropriate feature detection and fallback.
524-541: LGTM! Clean query encoding dispatch.The method properly routes to appropriate encoding functions based on the query encoding mode.
543-567: LGTM! Correct query extension for different storage encodings.The method properly extends the query vector to match the storage encoding format, with appropriate handling for the OneAndHalfBits case using max pooling.
624-668: LGTM! Correct normalization for scalar encodings.The metric calculation properly normalizes scalar encoding results to maintain consistent scoring across different bit widths.
777-806: LGTM! Proper handling of different query encoding types.The scoring correctly dispatches to calculate_metric with the appropriate bits_count for each encoding variant.
853-908: LGTM! Complete SIMD function declarations.The external function declarations properly cover all SIMD variants (SSE, AVX, NEON) for both 4-bit and 8-bit scalar encodings.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
lib/quantization/cpp/sse.c (4)
122-127: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).The same argument order issue identified in previous reviews applies to these new
_mm_set_epi32calls. The intrinsic expects arguments in reverse order (from high to low), but the current implementation provides them in low to high order.Also applies to: 130-135, 138-143, 146-151
178-183: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).The same argument order issue applies to these
_mm_set_epi32calls in the scalar4 function.Also applies to: 186-191
303-308: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).The argument order issue also affects these
_mm_set_epi32calls in the u8 scalar8 function.Also applies to: 311-316
428-433: Fix argument order for _mm_set_epi32 intrinsic (duplicate issue).The argument order issue also affects this
_mm_set_epi32call in the u8 scalar4 function.
🧹 Nitpick comments (1)
lib/quantization/cpp/sse.c (1)
213-292: Consider refactoring repetitive array initialization.The manual array initialization pattern is repetitive and error-prone. Consider using a loop or macro to reduce code duplication and improve maintainability.
- uint8_t values1[8] = { - *(q_ptr + 0), - *(q_ptr + 8), - *(q_ptr + 16), - // ... more manual entries - }; + for (int i = 0; i < 8; i++) { + for (int bit = 0; bit < 8; bit++) { + values[bit][i] = *(q_ptr + bit + i * 8); + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/quantization/cpp/sse.c(1 hunks)lib/quantization/src/encoded_vectors_binary.rs(15 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests
🔇 Additional comments (10)
lib/quantization/src/encoded_vectors_binary.rs (10)
38-50: LGTM - Well-designed query encoding enum.The
QueryEncodingenum is well-structured with appropriate defaults and utility methods. Theis_same_as_storagemethod provides clear intent checking.
52-56: LGTM - Appropriate enum design for encoded queries.The
EncodedQueryBQenum appropriately models the different query encoding variants and leverages the existing vector types effectively.
72-74: LGTM - Proper serialization attributes.The
query_encodingfield is correctly configured with serde attributes to maintain backward compatibility by using default values and conditional serialization.
83-96: LGTM - Appropriate trait bounds expansion.The additional trait bounds are necessary for the scalar operations and follow Rust best practices. The new
xor_popcnt_scalarmethod signature is well-designed.
330-336: Verify consistency with u8 implementation.The scalar popcount logic for u128 should be consistent with the u8 implementation. Both use the same bit shifting pattern for weighting.
361-361: LGTM - API extension for query encoding.The addition of the
query_encodingparameter to theencodefunction properly extends the API to support the new functionality.
601-614: Verify scalar quantization range calculation.The quantization logic calculates ranges as
(1 << bits_count) - 1and then usesquantized % (ranges + 1). This effectively creates2^bits_countquantization levels, which seems correct for the intended bit width.
668-673: LGTM - Proper metric normalization for scalar queries.The normalization of scalar XOR results by
((1 << v2_bits_count) - 1)correctly scales the weighted popcount back to the [0,1] range for metric calculation.
578-584: No changes needed: OneAndHalfBits query‐extension logic is already covered by testsThe asymmetric “max of pairs” logic in
extended_query.extend(query.chunks(2)… )is exercised by existing integration tests, confirming it behaves as intended:
- lib/quantization/tests/integration/test_binary_encodings.rs:132–134 (
test_binary_dot_asymentric_impl::<…>(…, Encoding::OneAndHalfBits, false))- lib/quantization/tests/integration/test_binary.rs:707–708 (
test_binary_scalar_internal_impl::<u8>(…, Encoding::OneAndHalfBits))No further action required.
202-208: Verify weighted popcount shift
Please confirm that(b1 ^ b2).count_ones() << iimplements the intended scalar-quantization weighting. I wasn’t able to find a matching pattern in other quantization modules—consider adding an inline comment or a unit test that references the weighting scheme from the spec for clarity.• lib/quantization/src/encoded_vectors_binary.rs:202–208
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/quantization/cpp/sse.c (1)
218-297: Consider refactoring repetitive array initialization.The manual array initialization for 8 query sets is highly repetitive and error-prone. Consider using a loop-based approach or helper function to reduce code duplication and improve maintainability.
Example refactor using a helper approach:
+ // Helper to extract query values for a specific bit position + static inline void extract_query_values(const uint8_t* q_ptr, uint8_t bit_pos, uint8_t values[8]) { + for (int i = 0; i < 8; i++) { + values[i] = *(q_ptr + bit_pos + i * 8); + } + } + - uint8_t values1[8] = { *(q_ptr + 0), *(q_ptr + 8), ... }; - uint8_t values2[8] = { *(q_ptr + 1), *(q_ptr + 9), ... }; - // ... repeat for values3-8 + uint8_t values1[8], values2[8], values3[8], values4[8]; + uint8_t values5[8], values6[8], values7[8], values8[8]; + for (int bit = 0; bit < 8; bit++) { + extract_query_values(q_ptr, bit, (&values1)[bit]); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/quantization/cpp/sse.c(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
lib/quantization/cpp/sse.c (1)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: storage-compat-test
- GitHub Check: lint
- GitHub Check: integration-tests-consensus
🔇 Additional comments (3)
lib/quantization/cpp/sse.c (3)
7-10: LGTM: MSVC compatibility handled correctly.The conditional compilation for MSVC's popcount intrinsic is implemented properly and provides necessary cross-platform compatibility.
113-168: Function implementation looks correct.The 8-bit scalar quantization with 128-bit vectors is implemented correctly. The weighted factor approach with powers of 2 (1,2,4,8 and 16,32,64,128) properly encodes the bit positions for asymmetric scoring.
170-206: Function implementation looks correct.The 4-bit scalar quantization variant is implemented correctly with appropriate factor weighting (1,2,4,8) and proper pointer arithmetic.
| uint64_t values1_u64 = *((const uint64_t*)values1); | ||
| uint64_t values2_u64 = *((const uint64_t*)values2); | ||
| uint64_t values3_u64 = *((const uint64_t*)values3); | ||
| uint64_t values4_u64 = *((const uint64_t*)values4); |
There was a problem hiding this comment.
Fix unsafe type punning that violates strict aliasing rules.
Same issue as in the scalar8 function - unsafe casting of uint8_t arrays to uint64_t*.
Apply the same fix:
- uint64_t values1_u64 = *((const uint64_t*)values1);
- uint64_t values2_u64 = *((const uint64_t*)values2);
- uint64_t values3_u64 = *((const uint64_t*)values3);
- uint64_t values4_u64 = *((const uint64_t*)values4);
+ uint64_t values1_u64, values2_u64, values3_u64, values4_u64;
+ memcpy(&values1_u64, values1, sizeof(uint64_t));
+ memcpy(&values2_u64, values2, sizeof(uint64_t));
+ memcpy(&values3_u64, values3, sizeof(uint64_t));
+ memcpy(&values4_u64, values4, sizeof(uint64_t));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uint64_t values1_u64 = *((const uint64_t*)values1); | |
| uint64_t values2_u64 = *((const uint64_t*)values2); | |
| uint64_t values3_u64 = *((const uint64_t*)values3); | |
| uint64_t values4_u64 = *((const uint64_t*)values4); | |
| uint64_t values1_u64, values2_u64, values3_u64, values4_u64; | |
| memcpy(&values1_u64, values1, sizeof(uint64_t)); | |
| memcpy(&values2_u64, values2, sizeof(uint64_t)); | |
| memcpy(&values3_u64, values3, sizeof(uint64_t)); | |
| memcpy(&values4_u64, values4, sizeof(uint64_t)); |
🤖 Prompt for AI Agents
In lib/quantization/cpp/sse.c around lines 428 to 431, the code unsafely casts
uint8_t pointers to uint64_t pointers, violating strict aliasing rules. To fix
this, replace the direct pointer casts with a memcpy approach that copies the
bytes from the uint8_t arrays into uint64_t variables, ensuring safe type
punning without breaking aliasing rules.
There was a problem hiding this comment.
I don't want to use memcpy here because I hope that the compiler can better optimize the = statement
There was a problem hiding this comment.
Are you sure that using memcpy doesn't decrease performance?
There was a problem hiding this comment.
@IvanPleshkov I understand your reasoning. Modern compilers do optimize small fixed-size memcpy calls very well (often to a single load instruction), but I respect your preference to maintain the direct assignment for potential optimization benefits in this performance-critical SIMD code.
The current approach should work fine in practice, especially given that this code targets specific compiler toolchains where you can verify the generated assembly.
🐰✨
(acknowledging your optimization priorities)
✏️ Learnings added
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.819Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
There was a problem hiding this comment.
@IvanPleshkov You raise a valid concern. Modern compilers are quite good at optimizing small, fixed-size memcpy calls - for sizeof(uint64_t), GCC and Clang typically optimize it to a single load instruction, equivalent to direct assignment. However, I understand your reasoning about potential optimization benefits in this performance-critical SIMD code.
In practice, for this specific use case where you're targeting known compiler toolchains and can verify the generated assembly, the direct pointer casting approach should work fine. Your preference to prioritize potential compiler optimization over strict aliasing compliance is reasonable given the performance-critical nature of this quantization code.
The compiler optimizations you're hoping for are likely to be preserved with the direct assignment approach.
🐰💭
(respecting your optimization expertise)
🧠 Learnings used
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.237Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.819Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
| uint64_t values1_u64 = *((const uint64_t*)values1); | ||
| uint64_t values2_u64 = *((const uint64_t*)values2); | ||
| uint64_t values3_u64 = *((const uint64_t*)values3); | ||
| uint64_t values4_u64 = *((const uint64_t*)values4); | ||
| uint64_t values5_u64 = *((const uint64_t*)values5); | ||
| uint64_t values6_u64 = *((const uint64_t*)values6); | ||
| uint64_t values7_u64 = *((const uint64_t*)values7); | ||
| uint64_t values8_u64 = *((const uint64_t*)values8); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| /// So it does not affect the resulting number of bits set to 1 | ||
| fn xor_popcnt(v1: &[Self], v2: &[Self]) -> usize; | ||
|
|
||
| fn xor_popcnt_scalar(v1: &[Self], v2: &[Self], bits_count: usize) -> usize; |
There was a problem hiding this comment.
Please provide a docstring with description of what we expect to see in v1 and v2
| } | ||
|
|
||
| pub struct EncodedScalarVector<TBitsStoreType: BitsStoreType> { | ||
| encoded_vector: Vec<TBitsStoreType>, |
There was a problem hiding this comment.
Please describe the structure of encoded_vector. Ideally with ASCII diagram or similar
|
Not related to this particular PR, but I've got the following errors when compiling this revision: Building from scratch by removing the // Option 1: list each file individually
println!("cargo:rerun-if-changed=cpp/export_macro.h");
if target_arch == "x86_64" {
println!("cargo:rerun-if-changed=cpp/sse.c");
println!("cargo:rerun-if-changed=cpp/avx2.c");
...
} else if target_arch == "aarch64" {
println!("cargo:rerun-if-changed=cpp/neon.c");
...
}
// Option 2: or just put the whole dir
println!("cargo:rerun-if-changed=cpp"); |
| let query_encodings = [ | ||
| QueryEncoding::SameAsStorage, | ||
| QueryEncoding::Scalar4bits, | ||
| QueryEncoding::Scalar8bits, | ||
| ]; | ||
|
|
||
| let encoded: Vec<_> = query_encodings | ||
| .iter() |
There was a problem hiding this comment.
Style nit: instead of hard-coding values, consider using strum::EnumIter.
| let query_encodings = [ | |
| QueryEncoding::SameAsStorage, | |
| QueryEncoding::Scalar4bits, | |
| QueryEncoding::Scalar8bits, | |
| ]; | |
| let encoded: Vec<_> = query_encodings | |
| .iter() | |
| let encoded: Vec<_> = QueryEncoding::iter() |
Also, the same suggestion to other test like test_binary_dot_inverted_asymetric, test_binary_dot_asymetric, test_binary_scalar_internal, etc.
There was a problem hiding this comment.
Thanks for suggestion, added EnumIter
| result | ||
| } | ||
|
|
||
| fn xor_popcnt_scalar(v1: &[Self], v2: &[Self], bits_count: usize) -> usize { |
There was a problem hiding this comment.
Performance: would it be beneficial to make bits_count checks compile-time rather than run-time?
A simpler (but probably not the cleaner) way is to turn it into const generic.
fn xor_popcnt_scalar<const BITS_COUNT: usize>(v1: &[Self], v2: &[Self]) -> usize {
const {
assert!(BITS_COUNT == 4 || BITS_COUNT == 8);
};
...
if BITS_COUNT == 4 {
...
}This check will be compiled out even in debug builds.
Maybe with this approach you could merge xor_popcnt and xor_popcnt_scalar without scarifying the performance.
There was a problem hiding this comment.
Good suggestion! May do it as a separate PR with benchmark if you don't mind
| let encoded_data = self.get_quantized_vector(point_id).to_vec(); | ||
| EncodedQueryBQ::Binary(EncodedBinVector { | ||
| encoded_vector: transmute_from_u8_to_slice(&encoded_data).to_vec(), | ||
| }) |
There was a problem hiding this comment.
Please use zerocopy or bytemuck for the new code instead of transmute_from_u8_to_slice/transmute_to_u8_slice.
Also, the first .to_vec() could be avoided.
| invert: false, | ||
| }, | ||
| Encoding::OneBit, | ||
| QueryEncoding::SameAsStorage, |
There was a problem hiding this comment.
Maybe add benchmarks for new variants?
Also, I wonder, how good LLVM auto-vectorization is for this particular case, compared to hand-written intrinsics in C.
Related story: https://www.reddit.com/r/rust/comments/1ha7uyi/comment/m16ke06/
There was a problem hiding this comment.
Autovectorization works great; you're right. But in this case, I added hand-written instrics because they do a measurable boost, We may check if rust can autovectorize popcnt from SSE4.2.
I will add benchmarks as the next PR. This PR is too big already
| Self::_encode_scalar_query_vector(&extended_query, bits_count) | ||
| } | ||
| Encoding::OneAndHalfBits => { | ||
| let mut extended_query = query.to_vec(); |
There was a problem hiding this comment.
Performance nit:
| let mut extended_query = query.to_vec(); | |
| let mut extended_query = Vec::with_capacity(query.len() + query.len().div_ceil(2)); | |
| extended_query.extend(query.iter()); |
And the same for Encoding::TwoBits.
Maybe it worth to extract extended_dim from get_quantized_vector_size_from_params to a separate function and use it here.
| let encoded_query_size = TBitsStoreType::get_storage_size(query.len().max(1)) * bits_count; | ||
| let mut encoded_query: Vec<TBitsStoreType> = vec![Default::default(); encoded_query_size]; | ||
|
|
||
| let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max); |
There was a problem hiding this comment.
In the code above you were using if v[0] > v[1] { v[0] } else { v[1] }, and here you are using f32::max.
Any particular reason why use both?
The former is faster, the latter does extra work to handle NaN values.
There was a problem hiding this comment.
Changed if into f32::max, performance is not very critical in query encoding, we do it once per search
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
lib/quantization/src/encoded_vectors_binary.rs (5)
614-621: Performance: Optimize vector allocation.The vector allocation can be optimized by pre-calculating capacity and avoiding
extend_from_slice.- let mut extended_query = query.to_vec(); - extended_query.extend_from_slice(query); + let mut extended_query = Vec::with_capacity(query.len() * 2); + extended_query.extend(query.iter()); + extended_query.extend(query.iter());
624-624: Consistency: Use consistent max comparison.The code uses
f32::maxhere but manualif v[0] > v[1] { v[0] } else { v[1] }elsewhere. Consider using a consistent approach.The
f32::maxhandles NaN values correctly, while manual comparison is faster. Choose based on whether NaN handling is needed.
751-755: Performance: Avoid unnecessary vector allocation.The
to_vec()call can be avoided by using the slice directly.- pub fn encode_internal_query(&self, point_id: u32) -> EncodedQueryBQ<TBitsStoreType> { - let encoded_data = self.get_quantized_vector(point_id).to_vec(); - EncodedQueryBQ::Binary(EncodedBinVector { - encoded_vector: transmute_from_u8_to_slice(&encoded_data).to_vec(), - }) - } + pub fn encode_internal_query(&self, point_id: u32) -> EncodedQueryBQ<TBitsStoreType> { + let encoded_data = self.get_quantized_vector(point_id); + EncodedQueryBQ::Binary(EncodedBinVector { + encoded_vector: transmute_from_u8_to_slice(encoded_data).to_vec(), + }) + }
851-851: Hardware counter usage is correct.Based on previous clarification, the hardware counter correctly measures vector data access rather than computational cost. Using
vector_data_1.len()is appropriate for tracking storage access patterns.
914-982: Consider using const generics for compile-time optimization.The bit count checks could be moved to compile-time using const generics to eliminate runtime branches.
fn xor_popcnt_scalar<const BITS_COUNT: usize>(vector: &[Self], query: &[Self]) -> usize { const { assert!(BITS_COUNT == 4 || BITS_COUNT == 8); }; // Implementation with compile-time BITS_COUNT }
🧹 Nitpick comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
638-638: Performance: Use more efficient max calculation.Consider using
foldwith manual comparison for better performance since NaN handling isn't needed here.- let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max); + let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, |a, b| if a > b { a } else { b });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/quantization/src/encoded_vectors_binary.rs(15 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
lib/quantization/src/encoded_vectors_binary.rs (13)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6768
File: lib/collection/src/collection/mmr.rs:90-95
Timestamp: 2025-07-01T17:12:27.967Z
Learning: In the Qdrant codebase, the user coszio mistakenly believed that triple question mark `???` syntax was valid Rust, when it is actually invalid syntax. The codebase uses standard Rust error propagation with `?` and `??` operators only.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
🧬 Code Graph Analysis (1)
lib/quantization/src/encoded_vectors_binary.rs (3)
lib/quantization/src/encoded_vectors_u8.rs (13)
std(85-85)std(122-122)std(249-249)std(255-255)std(269-269)query(346-346)query(352-355)encode_query(327-369)score_point(371-428)encoded_vector(104-104)encoded_vector(109-112)quantized_vector_size(507-509)encode_internal_vector(511-534)lib/common/memory/src/mmap_ops.rs (3)
v(94-94)v(142-142)transmute_from_u8_to_slice(101-119)lib/quantization/src/encoded_vectors.rs (4)
encode_query(36-36)score_point(38-43)quantized_vector_size(47-47)encode_internal_vector(51-51)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: test-consistency
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (5)
lib/quantization/src/encoded_vectors_binary.rs (5)
38-50: LGTM! Clean enum design.The
QueryEncodingenum provides a clear API for specifying query encoding strategies with appropriate default and helper methods.
52-56: LGTM! Well-structured query representation.The
EncodedQueryBQenum effectively encapsulates different query encoding types while maintaining type safety with the generic parameter.
62-100: Excellent documentation with clear technical explanation.The comprehensive documentation explains the transposed scalar encoding optimization with concrete examples and performance justification. This addresses the previous request for ASCII diagrams and structural descriptions.
686-730: LGTM! Robust metric calculation with proper normalization.The metric calculation correctly handles both binary and scalar query encodings with appropriate normalization for scalar results.
853-867: LGTM! Clean query type dispatch.The match statement cleanly dispatches scoring based on query encoding type with appropriate bit count parameters.
Thanks! I chose option 2, done! |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
lib/quantization/tests/integration/test_binary_encodings.rs (1)
164-164: Excellent use of EnumIter pattern.This addresses the previous review comment about using
strum::EnumIterinstead of hard-coding values. The implementation correctly usesQueryEncoding::iter()to systematically test all encoding variants.
🧹 Nitpick comments (2)
lib/quantization/src/encoded_vectors_binary.rs (2)
134-139: Add parameter documentation for xor_popcnt_scalar.While the method has good documentation explaining the algorithm, it would benefit from parameter documentation similar to the existing
xor_popcntmethod./// Calculate score between BQ encoded vector and `EncodedScalarVector<Self>` query. /// +/// # Parameters +/// * `vector` - BQ encoded vector data +/// * `query` - Scalar encoded query in transposed format +/// * `query_bits_count` - Number of bits per scalar value (4 or 8) +/// /// It calculates sum of XOR popcount between each bit of the `vector` and the corresponding scalar value in the `query`.
644-644: Consider documenting the choice of f32::max over manual comparison.Based on past review comments, there's inconsistency in using
f32::maxvs manual comparison. The current choice handles NaN values but may be slower.- let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max); + // Using f32::max to handle potential NaN values in query data + let max_abs_value = query.iter().map(|x| x.abs()).fold(0.0, f32::max);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
lib/quantization/Cargo.toml(1 hunks)lib/quantization/build.rs(1 hunks)lib/quantization/src/encoded_vectors_binary.rs(17 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- lib/quantization/Cargo.toml
- lib/quantization/build.rs
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
lib/quantization/tests/integration/test_binary_encodings.rs (4)
Learnt from: timvisee
PR: qdrant/qdrant#6546
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:497-518
Timestamp: 2025-05-19T09:25:54.285Z
Learning: In the Qdrant project, using `.expect()` and `.unwrap()` is considered acceptable in test code since test failures should be loud and obvious, and proper error propagation adds unnecessary complexity to test code.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
lib/quantization/src/encoded_vectors_binary.rs (14)
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:122-127
Timestamp: 2025-07-02T16:42:22.247Z
Learning: In lib/quantization/cpp/sse.c asymmetric binary quantization functions, the argument order in _mm_set_epi32 calls is intentionally designed to match query pointer offsets with corresponding bit shift factors. The "numbers" refer to logical indices that correspond to weighting factors (1, 2, 4, 8, etc. representing bit shifts), not SIMD register positions.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/src/encoded_vectors_binary.rs:810-810
Timestamp: 2025-07-02T17:10:13.847Z
Learning: In the Qdrant quantization codebase, hardware counters (hw_counter.cpu_counter()) are used to measure vector data access from storage, not computational cost. With asymmetric binary quantization where query vectors can be longer than storage vectors, the counter should still track vector_data.len() to maintain consistent measurement of storage access patterns, not query processing overhead.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6728
File: lib/quantization/cpp/sse.c:428-431
Timestamp: 2025-07-02T17:08:10.839Z
Learning: In lib/quantization/cpp/ SIMD functions, IvanPleshkov prefers to avoid memcpy in favor of direct pointer casts for type punning, prioritizing potential compiler optimization over strict aliasing rule compliance in performance-critical quantization code.
Learnt from: coszio
PR: qdrant/qdrant#6609
File: lib/gridstore/src/blob.rs:46-59
Timestamp: 2025-06-02T18:10:47.203Z
Learning: In the Qdrant codebase, zerocopy crate is extensively used for safe byte-level operations across GPU operations, HNSW indices, memory-mapped structures, and serialization. When implementing Blob trait for generic Vec<T>, using zerocopy's FromBytes and IntoBytes traits is preferred over size_of::<T>() because it guarantees memory layout equals byte representation, making serialization safe and correct.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/view.rs:118-118
Timestamp: 2025-05-19T14:40:20.068Z
Learning: In the bitpacking crate, the BitPacker::decompress_strictly_sorted function takes an Option<PointOffsetType> as its first parameter, which means using checked_sub(1) without unwrapping is intentional and correct.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5951
File: lib/segment/src/index/field_index/map_index/mmap_map_index.rs:225-242
Timestamp: 2025-03-12T09:33:16.728Z
Learning: In Rust, `std::mem::size_of_val(slice)` correctly calculates the total size of a slice (equivalent to `slice.len() * std::mem::size_of::<T>()`). Using manual calculation with `slice.len() * std::mem::size_of::<T>()` should be avoided as it triggers the Clippy lint "manual_slice_size_calculation".
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: coszio
PR: qdrant/qdrant#6528
File: lib/posting_list/src/tests.rs:44-47
Timestamp: 2025-05-15T22:54:30.292Z
Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.
Learnt from: coszio
PR: qdrant/qdrant#6768
File: lib/collection/src/collection/mmr.rs:90-95
Timestamp: 2025-07-01T17:12:27.967Z
Learning: In the Qdrant codebase, the user coszio mistakenly believed that triple question mark `???` syntax was valid Rust, when it is actually invalid syntax. The codebase uses standard Rust error propagation with `?` and `??` operators only.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: JojiiOfficial
PR: qdrant/qdrant#5954
File: lib/segment/src/index/field_index/bool_index/simple_bool_index.rs:219-228
Timestamp: 2025-03-12T13:40:56.980Z
Learning: In Rust 1.80.0 and later, `std::mem::size_of` is included in the prelude and doesn't need to be explicitly imported.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-consistency
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: lint
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: storage-compat-test
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
🔇 Additional comments (10)
lib/quantization/tests/integration/test_binary_encodings.rs (2)
7-11: LGTM! Good use of QueryEncoding imports and strum for enum iteration.The addition of
QueryEncodingto the imports and usingstrum::IntoEnumIteratoris clean and follows good practices for working with enums systematically.
80-85: AllEncodedVectorsBin::encodecalls include theQueryEncodingparameter
Verified that every invocation across benches, segment code, and integration tests now passes aQueryEncodingvariant—SameAsStoragefor backward‐compatible paths and specific variants (e.g.Scalar4bits) where intended. No further changes needed.lib/quantization/src/encoded_vectors_binary.rs (8)
25-31: LGTM! Clean enum definition with derive macros.The
Encodingenum properly usesEnumIterderive and has good default and serialization support.
39-51: Well-designed QueryEncoding enum.The enum is properly structured with appropriate derives and a sensible default. The helper method
is_same_as_storage()provides a clean API for checking the default behavior.
63-101: Excellent documentation for EncodedScalarVector.The comprehensive documentation clearly explains the transposition optimization strategy, provides concrete examples, and references the academic source. This level of detail is crucial for understanding the complex bit manipulation involved.
588-605: LGTM! Clean query encoding dispatch logic.The method properly handles the different query encoding types and delegates to appropriate encoding methods. The structure is clear and maintainable.
715-720: Verify scalar quantization normalization factor.The normalization by
((1 << query_bits_count) - 1)ensures consistent scaling between binary and scalar encodings. This looks correct for converting the weighted sum to a comparable scale.
757-765: Suggest using zerocopy for byte transmutation.Based on past review comments, the codebase prefers
zerocopyorbytemuckfor safe byte-level operations. The current usage ofbytemuck::cast_sliceis appropriate and safe.
862-876: LGTM! Clean query type dispatch in scoring.The match statement properly handles different query encoding types and passes the correct bit count to the calculation method. The implementation maintains consistency across encoding variants.
200-252: SIMD parameter order and fallback logic verifiedThe SIMD calls use the correct argument order and feature checks, and the fallback scalar loop’s
(b1 ^ b2).count_ones() << ibit-shifting matches the SIMD implementations. No changes needed.
* # This is a combination of 7 commits. * SameAsStorage default value * fix coderabbit warnings * neon for u8 bq * sse for u8 bq * fix windows build * rename function * add comments * fmt * fix arm build * review remarks
This PR introduces an internal implementation for an upcoming feature: asymmetric binary quantization.
API PR is here: #6201
Motivation
Asymmetric quantization is a quantization where vector and query have different quantization algorithms. In this PR we introduce only one asymmetric option: BQ with query encoded as SQ.
Such methods allow for increased accuracy because a more accurate query produces a more accurate score. But scoring will be more complicated, as not each use case is suitable for asymmetric mechanisms.
An example of BQ quantization with SQ query is a RaBitQ quantization. We want to have something similar.
We don't use RaBitQ alg, encoding is different, and only scoring is the same. We have our vision instead. As a pros, we don't need to store additional precomputed constants for each vector, and our approach is compatible with
score_internalmethod. As a cons, we require negative numbers in vectors to make it work like in a regular BQ.We don't use asymmetric BQ in HNSW construction because experiments have shown that HNSW accuracy is poor. No GPU integration is required.
Results on test laion 400M: