Conversation
44e87a8 to
9ef1ca7
Compare
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/quantization/src/encoded_vectors_binary.rs (2)
287-303: LGTM: AVX-512 path gating and early return are correct.The feature checks match the intrinsics used in the AVX-512 path, and placing this check before the SSE fallback is appropriate.
Optional: since AVX2 implies AVX on all practical CPUs and SSE2 is baseline on x86_64, you can drop those two runtime predicates to shorten the branch. If you adopt the AVX2-only lane-reduction below, you can also drop SSE4.1 from here.
1043-1093: Simplify lane reduction to AVX2-only; drop SSE4.1 dependency.Use AVX2 lane extracts to sum the four 64-bit counters, avoiding the SSE4.1-only _mm_extract_epi64 after 128-bit extract. This lets you remove the sse4.1 target_feature and runtime check without changing behavior. Intrinsics availability: _mm_popcnt_epi64 requires avx512vpopcntdq+avx512vl; _mm256_extract_epi64 requires avx2. (doc.rust-lang.org)
Apply these diffs:
- Replace the reduction (sum lanes) with AVX2 extracts
- // Extract high and low 128-bit lanes - let low = _mm256_castsi256_si128(sum); // Elements 0,1 - let high = _mm256_extracti128_si256(sum, 1); // Elements 2,3 - let sum128 = _mm_add_epi64(low, high); - - // Extract and add the two 64-bit elements - let low64 = _mm_extract_epi64(sum128, 0); - let high64 = _mm_extract_epi64(sum128, 1); - - (low64 + high64) as u32 + // AVX2-only horizontal sum via lane extracts + let s0 = _mm256_extract_epi64(sum, 0); + let s1 = _mm256_extract_epi64(sum, 1); + let s2 = _mm256_extract_epi64(sum, 2); + let s3 = _mm256_extract_epi64(sum, 3); + (s0 + s1 + s2 + s3) as u32
- Drop the SSE4.1 target feature on the function (no longer needed)
#[target_feature(enable = "avx512vl")] #[target_feature(enable = "avx512vpopcntdq")] #[target_feature(enable = "avx2")] #[target_feature(enable = "avx")] -#[target_feature(enable = "sse4.1")] #[target_feature(enable = "sse2")]
- Optionally streamline runtime gating accordingly (now AVX, SSE4.1, SSE2 checks are unnecessary)
- if is_x86_feature_detected!("avx512vl") - && is_x86_feature_detected!("avx512vpopcntdq") - && is_x86_feature_detected!("avx2") - && is_x86_feature_detected!("avx") - && is_x86_feature_detected!("sse4.1") - && is_x86_feature_detected!("sse2") + if is_x86_feature_detected!("avx512vl") + && is_x86_feature_detected!("avx512vpopcntdq") + && is_x86_feature_detected!("avx2") {Net result: same correctness, fewer required CPU features and a slightly shorter hot-path branch.
Also applies to: 1083-1092, 1049-1051
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/quantization/src/encoded_vectors_binary.rs(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/quantization/src/encoded_vectors_binary.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/quantization/src/encoded_vectors_binary.rs
🧠 Learnings (2)
📚 Learning: 2025-08-11T07:57:01.399Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6986
File: lib/segment/src/vector_storage/quantized/quantized_vectors.rs:81-84
Timestamp: 2025-08-11T07:57:01.399Z
Learning: In `lib/segment/src/vector_storage/quantized/quantized_vectors.rs`, the BitsStoreType parameter difference between single-vector and multi-vector Binary quantization is intentional: single-vector storage uses `EncodedVectorsBin<u128, ...>` to enable 128-bit SIMD/popcount optimizations for longer vectors, while multi-vector storage uses `EncodedVectorsBin<u8, ...>` because multivectors are typically shorter and benefit from byte-granular storage.
Applied to files:
lib/quantization/src/encoded_vectors_binary.rs
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.
Applied to files:
lib/quantization/src/encoded_vectors_binary.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: test-consistency
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
* avx512 for binary quantization fix build * change checks order
This PR adds AVX512 support for BQ.
It speeds up BQ scoring because of
_mm256_popcnt_epi64, which calculates POPCNT for 4 values.Criterion benches:
dbpedia 1M,
qdrant-bq-rps-m-64-ef-256without rescoringTest docker image: https://github.com/qdrant/qdrant/actions/runs/16938730067