Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis change introduces support for multiple binary quantization encoding types throughout the codebase, documentation, and API schemas. A new enum with variants for Possibly related PRs
Suggested labels
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)
100-104:⚠️ Potential issueCasting the sampled
f32tou8collapses the distribution to{0,1}
x as u8saturates negatives to0and maps the(0.0,1.0]range to0‒1, so the generated values are almost exclusively0or1.
That defeats the purpose of providing a balanced[-1.0, 1.0]input for BQ tests and can mask real-world errors.- Box::new( - rng.sample_iter(rand::distr::Uniform::new_inclusive(-1.0, 1.0).unwrap()) - .map(|x| f32::from(x as u8)), - ) + Box::new( + rng.sample_iter(rand::distr::Uniform::new_inclusive(-1.0, 1.0).unwrap()) + )lib/quantization/benches/binary.rs (1)
22-43:⚠️ Potential issue
VectorParameters.countno longer matches the vector set – potential UB in benchmarksYou create 100 K vectors, then push another 100 K, but keep
countat 100 K.
encode()relies on that field for bounds checks and may panic or silently process the wrong slice.- let vectors_count = 100_000; + let vectors_count = 200_000; // or remove the second push @@ - &VectorParameters { - dim: vector_dim, - count: vectors_count, + &VectorParameters { + dim: vector_dim, + count: vectors.len() as u32,Update the same block for the
u8run below.
♻️ Duplicate comments (1)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
939-950: Same mapping block repeated – DRY violationThis second
matchblock is identical to the one above. Once the conversion helper suggested earlier is in place, replace this entire block with the single-line conversion to avoid errors of omission when variants are extended.
🧹 Nitpick comments (17)
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (1)
95-99: Consider exercising the new encodings in this test suite
encoding: Nonepreserves the previous 1-bit default, but the PR addsTwoBitsandOneAndHalfBits. Extending the test matrix to cover those variants would protect the new code paths and prevent silent regressions.lib/segment/tests/integration/multivector_quantization_test.rs (1)
268-270: Parameterise the test over the new encoding enumHard-coding
encoding: Nonemeans the multivector pipeline is never validated forTwoBitsorOneAndHalfBits. Adding a#[values(None, Some(OneBit), Some(TwoBits), …)]style matrix (or looping inside the test) will give end-to-end coverage at negligible runtime cost.lib/api/src/grpc/proto/collections.proto (1)
298-302: IntroduceBinaryQuantizationEncodingenum
The new enum defines all supported binary quantization schemes. Consider adding inline comments for each variant to improve readability of the proto.docs/grpc/docs.md (2)
398-398: Approve encoding field addition
Theencodingfield forBinaryQuantizationis documented with the correct type and anchor. Optionally, you may specify the default encoding value to aid users.
1835-1845: Approve BinaryQuantizationEncoding enum documentation
The new enum section aligns with the existing format. All anchors and markdown syntax are consistent. Consider adding brief descriptions for each enum member to enhance clarity.lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (2)
147-176: Consider adding a “legacy / default‐encoding” caseAll cases in this block pass an explicit
BinaryQuantizationEncodingvalue.
A quick sanity-check thatencoding: Nonestill behaves the same as the original one-bit path is missing. Adding a single case (e.g.case::cosine_f32_default_encoding) that passesNonewould guard against silent regressions in fallback handling.This could look like:
#[case::cosine_f32_default_encoding( Distance::Cosine, TestStorageType::Dense(TestElementType::Float32), 273, 2057, - BinaryQuantizationEncoding::OneBit + /* encoding = */ { /* special marker – see below */ } )]You would then switch the parameter type of the test to
Option<BinaryQuantizationEncoding>and feed it into the config unchanged:encoding: Option<BinaryQuantizationEncoding> … encoding, // instead of Some(encoding)This keeps coverage high while exercising the “no encoding specified” path.
223-228: Unnecessaryclone()ofquantization_config
quantization_configis moved only once intotest_gpu_vector_storage_impl; cloning here allocates needlessly in every test run.- test_gpu_vector_storage_impl( + test_gpu_vector_storage_impl( … - Some(quantization_config.clone()), + Some(quantization_config),(A similar clone appears in the SQ/PQ helpers and can be removed in a follow-up sweep.)
lib/segment/tests/integration/byte_storage_quantization_test.rs (1)
293-297: Leverage the struct’sDefaultto keep the diff minimalNow that
encodingis anOption, the config can be expressed more concisely and keep future additions painless:- QuantizationVariant::Binary => BinaryQuantizationConfig { - always_ram: None, - encoding: None, - } + QuantizationVariant::Binary => BinaryQuantizationConfig { + encoding: None, + ..Default::default() + }This avoids repeating fields once defaults evolve.
docs/redoc/master/openapi.json (1)
7032-7039: EnhanceBinaryQuantizationEncodingschema with documentation.Adding a
description(and optionally adefault) to the enum schema will improve generated API docs and clarify intended defaults:"BinaryQuantizationEncoding": { "type": "string", "enum": [ "one_bit", "two_bits", "one_and_half_bits" ], + "description": "Binary quantization encoding options. Defaults to `one_bit` when unset.", + "default": "one_bit" }lib/segment/src/types.rs (1)
663-677: Consider derivingCopy(and optionallyEnumIter) for the new enum
BinaryQuantizationEncodingis a small, C-like enum with no payload.
DerivingCopylets the compiler move values by bit-copy instead of requiring an explicit clone and reduces boiler-plate at call-sites. Everywhere else in the file we deriveCopyfor similar enums (Distance,Order, …) so this would keep the API consistent.Optional quality-of-life: if the variants are ever iterated over (e.g. in tests or CLI validation) adding
EnumIter(already inCargo.toml) avoids handwritten arrays.-#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, PartialEq, Eq, Hash, Default)] +#[derive(Debug, Deserialize, Serialize, JsonSchema, Clone, Copy, PartialEq, Eq, Hash, Default, EnumIter)]No behaviour changes – only compile-time conveniences.
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
879-890: DuplicateBinaryQuantizationEncoding → Encodingmapping – extract a helper orFromimplThe same
matchblock translatingBinaryQuantizationEncodingtoquantization::encoded_vectors_binary::Encodingappears several times in this file (see also 939-950). Duplicating this logic risks future divergence when new enum variants are added.-let encoding = match binary_config.encoding { - Some(BinaryQuantizationEncoding::OneBit) => Encoding::OneBit, - Some(BinaryQuantizationEncoding::TwoBits) => Encoding::TwoBits, - Some(BinaryQuantizationEncoding::OneAndHalfBits) => Encoding::OneAndHalfBits, - None => Encoding::OneBit, -}; +let encoding = binary_config + .encoding + .unwrap_or(BinaryQuantizationEncoding::OneBit) + .into(); // requires `impl From<BinaryQuantizationEncoding> for Encoding`Implementing
From(orTryFromfor potential fall-backs) inencoded_vectors_binary.rsremoves the boilerplate here and elsewhere, improving maintainability.lib/quantization/tests/integration/test_binary_encodings.rs (1)
110-118: Test assertion may produce false negativesThe test assumes monotonically non-decreasing accuracy from
OneBit→1.5Bits→2Bits. For some random seeds that assumption can be violated by statistical noise even when encodings are correct, causing flaky tests.Consider comparing each encoding against a fixed threshold or running multiple seeds and averaging instead of asserting strict ordering.
lib/api/src/grpc/conversions.rs (1)
1056-1059: Consider providing more context in the error message.While the error handling is correct, consider including the invalid encoding value in the error message for better debugging.
- .map_err(|_| Status::invalid_argument("Unknown binary quantization encoding"))?; + .map_err(|_| Status::invalid_argument(format!("Unknown binary quantization encoding: {:?}", encoding)))?;lib/quantization/src/encoded_vectors_binary.rs (4)
269-271: Document the standard deviation fallback behavior.The fallback to 1.0 for near-zero standard deviations is reasonable but should be documented for clarity.
Add a comment explaining this behavior:
sds.iter_mut() - .for_each(|sd| *sd = if *sd < f32::EPSILON { 1.0 } else { sd.sqrt() }); + .for_each(|sd| { + // Use 1.0 as fallback for near-zero variance to avoid division by zero in normalization + *sd = if *sd < f32::EPSILON { 1.0 } else { sd.sqrt() } + });
330-333: Document the hardcoded normalization parameters.The z-score normalization uses hardcoded values that should be documented for clarity.
Consider extracting these as named constants or adding explanatory comments:
let mean = means[i]; let sd = standard_deviations[i]; - let ranges = 3; - let v_z = (v - mean) / sd; - let index = (v_z + 2.0) / (4.0 / ranges as f32); + // Map z-scores from [-2, 2] range to [0, 3] for 3 quantization levels + const RANGES: f32 = 3.0; + const Z_OFFSET: f32 = 2.0; // Maps z=-2 to index=0 + const Z_RANGE: f32 = 4.0; // Total z-score range covered + let v_z = (v - mean) / sd; + let index = (v_z + Z_OFFSET) / (Z_RANGE / RANGES);
345-371: Add documentation for the complex bit packing strategy.The one-and-a-half bits encoding uses a non-trivial bit packing strategy that needs clear documentation.
Add a comment explaining the bit layout:
fn encode_one_and_half_bits_vector( vector: &[f32], encoded_vector: &mut [TBitsStoreType], standard_deviations: &[f32], means: &[f32], ) { + // Bit packing strategy for 1.5 bits per dimension: + // - First n bits: one bit per dimension (indicates if value > first threshold) + // - Next n/2 bits: one bit per two dimensions (indicates if value > second threshold) + // This gives us 3 possible states per dimension using 1.5 bits on average let bits_count = u8::BITS as usize * std::mem::size_of::<TBitsStoreType>();
138-138: Use u8::BITS consistently for bit calculations.For consistency with line 181, consider using
u8::BITShere as well.- let bits_count = u8::BITS as usize * bytes_count; + let bits_count = u8::BITS as usize * bytes_count;Wait, this line is already using
u8::BITS. However, the logic above could be simplified:fn get_storage_size(size: usize) -> usize { - let bytes_count = if size > 128 { - std::mem::size_of::<u128>() - } else if size > 64 { - std::mem::size_of::<u64>() - } else if size > 32 { - std::mem::size_of::<u32>() - } else { - std::mem::size_of::<u8>() - }; - - let bits_count = u8::BITS as usize * bytes_count; + let (bytes_count, bits_count) = if size > 128 { + (std::mem::size_of::<u128>(), u128::BITS as usize) + } else if size > 64 { + (std::mem::size_of::<u64>(), u64::BITS as usize) + } else if size > 32 { + (std::mem::size_of::<u32>(), u32::BITS as usize) + } else { + (std::mem::size_of::<u8>(), u8::BITS as usize) + };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
docs/grpc/docs.md(3 hunks)docs/redoc/master/openapi.json(1 hunks)lib/api/src/grpc/conversions.rs(2 hunks)lib/api/src/grpc/proto/collections.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/quantization/benches/binary.rs(3 hunks)lib/quantization/src/encoded_vectors_binary.rs(7 hunks)lib/quantization/tests/integration/empty_storage.rs(1 hunks)lib/quantization/tests/integration/main.rs(1 hunks)lib/quantization/tests/integration/test_binary.rs(13 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(3 hunks)lib/segment/src/types.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(5 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(1 hunks)lib/segment/tests/integration/byte_storage_quantization_test.rs(1 hunks)lib/segment/tests/integration/multivector_quantization_test.rs(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
95-95: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: rust-gpu-tests (ubuntu-latest)
🔇 Additional comments (16)
lib/quantization/benches/binary.rs (1)
7-8: Import looks good
The explicitEncodingimport clarifies intent and keeps the benchmark compiling with the new API.lib/quantization/tests/integration/empty_storage.rs (1)
104-105: LGTM – explicit encoding keeps the test future-prooflib/quantization/tests/integration/main.rs (1)
12-14: Good addition of the new test moduleIncluding
test_binary_encodingsin the integration harness ensures the new variants are executed in CI.lib/quantization/tests/integration/test_binary.rs (2)
7-7: Added newEncodingimport
The import ofEncodingalongsideBitsStoreTypeandEncodedVectorsBinis correct and necessary to specify encoding schemes in the tests.
52-53: SpecifiedEncoding::OneBitfor all.encodecalls
Each call toEncodedVectorsBin::encodenow includes the newencodingparameter set toOneBit, preserving the original test behavior for one-bit quantization.Also applies to: 99-100, 146-147, 190-191, 233-234, 295-296, 357-358, 417-418, 476-477, 538-539, 599-600, 658-659
lib/api/src/grpc/proto/collections.proto (1)
306-307: Add optionalencodingtoBinaryQuantization
Theencodingfield is correctly declared as optional. Ensure that server‐ and client‐side code default to a sensible variant (e.g.,OneBit) when this field is omitted.docs/grpc/docs.md (1)
95-95: Approve Table of Contents update
The newBinaryQuantizationEncodingentry is correctly placed undercollections.protoand the anchor matches the section below.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
95-95: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
lib/segment/src/types.rs (1)
683-686: 👍 Back-compat preserved via#[serde(default)]The new optional
encodingfield is gated by#[serde(default)]andskip_serializing_if, so existing persisted configs deserialize cleanly and responses remain unchanged when the field is not set. Good attention to compatibility.lib/api/src/grpc/qdrant.rs (1)
433-435:Details
❌ Incorrect review comment
Consider using the enum type directly instead of
i32for stronger type–safety
prostsupports using the generated enum as the field type (Option<BinaryQuantizationEncoding>) instead of the rawi32. Storing it as ani32pushes the burden of validity checking onto every call-site and increases the chance of accidentally accepting an out-of-range value before it reaches the conversion layer.-#[prost(enumeration = "BinaryQuantizationEncoding", optional, tag = "2")] -pub encoding: ::core::option::Option<i32>, +#[prost(enumeration = "BinaryQuantizationEncoding", optional, tag = "2")] +pub encoding: ::core::option::Option<BinaryQuantizationEncoding>,This keeps the wire format identical (still var-int field #2) while giving compile-time guarantees inside Rust code.
🏁 Script executed:
#!/bin/bash # Display the first 30 lines of the file to check for generated code markers head -n 30 lib/api/src/grpc/qdrant.rsLength of output: 1467
Ignore manual edits in generated code – adjust the .proto instead
This file (
lib/api/src/grpc/qdrant.rs) is fully generated byprost-buildand will be overwritten on each build. You should not change the field’s Rust type here. If you want to exposeOption<BinaryQuantizationEncoding>instead ofOption<i32>, update the corresponding.protodefinition or prost-build configuration so that the generated code uses the enum type directly.Likely an incorrect or invalid review comment.
lib/quantization/tests/integration/test_binary_encodings.rs (1)
12-14:⚠️ Potential issueCompilation fails – use
gen_rangeinstead of non-existentrandom_range
rand::Rngprovidesgen_range, notrandom_range. The current code will not compile.-fn generate_number(rng: &mut rand::rngs::StdRng) -> f32 { - rng.random_range(-1.0..1.0) +fn generate_number(rng: &mut rand::rngs::StdRng) -> f32 { + rng.gen_range(-1.0..1.0) }⛔ Skipped due to learnings
Learnt from: coszio PR: qdrant/qdrant#6528 File: lib/posting_list/src/tests.rs:44-47 Timestamp: 2025-05-15T22:54:30.292Z Learning: The rand crate version 0.9.0 and higher changed method names from `gen_*` to `random_*` (e.g., `gen_range()` became `random_range()`). Code using rand 0.9.x should use the `random_*` method names, while code using rand 0.8.x and earlier should use the `gen_*` method names.Learnt from: coszio PR: qdrant/qdrant#6528 File: lib/posting_list/src/tests.rs:44-47 Timestamp: 2025-05-15T22:54:30.292Z Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This was part of an API redesign in rand 0.9.0 released in February 2024.Learnt from: coszio PR: qdrant/qdrant#6528 File: lib/posting_list/src/tests.rs:44-47 Timestamp: 2025-05-15T22:54:30.292Z Learning: The rand crate version 0.9.0 and newer uses method names with `random_*` prefix (like `random_range`), while versions 0.8.x and older use `gen_*` prefix (like `gen_range`). This naming change was introduced in rand 0.9.0-alpha.1.Learnt from: coszio PR: qdrant/qdrant#6446 File: lib/gridstore/benches/flush_bench.rs:18-18 Timestamp: 2025-04-29T16:48:34.967Z Learning: The Rust rand crate version 0.9+ has introduced a top-level `rng()` function that replaces the now-deprecated `thread_rng()` function.lib/api/src/grpc/conversions.rs (3)
1001-1015: LGTM! Clean bidirectional conversion implementation.The conversion from internal to gRPC encoding types is well-structured with explicit matching of all enum variants.
1017-1031: LGTM! Symmetric conversion implementation.The reverse conversion maintains consistency with the forward conversion.
1033-1046: LGTM! Properly handles optional encoding field.The implementation correctly extracts and converts the encoding field when present, maintaining backward compatibility with the optional field.
lib/quantization/src/encoded_vectors_binary.rs (3)
22-35: LGTM! Well-structured enum with proper serialization support.The
Encodingenum is properly designed with a sensible default and a helper method for conditional serialization.
41-52: LGTM! Excellent backward compatibility design.The metadata structure properly handles new fields with conditional serialization, ensuring backward compatibility when encoding is
OneBitand statistics are empty.
377-377: LGTM! Correct use of div_ceil for 1.5 bits encoding.The ceiling division correctly handles the fractional bits case for the one-and-a-half bits encoding.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
lib/quantization/src/encoded_vectors_binary.rs (5)
22-28: Add an explicit representation to avoid ABI / forward-compat pitfalls
serdeguarantees deterministic (de)serialisation of enums, but the underlying discriminant is not stabilised across compilers/platforms.
Declaring the enum as#[repr(u8)]locks the layout and makes FFI (and potential on-disk bincode / C-bindings) safer at virtually zero cost.-#[derive(Clone, Copy, Eq, PartialEq, Debug, Serialize, Deserialize, Default)] +#[repr(u8)] +#[derive(Clone, Copy, Eq, PartialEq, Debug, Serialize, Deserialize, Default)] pub enum Encoding {
201-209: Avoid unnecessary mean/σ computations forOneBitencodingMean and standard-deviation vectors are never used by
encode_one_bit_vector, yet they are computed and persisted for every collection.
That is O(N·dim) extra work and extra disk I/O.- let means = Self::means(orig_data.clone(), count); - let standard_deviations = Self::standard_deviations(orig_data.clone(), &means, count); + let (means, standard_deviations) = if encoding == Encoding::OneBit { + (Vec::new(), Vec::new()) + } else { + let m = Self::means(orig_data.clone(), count); + let sd = Self::standard_deviations(orig_data.clone(), &m, count); + (m, sd) + };You can then pass empty slices to
encode_vectorfor theOneBitpath.
Reduces encode time and shrinks metadata considerably.
234-250: Three passes over the dataset – consider streaming mean/variance
meansandstandard_deviationswalk the entire iterator twice after the initialcount()pass (total = 3 × N).
A single streaming Welford’s algorithm can compute both mean and variance in one pass, cutting memory traffic by ~66 %.Not critical for small collections, but noticeable on multi-GB datasets.
Also applies to: 262-272
319-333: Hard-codedranges = 3deserves a named constant / commentTwo-bit encoding currently splits the Z-score interval
[-2,2]into three buckets, mapping to 0/1/2 set bits.
Readers unfamiliar with the academic paper will wonder why “3” and why “±2”. A short comment or aconst BINS: usize = 3would clarify intent and ease future tuning.
180-187: Minor semantic inconsistency inget_storage_sizereturn unitsFor
u8the function returns bytes, foru128it returns elements (converted to bytes later).
The wrapper currently compensates, but aligning semantics (either always bytes or always elements) would make future refactors safer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/quantization/src/encoded_vectors_binary.rs(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (3)
lib/quantization/src/encoded_vectors_binary.rs (3)
43-52: Good use ofserde(default)/ conditional serialisationOld metadata files will seamlessly deserialise with the default
OneBitvariant, and theskip_serializing_ifkeeps JSON compact for that case.
Looks correct and future-proof.
307-316: Implicit assumption on input range – surface it
encode_one_bit_vectorrelies on all components being already normalised to[-1, 1].
If the caller forgets to scale, every positive value becomes1and every non-positive0, destroying distance ordering.Consider:
- documenting this loudly (
/// expects values in [-1,1])- or guarding with
debug_assert!(v.abs() <= 1.0 + f32::EPSILON);
347-373: Verify bit collision strategy in 1½-bit encoding
j = vector.len() + i / 2;means dimensions(0,1)share the same “second” bit, as do(2,3), etc.
This is presumably the intended 3-bits-per-2-dims packing, but it compresses information by OR-ing contributions from two dimensions into one bit position.Please double-check that the downstream distance metric was derived with this collision in mind; otherwise precision might degrade unexpectedly.
| let count = orig_data.clone().count(); | ||
| let means = Self::means(orig_data.clone(), count); | ||
| let standard_deviations = Self::standard_deviations(orig_data.clone(), &means, count); |
There was a problem hiding this comment.
We read vectors 3 times to compute stats, this might be expensive, especially if vectors are on disk.
Let's do streaming stat computing from the single read + ideally only if the method actually requires it.
There was a problem hiding this comment.
ChatGPT have some examples for online stats computation https://chatgpt.com/share/6850a226-2898-8002-8bb1-84dc61382c5c
There was a problem hiding this comment.
Done. Fun fact, CodeRabbit did the same suggestion in nitpicks:
234-250: Three passes over the dataset – consider streaming mean/variance
means and standard_deviations walk the entire iterator twice after the initial count() pass (total = 3 × N).
A single streaming Welford’s algorithm can compute both mean and variance in one pass, cutting memory traffic by ~66 %.
Not critical for small collections, but noticeable on multi-GB datasets.
| fn encode_vector(vector: &[f32]) -> EncodedBinVector<TBitsStoreType> { | ||
| let mut encoded_vector = | ||
| vec![Default::default(); TBitsStoreType::get_storage_size(vector.len())]; | ||
| fn means<'a>( |
There was a problem hiding this comment.
Let's do struct for streaming update of statistics and move it into a dedicated file.
There was a problem hiding this comment.
Done, struct is called VectorStats in a separate file.
| orig_data: impl Iterator<Item = impl AsRef<[f32]> + 'a> + Clone, | ||
| count: usize, | ||
| ) -> Vec<f32> { | ||
| let dimension = orig_data.clone().next().map_or(0, |v| v.as_ref().len()); |
There was a problem hiding this comment.
I would prefer to read it from vector parameters.
There was a problem hiding this comment.
Done, thanks! It avoids unnecessary reads
| } | ||
| } | ||
| sds.iter_mut() | ||
| .for_each(|sd| *sd = if *sd < f32::EPSILON { 1.0 } else { sd.sqrt() }); |
There was a problem hiding this comment.
if sd == 0, why result is 1.0?
There was a problem hiding this comment.
It was a zero-div check. I refactored this part. see fn encode_two_bits_value(value: f32, mean: f32, sd: f32). I did an explicit sd value check and fallback into 1-bit encoding if, for some reason, sd is zero
| let v_z = (v - mean) / sd; | ||
| let index = (v_z + 2.0) / (4.0 / ranges as f32); |
There was a problem hiding this comment.
this part requires a comment. Where 2.0 and 4.0 is coming from?
There was a problem hiding this comment.
I did a refactor, see fn encode_two_bits_value(value: f32, mean: f32, sd: f32). It has comments for each step and explanes all magic numbers
| let v_z = (v - mean) / sd; | ||
| let index = (v_z + 2.0) / (4.0 / ranges as f32); | ||
|
|
||
| if index >= 1.0 { | ||
| let count_ones = (index.floor() as usize).min(2); | ||
| if count_ones > 1 { | ||
| encoded_vector[i / bits_count] |= one << (i % bits_count); | ||
| } | ||
| if count_ones > 0 { | ||
| let j = vector.len() + i / 2; | ||
| encoded_vector[j / bits_count] |= one << (j % bits_count); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Requires commend with examples
There was a problem hiding this comment.
Done, I covered all logic by comments, see fn encode_two_bits_value(value: f32, mean: f32, sd: f32). For 1.5bit quantization I added a motivation and examples how do we unite two 2bit values into 3bits
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
459-465: Errors fromcreate_dir_allsilently ignored
parent().map(std::fs::create_dir_all);discards both theOptionand theResult, so failures (e.g. permission denied) propagate as “file not found” later, making debugging painful.-if let Some(dir) = meta_path.parent() { - std::fs::create_dir_all(dir)?; -} +if let Some(dir) = meta_path.parent() { + std::fs::create_dir_all(dir)?; +}Repeat for the data path and
vector_stats_path.
🧹 Nitpick comments (1)
lib/quantization/src/vector_stats.rs (1)
18-27: Usef32::INFINITY/NEG_INFINITYfor clearer intent
f32::MAX/f32::MINare technically correct, butINFINITY/NEG_INFINITYconvey intent more explicitly and avoid the cognitive overhead of remembering which of the two M-prefixed constants is the “large positive” one.- min: f32::MAX, - max: f32::MIN, + min: f32::INFINITY, + max: f32::NEG_INFINITY,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/quantization/src/encoded_vectors_binary.rs(11 hunks)lib/quantization/src/lib.rs(1 hunks)lib/quantization/src/vector_stats.rs(1 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- lib/quantization/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/quantization/tests/integration/test_binary_encodings.rs
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: test-consistency
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (3)
lib/quantization/src/vector_stats.rs (1)
30-43: Mismatch risk betweenvector_params.countand iterator lengthThe online update uses
count = i + 1, but the final variance divisor later relies onvector_params.count. If callers pass an incorrectcount, standard deviation will be off (or divide-by-zero if it is 0/1 while the iterator contained many vectors).Either:
- Assert the equality once the loop is finished.
- Pass the real counter down instead of relying on the config field.
// after the loop -debug_assert!(vector_params.count == (i + 1), "VectorParameters.count does not match actual data length"); +assert_eq!( +vector_params.count, +data_points_seen, +"VectorParameters.count={:?} but actually processed {:?} vectors", +vector_params.count, +data_points_seen +);lib/quantization/src/encoded_vectors_binary.rs (2)
382-389:usize::div_ceilrequires Rust 1.71 – confirm MSRV
div_ceilstabilized in 1.71. If the project’s MSRV is lower, this will break the build for downstream users. Consider a fallback helper:fn ceil_div(a: usize, b: usize) -> usize { (a + b - 1) / b }or bump MSRV explicitly.
298-337: Confirm packing logic for 1.5-bit encodingThe second bit is written to
j = dim + i/2, effectively OR-ing the high bits of pairs(0,1),(2,3)… This matches the doc-comment, but means the last vector element in an odd-dimensional vector never shares its bit (good) yet the storage size usesceil(dim*1.5), so there is still room.No action required, just flagging to double-check that retrieval / scoring code applies the identical mapping; otherwise similarity scores will drift for odd dimensions.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
472-474: Fix misuse ofstd::fs::exists– it returnsbool, notResult
std::fs::existsalready yields a boolean. Pattern-matching onOk(true)will not compile and re-introduces the compile error called out in a prior review.
Replace the check with the idiomatic.exists()call on thePath.- } else if let Ok(true) = std::fs::exists(&vector_stats_path) { - std::fs::remove_file(&vector_stats_path)?; + } else if vector_stats_path.exists() { + std::fs::remove_file(&vector_stats_path)?; }
🧹 Nitpick comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
202-208: Avoid double pass overorig_datato build stats
VectorStats::buildconsumes a full clone oforig_data, and the subsequent
encode loop consumes the original iterator – two complete reads.
Streaming the statistics while encoding (or re-using the same iterator with
peekable buffering) would cut I/O by half for large on-disk collections.Consider refactoring
VectorStats::buildinto an online updater that is fed
inside the main loop.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/quantization/src/encoded_vectors_binary.rs(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: lint
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
🔇 Additional comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
378-385: Confirm MSRV for.div_ceil()
usize::div_ceilis stable only from Rust 1.73.
If the project’s MSRV is lower, this call will fail to compile. Please verify
toolchain compatibility or add a polyfill.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
lib/quantization/src/encoded_vectors_binary.rs (1)
463-476: Critical: Fix std::fs::exists usageThe code on line 473 treats
std::fs::existsas returning aResult, but it returns a plainbool. This will cause a compilation error.Apply this fix:
- } else if let Ok(true) = std::fs::exists(&vector_stats_path) { + } else if vector_stats_path.exists() {
🧹 Nitpick comments (2)
docs/grpc/docs.md (2)
96-96: Validate TOC indentation
The new nested list item is indented by 4 spaces, which violates markdownlint MD007 (expected 2 spaces for nested lists).Apply this diff to normalize the indent:
- - [BinaryQuantizationEncoding](#qdrant-BinaryQuantizationEncoding) + - [BinaryQuantizationEncoding](#qdrant-BinaryQuantizationEncoding)
1853-1863: Enhance enum variant descriptions
TheBinaryQuantizationEncodingtable omits descriptions for each variant, reducing clarity. Consider adding concise descriptions for each entry.For example:
| Name | Number | Description | | ----------------- | ------ | ------------------------------- | | OneBit | 0 | Single-bit binary quantization | | TwoBits | 1 | Two-bit binary quantization | | OneAndHalfBits | 2 | One-and-a-half bit quantization |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
docs/grpc/docs.md(3 hunks)docs/redoc/master/openapi.json(1 hunks)lib/api/src/grpc/conversions.rs(2 hunks)lib/api/src/grpc/proto/collections.proto(1 hunks)lib/api/src/grpc/qdrant.rs(2 hunks)lib/quantization/benches/binary.rs(3 hunks)lib/quantization/src/encoded_vectors_binary.rs(11 hunks)lib/quantization/src/lib.rs(1 hunks)lib/quantization/src/vector_stats.rs(1 hunks)lib/quantization/tests/integration/empty_storage.rs(1 hunks)lib/quantization/tests/integration/main.rs(1 hunks)lib/quantization/tests/integration/test_binary.rs(13 hunks)lib/quantization/tests/integration/test_binary_encodings.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(3 hunks)lib/segment/src/types.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(5 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(1 hunks)lib/segment/tests/integration/byte_storage_quantization_test.rs(1 hunks)lib/segment/tests/integration/multivector_quantization_test.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (17)
- lib/quantization/src/lib.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/quantization/tests/integration/empty_storage.rs
- lib/quantization/tests/integration/main.rs
- lib/quantization/benches/binary.rs
- lib/segment/tests/integration/multivector_quantization_test.rs
- lib/quantization/tests/integration/test_binary.rs
- lib/segment/tests/integration/byte_storage_quantization_test.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/api/src/grpc/proto/collections.proto
- lib/api/src/grpc/qdrant.rs
- lib/quantization/tests/integration/test_binary_encodings.rs
- lib/segment/src/types.rs
- lib/api/src/grpc/conversions.rs
- lib/quantization/src/vector_stats.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
- docs/redoc/master/openapi.json
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
96-96: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: lint
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
🔇 Additional comments (14)
docs/grpc/docs.md (1)
398-400: Confirmencodingfield docs consistency
Theencodingline correctly references the new enum anchor and provides an appropriate description. Ensure it stays in sync with the protobuf definition.lib/quantization/src/encoded_vectors_binary.rs (13)
2-2: LGTM! Clean integration of new dependencies and data structures.The Path/PathBuf imports and VectorStats integration are well-placed to support the multi-bit encoding functionality.
Also applies to: 11-11, 20-20
24-36: Well-designed enum with appropriate defaults.The Encoding enum cleanly represents the different quantization schemes with sensible defaults and helper methods.
45-47: Excellent use of serde attributes for backward compatibility.The conditional serialization ensures the encoding field is only stored when it differs from the default OneBit encoding.
177-177: Good improvement using named constant over magic number.Using
u8::BITSis more self-documenting than the hardcoded8.
197-197: Smart conditional computation of vector statistics.Computing statistics only for multi-bit encodings avoids unnecessary overhead for OneBit encoding.
Also applies to: 202-208
231-252: Clean dispatch pattern for different encoding schemes.The match-based dispatch to specialized encoding functions promotes good code organization and extensibility.
266-293: Robust implementation with defensive fallback.The fallback to one-bit encoding when vector stats are unavailable prevents runtime errors while maintaining functionality.
295-333: Excellent documentation with clear examples.The detailed comments and examples make the complex 1.5-bit encoding logic easy to understand and verify.
335-376: Mathematically sound quantization with excellent documentation.The z-score based approach is well-implemented with clear explanations of the bit encoding scheme and proper handling of edge cases.
378-393: Correct size calculations for different encoding schemes.The dimension multipliers and rounding logic are appropriate for each encoding type, with proper use of
div_ceilfor fractional bit counts.
444-448: Sensible path construction for vector statistics file.Placing the vector stats file alongside the metadata file is logical and the Option return type handles path construction failures appropriately.
487-507: Well-structured conditional loading of vector statistics.The loading logic appropriately mirrors the save behavior and handles missing vector stats files correctly based on encoding type.
523-523: Critical consistency: Query encoding matches stored vector encoding.Using the stored encoding and vector stats ensures queries are encoded consistently with the indexed vectors.
* bq encodings * are you happy clippy * are you happy clippy * are you happy clippy * are you happy clippy * gpu tests * update models * are you happy fmt * move additional bits to the end * fix tests * Welford's Algorithm * review remarks * are you happy clippy * remove debug println in test * coderabit nitpicks * remove unnecessary clone and partialeq * Use f64 for Welford's Algorithm * try fix ci * revert cargo-nextest * add debug assertions
This PR introduces new BQ encoding algs: two-bits quantization and one-and-a-half-bits quantization.
Both algs use similal scoring but uses 2 or 1.5 bits per element in vector.
The idea of 2bit BQ is using one more bit to simulate zero value. 1.5bit BQ is a AND operatoion between two zero values.
This PR tested using vector-db-benchmark.
Script to change BQ to 2bit BQ:
Use
one_and_half_bitsinstead oftwo_bitsfor 1.5bit quantizaion.Results for dbpedia 100K with oversampling=1:
Results for laion-small-clip 100K with oversampling=1:
PR already supports GPU, GPU CI: https://github.com/qdrant/qdrant/actions/runs/15581177437