Don't encode quantization query while hnsw build#6729
Merged
IvanPleshkov merged 2 commits intodevfrom Jun 23, 2025
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, we do quantization encoding each time we create
RawScorer. In the case of HNSW construction, it is not required because we already have a quantized storage. We want to reuse already quantized data in HNSW construction and avoid unnecessary access to original vector data, which can be on disk.This PR fixes this behaviour and uses already constructed quantized storage in raw scorer creation.
To achieve this, this PR introduces a new method in quantization storage:
fn encode_internal_vector(&self, id: u32) -> Option<Self::EncodedQuery>;. It takes a point id and returns an encoded query, which is used in the query scorer. It's optional because not every case can be done that way. In PQ we still want to encode the original vector because using the encoded one will produce accuracy loss in LUT. Everything is fine in the case of SQ and BQ?We don't call
score_internaldirectly, instead this idea of a query getter because quantized data can be stored on disk and we want to have query vector always in RAM.Quantized vector scorer has a new construction method which returns scorer or fall bach the ownership of hardware counter. Also a new constructor has a Filtered Scorer.