Skip to content

Don't encode quantization query while hnsw build#6729

Merged
IvanPleshkov merged 2 commits intodevfrom
dont-encode-quantization-query-while-hnsw-build
Jun 23, 2025
Merged

Don't encode quantization query while hnsw build#6729
IvanPleshkov merged 2 commits intodevfrom
dont-encode-quantization-query-while-hnsw-build

Conversation

@IvanPleshkov
Copy link
Contributor

@IvanPleshkov IvanPleshkov commented Jun 19, 2025

Currently, we do quantization encoding each time we create RawScorer. In the case of HNSW construction, it is not required because we already have a quantized storage. We want to reuse already quantized data in HNSW construction and avoid unnecessary access to original vector data, which can be on disk.

This PR fixes this behaviour and uses already constructed quantized storage in raw scorer creation.

To achieve this, this PR introduces a new method in quantization storage: fn encode_internal_vector(&self, id: u32) -> Option<Self::EncodedQuery>;. It takes a point id and returns an encoded query, which is used in the query scorer. It's optional because not every case can be done that way. In PQ we still want to encode the original vector because using the encoded one will produce accuracy loss in LUT. Everything is fine in the case of SQ and BQ?

We don't call score_internal directly, instead this idea of a query getter because quantized data can be stored on disk and we want to have query vector always in RAM.

Quantized vector scorer has a new construction method which returns scorer or fall bach the ownership of hardware counter. Also a new constructor has a Filtered Scorer.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants