Optimize sparse search by precomputing once the query info per posting by agourlay · Pull Request #3327 · qdrant/qdrant

agourlay · 2024-01-05T07:36:02Z

This PR is a low hanging fruit I noticed while working on a larger optimization.

We currently fetch the query weight corresponding to a posting list iterator every time we score a point.
This small cost adds up and actually shows in profiling.

The proposed fix is to precompute for each posting list their corresponding query information.

The result is a consistent 3% gain on the inverted index search:

sparse-vector-search-group/inverted-index-search
                        time:   [30.959 ms 31.025 ms 31.098 ms]
                        change: [-4.5531% -3.8959% -3.3699%] (p = 0.00 < 0.05)
                        Performance has improved.

sparse-vector-search-group/inverted-index-filtered-payload-index
                        time:   [34.407 ms 34.544 ms 34.693 ms]
                        change: [-3.3703% -2.7161% -2.1013%] (p = 0.00 < 0.05)
                        Performance has improved.

And 8% on the plain search:

sparse-vector-search-group/inverted-index-filtered-plain
                        time:   [268.72 ms 269.30 ms 269.89 ms]
                        change: [-8.7347% -8.4290% -8.1481%] (p = 0.00 < 0.05)
                        Performance has improved.

sparse-vector-search-group/plain-filtered-payload-index
                        time:   [260.24 ms 260.75 ms 261.30 ms]
                        change: [-8.0965% -7.8180% -7.5645%] (p = 0.00 < 0.05)
                        Performance has improved.

…g interator

agourlay · 2024-01-05T07:38:04Z

lib/sparse/src/index/search_context.rs

                // accumulate score for the current record id
                if element.record_id == min_record_id {
-                    score +=
-                        element.weight * self.query.values[posting_iterator.query_weight_offset];


this is the hottest part

…g interator (#3327)

…Lists (qdrant#3327) Summary: Pull Request resolved: facebookresearch/faiss#3327 **Context** 1. [Issue 2621](facebookresearch/faiss#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard. 2. [Issue 2876](facebookresearch/faiss#2876) provides usecase of shifting ids when merging invls from different shards. **In this diff**, 1. To address qdrant#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class. why so? To continue to allow merge invl from one index to ondiskinvl from other index. 2. To address qdrant#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards. Reviewed By: mdouze Differential Revision: D55482518 fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083

Optimize sparse search by precomputing once the query info per postin…

ce9cc30

…g interator

agourlay requested review from IvanPleshkov and generall January 5, 2024 07:36

agourlay commented Jan 5, 2024

View reviewed changes

IvanPleshkov approved these changes Jan 8, 2024

View reviewed changes

agourlay merged commit c49aa37 into dev Jan 8, 2024

agourlay deleted the optimize-sparse-search-postings-query-info branch January 8, 2024 09:27

generall pushed a commit that referenced this pull request Mar 5, 2024

Optimize sparse search by precomputing once the query info per postin…

da8eed2

…g interator (#3327)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sparse search by precomputing once the query info per posting#3327

Optimize sparse search by precomputing once the query info per posting#3327
agourlay merged 1 commit intodevfrom
optimize-sparse-search-postings-query-info

agourlay commented Jan 5, 2024 •

edited

Loading

Uh oh!

agourlay Jan 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agourlay commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agourlay Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agourlay commented Jan 5, 2024 •

edited

Loading