Skip to content

Optimize sparse search by precomputing once the query info per posting#3327

Merged
agourlay merged 1 commit intodevfrom
optimize-sparse-search-postings-query-info
Jan 8, 2024
Merged

Optimize sparse search by precomputing once the query info per posting#3327
agourlay merged 1 commit intodevfrom
optimize-sparse-search-postings-query-info

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented Jan 5, 2024

This PR is a low hanging fruit I noticed while working on a larger optimization.

We currently fetch the query weight corresponding to a posting list iterator every time we score a point.
This small cost adds up and actually shows in profiling.

The proposed fix is to precompute for each posting list their corresponding query information.

The result is a consistent 3% gain on the inverted index search:

sparse-vector-search-group/inverted-index-search
                        time:   [30.959 ms 31.025 ms 31.098 ms]
                        change: [-4.5531% -3.8959% -3.3699%] (p = 0.00 < 0.05)
                        Performance has improved.

sparse-vector-search-group/inverted-index-filtered-payload-index
                        time:   [34.407 ms 34.544 ms 34.693 ms]
                        change: [-3.3703% -2.7161% -2.1013%] (p = 0.00 < 0.05)
                        Performance has improved.

And 8% on the plain search:

sparse-vector-search-group/inverted-index-filtered-plain
                        time:   [268.72 ms 269.30 ms 269.89 ms]
                        change: [-8.7347% -8.4290% -8.1481%] (p = 0.00 < 0.05)
                        Performance has improved.

sparse-vector-search-group/plain-filtered-payload-index
                        time:   [260.24 ms 260.75 ms 261.30 ms]
                        change: [-8.0965% -7.8180% -7.5645%] (p = 0.00 < 0.05)
                        Performance has improved.

// accumulate score for the current record id
if element.record_id == min_record_id {
score +=
element.weight * self.query.values[posting_iterator.query_weight_offset];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the hottest part

@agourlay agourlay merged commit c49aa37 into dev Jan 8, 2024
@agourlay agourlay deleted the optimize-sparse-search-postings-query-info branch January 8, 2024 09:27
generall pushed a commit that referenced this pull request Mar 5, 2024
HighBestCoder pushed a commit to HighBestCoder/qdrant that referenced this pull request Dec 14, 2025
…Lists (qdrant#3327)

Summary:
Pull Request resolved: facebookresearch/faiss#3327

**Context**
1. [Issue 2621](facebookresearch/faiss#2621) discuss inconsistency between OnDiskInvertedList and InvertedList. OnDiskInvertedList is supposed to handle disk based multiple Index Shards. Thus, we should name it differently when merging invls from index shard.
2. [Issue 2876](facebookresearch/faiss#2876) provides usecase of shifting ids when merging invls from different shards.

**In this diff**,
1. To address qdrant#1 above, I renamed the merge_from function to merge_from_multiple without touching merge_from base class.
why so? To continue to allow merge invl from one index to ondiskinvl from other index.

2. To address qdrant#2 above, I have added support of shift_ids in merge_from_multiple to shift ids from different shards. This can be used when each shard has same set of ids but different data. This is not recommended if id is already unique across shards.

Reviewed By: mdouze

Differential Revision: D55482518

fbshipit-source-id: 95470c7449160488d2b45b024d134cbc037a2083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants