Speed up conjunctive queries that need scores.#14690
Closed
jpountz wants to merge 6 commits intoapache:mainfrom
Closed
Speed up conjunctive queries that need scores.#14690jpountz wants to merge 6 commits intoapache:mainfrom
jpountz wants to merge 6 commits intoapache:mainfrom
Conversation
This change helps speed up exhaustive evaluation of term queries, ie. calling `DocIdSetIterator#nextDoc()` then `Scorer#score()` in a loop. It helps in two ways: - Iteration of matching doc IDs gets a bit more efficient, especially in the case when a block of postings is encoded as a bit set. - Computation of scores now gets (auto-)vectorized. While this change doesn't help much when dynamic pruning kicks in, I'm hopeful that we can improve this in the future.
Calls to `DocIdSetIterator#nextDoc`, `DocIdSetIterator#advance` and `SimScorer#score` are currently interleaved and include lots of conditionals. This builds up on apache#14679 and refactors the code a bit to make it eligible to auto-vectorization and better pipelining. This effectively speeds up conjunctive queries (e.g. `AndHighHigh`) but also disjunctive queries that run as conjunctive queries in practice (e.g. `OrHighHigh`).
Contributor
Author
|
I'm superseding this change with a more general one for now, which doesn't introduce new public APIs: #14701. We can look into taking ideas from this PR as follow-ups. |
Contributor
|
I create a new PR #14968 about this one, currently the core logic is nearly identical to this PR, but I'm planning to dig more about this approach, hope you don't mind. : ) |
Contributor
Author
|
I don't mind at all. :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Calls to
DocIdSetIterator#nextDoc,DocIdSetIterator#advanceandSimScorer#scoreare currently interleaved and include lots of conditionals.This builds up on #14679 and refactors the code a bit to make it eligible to
auto-vectorization and better pipelining.
This effectively speeds up conjunctive queries (e.g.
AndHighHigh) but alsodisjunctive queries that run as conjunctive queries in practice (e.g.
OrHighHigh).Note that this builds on #14679, only the last commit touches conjunctive queries. I will clean up this PR when #14679 is merged but wanted to show the benefits for conjunctive queries as well. Note that unlike #14679 this change helps when dynamic pruning kicks in.
In the below luceneutil run on wikibigall, the baseline is ##14679 and the modified version is this PR: