Use the bulk SimScorer#score API to compute impact scores.#15151
Merged
jpountz merged 7 commits intoapache:mainfrom Sep 8, 2025
Merged
Use the bulk SimScorer#score API to compute impact scores.#15151jpountz merged 7 commits intoapache:mainfrom
jpountz merged 7 commits intoapache:mainfrom
Conversation
In apache#15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched.
Contributor
|
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
Contributor
Author
|
wikibigall on my machine gives the following results: p-values are high due to quite high run-over-run variance, but queries that we'd have expected to get a speedup are at the bottom so it may give a tiny speedup in practice. |
gf2121
approved these changes
Sep 7, 2025
Contributor
gf2121
left a comment
There was a problem hiding this comment.
This looks a right direction to me though the improvement does not seems very significant. Thank you!
lucene/core/src/java/org/apache/lucene/search/SloppyPhraseMatcher.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/TestPhraseQuery.java
Outdated
Show resolved
Hide resolved
lucene/core/src/test/org/apache/lucene/search/TestSynonymQuery.java
Outdated
Show resolved
Hide resolved
…her.java Co-authored-by: Guo Feng <52390227+gf2121@users.noreply.github.com>
…java Co-authored-by: Guo Feng <52390227+gf2121@users.noreply.github.com>
….java Co-authored-by: Guo Feng <52390227+gf2121@users.noreply.github.com>
jpountz
added a commit
that referenced
this pull request
Sep 8, 2025
In #15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched. Co-authored-by: Guo Feng <52390227+gf2121@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In #15039 we introduced a bulk
SimScorer#scoreAPI and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce.To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the
SimScorer#scorebulk API. Unfortunately this makes the change quite big since many backward formats had to be touched.