Add more interesting tasks for top-k hits on boolean queries.#240
Merged
mikemccand merged 4 commits intomikemccand:masterfrom Nov 1, 2023
Merged
Add more interesting tasks for top-k hits on boolean queries.#240mikemccand merged 4 commits intomikemccand:masterfrom
mikemccand merged 4 commits intomikemccand:masterfrom
Conversation
Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/) highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with `OrHighMed` because there is one clause whose scores dominates, and with `OrHighHigh` because there are so many documents that have both terms that `MAXSCORE` quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway). This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic `OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries.
mikemccand
approved these changes
Nov 1, 2023
Owner
mikemccand
left a comment
There was a problem hiding this comment.
Looks great, thanks @jpountz! We should also update the writeIndexHTML but I can take that after we start getting some data for this.
I'll merge and let's see if tonite's run (starts in ~ 1 hour) can pick these up! Oh nevermind -- beast3 is still regolding -- hopefully some time tomorrow we'll see first light for these tasks.
Owner
|
Oh yeah you already fixed the |
nitirajrathore
pushed a commit
to nitirajrathore/luceneutil
that referenced
this pull request
Nov 23, 2023
…cand#240) * Add more interesting tasks for top-k hits on boolean queries. Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/) highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with `OrHighMed` because there is one clause whose scores dominates, and with `OrHighHigh` because there are so many documents that have both terms that `MAXSCORE` quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway). This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic `OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries. * Fix * iter * iter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Looking into the Tantivy benchmark highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with
OrHighMedbecause there is one clause whose scores dominates, and withOrHighHighbecause there are so many documents that have both terms thatMAXSCOREquickly realizes that the clause with the maximum score is required for a hit to be competitive anyway).This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic
OrHighHigh,OrHighMed,AndHighHighandAndHighMedqueries.