Skip to content

Add more interesting tasks for top-k hits on boolean queries.#240

Merged
mikemccand merged 4 commits intomikemccand:masterfrom
jpountz:more_disjunction_tasks
Nov 1, 2023
Merged

Add more interesting tasks for top-k hits on boolean queries.#240
mikemccand merged 4 commits intomikemccand:masterfrom
jpountz:more_disjunction_tasks

Conversation

@jpountz
Copy link
Copy Markdown
Collaborator

@jpountz jpountz commented Oct 30, 2023

Looking into the Tantivy benchmark highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with OrHighMed because there is one clause whose scores dominates, and with OrHighHigh because there are so many documents that have both terms that MAXSCORE quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway).

This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic OrHighHigh, OrHighMed, AndHighHigh and AndHighMed queries.

Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/)
highlighted that there are interesting cases that are not covered by our
nightly benchmarks. For instance, MAXSCORE splits clauses into essential and
non-essential clauses, but our nightly benchmarks only really benchmark the
case when there is a single essential clause (with `OrHighMed` because there is
one clause whose scores dominates, and with `OrHighHigh` because there are so
many documents that have both terms that `MAXSCORE` quickly realizes that the
clause with the maximum score is required for a hit to be competitive anyway).

This adds a few more categories whose performance would be interesting to
track. I'm expecting the performance of these queries to often react very
differently to changes in query evaluation compared to our historic
`OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries.
Copy link
Copy Markdown
Owner

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @jpountz! We should also update the writeIndexHTML but I can take that after we start getting some data for this.

I'll merge and let's see if tonite's run (starts in ~ 1 hour) can pick these up! Oh nevermind -- beast3 is still regolding -- hopefully some time tomorrow we'll see first light for these tasks.

@mikemccand
Copy link
Copy Markdown
Owner

Oh yeah you already fixed the writeIndexHTML! Thanks :)

@mikemccand mikemccand merged commit ed5a7ac into mikemccand:master Nov 1, 2023
nitirajrathore pushed a commit to nitirajrathore/luceneutil that referenced this pull request Nov 23, 2023
…cand#240)

* Add more interesting tasks for top-k hits on boolean queries.

Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/)
highlighted that there are interesting cases that are not covered by our
nightly benchmarks. For instance, MAXSCORE splits clauses into essential and
non-essential clauses, but our nightly benchmarks only really benchmark the
case when there is a single essential clause (with `OrHighMed` because there is
one clause whose scores dominates, and with `OrHighHigh` because there are so
many documents that have both terms that `MAXSCORE` quickly realizes that the
clause with the maximum score is required for a hit to be competitive anyway).

This adds a few more categories whose performance would be interesting to
track. I'm expecting the performance of these queries to often react very
differently to changes in query evaluation compared to our historic
`OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries.

* Fix

* iter

* iter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants