Add more interesting tasks for top-k hits on boolean queries. by jpountz · Pull Request #240 · mikemccand/luceneutil

jpountz · 2023-10-30T09:43:14Z

Looking into the Tantivy benchmark highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with OrHighMed because there is one clause whose scores dominates, and with OrHighHigh because there are so many documents that have both terms that MAXSCORE quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway).

This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic OrHighHigh, OrHighMed, AndHighHigh and AndHighMed queries.

Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/) highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with `OrHighMed` because there is one clause whose scores dominates, and with `OrHighHigh` because there are so many documents that have both terms that `MAXSCORE` quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway). This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic `OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries.

mikemccand

Looks great, thanks @jpountz! We should also update the writeIndexHTML but I can take that after we start getting some data for this.

I'll merge and let's see if tonite's run (starts in ~ 1 hour) can pick these up! Oh nevermind -- beast3 is still regolding -- hopefully some time tomorrow we'll see first light for these tasks.

mikemccand · 2023-11-01T21:01:54Z

Oh yeah you already fixed the writeIndexHTML! Thanks :)

…cand#240) * Add more interesting tasks for top-k hits on boolean queries. Looking into the [Tantivy benchmark](https://tantivy-search.github.io/bench/) highlighted that there are interesting cases that are not covered by our nightly benchmarks. For instance, MAXSCORE splits clauses into essential and non-essential clauses, but our nightly benchmarks only really benchmark the case when there is a single essential clause (with `OrHighMed` because there is one clause whose scores dominates, and with `OrHighHigh` because there are so many documents that have both terms that `MAXSCORE` quickly realizes that the clause with the maximum score is required for a hit to be competitive anyway). This adds a few more categories whose performance would be interesting to track. I'm expecting the performance of these queries to often react very differently to changes in query evaluation compared to our historic `OrHighHigh`, `OrHighMed`, `AndHighHigh` and `AndHighMed` queries. * Fix * iter * iter

jpountz added 4 commits October 30, 2023 09:14

Fix

55dbc09

iter

4a419a5

iter

e1ed43a

jpountz mentioned this pull request Nov 1, 2023

Speed up disjunctions by computing estimations of the score of the k-th top hit up-front. apache/lucene#12526

Closed

mikemccand approved these changes Nov 1, 2023

View reviewed changes

mikemccand merged commit ed5a7ac into mikemccand:master Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more interesting tasks for top-k hits on boolean queries.#240

Add more interesting tasks for top-k hits on boolean queries.#240
mikemccand merged 4 commits intomikemccand:masterfrom
jpountz:more_disjunction_tasks

jpountz commented Oct 30, 2023

Uh oh!

mikemccand left a comment

Uh oh!

mikemccand commented Nov 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jpountz commented Oct 30, 2023

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand commented Nov 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants