Simplify max score for kNN vector queries by jtibshirani · Pull Request #12146 · apache/lucene

jtibshirani · 2023-02-13T22:08:36Z

The helper class DocAndScoreQuery implements advanceShallow to help skip
non-competitive documents. This method doesn't actually keep track of where it
has advanced, so it might do extra work on each call.

Overall the complexity here didn't seem worth it, given the low cost of
collecting matching kNN docs. This PR switches to a simple approach, which uses
a fixed upper bound on the max score. This is low overhead, while still
allowing for skipping in some cases.

The helper class DocAndScoreQuery implements advanceShallow to help skip non-competitive documents. This method doesn't actually keep track of where it has advanced, which means it can do extra work. Overall the complexity here didn't seem worth it, given the low cost of collecting matching kNN docs. This PR switches to a simpler approach, which uses a fixed upper bound on the max score.

jtibshirani · 2023-02-13T22:32:51Z

The context: I've been testing out an AI code assistant, and I asked it if there were any bugs in AbstractKnnVectorQuery. It pointed out that "'advanceShallow' should move the 'upTo' field like 'nextDoc' does" ... which didn't seem right, but made me realize this logic looked funny.

I'm not an expert in the doc skipping code, and had trouble fully understanding the method contracts. So let me know if this is off!

jpountz

Your logic looks correct.

I wondered if this change could deoptimize some important cases, but I think that your intuition is true that since this query would generally matche few documents, then advance() would move far ahead and the maximum scores produced by this query would get ignored anyway because they fall outside of the window of doc IDs that is being considered (upTo in BlockMaxMaxscoreScorer).

jtibshirani · 2023-02-16T20:04:09Z

Thanks for the review!

* Ensure vector queries handle advanceShallow correctly * adding changes * Adjusting to just be a backport of #12146

jtibshirani added 2 commits February 13, 2023 14:07

Fix spotless

31bd989

Simpler max score calculation

8d4c812

jtibshirani marked this pull request as ready for review February 13, 2023 23:32

jtibshirani requested review from jpountz and msokolov February 13, 2023 23:48

jtibshirani mentioned this pull request Feb 14, 2023

Improve DocAndScoreQuery#toString #12148

Merged

jpountz approved these changes Feb 16, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/main' into jtibshirani/max-score

8f607fb

jtibshirani merged commit 8340b01 into main Feb 16, 2023

jtibshirani deleted the jtibshirani/max-score branch February 16, 2023 20:04

benwtrent mentioned this pull request Feb 16, 2023

Minor vector search matching doc optimizations #12152

Merged

benwtrent mentioned this pull request Jun 28, 2025

Ensure vector queries handle advanceShallow correctly #14858

Merged

benwtrent added a commit to benwtrent/lucene that referenced this pull request Jun 30, 2025

Adjusting to just be a backport of apache#12146

edc4271

benwtrent added a commit that referenced this pull request Jun 30, 2025

Ensure vector queries handle advanceShallow correctly (#14858)

16b9a87

* Ensure vector queries handle advanceShallow correctly * adding changes * Adjusting to just be a backport of #12146

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify max score for kNN vector queries#12146

Simplify max score for kNN vector queries#12146
jtibshirani merged 4 commits intomainfrom
jtibshirani/max-score

jtibshirani commented Feb 13, 2023 •

edited

Loading

Uh oh!

jtibshirani commented Feb 13, 2023

Uh oh!

jpountz left a comment

Uh oh!

jtibshirani commented Feb 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtibshirani commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtibshirani commented Feb 13, 2023

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

jtibshirani commented Feb 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jtibshirani commented Feb 13, 2023 •

edited

Loading