Simplify max score for kNN vector queries#12146
Conversation
The helper class DocAndScoreQuery implements advanceShallow to help skip non-competitive documents. This method doesn't actually keep track of where it has advanced, which means it can do extra work. Overall the complexity here didn't seem worth it, given the low cost of collecting matching kNN docs. This PR switches to a simpler approach, which uses a fixed upper bound on the max score.
|
The context: I've been testing out an AI code assistant, and I asked it if there were any bugs in I'm not an expert in the doc skipping code, and had trouble fully understanding the method contracts. So let me know if this is off! |
jpountz
left a comment
There was a problem hiding this comment.
Your logic looks correct.
I wondered if this change could deoptimize some important cases, but I think that your intuition is true that since this query would generally matche few documents, then advance() would move far ahead and the maximum scores produced by this query would get ignored anyway because they fall outside of the window of doc IDs that is being considered (upTo in BlockMaxMaxscoreScorer).
|
Thanks for the review! |
* Ensure vector queries handle advanceShallow correctly * adding changes * Adjusting to just be a backport of #12146
The helper class DocAndScoreQuery implements advanceShallow to help skip
non-competitive documents. This method doesn't actually keep track of where it
has advanced, so it might do extra work on each call.
Overall the complexity here didn't seem worth it, given the low cost of
collecting matching kNN docs. This PR switches to a simple approach, which uses
a fixed upper bound on the max score. This is low overhead, while still
allowing for skipping in some cases.