Fix failing BaseVectorSimilarityQueryTestCase#testApproximate#12922
Fix failing BaseVectorSimilarityQueryTestCase#testApproximate#12922benwtrent merged 3 commits intoapache:mainfrom
Conversation
| // Advance the scorer | ||
| if (!scorer.advanceExact(doc)) { | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
This tells me that none of the filtered cases drop to using exact search? If they did they would match on docs that were actually filtered out.
Could you add a test exercising this path?
There was a problem hiding this comment.
This tells me that none of the filtered cases drop to using exact search?
We do have a test case that covers this: testRandomFilter randomly filters a sub-range of documents and expects all of them to be found -- which always falls back to exact search (because a traversalSimilarity of Float.NEGATIVE_INFINITY here will visit the entire graph and exhaust any limit)
While running this test, I put debug points here to check the scorer, and it was always unpositioned (docid = -1) but doesn't throw an error in many cases
The error is thrown when the underlying values are randomly chosen to SimpleTextFloatVectorValues - try ./gradlew test --tests TestFloatVectorSimilarityQuery.testRandomFilter -Dtests.seed=119135B1F0803918 for a failing case (as mentioned here, thanks @epotyom!)
In other cases (for example when values are DenseOffHeapVectorValues), the scorer does not fail even when unpositioned and returns some (garbage) value - but the count of results is still correct (and asserted)
I wonder if we can put an assert values.docID() != -1 before this and this line to ensure it is positioned before trying to compute scores? -- I tried this offline, and the issue was caught immediately
There was a problem hiding this comment.
I added a commit to demonstrate this^, please let me know if we can improve this / do it some other way?
There was a problem hiding this comment.
adding an assert is nice :). In reality it would throw trying to read garbage.
Discovered in #12921, and introduced in #12679 The first issue is that we weren't advancing the `VectorScorer` [here](https://github.com/apache/lucene/blob/cf13a9295052288b748ed8f279f05ee26f3bfd5f/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L257-L262) -- so it was still un-positioned while trying to compute the similarity score Earlier in the PR, the underlying delegate of the `FilteredDocIdSetIterator` was `scorer.iterator()` (see [here](https://github.com/apache/lucene/blob/cad565439be512ac6e95a698007b1fc971173f00/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L107)) -- so we didn't need to explicitly advance it Later, we decided to maintain parity to `AbstractKnnVectorQuery` and introduce filtering in `AbstractVectorSimilarityQuery` (see [this commit](5096790)) to determine the `visitLimit` of approximate search -- after which the underlying iterator changed to the accepted docs (see [here](https://github.com/apache/lucene/blob/5096790f281e477c529a7c8311aeb353ccdffdeb/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java#L255)) and I missed advancing the `VectorScorer` explicitly.. After doing so, we no longer get the original `java.lang.ArrayIndexOutOfBoundsException` -- but the `BaseVectorSimilarityQueryTestCase#testApproximate` starts failing because it falls back to exact search, as the limit of the prefilter is met during graph search Relaxed the parameters of the test to fix this (making the filter less restrictive, and trying to visit a fewer number of nodes so that approximate search completes without hitting its limit) Sorry for missing this earlier!
Discovered in #12921, and introduced in #12679
The first issue is that we weren't advancing the
VectorScorerhere -- so it was still un-positioned while trying to compute the similarity scoreEarlier in the PR, the underlying delegate of the
FilteredDocIdSetIteratorwasscorer.iterator()(see here) -- so we didn't need to explicitly advance itLater, we decided to maintain parity to
AbstractKnnVectorQueryand introduce filtering inAbstractVectorSimilarityQuery(see this commit) to determine thevisitLimitof approximate search -- after which the underlying iterator changed to the accepted docs (see here) and I missed advancing theVectorScorerexplicitly..After doing so, we no longer get the original
java.lang.ArrayIndexOutOfBoundsException-- but theBaseVectorSimilarityQueryTestCase#testApproximatestarts failing because it falls back to exact search, as the limit of the prefilter is met during graph searchRelaxed the parameters of the test to fix this (making the filter less restrictive, and trying to visit a fewer number of nodes so that approximate search completes without hitting its limit)
Sorry for missing this earlier!