Allow a slight difference in rescored docs#139931
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
My concern is that the delta might be because either we didn't rescore at all (and thus it's the same as the quantized score), or we are actually returning the wrong scores for docs (e.g. score for doc 1 for doc 2). We should assert on doc order (if we don't already) and that the scores are different than the quantized scores. |
bbaa6e3 to
4e06af1
Compare
|
We already assert on doc order, I've added a check that rescoring changes scores compared to the original query |
| assertThat(rescoredDocs.scoreDocs.length, equalTo(k)); | ||
| assertThat(rescoredDocs.scoreDocs, arrayWithSize(k)); | ||
|
|
||
| if (innerQuery instanceof KnnFloatVectorQuery) { |
There was a problem hiding this comment.
The boolean query doesnt cause any differences - as it's not actually doing a kNN search over quantized data
| /* | ||
| * Original KNN scoring and rescoring can use slightly different calculation methods, | ||
| * so there may be a very slight difference in the scores after rescoring. | ||
| */ | ||
| private static final float DELTA = 1e-6f; |
There was a problem hiding this comment.
Is this difference due to script execution paths vs utilizing native/bulk scoring?
There was a problem hiding this comment.
If this is due to script vs. bulk rescoring. Could we adjust the calculation to use a flat index and a regular knn query instead of the script? I would assume the flat utilization would execute through the bulk off-heap scoring stuff vs. the script.
I am worried about these slight differences...
There was a problem hiding this comment.
#139769 has got similar changes to this, so I'm pretty sure these changes are due to that PR
|
Looking closer, this might actually be a result of #139769 - which uses panama in test classes, which could cause some FP differences |
Follow on from elastic#139769 to update some more tests for FP differences
* upstream/main: (191 commits) Overall Decision for Deciders prioritizes THROTTLE (elastic#140237) Apply group by all logic not only to top-level aggregates (elastic#140248) [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982) Avoid reading entire bloom filter file on reader open (elastic#139374) Mark bloom filter files for random access (elastic#139375) Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034) Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230) Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228) Allow a slight difference in rescored docs (elastic#139931) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480 Start exchange sink fetchers concurrently (elastic#140196) Allow allocation to replacement target node on vacate completion (elastic#140150) Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925) DeterministicQueue refactor and enhancement (elastic#140151) Always error out if CCS expression shows up when CCS is not supported (elastic#139009) Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200) Guard promql capabilities in AnalyzerTests (elastic#140232) [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978) Cleaning up exitable vector value impls (elastic#140190) [Inference API] Fix auth exception listener not called bug (elastic#139966) ...
Follow on from elastic#139769 to update some more tests for FP differences
With #139769, and recent changes to new native scorers, the rescored values may be slightly different to the original scores. Update the tests to allow this
Fixes #139912
Fixes #139869
Fixes #139859
Fixes #140014
Fixes #140078