Skip to content

Allow a slight difference in rescored docs#139931

Merged
thecoop merged 4 commits intoelastic:mainfrom
thecoop:rescoring-delta-fix
Jan 7, 2026
Merged

Allow a slight difference in rescored docs#139931
thecoop merged 4 commits intoelastic:mainfrom
thecoop:rescoring-delta-fix

Conversation

@thecoop
Copy link
Copy Markdown
Member

@thecoop thecoop commented Dec 23, 2025

With #139769, and recent changes to new native scorers, the rescored values may be slightly different to the original scores. Update the tests to allow this

Fixes #139912
Fixes #139869
Fixes #139859
Fixes #140014
Fixes #140078

@thecoop thecoop added >test Issues or PRs that are addressing/adding tests :Search Relevance/Vectors Vector search labels Dec 23, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added v9.4.0 Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Dec 23, 2025
@benwtrent
Copy link
Copy Markdown
Member

My concern is that the delta might be because either we didn't rescore at all (and thus it's the same as the quantized score), or we are actually returning the wrong scores for docs (e.g. score for doc 1 for doc 2).

We should assert on doc order (if we don't already) and that the scores are different than the quantized scores.

@thecoop thecoop force-pushed the rescoring-delta-fix branch from bbaa6e3 to 4e06af1 Compare January 2, 2026 11:08
@thecoop
Copy link
Copy Markdown
Member Author

thecoop commented Jan 2, 2026

We already assert on doc order, I've added a check that rescoring changes scores compared to the original query

assertThat(rescoredDocs.scoreDocs.length, equalTo(k));
assertThat(rescoredDocs.scoreDocs, arrayWithSize(k));

if (innerQuery instanceof KnnFloatVectorQuery) {
Copy link
Copy Markdown
Member Author

@thecoop thecoop Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boolean query doesnt cause any differences - as it's not actually doing a kNN search over quantized data

Comment on lines +72 to +76
/*
* Original KNN scoring and rescoring can use slightly different calculation methods,
* so there may be a very slight difference in the scores after rescoring.
*/
private static final float DELTA = 1e-6f;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this difference due to script execution paths vs utilizing native/bulk scoring?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is due to script vs. bulk rescoring. Could we adjust the calculation to use a flat index and a regular knn query instead of the script? I would assume the flat utilization would execute through the bulk off-heap scoring stuff vs. the script.

I am worried about these slight differences...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#139769 has got similar changes to this, so I'm pretty sure these changes are due to that PR

@thecoop
Copy link
Copy Markdown
Member Author

thecoop commented Jan 6, 2026

Looking closer, this might actually be a result of #139769 - which uses panama in test classes, which could cause some FP differences

@thecoop thecoop requested a review from ChrisHegarty January 6, 2026 14:39
@thecoop thecoop merged commit 49f163e into elastic:main Jan 7, 2026
36 checks passed
@thecoop thecoop deleted the rescoring-delta-fix branch January 7, 2026 09:10
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jan 7, 2026
Follow on from elastic#139769 to update some more tests for FP differences
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 7, 2026
* upstream/main: (191 commits)
  Overall Decision for Deciders prioritizes THROTTLE (elastic#140237)
  Apply group by all logic not only to top-level aggregates (elastic#140248)
  [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982)
  Avoid reading entire bloom filter file on reader open (elastic#139374)
  Mark bloom filter files for random access (elastic#139375)
  Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034)
  Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230)
  Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228)
  Allow a slight difference in rescored docs (elastic#139931)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  Start exchange sink fetchers concurrently (elastic#140196)
  Allow allocation to replacement target node on vacate completion (elastic#140150)
  Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925)
  DeterministicQueue refactor and enhancement (elastic#140151)
  Always error out if CCS expression shows up when CCS is not supported (elastic#139009)
  Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200)
  Guard promql capabilities in AnalyzerTests (elastic#140232)
  [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978)
  Cleaning up exitable vector value impls (elastic#140190)
  [Inference API] Fix auth exception listener not called bug (elastic#139966)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
Follow on from elastic#139769 to update some more tests for FP differences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

3 participants