[ES|QL] Rerank operator improvements#132318
Merged
afoucret merged 10 commits intoelastic:mainfrom Aug 1, 2025
Merged
Conversation
added 5 commits
July 31, 2025 20:36
Collaborator
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
afoucret
commented
Aug 1, 2025
| "year": { | ||
| "type": "integer" | ||
| }, | ||
| "collection": { |
Contributor
Author
There was a problem hiding this comment.
ℹ️ Added a new column to the dataset with sparse data, so we can test some sparse behavior.
afoucret
commented
Aug 1, 2025
| ; | ||
|
|
||
| book_no:keyword | title:text | author:text | collection:text | rerank_score:double | _score:double | ||
| 2714 | Return of the King Being the Third Part of The Lord of the Rings | J. R. R. Tolkien | The Lord of the Rings | 0.04761905 | 8.56 |
Contributor
Author
There was a problem hiding this comment.
ℹ️ Testing that reranking return null when the input field is null
afoucret
commented
Aug 1, 2025
| | KEEP book_no, title, ratings, _score | ||
| ; | ||
|
|
||
| book_no:keyword | title:text | ratings:double | _score:double |
Contributor
Author
There was a problem hiding this comment.
ℹ️ It is stupid to rerank on a number but at least it does not break.
afoucret
commented
Aug 1, 2025
| | KEEP book_no, title, ratings, _score | ||
| ; | ||
|
|
||
| book_no:keyword | title:text | ratings:double | _score:double |
Contributor
Author
There was a problem hiding this comment.
ℹ️ Combining text and non-text fields. Will be encoded in a YAML document that will be passed to the reranker.
afoucret
commented
Aug 1, 2025
| if (castRerankFieldsAsString | ||
| && rerank.isValidRerankField(resolved) | ||
| && DataType.isString(resolved.dataType()) == false) { | ||
| resolved = resolved.replaceChild(new ToString(resolved.child().source(), resolved.child())); |
Contributor
Author
There was a problem hiding this comment.
ℹ️ Casting non text input field to string,
tteofili
approved these changes
Aug 1, 2025
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Outdated
Show resolved
Hide resolved
added 2 commits
August 1, 2025 13:26
carlosdelest
approved these changes
Aug 1, 2025
added 2 commits
August 1, 2025 14:14
…esql-inference-commands-input-validation
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Aug 1, 2025
…cking * upstream/main: (166 commits) Reduce inactive sink interval in VectorSimilarityFunctionsIT (elastic#132288) ESQL: Allow agg tests to process many columns (elastic#132358) Update analysis-lowercase-tokenfilter.md (elastic#132359) Add Sparse Vector Index Options Settings to Semantic Text Field (elastic#131058) Collect node thread pool usage for shard balancing (elastic#131480) Add tasks to validate new style transport versions (elastic#131782) Mute org.elasticsearch.search.routing.SearchReplicaSelectionIT testNodeSelection elastic#132354 Mute org.elasticsearch.xpack.esql.action.CrossClusterAsyncQueryIT testBadAsyncId elastic#132353 Fixes DenseVectorFieldIndexTypeUpdateIT release tests (elastic#132346) Fix testCloseOrReallocateDuringPartialSnapshot (elastic#132049) (Doc) ILM Force Merge not on HDD and happens on hosting node not current phase tier (elastic#130280) Run GeoIp YAML tests in multi-project cluster and fix bug discovered by tests (elastic#131521) Unmutes elastic#132111, seems a transient, non reproducible issue (elastic#132253) Mute org.elasticsearch.search.suggest.phrase.PhraseSuggesterIT testPhraseSuggestionWithNgramOnlyAnalyzerThrowsException elastic#132347 Add AI21 support to Inference Plugin (elastic#131238) OpenJDK EA builds should use https instead of http (elastic#132297) ESQL: Normalize timeseries aggs slightly (elastic#132284) Avoid internal server error on suggester ngram bad request (elastic#132321) [ES|QL] Rerank operator improvements (elastic#132318) Mute org.elasticsearch.xpack.logsdb.qa.LogsDbVersusReindexedLogsDbChallengeRestIT testTermsQuery elastic#132337 ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces several enhancements to ES|QL's
RERANKcommand.RERANKInput Validation:On multiple fields, the whole content is encoded in YAML so it is not necessary
AnalyzerTestsfor supported / unsupported field typesSparse Data Handling:
RERANKoperator to correctly handle null or missing values in input fieldnull(0 does not make sense in the context of reranker model since the min score can be < 0).XContentRowEncoder(in charge of the YAML conversion when multiple fields are used), so it returnsnullif all fields arenull(empty YAML before)Bug Fixes & Testing:
XContentRowEncoderthat caused a leading space in the outputXContentRowEncoderTests) has been added to cover the functionality of theXContentRowEncoderand prevent future regressionsRERANKandCOMPLETIONhave been updated to use a new test helper for reading block data and to assert correct behavior with sparse inputs.