Skip to content

ES|QL: Configurable score, key and group by columns for FUSE#135079

Merged
ioanatia merged 5 commits intoelastic:mainfrom
ioanatia:fuse_configurable_columns
Sep 22, 2025
Merged

ES|QL: Configurable score, key and group by columns for FUSE#135079
ioanatia merged 5 commits intoelastic:mainfrom
ioanatia:fuse_configurable_columns

Conversation

@ioanatia
Copy link
Copy Markdown
Member

@ioanatia ioanatia commented Sep 19, 2025

tracked in #123389

FUSE uses:

  • a column for scoring (_score by default)
  • a column that designates the result group from which the row originates (_fork by default)
  • a list of columns to merge rows together (_id, _index by default)

We want to be able to configure these with non-default columns.
We extend the syntax to be able to support:

FUSE linear SCORE BY my_score GROUP BY my_fork KEY BY my_index,my_id WITH { "normalizer": "minmax" }

SCORE BY/GROUP BY/KEY BY/WITH can be used in any order.

@ioanatia ioanatia added >non-issue Team:Search - Relevance The Search organization Search Relevance team v9.2.0 :Search Relevance/ES|QL Search functionality in ES|QL labels Sep 19, 2025

@Override
public void postAnalysisVerification(Failures failures) {
if (score.dataType() != DataType.DOUBLE) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a note in #123389 that as a follow up, we might want to accept any numeric type for the score column.
I have not done this yet, since (1) the operator expects it to a double and (2) it would mean refactoring FuseScoreEval/FuseScoreEvalExec to be able to receive both an input score attribute and an output score attribute.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, users could convert it with TO_DOUBLE if needed

;


fuseWithRowAndRRF
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests at the end that are not using FORK, we did not have any before.
For FUSE we always wanted to support the case where it's not used in combination with FORK.

The other aspect is that I wanted some tests where the scores are more predictable and we are not using full text search.

@ioanatia ioanatia marked this pull request as ready for review September 19, 2025 13:01
@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search - Relevance The Search organization Search Relevance team labels Sep 19, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)


fuseCommand
: DEV_FUSE (fuseType=identifier)? fuseOptions=commandNamedParameters
: DEV_FUSE (fuseType=identifier)? (SCORE BY score=qualifiedName)? (KEY BY key=fields)? (GROUP BY group=qualifiedName)? fuseOptions=commandNamedParameters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit skeptical with the syntax here.

My first concern is that it requires a strict ordering of the optional clauses which is a little bit trappy for the user.
One could not understand why the following is not working:

| FUSE  RRF KEY BY key column SCORE BY my_score WITH { ... custom options }

I wonder if those parameter should just be part of the commandNamedParameters:

| FUSE  RRF  WITH {"key_column": "my_key", "score_column": "my_score", ...  ... custom options }\

Also we were asked that all the column creation are using the assignment pattern, so it should probably be something like:

| FUSE  my_score=RRF  WITH {"key_column": "my_key",  ...  ... custom options }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @afoucret . We may want to have the options in any order, and just fail the parsing if we're repeating an option.

Copy link
Copy Markdown
Member Author

@ioanatia ioanatia Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the skepticism, but this is the syntax we agreed we will implement for FUSE with the larger group.
We discussed even the option of having the columns as part of MapExpression as you suggest.
In the end, we went with the syntax that you see implemented here.

We went through many iterations to get to this shape, this PR is simply implementing what we agreed with.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards with the syntax being a bit too rigid that it expects a certain order, I think we can follow up on it so we make SCORE BY/KEY BY/GROUP BY interchangeable.

@ioanatia ioanatia requested a review from afoucret September 19, 2025 14:25
Copy link
Copy Markdown
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I think a follow up would be nice with grammar changes so KEY BY, GROUP BY can be used in any order.


fuseCommand
: DEV_FUSE (fuseType=identifier)? fuseOptions=commandNamedParameters
: DEV_FUSE (fuseType=identifier)? (SCORE BY score=qualifiedName)? (KEY BY key=fields)? (GROUP BY group=qualifiedName)? fuseOptions=commandNamedParameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @afoucret . We may want to have the options in any order, and just fail the parsing if we're repeating an option.


@Override
public void postAnalysisVerification(Failures failures) {
if (score.dataType() != DataType.DOUBLE) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, users could convert it with TO_DOUBLE if needed

failures.add(
fail(
discriminator,
"expected GROUP BY field [{}] to be KEYWORD or TEXT, not {}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary that we group / key by text or keyword? Why not use any type? At least numeric types could make sense to me.

Of course we can do that in a follow up, just curious about your thinking

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a restriction we have on the operator where we expect the discriminator values to be strings.
but I think we can lift this later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test to ensure the parameters are passed down to the Fuse plan?

@ioanatia ioanatia added the ES|QL-ui Impacts ES|QL UI label Sep 22, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@ioanatia
Copy link
Copy Markdown
Member Author

@carlosdelest I added statement parser tests and made sure that GROUP BY/KEY BY/SCORE BY are interchangeable

Copy link
Copy Markdown
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for unordered clauses! 💯

@ioanatia ioanatia merged commit fe4c41e into elastic:main Sep 22, 2025
34 checks passed
@ioanatia ioanatia deleted the fuse_configurable_columns branch September 22, 2025 15:35
gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 22, 2025
DonalEvans pushed a commit to DonalEvans/elasticsearch that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ES|QL-ui Impacts ES|QL UI >non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants