ES|QL: Configurable score, key and group by columns for FUSE#135079
ES|QL: Configurable score, key and group by columns for FUSE#135079ioanatia merged 5 commits intoelastic:mainfrom
Conversation
|
|
||
| @Override | ||
| public void postAnalysisVerification(Failures failures) { | ||
| if (score.dataType() != DataType.DOUBLE) { |
There was a problem hiding this comment.
I made a note in #123389 that as a follow up, we might want to accept any numeric type for the score column.
I have not done this yet, since (1) the operator expects it to a double and (2) it would mean refactoring FuseScoreEval/FuseScoreEvalExec to be able to receive both an input score attribute and an output score attribute.
There was a problem hiding this comment.
I think this is fine, users could convert it with TO_DOUBLE if needed
| ; | ||
|
|
||
|
|
||
| fuseWithRowAndRRF |
There was a problem hiding this comment.
I added some tests at the end that are not using FORK, we did not have any before.
For FUSE we always wanted to support the case where it's not used in combination with FORK.
The other aspect is that I wanted some tests where the scores are more predictable and we are not using full text search.
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
|
||
| fuseCommand | ||
| : DEV_FUSE (fuseType=identifier)? fuseOptions=commandNamedParameters | ||
| : DEV_FUSE (fuseType=identifier)? (SCORE BY score=qualifiedName)? (KEY BY key=fields)? (GROUP BY group=qualifiedName)? fuseOptions=commandNamedParameters |
There was a problem hiding this comment.
I am a little bit skeptical with the syntax here.
My first concern is that it requires a strict ordering of the optional clauses which is a little bit trappy for the user.
One could not understand why the following is not working:
| FUSE RRF KEY BY key column SCORE BY my_score WITH { ... custom options }
I wonder if those parameter should just be part of the commandNamedParameters:
| FUSE RRF WITH {"key_column": "my_key", "score_column": "my_score", ... ... custom options }\
Also we were asked that all the column creation are using the assignment pattern, so it should probably be something like:
| FUSE my_score=RRF WITH {"key_column": "my_key", ... ... custom options }
There was a problem hiding this comment.
I agree with @afoucret . We may want to have the options in any order, and just fail the parsing if we're repeating an option.
There was a problem hiding this comment.
I get the skepticism, but this is the syntax we agreed we will implement for FUSE with the larger group.
We discussed even the option of having the columns as part of MapExpression as you suggest.
In the end, we went with the syntax that you see implemented here.
We went through many iterations to get to this shape, this PR is simply implementing what we agreed with.
There was a problem hiding this comment.
In regards with the syntax being a bit too rigid that it expects a certain order, I think we can follow up on it so we make SCORE BY/KEY BY/GROUP BY interchangeable.
carlosdelest
left a comment
There was a problem hiding this comment.
Looks good!
I think a follow up would be nice with grammar changes so KEY BY, GROUP BY can be used in any order.
|
|
||
| fuseCommand | ||
| : DEV_FUSE (fuseType=identifier)? fuseOptions=commandNamedParameters | ||
| : DEV_FUSE (fuseType=identifier)? (SCORE BY score=qualifiedName)? (KEY BY key=fields)? (GROUP BY group=qualifiedName)? fuseOptions=commandNamedParameters |
There was a problem hiding this comment.
I agree with @afoucret . We may want to have the options in any order, and just fail the parsing if we're repeating an option.
|
|
||
| @Override | ||
| public void postAnalysisVerification(Failures failures) { | ||
| if (score.dataType() != DataType.DOUBLE) { |
There was a problem hiding this comment.
I think this is fine, users could convert it with TO_DOUBLE if needed
| failures.add( | ||
| fail( | ||
| discriminator, | ||
| "expected GROUP BY field [{}] to be KEYWORD or TEXT, not {}", |
There was a problem hiding this comment.
Is it necessary that we group / key by text or keyword? Why not use any type? At least numeric types could make sense to me.
Of course we can do that in a follow up, just curious about your thinking
There was a problem hiding this comment.
I think this is a restriction we have on the operator where we expect the discriminator values to be strings.
but I think we can lift this later.
There was a problem hiding this comment.
Should we add a test to ensure the parameters are passed down to the Fuse plan?
|
Pinging @elastic/kibana-esql (ES|QL-ui) |
|
@carlosdelest I added statement parser tests and made sure that GROUP BY/KEY BY/SCORE BY are interchangeable |
carlosdelest
left a comment
There was a problem hiding this comment.
Thanks for adding support for unordered clauses! 💯
tracked in #123389
FUSE uses:
We want to be able to configure these with non-default columns.
We extend the syntax to be able to support:
SCORE BY/GROUP BY/KEY BY/WITHcan be used in any order.