ES|QL - vector similarity pushdown follow up#137564
ES|QL - vector similarity pushdown follow up#137564carlosdelest merged 22 commits intoelastic:mainfrom
Conversation
…sure no duplicate names
| similarityFunction.dataType(), | ||
| blockLoaderFunctionConfig | ||
| ); | ||
| var name = rawTemporaryName(fieldAttr.name(), blockLoaderFunctionConfig.name()); |
There was a problem hiding this comment.
Rely on blockLoaderFunctionConfig.name() to get a unique name for different functions
|
@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3 |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
| @Override | ||
| protected Expression rule(VectorSimilarityFunction vectorSimilarityFunction, LogicalOptimizerContext ctx) { | ||
| return vectorSimilarityFunction.canonical(); | ||
| } |
There was a problem hiding this comment.
I had assumed there was already a rule for this!
There was a problem hiding this comment.
There's not. Given that we don't canonicalize other expressions, I'm thinking on removing this rule and getting back to the previous code for checking field and literal - this is adding complexity and coupling between the two rules.
WDYT?
There was a problem hiding this comment.
We do have LiteralsOnTheRight rule. But it only works for BinaryOperator. It seems VectorSimilarityFunction is not BinaryOperator and might be hard to make it one.
Alternatively, we will swap left and right in the surrogate method for spacial functions. Then you don't need a new rule and the code is much simpler. See SpatialContains.surrogate() for an example how to do it.
There was a problem hiding this comment.
Aren't SurrogateExpressions used in the context of aggregations? Would it make sense to make the VectorSimilarityFunctions a SurrogateExpression?
I don't see any practical reason for doing that other than simplifying the check that is done in order to push down the vector similarity functions. I think it does not pay off - we're expecting a rule to act in order to be able to simplify an expression that should be able to understand when it should be pushable or not.
There was a problem hiding this comment.
I've removed this change in 7a2b420 - happy to discuss or address this later
…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up
…-similarity-pushdown-follow-up # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/vector/VectorSimilarityFunctionsIT.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/PushExpressionsToFieldLoad.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
|
@nik9000 @julian-elastic this is ready for another round of reviews, after incorporating the work done in #137382 |
I opened #137892 |
nik9000
left a comment
There was a problem hiding this comment.
LGTM. @julian-elastic and @fang-xing-esql should also have a look too as they are experts here.
I'm surprised this doesn't help with the testLengthPushdownZoo - but I'll take a look at that after this one gets in.
There was a problem hiding this comment.
The two tests below in LocalLogicalPlanOptimizerTests should be able to run successfully with this PR now. Before this change duplicated entries are created in EsRelation, and cause duplicated attributes error.
testLengthPushdownZoo
testLengthInWhereAndEval
Ideally one entry can be created for multiple/equivalent length functions in EsRelation, however it seems like we still create multiple entries(different names and ids) for them. I'm wondering if the compute engine is smart enough to detect duplicated entries for the equivalent length function, and calculate them once?
… equals / hashCode
…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up
@fang-xing-esql this was due to I changed the equals / hashCode methods in e5a102e and fixed the rule and tests in 3cb01aa. The tests should pass now. There may be other issues that were not part of this work initially, that will be taken care of as part of #137679 |
fang-xing-esql
left a comment
There was a problem hiding this comment.
Thank you for addressing the duplicated generated fields @carlosdelest, LGTM! A couple of suggestions, but you can decide when to address them:
- The hash code in the generated field names look not quite necessary to me, it seems like the original field name + the function name can uniquely identify a generated field for a
BlockLoaderExpression. - This rule adds additional attributes into each
EsRelation, however nowadays, there could be multipleEsRelations in a plan, that has join, fork or subqueries etc. It will be great to have tests that covers multipleEsRelations withBlockLoaderExpressionin general to validate both the plan and the query results.
@fang-xing-esql it's necessary - otherwise We may only use the hashCode when the function has other parameters than the field itself, but for now it's the safer option.
Good point, I was not addressing this on this PR. I will add them to #137679, that contains further follow up work. |
Closes #137392
Addresses the following follow ups to #137002:
ResolveUnionTypeswork- Implement- - We're not invokingcanonicalizefor vector similarity functions, and addsCanonicalizeVectorSimilarityFunctionsrule to ensure canonicalizationcanonical()on functions. It would complicate things to create a rule specifically for that, and then depend on the function being canonicalized to avoid the literal / field check.