Skip to content

ES|QL - vector similarity pushdown follow up#137564

Merged
carlosdelest merged 22 commits intoelastic:mainfrom
carlosdelest:non-issue/esql-vector-similarity-pushdown-follow-up
Nov 14, 2025
Merged

ES|QL - vector similarity pushdown follow up#137564
carlosdelest merged 22 commits intoelastic:mainfrom
carlosdelest:non-issue/esql-vector-similarity-pushdown-follow-up

Conversation

@carlosdelest
Copy link
Copy Markdown
Member

@carlosdelest carlosdelest commented Nov 4, 2025

Closes #137392

Addresses the following follow ups to #137002:

  • Do a single walk through the plan on the pushdown rule, similar to how ResolveUnionTypes work
  • As a consequence of the above, deduplicate fields in multiple commands when replacing expressions by pushed down fields
    - Implement canonicalize for vector similarity functions, and adds CanonicalizeVectorSimilarityFunctions rule to ensure canonicalization - - We're not invoking canonical() on functions. It would complicate things to create a rule specifically for that, and then depend on the function being canonicalized to avoid the literal / field check.
  • Randomize tests for vector similarity functions pushdown

@carlosdelest carlosdelest added >non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL v9.3.0 labels Nov 4, 2025
similarityFunction.dataType(),
blockLoaderFunctionConfig
);
var name = rawTemporaryName(fieldAttr.name(), blockLoaderFunctionConfig.name());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rely on blockLoaderFunctionConfig.name() to get a unique name for different functions

@carlosdelest
Copy link
Copy Markdown
Member Author

@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3

@carlosdelest carlosdelest marked this pull request as ready for review November 4, 2025 11:37
@elasticsearchmachine elasticsearchmachine added the Team:Search - Relevance The Search organization Search Relevance team label Nov 4, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@Override
protected Expression rule(VectorSimilarityFunction vectorSimilarityFunction, LogicalOptimizerContext ctx) {
return vectorSimilarityFunction.canonical();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed there was already a rule for this!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not. Given that we don't canonicalize other expressions, I'm thinking on removing this rule and getting back to the previous code for checking field and literal - this is adding complexity and coupling between the two rules.

WDYT?

Copy link
Copy Markdown
Contributor

@julian-elastic julian-elastic Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have LiteralsOnTheRight rule. But it only works for BinaryOperator. It seems VectorSimilarityFunction is not BinaryOperator and might be hard to make it one.

Alternatively, we will swap left and right in the surrogate method for spacial functions. Then you don't need a new rule and the code is much simpler. See SpatialContains.surrogate() for an example how to do it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't SurrogateExpressions used in the context of aggregations? Would it make sense to make the VectorSimilarityFunctions a SurrogateExpression?

I don't see any practical reason for doing that other than simplifying the check that is done in order to push down the vector similarity functions. I think it does not pay off - we're expecting a rule to act in order to be able to simplify an expression that should be able to understand when it should be pushable or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this change in 7a2b420 - happy to discuss or address this later

elasticsearchmachine added 4 commits November 11, 2025 11:33
…-similarity-pushdown-follow-up

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java
#	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/vector/VectorSimilarityFunctionsIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/PushExpressionsToFieldLoad.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java
@carlosdelest
Copy link
Copy Markdown
Member Author

@nik9000 @julian-elastic this is ready for another round of reviews, after incorporating the work done in #137382

@nik9000
Copy link
Copy Markdown
Member

nik9000 commented Nov 11, 2025

  • We're not invoking canonical() on functions. It would complicate things to create a rule specifically for that, and then depend on the function being canonicalized to avoid the literal / field check.

I opened #137892

Copy link
Copy Markdown
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @julian-elastic and @fang-xing-esql should also have a look too as they are experts here.

I'm surprised this doesn't help with the testLengthPushdownZoo - but I'll take a look at that after this one gets in.

Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two tests below in LocalLogicalPlanOptimizerTests should be able to run successfully with this PR now. Before this change duplicated entries are created in EsRelation, and cause duplicated attributes error.

testLengthPushdownZoo
testLengthInWhereAndEval

Ideally one entry can be created for multiple/equivalent length functions in EsRelation, however it seems like we still create multiple entries(different names and ids) for them. I'm wondering if the compute engine is smart enough to detect duplicated entries for the equivalent length function, and calculate them once?

elasticsearchmachine added 3 commits November 12, 2025 15:06
@carlosdelest
Copy link
Copy Markdown
Member Author

Ideally one entry can be created for multiple/equivalent length functions in EsRelation, however it seems like we still create multiple entries(different names and ids) for them. I'm wondering if the compute engine is smart enough to detect duplicated entries for the equivalent length function, and calculate them once?

@fang-xing-esql this was due to BlockLoaderFunctionConfig.JustWarnings including into equals / hashCode the Warnings, which include the Source used on each expression. This caused that expressions created with different Source were treated as not equals.

I changed the equals / hashCode methods in e5a102e and fixed the rule and tests in 3cb01aa.

The tests should pass now. There may be other issues that were not part of this work initially, that will be taken care of as part of #137679

Copy link
Copy Markdown
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the duplicated generated fields @carlosdelest, LGTM! A couple of suggestions, but you can decide when to address them:

  • The hash code in the generated field names look not quite necessary to me, it seems like the original field name + the function name can uniquely identify a generated field for a BlockLoaderExpression.
  • This rule adds additional attributes into each EsRelation, however nowadays, there could be multiple EsRelations in a plan, that has join, fork or subqueries etc. It will be great to have tests that covers multiple EsRelations with BlockLoaderExpression in general to validate both the plan and the query results.

@carlosdelest
Copy link
Copy Markdown
Member Author

The hash code in the generated field names look not quite necessary to me, it seems like the original field name + the function name can uniquely identify a generated field for a BlockLoaderExpression.

@fang-xing-esql it's necessary - otherwise V_DOT_PRODUCT(my_field, [0, 1, 2]) and V_DOT_PRODUCT(my_field, [3, 4, 5]) will end up using the same attribute, even though the function parameters are different.

We may only use the hashCode when the function has other parameters than the field itself, but for now it's the safer option.

This rule adds additional attributes into each EsRelation, however nowadays, there could be multiple EsRelations in a plan, that has join, fork or subqueries etc. It will be great to have tests that covers multiple EsRelations with BlockLoaderExpression in general to validate both the plan and the query results.

Good point, I was not addressing this on this PR. I will add them to #137679, that contains further follow up work.

@carlosdelest carlosdelest enabled auto-merge (squash) November 13, 2025 16:12
@carlosdelest carlosdelest merged commit 38f3839 into elastic:main Nov 14, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/ES|QL Search functionality in ES|QL :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL - Expression field loading follow ups

5 participants