ES|QL - vector similarity pushdown follow up by carlosdelest · Pull Request #137564 · elastic/elasticsearch

carlosdelest · 2025-11-04T10:30:43Z

Addresses the following follow ups to #137002:

Do a single walk through the plan on the pushdown rule, similar to how ResolveUnionTypes work
As a consequence of the above, deduplicate fields in multiple commands when replacing expressions by pushed down fields
~~- Implement canonicalize for vector similarity functions, and adds CanonicalizeVectorSimilarityFunctions rule to ensure canonicalization~~ - - We're not invoking canonical() on functions. It would complicate things to create a rule specifically for that, and then depend on the function being canonicalized to avoid the literal / field check.
Randomize tests for vector similarity functions pushdown

…sure no duplicate names

carlosdelest · 2025-11-04T10:33:32Z

...lasticsearch/xpack/esql/optimizer/rules/logical/local/PushDownVectorSimilarityFunctions.java

+                similarityFunction.dataType(),
+                blockLoaderFunctionConfig
+            );
+            var name = rawTemporaryName(fieldAttr.name(), blockLoaderFunctionConfig.name());


Rely on blockLoaderFunctionConfig.name() to get a unique name for different functions

carlosdelest · 2025-11-04T11:37:19Z

@elasticsearchmachine run elasticsearch-ci/bwc-snapshots-part3

elasticsearchmachine · 2025-11-04T11:38:05Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2025-11-04T11:38:05Z

Pinging @elastic/search-relevance (Team:Search - Relevance)

nik9000 · 2025-11-04T17:07:46Z

.../elasticsearch/xpack/esql/optimizer/rules/logical/CanonicalizeVectorSimilarityFunctions.java

+    @Override
+    protected Expression rule(VectorSimilarityFunction vectorSimilarityFunction, LogicalOptimizerContext ctx) {
+        return vectorSimilarityFunction.canonical();
+    }


I had assumed there was already a rule for this!

There's not. Given that we don't canonicalize other expressions, I'm thinking on removing this rule and getting back to the previous code for checking field and literal - this is adding complexity and coupling between the two rules.

WDYT?

We do have LiteralsOnTheRight rule. But it only works for BinaryOperator. It seems VectorSimilarityFunction is not BinaryOperator and might be hard to make it one.

Alternatively, we will swap left and right in the surrogate method for spacial functions. Then you don't need a new rule and the code is much simpler. See SpatialContains.surrogate() for an example how to do it.

Aren't SurrogateExpressions used in the context of aggregations? Would it make sense to make the VectorSimilarityFunctions a SurrogateExpression?

I don't see any practical reason for doing that other than simplifying the check that is done in order to push down the vector similarity functions. I think it does not pay off - we're expecting a rule to act in order to be able to simplify an expression that should be able to understand when it should be pushable or not.

I've removed this change in 7a2b420 - happy to discuss or address this later

…llow-up

…t fail

…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up

…-similarity-pushdown-follow-up # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/vector/VectorSimilarityFunctionsIT.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/PushExpressionsToFieldLoad.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

carlosdelest · 2025-11-11T12:44:08Z

@nik9000 @julian-elastic this is ready for another round of reviews, after incorporating the work done in #137382

nik9000 · 2025-11-11T13:07:48Z

We're not invoking canonical() on functions. It would complicate things to create a rule specifically for that, and then depend on the function being canonicalized to avoid the literal / field check.

I opened #137892

nik9000

LGTM. @julian-elastic and @fang-xing-esql should also have a look too as they are experts here.

I'm surprised this doesn't help with the testLengthPushdownZoo - but I'll take a look at that after this one gets in.

…llow-up

fang-xing-esql

The two tests below in LocalLogicalPlanOptimizerTests should be able to run successfully with this PR now. Before this change duplicated entries are created in EsRelation, and cause duplicated attributes error.

testLengthPushdownZoo
testLengthInWhereAndEval

Ideally one entry can be created for multiple/equivalent length functions in EsRelation, however it seems like we still create multiple entries(different names and ids) for them. I'm wondering if the compute engine is smart enough to detect duplicated entries for the equivalent length function, and calculate them once?

… equals / hashCode

…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up

carlosdelest · 2025-11-12T18:06:37Z

Ideally one entry can be created for multiple/equivalent length functions in EsRelation, however it seems like we still create multiple entries(different names and ids) for them. I'm wondering if the compute engine is smart enough to detect duplicated entries for the equivalent length function, and calculate them once?

@fang-xing-esql this was due to BlockLoaderFunctionConfig.JustWarnings including into equals / hashCode the Warnings, which include the Source used on each expression. This caused that expressions created with different Source were treated as not equals.

I changed the equals / hashCode methods in e5a102e and fixed the rule and tests in 3cb01aa.

The tests should pass now. There may be other issues that were not part of this work initially, that will be taken care of as part of #137679

…llow-up

fang-xing-esql

Thank you for addressing the duplicated generated fields @carlosdelest, LGTM! A couple of suggestions, but you can decide when to address them:

The hash code in the generated field names look not quite necessary to me, it seems like the original field name + the function name can uniquely identify a generated field for a BlockLoaderExpression.
This rule adds additional attributes into each EsRelation, however nowadays, there could be multiple EsRelations in a plan, that has join, fork or subqueries etc. It will be great to have tests that covers multiple EsRelations with BlockLoaderExpression in general to validate both the plan and the query results.

carlosdelest · 2025-11-13T16:10:32Z

The hash code in the generated field names look not quite necessary to me, it seems like the original field name + the function name can uniquely identify a generated field for a BlockLoaderExpression.

@fang-xing-esql it's necessary - otherwise V_DOT_PRODUCT(my_field, [0, 1, 2]) and V_DOT_PRODUCT(my_field, [3, 4, 5]) will end up using the same attribute, even though the function parameters are different.

We may only use the hashCode when the function has other parameters than the field itself, but for now it's the safer option.

This rule adds additional attributes into each EsRelation, however nowadays, there could be multiple EsRelations in a plan, that has join, fork or subqueries etc. It will be great to have tests that covers multiple EsRelations with BlockLoaderExpression in general to validate both the plan and the query results.

Good point, I was not addressing this on this PR. I will add them to #137679, that contains further follow up work.

elasticsearchmachine added 6 commits November 3, 2025 20:21

Implement single pass, similar to ResolveUnionTypes

092d2a2

Add test for replacing duplicates in multiple commands

b5b1176

Calculate name as part of the BlockLoaderFunctionConfig, so it can en…

516b4f9

…sure no duplicate names

Add javadoc

9466e31

Implement canonicalize() and CanonicalizeVectorSimilarityFunctions

207075a

Add randomized testing

d154931

carlosdelest added >non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch :Search Relevance/ES|QL Search functionality in ES|QL v9.3.0 labels Nov 4, 2025

carlosdelest commented Nov 4, 2025

View reviewed changes

[CI] Auto commit changes from spotless

7e59a92

carlosdelest marked this pull request as ready for review November 4, 2025 11:37

carlosdelest requested review from alex-spies and nik9000 November 4, 2025 11:37

elasticsearchmachine added the Team:Search - Relevance The Search organization Search Relevance team label Nov 4, 2025

nik9000 reviewed Nov 4, 2025

View reviewed changes

carlosdelest and others added 3 commits November 5, 2025 11:02

Merge branch 'main' into non-issue/esql-vector-similarity-pushdown-fo…

ba8bcb0

…llow-up

Fix test - dimensions checking must not use a null vector, or it won'…

8ee21af

…t fail

Merge remote-tracking branch 'carlosdelest/non-issue/esql-vector-simi…

9c16f79

…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up

carlosdelest mentioned this pull request Nov 6, 2025

ESQL: Leftovers from making field pushing generic #137679

Open

16 tasks

elasticsearchmachine added 4 commits November 11, 2025 11:33

Fix merge and tests

7866f4e

Remove unnecessary name() method

89a5a23

Remove CanonicalizeVectorSimilarityFunctions

7a2b420

nik9000 mentioned this pull request Nov 11, 2025

ESQL: Canonicalize all expressions #137892

Open

nik9000 approved these changes Nov 11, 2025

View reviewed changes

carlosdelest requested review from fang-xing-esql and julian-elastic November 11, 2025 16:17

Merge branch 'main' into non-issue/esql-vector-similarity-pushdown-fo…

801542e

…llow-up

fang-xing-esql reviewed Nov 12, 2025

View reviewed changes

elasticsearchmachine added 3 commits November 12, 2025 15:06

BlockLoaderFunctionConfig.JustWarnings does not consider warnings for…

e5a102e

… equals / hashCode

Fix logical plan tests

3cb01aa

Merge remote-tracking branch 'carlosdelest/non-issue/esql-vector-simi…

b4b3083

…larity-pushdown-follow-up' into non-issue/esql-vector-similarity-pushdown-follow-up

carlosdelest requested a review from fang-xing-esql November 12, 2025 18:06

carlosdelest and others added 4 commits November 13, 2025 09:42

Merge branch 'main' into non-issue/esql-vector-similarity-pushdown-fo…

5f665a0

…llow-up

Fix merge

4750aa8

Fix hashCode for similarity functions

73c9ef5

Merge branch 'main' into non-issue/esql-vector-similarity-pushdown-fo…

c8d6a88

…llow-up

fang-xing-esql approved these changes Nov 13, 2025

View reviewed changes

carlosdelest enabled auto-merge (squash) November 13, 2025 16:12

carlosdelest disabled auto-merge November 14, 2025 08:07

carlosdelest merged commit 38f3839 into elastic:main Nov 14, 2025
34 checks passed

carlosdelest mentioned this pull request Nov 14, 2025

ESQL: improve performance - Merge functions into loaders (sometimes) #103636

Closed

4 tasks

Conversation

carlosdelest commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosdelest Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Nov 4, 2025

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

elasticsearchmachine commented Nov 4, 2025

Uh oh!

nik9000 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

julian-elastic Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Nov 11, 2025

Uh oh!

nik9000 commented Nov 11, 2025

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

fang-xing-esql left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Nov 12, 2025

Uh oh!

fang-xing-esql left a comment

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

carlosdelest commented Nov 4, 2025 •

edited

Loading

julian-elastic Nov 6, 2025 •

edited

Loading

fang-xing-esql left a comment •

edited

Loading