[PoC] Push vector similarity functions using BlockDocValuesReader by carlosdelest · Pull Request #136739 · elastic/elasticsearch

carlosdelest · 2025-10-17T07:14:38Z

Pushes down vector similarity functions, and applies them when loading vectors via BlockDocValuesReader.

This avoids the overhead of loading vectors into blocks, and calculate the functions afterwards in the compute engine. This directly calculates the similarity functions when loading the field, and loads the results directly into a block.

This approach consists on:

A new FieldExtractPreference: FUNCTION. This signals that the field must be extracted using a function.
A BlockLoaderValueFunction interface, that is available on the BlockLoaderContext for fields that are extracted via FUNCTION. This interface defines both the Builder that will be used as a result of the transformation, and the transformation itself.
This function is applied via a DenseVectorValueBlockLoader, that is used when a FUNCTION is used to extract a dense vector. It applies the transformation and loads the results.
A FieldFunctionAttribute that is used to extract a FieldAttribute with a function.
A VectorSimilarityFunctionsPushdown local physical optimization rule, that replaces a vector similarity function applied to a field with a FieldFunctionAttribute that will be used to extract the function result at field loading.

Work still pending:

VectorSimilarityFunctionsPushdown needs to take into account that the field may still need to be extracted as it may be a reference in other places of the query
Replace the current dense vector loading so it reuses the DenseVectorValueBlockLoader - vector loading could be done using a BlockLoaderValueFunction that just adds the vectors to the block. This will reduce code duplication and use a single code path for loading a field.
Testing

…ectorFieldMapper and BlockDocValuesReader

… fields

nik9000 · 2025-10-21T15:37:51Z

A new FieldExtractPreference: FUNCTION. This signals that the field must be extracted using a function.

I'd prefer if we didn't modify the preference for loading. I'd prefer to just attach the function to the configuration and respect it if it is there. And, I suppose, we should assert that the loaded thinks it is loading for the passed function.

nik9000 · 2025-10-21T15:39:17Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

+        float calculateSimilarity(float[] v1, float[] v2);
+
+        float calculateSimilarity(byte[] v1, byte[] v2);
+    }


Watch out for these! They'll be super fast if you only have one subclass. Or two. But as soon as you have three it'll start getting slower. Fine for now, but might be a big deal once you start really getting faster.

I see. We have more than 3 similarity functions so I don't think we'll be able to avoid going megamorphic here. Is there any alternative that you can think of?

nik9000 · 2025-10-21T15:47:17Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

+
+        @Nullable
+        default BlockLoaderValueFunction<?, ?> blockLoaderValueFunction() {
+            throw new UnsupportedOperationException("blockLoaderValueFunction is not supported");


I think it'd be harder to make mistake with this if we didn't default implement it. Though I'd be happy with a "for testing" version of this thing that throws from every method so we don't have to change every single test in follow-up PRs.

Also! Can this return null if there's no function to apply?

I think it'd be harder to make mistake with this if we didn't default implement it.

Yep, my thinking was to reduce the blast radius for this at least at PoC level. I can remove the default when switching to ready for review 👍

Also! Can this return null if there's no function to apply?

I will do that, as I was going to signal this via the FUNCTION extract preference - but I understand your concerns there.

nik9000 · 2025-10-21T15:49:53Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

+        /**
+         * Applies the function to the value passed as parameter, and adds it to the builder that was returned by {@link #builder(BlockLoader.BlockFactory, int)}.
+         */
+        void applyFunctionAndAdd(T value, B builder) throws IOException;


I was thinking we wouldn't actually have any code in this. Or maybe just a:

String name(); Object config();

And the block loaders would detect it and specialize. I think the point of this is that all block loaders would have hand-optimized code for each case. That's what we're looking to do for the vector cases at least.

So you're thinking in directly providing the block loader at this level, that handles both retrieving the values and transforming them. I see your point.

I was thinking that there will be repetitive stuff in terms of retrieving the original values, and that it would be better to add the transformation on top of that - but we can surely do that via subclassing a base class.

I'll try this approach, I was not happy with how this interface looked anyway. Thanks!

nik9000 · 2025-10-21T18:13:29Z

...-core/src/main/java/org/elasticsearch/xpack/esql/core/expression/FieldFunctionAttribute.java

+ * the value in a way that is more efficient than loading the raw field value
+ * and applying the function to it.
+ */
+public class FieldFunctionAttribute extends FieldAttribute {


I don't tend to make new ones of these so I'm never sure what's right. We'll need @alex-spies to comment on this one.

The idea is basically to tell how to extract fields when we want to transform them, and it seemed natural to use a new field that can provide a block loader.

I'm currently iterating on this idea. It will keep the FieldFunctionAttribute class, but will include a Function reference on it. It's the Function the one that can implement the block loader providing method.

nik9000 · 2025-10-21T18:15:39Z

.../src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/SpatialExtent.java

-                case DOC_VALUES -> throw new EsqlIllegalArgumentException("Illegal field extract preference: " + fieldExtractPreference);
+                case DOC_VALUES, FUNCTION -> throw new EsqlIllegalArgumentException(
+                    "Illegal field extract preference: " + fieldExtractPreference
+                );


I was hoping we could get this "for free" by checking on the return BlockLoader. I was thinking it could return the function that it's applying - which would be null for all existing functions and the new one could be your functions. And then we could assert in the test and the prod loader code that they were equal.

Because there's a bunch of field types. Lots of implementations - and doing it in the caller makes sure any failure to take into account the requested field to load comes back as a bug.

Yes, I see your point. We could separate as well the where we get the field data to the how we transform it. They're orthogonal concepts, I will change this. Thanks for the feedback, makes sense!

nik9000 · 2025-10-21T18:20:12Z

...asticsearch/xpack/esql/optimizer/rules/physical/local/VectorSimilarityFunctionsPushdown.java

+    private EvalExec replaceSimilarityFunctionsForFieldTransformations(EvalExec eval) {
+        Map<Attribute, Attribute> replacements = new HashMap<>();
+        EvalExec resultEval = (EvalExec) eval.transformExpressionsDown(VectorSimilarityFunction.class, similarityFunction -> {
+            if (similarityFunction.left() instanceof Literal ^ similarityFunction.right() instanceof Literal) {


I think we always rotate the literal to the right hand side - though maybe that's not done yet?

Will double check, thanks!

nik9000 · 2025-10-21T18:21:11Z

...asticsearch/xpack/esql/optimizer/rules/physical/local/VectorSimilarityFunctionsPushdown.java

+    @SuppressWarnings("unchecked")
+    private EvalExec replaceSimilarityFunctionsForFieldTransformations(EvalExec eval) {
+        Map<Attribute, Attribute> replacements = new HashMap<>();
+        EvalExec resultEval = (EvalExec) eval.transformExpressionsDown(VectorSimilarityFunction.class, similarityFunction -> {


I'd love to make this generic over other function. But I can grab that after you've landed this one. Or Alex or someone. This is a fine way to get it in, I think.

Yes, I'm trying to set the path for that. I'm currently thinking of making a specific interface for providing block loaders that can be implemented by functions, so I think that would be pretty straightforward.

carlosdelest · 2025-10-24T09:44:56Z

Closing in favor of #137002

elasticsearchmachine added 10 commits October 14, 2025 20:25

Add value transformations to BlockLoaderContext, apply them in DenseV…

5fd5f9a

…ectorFieldMapper and BlockDocValuesReader

Add value transformation to shard context

1ace6cd

Add transformations to vector similarity functions

0e12f0b

Add push down rule

63b0974

Introduce new interface BlockValueLoader for loading and transforming…

c5a8f45

… fields

Additional method for a new class in Dense Vector

f5ec0b1

Refactor before bytes

94610c2

Added byte support

c8b67a0

Refactoring

59b9c37

Refactoring and comments

566926d

carlosdelest added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL :Search Relevance/ES|QL Search functionality in ES|QL v9.3.0 labels Oct 17, 2025

[CI] Auto commit changes from spotless

ba779b6

nik9000 reviewed Oct 21, 2025

View reviewed changes

carlosdelest mentioned this pull request Oct 23, 2025

Push vector similarity functions using BlockDocValuesReader #137002

Merged

carlosdelest closed this Oct 24, 2025

phananh1010 mentioned this pull request Nov 6, 2025

Mirror upstream elastic/elasticsearch#137002 for AI review (snapshot of HEAD tree) phananh1010/elasticsearch#233

Closed

Conversation

carlosdelest commented Oct 17, 2025

Uh oh!

nik9000 commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosdelest commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants