Skip to content

[ES|QL] MMR Command - MMROperator and Execution#141324

Merged
markjhoy merged 65 commits intoelastic:mainfrom
markjhoy:markjhoy/esql_mmr_command_operator_execution
Feb 12, 2026
Merged

[ES|QL] MMR Command - MMROperator and Execution#141324
markjhoy merged 65 commits intoelastic:mainfrom
markjhoy:markjhoy/esql_mmr_command_operator_execution

Conversation

@markjhoy
Copy link
Copy Markdown
Contributor

@markjhoy markjhoy commented Jan 27, 2026

Adds the operator and execution flow for the MMR command as well as pertinent CSV tests

Also performs a light refactoring to add a CompleteInputCollectorOperator base class for operations that require the full set of input pages to be input / processed before any output

import static org.elasticsearch.xpack.esql.core.type.DataType.INTEGER;

public class MMR extends UnaryPlan implements TelemetryAware, ExecutesOn.Coordinator, PostAnalysisVerificationAware {
public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(LogicalPlan.class, "Mmr", MMR::new);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although serialization is not strictly necessary here as this should only run on the coordinator, certain parent plans (e.g. LIMIT) may try and serialize any child plans - and without this (and the writeTo(), etc. implementations) will throw. It theoretically should not hurt anything to allow this to de/serialize.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need serialization, if there's any serialization happening that is a separate bug we need to dig into.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue (and exception thrown) I've found is in the LIMIT's writeTo() method as it explicitly writes the child out... if the child is not a NamedWriteable than it will throw here. For most cases, I would think this is fine, but I've gotten many tests to fail for MMR without implementing this. :/

@markjhoy
Copy link
Copy Markdown
Contributor Author

markjhoy commented Feb 2, 2026

Note: seeing a weird issue with the output of the dense vectors where it's rounding the values. For example in one of the CSV tests, I'm getting a mismatch error due to the results:

Actual:
    keyword_field:keyword | text_body:text | text_vector:dense_vector
    test1                 | first text     | [0.4, 0.2, 0.4]
    test3                 | third text     | [0.4, 0.1, 0.3]
    test6                 | sixth text     | [0.099999994, 0.099999994, 0.099999994]
Expected:
    keyword_field:keyword | text_body:text | text_vector:dense_vector
    test1                 | first text     | [0.4, 0.2, 0.4]
    test3                 | third text     | [0.4, 0.1, 0.3]
    test6                 | sixth text     | [0.1, 0.1, 0.1]

... the input vectors appear to be rounding on input. Shouldn't affect the functionality of the operator here, but we need to look a bit further into this.

Copy link
Copy Markdown
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the tests and hex string cases!

@markjhoy markjhoy merged commit 8c94606 into elastic:main Feb 12, 2026
34 of 35 checks passed
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Feb 13, 2026
* initial MMROperator and base diversification

* add _score block input

* working MMR output;CompleteInputCollectorOperator

* add status; add tests; light refactoring

* remove dependency on _doc vector input block

* fix bad merge

* realize queryVector/lambda as VectorData/Float

* fix MMRExec input params

* update grammar to ensure ON is specified

* add status tests

* start of MMR csv tests

* re-add writable for MMRExec

* revert csvtests change

* regen ANTLR grammar from last merge

* additional CSV tests

* light fix csv tests for stability

* drop text_vector column in output from CSV tests

* add KNN subquery and TEXT_EMBEDDING fxn query vec

* add temp capability for BWC tests

* add sort for tests

* [CI] Auto commit changes from spotless

* comment cleanup

* light cleanups and refactoring

* [CI] Auto commit changes from spotless

* remove need for score block as input

* remove score block from tests

* use implicit literal vector for queryVector

* add random null blocks to MMROperator tests

* better check for query vector data type

* fix broken test

* fix statement parser test

* support byte/bit/hex vector query vector

* [CI] Auto commit changes from spotless

* update page output filter (changed since last merge)

* refactored import

* fix bad byte vectors in test

* and cleanup tests again

* ensure all byte numbers are in the proper range

* cleanup verifier tests

* ensure output pages are not empty

* ensure query vector data type is not null

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants