[ES|QL] MMR Command - MMROperator and Execution#141324
[ES|QL] MMR Command - MMROperator and Execution#141324markjhoy merged 65 commits intoelastic:mainfrom
Conversation
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/MMROperator.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
| import static org.elasticsearch.xpack.esql.core.type.DataType.INTEGER; | ||
|
|
||
| public class MMR extends UnaryPlan implements TelemetryAware, ExecutesOn.Coordinator, PostAnalysisVerificationAware { | ||
| public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(LogicalPlan.class, "Mmr", MMR::new); |
There was a problem hiding this comment.
Although serialization is not strictly necessary here as this should only run on the coordinator, certain parent plans (e.g. LIMIT) may try and serialize any child plans - and without this (and the writeTo(), etc. implementations) will throw. It theoretically should not hurt anything to allow this to de/serialize.
There was a problem hiding this comment.
I don't think we need serialization, if there's any serialization happening that is a separate bug we need to dig into.
There was a problem hiding this comment.
The issue (and exception thrown) I've found is in the LIMIT's writeTo() method as it explicitly writes the child out... if the child is not a NamedWriteable than it will throw here. For most cases, I would think this is fine, but I've gotten many tests to fail for MMR without implementing this. :/
|
Note: seeing a weird issue with the output of the dense vectors where it's rounding the values. For example in one of the CSV tests, I'm getting a mismatch error due to the results: ... the input vectors appear to be rounding on input. Shouldn't affect the functionality of the operator here, but we need to look a bit further into this. |
carlosdelest
left a comment
There was a problem hiding this comment.
LGTM, thanks for adding the tests and hex string cases!
* initial MMROperator and base diversification * add _score block input * working MMR output;CompleteInputCollectorOperator * add status; add tests; light refactoring * remove dependency on _doc vector input block * fix bad merge * realize queryVector/lambda as VectorData/Float * fix MMRExec input params * update grammar to ensure ON is specified * add status tests * start of MMR csv tests * re-add writable for MMRExec * revert csvtests change * regen ANTLR grammar from last merge * additional CSV tests * light fix csv tests for stability * drop text_vector column in output from CSV tests * add KNN subquery and TEXT_EMBEDDING fxn query vec * add temp capability for BWC tests * add sort for tests * [CI] Auto commit changes from spotless * comment cleanup * light cleanups and refactoring * [CI] Auto commit changes from spotless * remove need for score block as input * remove score block from tests * use implicit literal vector for queryVector * add random null blocks to MMROperator tests * better check for query vector data type * fix broken test * fix statement parser test * support byte/bit/hex vector query vector * [CI] Auto commit changes from spotless * update page output filter (changed since last merge) * refactored import * fix bad byte vectors in test * and cleanup tests again * ensure all byte numbers are in the proper range * cleanup verifier tests * ensure output pages are not empty * ensure query vector data type is not null --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Adds the operator and execution flow for the MMR command as well as pertinent CSV tests
Also performs a light refactoring to add a CompleteInputCollectorOperator base class for operations that require the full set of input pages to be input / processed before any output