Replace remote Enrich hack with proper handling of remote Enrich#134967
Replace remote Enrich hack with proper handling of remote Enrich#134967smalyshev merged 96 commits intoelastic:mainfrom
Conversation
alex-spies
left a comment
There was a problem hiding this comment.
I had a first look at the new optimizer rules and left some comments.
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/TopN.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichLimit.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichLimit.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichLimit.java
Outdated
Show resolved
Hide resolved
alex-spies
left a comment
There was a problem hiding this comment.
Ok, I think there's one (final) bugfix require in HoistRemoteEnrichTopN, but it's easily fixable and the hoisting looks right to me other than that - see below.
| // Replace the missing attributes in order expressions with their aliases | ||
| for (Order expr : topN.order()) { | ||
| rewrittenExpressions.add(expr.transformUp(Attribute.class, attr -> { | ||
| if (missingNames.contains(attr.name())) { |
There was a problem hiding this comment.
nit: comparing by name isn't wrong here, but usually we use an AttributeSet, where .contains works by NameId. That'd avoid the dance where we start with a Set<Attribute> (not the same!) and then create a Set<String> from it.
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopN.java
Outdated
Show resolved
Hide resolved
| case Project pr -> { | ||
| List<NamedExpression> allFields = new LinkedList<>(pr.projections()); | ||
| allFields.addAll(eval.fields()); | ||
| yield pr.withProjections(allFields); |
There was a problem hiding this comment.
I think this remark is still valid. In fact, I think this is a bug.
Because we use transformDown, we won't just traverse until we hit the TopN; we will traverse the whole plan and update every project we find.
And even if we only traverse until we hit the TopN, there may be projections on the way that don't have access to the renamed attributes (because another plan dropped them earlier).
I think we can do this instead, in two passes:
- pass: update the topn (and place the eval over it)
- pass: transformUp the plan and update only those projections whose inputs contain the renamed attributes, so these projections don't drop them.
alex-spies
left a comment
There was a problem hiding this comment.
This is great, thank you @smalyshev ! Finally, remote enrich will be working correctly and we don't have to work around it!
That was a seriously tricky one, and I'm really happy you made it work!
I have only some last, minor comments - let's
!
| package org.elasticsearch.xpack.esql.plan.logical; | ||
|
|
||
| /** | ||
| * This interface marks a command which does not add or remove rows. |
There was a problem hiding this comment.
Please, let's update the precise definition of this interface. I gave a suggestion above. This interface is implicitly "any command that we can hoist topn and limits past", but currently it says "can be commutated with limits" - which is both too much (LIMIT X | MY_COMMAND has to be the same as local LIMIT X | MY_COMMAND | LIMIT X) and too little (doesn't say anything about topns).
| case Project pr -> { | ||
| List<NamedExpression> allFields = new LinkedList<>(pr.projections()); | ||
| allFields.addAll(eval.fields()); | ||
| yield pr.withProjections(allFields); |
There was a problem hiding this comment.
Rather than testing the whole optimizer chain, you can construct a specific logical plan and test the application of a single rule pass.
Alternatively, you can invent a test-only logical plan node that implements CardinalityPreserving and place a projection under it. The projection won't travel upwards because there's no pushdown rule for the new plan node.
| case Project pr -> { | ||
| List<NamedExpression> allFields = new LinkedList<>(pr.projections()); | ||
| allFields.addAll(eval.fields()); | ||
| yield pr.withProjections(allFields); |
There was a problem hiding this comment.
That would be a failure since if somebody drops our attributes we can no longer carry them to topN copy. Unless there is a possibility of nodes dropping attributes and then magically un-dropping them, the verification step below should catch it I think?
The verification step will fail later downstream with a rather generic server error in this case ("optimized incorrectly due to missing attributes"). It'd be a tad nicer to check or assert that we're not creating a broken plan to begin with.
Consider an invented plan node NewProjection, which is silly because it does the same as a Project without being one. (Less silly would be plans that always output the same, fixed attributes; we don't have such plans, yet, but we could!)
If you have TopN BY field1, field2 | NewProjection | Project | remote Enrich, then placing the EVAL $$renamed$field1 = field1, $$renamed$field2 = field2 after TopN will not be sufficient, because the renamed attributes get dropped by NewProjection. If we now go and just stuff the $$renamed$fieldX attributes into the output of the Project - the remote Enrich will think "yay! I got the attributes I needed to pass to the new TopN!" but, in fact, our plan will just be deeply broken. And plan verification will catch this, but it'd be nicer to not create a broken plan to begin with.
This can be prevented by checking (or at least, asserting) that the InputSet of the Project still contains $$renamed$fieldX. For this to be correct, we need to transformUp (not down!) to propagate the renamed attributes through all the projects that may still be in the plan.
This is quite defensive, but it'd also tackle the problem that we didn't stop updating projects even below the original topn.
But your solution of just stopping early works, too, of course! So, please consider this a nit as of now.
...in/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimitsTests.java
Show resolved
Hide resolved
...st/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/HoistRemoteEnrichTopNTests.java
Show resolved
Hide resolved
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…stic#134967) - The remote Enrich hack and the verification workaround that it necessitates are removed - Limit with remote Enrich is handled by duplicating the limits around Enrich - TopN with remote Enrich is handled by duplicating around Enrich and adjusting projections - Introduced `CardinalityPreserving` interface to mark nodes that preserve cardinality - Introduced the capability to run optimizations only on coordinator plan (`CoordinatorOnly` interface) - Added post-optimizer check that ensures remote Enrich is not moved to the coordinator plan (cherry picked from commit fd6e962) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/PhysicalVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/PostOptimizationPhasePlanVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimits.java
#134967) (#136901) * Replace remote Enrich hack with proper handling of remote Enrich (#134967) - The remote Enrich hack and the verification workaround that it necessitates are removed - Limit with remote Enrich is handled by duplicating the limits around Enrich - TopN with remote Enrich is handled by duplicating around Enrich and adjusting projections - Introduced `CardinalityPreserving` interface to mark nodes that preserve cardinality - Introduced the capability to run optimizations only on coordinator plan (`CoordinatorOnly` interface) - Added post-optimizer check that ensures remote Enrich is not moved to the coordinator plan (cherry picked from commit fd6e962) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/PhysicalVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/PostOptimizationPhasePlanVerifier.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PushDownAndCombineLimits.java * fix merge
…stic#134967) - The remote Enrich hack and the verification workaround that it necessitates are removed - Limit with remote Enrich is handled by duplicating the limits around Enrich - TopN with remote Enrich is handled by duplicating around Enrich and adjusting projections - Introduced `CardinalityPreserving` interface to mark nodes that preserve cardinality - Introduced the capability to run optimizations only on coordinator plan (`CoordinatorOnly` interface) - Added post-optimizer check that ensures remote Enrich is not moved to the coordinator plan
CardinalityPreservinginterface to mark nodes that preserve cardinalityCoordinatorOnlyinterface)TODO: this does not fix the issue of local (not remote) Lookup Join always being forced on the coordinator by Limits, we may address this in the followup (#136227), this one is meaty enough as it is.
Closes #115897
Fixes #118531