ESQL: Prevent circular alias references in DeduplicateAggs#139175
ESQL: Prevent circular alias references in DeduplicateAggs#139175elasticsearchmachine merged 19 commits intoelastic:mainfrom
DeduplicateAggs#139175Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
bpintea
left a comment
There was a problem hiding this comment.
Thanks @kanoshiou for the fix!
Didn't look too deep at it yet, but we'd need unit tests too. This should also make it easier to understand how the defect occurs and how the fix patches it. 🙏
| ; | ||
|
|
||
|
|
||
| circularRefComplexEnrich |
There was a problem hiding this comment.
Could this test be "minimised" i.e. reduced to a minimal reproducer? This is not a must, however, if the other tests do actually that; but it would be, if there are multiple "minimum" scenarios that trigger the issue.
Also, could we make it "human" friendly. Those random identifiers are a pain to track.
Finally, would it be OK to capitalise the language keywords, to stay consistent? I'm not sure if that's always the case with these csv-spec tests, but it makes it again easier to parse the query when reviewing.
There was a problem hiding this comment.
In another PR, I was asked to include the original generated query in the CSV tests, so I have consistently followed that requirement. I assumed it was mandatory when addressing generative CSV test failures. I was also asked to include the original issue in the comment above the new capability.
| /** | ||
| * Fix for circular reference in alias chains during PushDownEnrich and aggregate deduplication. | ||
| * Prevents "Potential cycle detected" errors when aliases reference each other. | ||
| * https://github.com/elastic/elasticsearch/issues/XXXXX |
There was a problem hiding this comment.
Did you intend to add the PR number here? Or really an issue?
In any case, the GH link is optional, git will contain the history (but up to you).
There was a problem hiding this comment.
Sorry, this was AI generated and I didn’t review it carefully. This situation will not happen again.
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
|
Thank you for your review, @bpintea! Could you please advise on the types of tests that should be added? Would |
|
Ah, sorry, that wasn't clear.
Yes, the patch updates a logical plan optimiser rule, so some tests in the LogicalPlanOptimizerTests suite should do. |
|
buildkite test this please |
| assertThat(agg2.name(), is("b")); | ||
| var count2 = as(agg2.child(), Count.class); | ||
| var field2 = as(count2.field(), ReferenceAttribute.class); | ||
| assertThat(field2.name(), is("$$language_name$temp_name$23")); |
There was a problem hiding this comment.
This check will fail. If you execute the full suite you see a different id than if you only execute the particular test.
It's best to just check the name starts with "$$language_name$temp_name$"
There was a problem hiding this comment.
Thanks for your review! I have updated this test.
ncordon
left a comment
There was a problem hiding this comment.
The PR looks good from my viewpoint, thanks a lot for addressing this @kanoshiou! The only think I would maybe tweak would be what we are testing exactly.
@bpintea opinions?
| FROM firewall_logs*,languages_lookup | ||
| | RENAME language_name AS message | ||
| | LOOKUP JOIN message_types_lookup ON message | ||
| | DISSECT message "%{a}" |
There was a problem hiding this comment.
Why do we need tests with DISECT?
I think we should just have one test with RENAME after the ENRICH and with RENAME before the ENRICH instead
There was a problem hiding this comment.
This actually comes from #138346 (comment), and I didn't look into it in depth. I can remove this test if you’d like, since I believe the query below is the minimal reproducer.
FROM languages_lookup
| RENAME language_name AS message
| ENRICH languages_policy ON language_code
| STATS a = count(to_lower(language_name)), b = count(language_name)
There was a problem hiding this comment.
I'm inclined to leave only that query if you don't mind 😄
There was a problem hiding this comment.
I will remove circularRefComplexEnrich and circularRefEnrichDissect once the CI finishes 😄
There was a problem hiding this comment.
Actually I think it might be good to test all possible combinations of aliasing ENRICH can introduce:
FROM languages_lookup
| RENAME language_name AS message
| ENRICH languages_policy ON language_code
| STATS a = count(to_lower(language_name)), b = count(message)FROM languages_lookup
| ENRICH languages_policy ON language_code
| RENAME language_name AS message
| STATS a = count(to_lower(message)), b = count(message)FROM languages_lookup
| ENRICH languages_policy ON language_code WITH message=language_name
| RENAME language_name AS message
| STATS a = count(to_lower(message)), b = count(message)FROM languages_lookup
| ENRICH languages_policy ON language_code WITH message=language_name
| STATS a = count(to_lower(language_name)), b = count(message)Do you mind adding those to the pr? After that I'm happy to approve and merge
There was a problem hiding this comment.
I'm glad to add these tests. Thanks for providing them! 😊
|
buildkite test this please |
# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
|
Hi @alex-spies @bpintea, I have updated the branch. Regarding the issue, I see two possible solutions here:
Which one do you think is better? I'd like to know your preference. |
alex-spies
left a comment
There was a problem hiding this comment.
Update the nameId of ReferenceAttribute when modifying the generated ENRICH names. This is a safe change that fundamentally resolves potential circular references in the future. The only downside is that the new nameIds are often semantically meaningless.
I think this is the best change we can make.
It's a little unfortunate that Enrich doesn't just always have aliases for its output fields. ... | EVAL 2*x will still generate an alias behind the scenes, whereas ... | ENRICH policy ON field WITH field, foo = otherfield will only create an alias for otherfield.
However, Enrich's internal fields that say which physical fields we should fetch from the enrich policy don't matter to the plan - downstream nodes can only reference the output fields. Arguably, the enrich policy fields could be represented without the use of attributes or aliases altogether.
That is to say, the name ids of the original attributes are always semantically meaningless IMHO - they're only needed in the case without aliases. But ENRICH WITH field is equivalent to ENRICH WITH field = field; and turning an un-aliased enrich field into an aliased one with a new name id for the physical enrich policy field - that's essentially the same as re-writing ENRICH WITH field into ENRICH WITH field = field (+ renaming the left hand side of the alias to deal with the name conflict).
| UnaryPlan finalGeneratingPlan = generatingPlanWithRenamedAttributes; | ||
| if (finalGeneratingPlan instanceof Enrich enrich && renameGeneratedAttributeTo.isEmpty() == false) { | ||
| finalGeneratingPlan = (UnaryPlan) enrich.transformExpressionsOnly(Alias.class, alias -> { | ||
| if (alias.child() instanceof ReferenceAttribute ra && renameGeneratedAttributeTo.containsKey(ra.name())) { | ||
| return alias.replaceChild(ra.withId(new NameId())); | ||
| } | ||
| return alias; | ||
| }); | ||
| } |
There was a problem hiding this comment.
I think we could contain the change more inside Enrich.java, by assigning a new name id when calling withGeneratedNames here.
If that still passes tests, we should probably also add a comment saying "it's safe to generate new name ids when assigning new names - the original attribute representing the field in the enrich index cannot be directly referenced in downstream plans, anyway".
|
Thank you for the review, @alex-spies! I’ve updated the branch, and it should be ready for testing now. |
|
buildkite test this please |
bpintea
left a comment
There was a problem hiding this comment.
Thanks for this contribution, @kanoshiou!
There was a problem hiding this comment.
It'd have been great to have one test with ENRICH + SORT (besides RENAME), but this can come in a next PR (you may add one yourself, @kanoshiou, or I'll make a mental note to do it myself in some other PR).
There was a problem hiding this comment.
Thanks for your review @bpintea! I will add this test in my next PR ✍️
💔 Backport failed
You can use sqren/backport to manually backport by running |
|
Hi @bpintea, If you don’t have the time, I’m happy to take care of the manual backport myself. |
…#139175) Fixes circular reference detection in alias chains during `DeduplicateAggs` optimizations. The `PushDownEnrich` optimization can create circular references when pushing `Enrich` past `Project` operations, causing `ql_illegal_argument_exception: Potential cycle detected` errors when `DeduplicateAggs`. Example query: ```esql FROM languages_lookup | RENAME language_name AS message | ENRICH languages_policy ON language_code | STATS a = count(to_lower(language_name)), b = count(language_name) ``` Skip aliases that would create circular references by checking if `alias.toAttribute()` equals the resolved `alias.child()`. Closes elastic#138346 Closes elastic#139541 (cherry picked from commit 2289e17)
…#139175) ## Summary Fixes circular reference detection in alias chains during `DeduplicateAggs` optimizations. ## Problem The `PushDownEnrich` optimization can create circular references when pushing `Enrich` past `Project` operations, causing `ql_illegal_argument_exception: Potential cycle detected` errors when `DeduplicateAggs`. Example query: ```esql FROM languages_lookup | RENAME language_name AS message | ENRICH languages_policy ON language_code | STATS a = count(to_lower(language_name)), b = count(language_name) ``` ## Changes Skip aliases that would create circular references by checking if `alias.toAttribute()` equals the resolved `alias.child()`. Closes elastic#138346 Closes elastic#139541 (cherry picked from commit 2289e17) # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…#139175) ## Summary Fixes circular reference detection in alias chains during `DeduplicateAggs` optimizations. ## Problem The `PushDownEnrich` optimization can create circular references when pushing `Enrich` past `Project` operations, causing `ql_illegal_argument_exception: Potential cycle detected` errors when `DeduplicateAggs`. Example query: ```esql FROM languages_lookup | RENAME language_name AS message | ENRICH languages_policy ON language_code | STATS a = count(to_lower(language_name)), b = count(language_name) ``` ## Changes Skip aliases that would create circular references by checking if `alias.toAttribute()` equals the resolved `alias.child()`. Closes elastic#138346 Closes elastic#139541 (cherry picked from commit 2289e17) # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/enrich.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
#139615) ## Summary Fixes circular reference detection in alias chains during `DeduplicateAggs` optimizations. ## Problem The `PushDownEnrich` optimization can create circular references when pushing `Enrich` past `Project` operations, causing `ql_illegal_argument_exception: Potential cycle detected` errors when `DeduplicateAggs`. Example query: ```esql FROM languages_lookup | RENAME language_name AS message | ENRICH languages_policy ON language_code | STATS a = count(to_lower(language_name)), b = count(language_name) ``` ## Changes Skip aliases that would create circular references by checking if `alias.toAttribute()` equals the resolved `alias.child()`. Closes #138346 Closes #139541 (cherry picked from commit 2289e17) # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
…139175) (#139617) * ESQL: Prevent circular alias references in `DeduplicateAggs` (#139175) ## Summary Fixes circular reference detection in alias chains during `DeduplicateAggs` optimizations. ## Problem The `PushDownEnrich` optimization can create circular references when pushing `Enrich` past `Project` operations, causing `ql_illegal_argument_exception: Potential cycle detected` errors when `DeduplicateAggs`. Example query: ```esql FROM languages_lookup | RENAME language_name AS message | ENRICH languages_policy ON language_code | STATS a = count(to_lower(language_name)), b = count(language_name) ``` ## Changes Skip aliases that would create circular references by checking if `alias.toAttribute()` equals the resolved `alias.child()`. Closes #138346 Closes #139541 (cherry picked from commit 2289e17) # Conflicts: # x-pack/plugin/esql/qa/server/src/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/GenerativeRestTest.java # x-pack/plugin/esql/qa/testFixtures/src/main/resources/enrich.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java * unmute GenerativeIT * Fix tests Lookup indices are not loaded in multi-cluster tests in 9.1. Instead of languages_lookup we can also just use languages: same index, just without the lookup mode set. * Revert "Fix tests" This reverts commit b0696b5. Turns out, we cannot use the languages index because this one is also used for enrich, and the multi cluster tests index this two times, on the remote AND on the local cluster. * Use index that isn't duplicated in multi-cluster * Fix more tests... --------- Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Summary
Fixes circular reference detection in alias chains during
DeduplicateAggsoptimizations.Problem
The
PushDownEnrichoptimization can create circular references when pushingEnrichpastProjectoperations, causingql_illegal_argument_exception: Potential cycle detectederrors whenDeduplicateAggs.Example query:
Changes
Skip aliases that would create circular references by checking if
alias.toAttribute()equals the resolvedalias.child().Closes #138346
Closes #139541