Conversation
To recap, Attributes form the properties of a derived table an each LogicalPlan has Attributes as output since each one can be part of a query and its result sent to the user. This change essentially removes the name id comparison so any changes applied to existing functions should work as long as the functions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent nodes from getting out of sync due to optimizations. Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. Rename ExpressionId to NameId Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. The first commit milestone from refactoring of NamedExpressions. Essentially there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field from Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute.
|
Pinging @elastic/es-search (:Search/SQL) |
|
Currently only one type of test is failing, namely: The fix would have to check that once |
astefan
left a comment
There was a problem hiding this comment.
Impressive amount of changes. Left some minor comments.
| countAll | ||
| schema::all_names:l|c:l | ||
| SELECT COUNT(ALL first_name) all_names, COUNT(*) c FROM test_emp; | ||
| countDistinctAndLiteral |
There was a problem hiding this comment.
Why did you replace this test? Couldn't you just keep both?
There was a problem hiding this comment.
Likely a mistake inside from merging.
|
|
||
| initial | first_name |ASCII(LEFT(first_name,1)) | ||
| ---------------+---------------+------------------------- | ||
| initial | first_name |ASCII(LEFT(first_name, 1)) |
There was a problem hiding this comment.
That's weird, I would have thought the whitespace should already have been there (before this PR).
| if (plan instanceof Aggregate) { | ||
| Aggregate a = (Aggregate) plan; | ||
| // aliases inside GROUP BY are irellevant so remove all of them | ||
| // aliases inside GROUP BY are irelevant so remove all of them |
There was a problem hiding this comment.
I think both versions are wrong, it should be irrelevant.
|
|
||
| return onlyExact.get(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Formatting here seems a bit off.
| localFailures.add(fail(c, "No functions allowed (yet); encountered [{}]", c.sourceText())); | ||
| onlyExact.set(Boolean.FALSE); | ||
| } | ||
| } |
| Expression field = p.field(); | ||
| Set<Expression> percentiles = ranksPerField.get(field); | ||
| protected LogicalPlan rule(Aggregate agg) { | ||
| List<Expression> groupings = agg.groupings(); |
There was a problem hiding this comment.
I think you can make this one final.
There was a problem hiding this comment.
sure but what would the code gain by it? A lot of inner variables can be made final but we don't unless the compiler requests that.
| && Expressions.anyMatch(e.children(), Expressions::isNull)) { | ||
| return Literal.of(e, null); | ||
| } | ||
| return Literal.of(e, null); |
| // eq matches the boundary but should not be included | ||
| // eq outside the upper boundary | ||
| compare < 0 || | ||
| // eq matches the boundary but should not be included |
|
|
||
| changed = true; | ||
| } | ||
| changed = true; |
| if (dthf.calendarInterval() != null) { | ||
| key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval(), dthf.zoneId()); | ||
| } else { | ||
| key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.fixedInterval(), dthf.zoneId()); | ||
| } |
There was a problem hiding this comment.
You could write this block as key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval(), dthf.zoneId());
There was a problem hiding this comment.
No, because each method returns a different type - the first a String, the second a long so the common type would be Object at which point is unclear what constructor to call...
There was a problem hiding this comment.
You're right. Sorry, I missed that.
| @@ -1,5 +1,5 @@ | |||
| /* | |||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | |||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | |||
| public final class Verifier { | ||
| private final Metrics metrics; | ||
|
|
||
| }); | ||
| } | ||
|
|
||
| } | ||
|
|
||
| // The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword) | ||
| // The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword) |
There was a problem hiding this comment.
I think indentation here is wrong.
| return true; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| assertEquals(column, in.value()); | ||
| assertEquals(Arrays.asList(L(1), L(2)), in.list()); | ||
| } | ||
| } |
| assertTrue(and.left() instanceof GreaterThan); | ||
| gt = (GreaterThan) and.left(); | ||
| assertEquals(a, gt.left()); | ||
|
|
| // use scripting for functions | ||
| else if (field instanceof Function) { | ||
| ScriptTemplate script = ((Function) field).asScript(); | ||
| if (dthf.calendarInterval() != null) { |
There was a problem hiding this comment.
same as above, could be simplified with dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval()
| String lookup = Expressions.id(orderExpression); | ||
| GroupByKey group = qContainer.findGroupForAgg(lookup); | ||
|
|
||
| // TODO: handle score |
There was a problem hiding this comment.
Should be fixed before merging the PR?
There was a problem hiding this comment.
It's already addressed later in the file (2nd else). Removed the line.
| } | ||
|
|
||
| public void testSimplifyCaseConditionsFoldCompletely_FoldableElse() { | ||
| public void testSimplifyCaseConditionsFoldCompletely() { |
There was a problem hiding this comment.
Why did you make these changes?
They were added because of a bug when the conditions were folded but the results could not be folded.
|
Thanks for the feedback which I've incorporated in the latest commit - it's unfortunate the merging resulted in these formatting mistakes but then again, the changes were significant. |
|
Thx @costin, all comments seemed addressed by now. |
non-singular expression tree against the same expression used up the tree.
The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045
To recap, Attributes form the properties of a derived table. Each LogicalPlan has Attributes as output since each one can be part of a query and as such its result are sent to its consumer. This change essentially removes the name id comparison so any changes applied to existing expressions should work as long as the said expressions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent a reference from getting out of sync with its source due to optimizations. Essentially going forward there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field backed by Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute. Typically the Attribute of an Alias. * Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. * Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. * Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. * Improve ResolveMissingRef rule to handle references to named non-singular expression tree against the same expression used up the tree. #49693 backport to 7.x (cherry picked from commit 5d095e2)
To recap, Attributes form the properties of a derived table. Each LogicalPlan has Attributes as output since each one can be part of a query and as such its result are sent to its consumer. This change essentially removes the name id comparison so any changes applied to existing expressions should work as long as the said expressions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent a reference from getting out of sync with its source due to optimizations. Essentially going forward there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field backed by Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute. Typically the Attribute of an Alias. * Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. * Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. * Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. * Improve ResolveMissingRef rule to handle references to named non-singular expression tree against the same expression used up the tree.
The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have.
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in elastic#46489, but after elastic#49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-pick 67704b0)
* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-picked from 67704b0)
To recap, Attributes form the properties of a derived table an each
LogicalPlan has Attributes as output since each one can be part of a
query and its result sent to the user.
This change essentially removes the name id comparison so any changes
applied to existing functions should work as long as the functions are
semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.
By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent nodes from getting out of sync due to optimizations.
Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.
Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.
Rename ExpressionId to NameId
Side-effect, simplify the rules as the state for InnerAggs doesn't have
to be contained anymore.
The first commit milestone from refactoring of NamedExpressions.
Essentially there are only 3 types of NamedExpressions:
Alias - user define (implicit or explicit) name
FieldAttribute - field from Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute.
Relates to #46954
Superseeds #49570