Skip to content

Refactor named expression 3#49693

Merged
costin merged 5 commits intoelastic:masterfrom
costin:refactor_named_expression_3
Dec 6, 2019
Merged

Refactor named expression 3#49693
costin merged 5 commits intoelastic:masterfrom
costin:refactor_named_expression_3

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Nov 28, 2019

To recap, Attributes form the properties of a derived table an each
LogicalPlan has Attributes as output since each one can be part of a
query and its result sent to the user.

This change essentially removes the name id comparison so any changes
applied to existing functions should work as long as the functions are
semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.

By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent nodes from getting out of sync due to optimizations.

Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.

Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.

Rename ExpressionId to NameId

Side-effect, simplify the rules as the state for InnerAggs doesn't have
to be contained anymore.

The first commit milestone from refactoring of NamedExpressions.
Essentially there are only 3 types of NamedExpressions:

Alias - user define (implicit or explicit) name
FieldAttribute - field from Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute.

Relates to #46954
Superseeds #49570

To recap, Attributes form the properties of a derived table an each
LogicalPlan has Attributes as output since each one can be part of a
query and its result sent to the user.

This change essentially removes the name id comparison so any changes
applied to existing functions should work as long as the functions are
semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.

By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent nodes from getting out of sync due to optimizations.

Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.

Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.

Rename ExpressionId to NameId

Side-effect, simplify the rules as the state for InnerAggs doesn't have
to be contained anymore.

The first commit milestone from refactoring of NamedExpressions.
Essentially there are only 3 types of NamedExpressions:

Alias - user define (implicit or explicit) name
FieldAttribute - field from Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute.
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (:Search/SQL)

@costin
Copy link
Copy Markdown
Member Author

costin commented Nov 28, 2019

Currently only one type of test is failing, namely:
SELECT PI() + f FROM t GROUP BY PI() + f ORDER BY PI() + f
The reason behind it is that Aggregate returns an attribute reference for its output PI() + f however OrderBy wants to resolve PI() and f individually.
To do so it would need to push f down however since the grouping is only on PI() + f it fails.

The fix would have to check that once f is resolved whether PI() + f matches the output of children and if so, skip the propagation of the newly resolved field.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive amount of changes. Left some minor comments.

countAll
schema::all_names:l|c:l
SELECT COUNT(ALL first_name) all_names, COUNT(*) c FROM test_emp;
countDistinctAndLiteral
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you replace this test? Couldn't you just keep both?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely a mistake inside from merging.


initial | first_name |ASCII(LEFT(first_name,1))
---------------+---------------+-------------------------
initial | first_name |ASCII(LEFT(first_name, 1))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird, I would have thought the whitespace should already have been there (before this PR).

if (plan instanceof Aggregate) {
Aggregate a = (Aggregate) plan;
// aliases inside GROUP BY are irellevant so remove all of them
// aliases inside GROUP BY are irelevant so remove all of them
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both versions are wrong, it should be irrelevant.


return onlyExact.get();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting here seems a bit off.

localFailures.add(fail(c, "No functions allowed (yet); encountered [{}]", c.sourceText()));
onlyExact.set(Boolean.FALSE);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, formatting is off.

Expression field = p.field();
Set<Expression> percentiles = ranksPerField.get(field);
protected LogicalPlan rule(Aggregate agg) {
List<Expression> groupings = agg.groupings();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this one final.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure but what would the code gain by it? A lot of inner variables can be made final but we don't unless the compiler requests that.

&& Expressions.anyMatch(e.children(), Expressions::isNull)) {
return Literal.of(e, null);
}
return Literal.of(e, null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation issue.

// eq matches the boundary but should not be included
// eq outside the upper boundary
compare < 0 ||
// eq matches the boundary but should not be included
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again indentation issues.


changed = true;
}
changed = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indendation.

Comment on lines +293 to +297
if (dthf.calendarInterval() != null) {
key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval(), dthf.zoneId());
} else {
key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.fixedInterval(), dthf.zoneId());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write this block as key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval(), dthf.zoneId());

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because each method returns a different type - the first a String, the second a long so the common type would be Object at which point is unclear what constructor to call...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Sorry, I missed that.

@@ -1,5 +1,5 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation issue.

Copy link
Copy Markdown
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin Thanx a lot for this effort and the massive changes involved to make the code more clear and our lives easier in the future.
I left a series of comments but most of them are minor and code formatting related.

public final class Verifier {
private final Metrics metrics;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

});
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

}

// The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword)
// The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indentation here is wrong.

return true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

return false;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

assertEquals(column, in.value());
assertEquals(Arrays.asList(L(1), L(2)), in.list());
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indentation

assertTrue(and.left() instanceof GreaterThan);
gt = (GreaterThan) and.left();
assertEquals(a, gt.left());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra empty line.

// use scripting for functions
else if (field instanceof Function) {
ScriptTemplate script = ((Function) field).asScript();
if (dthf.calendarInterval() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, could be simplified with dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval()

String lookup = Expressions.id(orderExpression);
GroupByKey group = qContainer.findGroupForAgg(lookup);

// TODO: handle score
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed before merging the PR?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already addressed later in the file (2nd else). Removed the line.

}

public void testSimplifyCaseConditionsFoldCompletely_FoldableElse() {
public void testSimplifyCaseConditionsFoldCompletely() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make these changes?
They were added because of a bug when the conditions were folded but the results could not be folded.

@matriv
Copy link
Copy Markdown
Contributor

matriv commented Nov 30, 2019

@costin can this: #48997 be unmuted now?

@costin
Copy link
Copy Markdown
Member Author

costin commented Dec 5, 2019

Thanks for the feedback which I've incorporated in the latest commit - it's unfortunate the merging resulted in these formatting mistakes but then again, the changes were significant.
I'm now trying to fix the resolution issue.

@matriv
Copy link
Copy Markdown
Contributor

matriv commented Dec 5, 2019

Thx @costin, all comments seemed addressed by now.

Copy link
Copy Markdown
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

matriv added a commit to matriv/elasticsearch that referenced this pull request Dec 6, 2019
The `testReplaceChildren()` has been fixed for Pivot as part
of elastic#49693.

Reverting: elastic#49045
costin added a commit that referenced this pull request Dec 7, 2019
To recap, Attributes form the properties of a derived table.
Each LogicalPlan has Attributes as output since each one can be part of
a query and as such its result are sent to its consumer.
This change essentially removes the name id comparison so any changes
applied to existing expressions should work as long as the said
expressions are semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.
By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent a reference from getting out of sync with its source due to
optimizations.

Essentially going forward there are only 3 types of NamedExpressions:

Alias - user define (implicit or explicit) name
FieldAttribute - field backed by Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute. Typically the Attribute of an Alias.

* Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.
* Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.
* Side-effect, simplify the rules as the state for InnerAggs doesn't
have to be contained anymore.
* Improve ResolveMissingRef rule to handle references to named
non-singular expression tree against the same expression used up the
tree.

#49693 backport to 7.x

(cherry picked from commit 5d095e2)
matriv added a commit that referenced this pull request Dec 9, 2019
The `testReplaceChildren()` has been fixed for Pivot as part
of #49693.

Reverting: #49045
matriv added a commit that referenced this pull request Dec 9, 2019
The `testReplaceChildren()` has been fixed for Pivot as part
of #49693.

Reverting: #49045
(cherry picked from commit 4b9b9ed)
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
To recap, Attributes form the properties of a derived table.
Each LogicalPlan has Attributes as output since each one can be part of a
query and as such its result are sent to its consumer.
This change essentially removes the name id comparison so any changes
applied to existing expressions should work as long as the said expressions
are semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.
By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent a reference from getting out of sync with its source due to 
optimizations.

Essentially going forward there are only 3 types of NamedExpressions:

Alias - user define (implicit or explicit) name
FieldAttribute - field backed by Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute. Typically the Attribute of an Alias.

* Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.
* Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.
* Side-effect, simplify the rules as the state for InnerAggs doesn't have
to be contained anymore.
* Improve ResolveMissingRef rule to handle references to named
non-singular expression tree against the same expression used up the
tree.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
The `testReplaceChildren()` has been fixed for Pivot as part
of elastic#49693.

Reverting: elastic#49045
palesz pushed a commit that referenced this pull request Dec 7, 2020
* Remove the limitation of not being able to use `InnerAggregate`
inside PIVOTs (aggregations using extended and matrix stats)
* The limitation was introduced as part of the original `PIVOT` 
implementation in #46489, but after #49693 it could be lifted.
* Test that the `PIVOT` results in the same query as the 
`GROUP BY`. This should hold across all the 
`AggregateFunction`s we have.
palesz pushed a commit to palesz/elasticsearch that referenced this pull request Dec 7, 2020
* Remove the limitation of not being able to use `InnerAggregate`
inside PIVOTs (aggregations using extended and matrix stats)
* The limitation was introduced as part of the original `PIVOT`
implementation in elastic#46489, but after elastic#49693 it could be lifted.
* Test that the `PIVOT` results in the same query as the
`GROUP BY`. This should hold across all the
`AggregateFunction`s we have.
(cherry-pick 67704b0)
palesz pushed a commit that referenced this pull request Dec 7, 2020
* Remove the limitation of not being able to use `InnerAggregate`
inside PIVOTs (aggregations using extended and matrix stats)
* The limitation was introduced as part of the original `PIVOT`
implementation in #46489, but after #49693 it could be lifted.
* Test that the `PIVOT` results in the same query as the
`GROUP BY`. This should hold across all the
`AggregateFunction`s we have.

(cherry-picked from  67704b0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants