Refactor named expression 3 by costin · Pull Request #49693 · elastic/elasticsearch

costin · 2019-11-28T17:55:56Z

To recap, Attributes form the properties of a derived table an each
LogicalPlan has Attributes as output since each one can be part of a
query and its result sent to the user.

This change essentially removes the name id comparison so any changes
applied to existing functions should work as long as the functions are
semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.

By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent nodes from getting out of sync due to optimizations.

Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.

Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.

Rename ExpressionId to NameId

Side-effect, simplify the rules as the state for InnerAggs doesn't have
to be contained anymore.

The first commit milestone from refactoring of NamedExpressions.
Essentially there are only 3 types of NamedExpressions:

Alias - user define (implicit or explicit) name
FieldAttribute - field from Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute.

Relates to #46954
Superseeds #49570

To recap, Attributes form the properties of a derived table an each LogicalPlan has Attributes as output since each one can be part of a query and its result sent to the user. This change essentially removes the name id comparison so any changes applied to existing functions should work as long as the functions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent nodes from getting out of sync due to optimizations. Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. Rename ExpressionId to NameId Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. The first commit milestone from refactoring of NamedExpressions. Essentially there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field from Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute.

elasticmachine · 2019-11-28T17:55:58Z

Pinging @elastic/es-search (:Search/SQL)

costin · 2019-11-28T17:59:59Z

Currently only one type of test is failing, namely:
SELECT PI() + f FROM t GROUP BY PI() + f ORDER BY PI() + f
The reason behind it is that Aggregate returns an attribute reference for its output PI() + f however OrderBy wants to resolve PI() and f individually.
To do so it would need to push f down however since the grouping is only on PI() + f it fails.

The fix would have to check that once f is resolved whether PI() + f matches the output of children and if so, skip the propagation of the newly resolved field.

astefan

Impressive amount of changes. Left some minor comments.

astefan · 2019-11-29T07:45:51Z

x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec

-countAll
-schema::all_names:l|c:l
-SELECT COUNT(ALL first_name) all_names, COUNT(*) c FROM test_emp;
+countDistinctAndLiteral


Why did you replace this test? Couldn't you just keep both?

Likely a mistake inside from merging.

astefan · 2019-11-29T07:47:29Z

x-pack/plugin/sql/qa/src/main/resources/math.csv-spec


-    initial    |  first_name   |ASCII(LEFT(first_name,1))
---------------+---------------+-------------------------
+    initial    |  first_name   |ASCII(LEFT(first_name, 1))


That's weird, I would have thought the whitespace should already have been there (before this PR).

astefan · 2019-11-29T07:57:02Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Analyzer.java

            if (plan instanceof Aggregate) {
                Aggregate a = (Aggregate) plan;
-                // aliases inside GROUP BY are irellevant so remove all of them
+                // aliases inside GROUP BY are irelevant so remove all of them


I think both versions are wrong, it should be irrelevant.

astefan · 2019-11-29T09:54:29Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java


        return onlyExact.get();
-    }
+        }


Formatting here seems a bit off.

astefan · 2019-11-29T10:29:22Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

                localFailures.add(fail(c, "No functions allowed (yet); encountered [{}]", c.sourceText()));
                onlyExact.set(Boolean.FALSE);
-            }
+    }


Again, formatting is off.

astefan · 2019-11-29T13:17:24Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

-                Expression field = p.field();
-                Set<Expression> percentiles = ranksPerField.get(field);
+        protected LogicalPlan rule(Aggregate agg) {
+            List<Expression> groupings = agg.groupings();


I think you can make this one final.

sure but what would the code gain by it? A lot of inner variables can be made final but we don't unless the compiler requests that.

astefan · 2019-11-29T13:23:55Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

                    && Expressions.anyMatch(e.children(), Expressions::isNull)) {
-                return Literal.of(e, null);
-            }
+                        return Literal.of(e, null);


Indentation issue.

astefan · 2019-11-29T13:25:49Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

-                                 // eq matches the boundary but should not be included
+                            // eq outside the upper boundary
+                            compare < 0 ||
+                            // eq matches the boundary but should not be included


Again indentation issues.

astefan · 2019-11-29T13:26:04Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java


-                            changed = true;
-                        }
+                                    changed = true;


Indendation.

astefan · 2019-11-29T13:48:52Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/planner/QueryFolder.java

+                            if (dthf.calendarInterval() != null) {
+                                key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval(), dthf.zoneId());
+                            } else {
+                                key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.fixedInterval(), dthf.zoneId());
+                            }


You could write this block as key = new GroupByDateHistogram(aggId, QueryTranslator.nameOf(exp), dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval(), dthf.zoneId());

No, because each method returns a different type - the first a String, the second a long so the common type would be Object at which point is unclear what constructor to call...

You're right. Sorry, I missed that.

matriv · 2019-11-30T20:00:58Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

@@ -1,5 +1,5 @@
 /*
- * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
+* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one


indentation issue.

matriv

@costin Thanx a lot for this effort and the massive changes involved to make the code more clear and our lives easier in the future.
I left a series of comments but most of them are minor and code formatting related.

matriv · 2019-11-30T20:01:16Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

 public final class Verifier {
    private final Metrics metrics;
-
+    


Please revert.

matriv · 2019-11-30T20:01:46Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

Please revert.

matriv · 2019-11-30T20:02:51Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

    }

-    // The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword)
+            // The grouping can not be an aggregate function or an inexact field (e.g. text without a keyword)


I think indentation here is wrong.

matriv · 2019-11-30T20:03:26Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

            return true;
        }
-
+        


Please revert.

matriv · 2019-11-30T20:03:35Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/analysis/analyzer/Verifier.java

        return false;
    }
-
+    


matriv · 2019-11-30T21:01:24Z

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

        assertEquals(column, in.value());
        assertEquals(Arrays.asList(L(1), L(2)), in.list());
-    }
+}


wrong indentation

matriv · 2019-11-30T21:01:41Z

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

+        assertTrue(and.left() instanceof GreaterThan);
+        gt = (GreaterThan) and.left();
+        assertEquals(a, gt.left());
+


extra empty line.

matriv · 2019-11-30T21:09:19Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/planner/QueryFolder.java

+                        // use scripting for functions
+                        else if (field instanceof Function) {
+                            ScriptTemplate script = ((Function) field).asScript();
+                            if (dthf.calendarInterval() != null) {


same as above, could be simplified with dthf.calendarInterval() != null ? dthf.calendarInterval() : dthf.fixedInterval()

matriv · 2019-11-30T21:11:49Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/planner/QueryFolder.java

+                    String lookup = Expressions.id(orderExpression);
+                    GroupByKey group = qContainer.findGroupForAgg(lookup);

+                    // TODO: handle score


Should be fixed before merging the PR?

It's already addressed later in the file (2nd else). Removed the line.

matriv · 2019-11-30T21:15:40Z

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

    }

-    public void testSimplifyCaseConditionsFoldCompletely_FoldableElse() {
+    public void testSimplifyCaseConditionsFoldCompletely() {


Why did you make these changes?
They were added because of a bug when the conditions were folded but the results could not be folded.

matriv · 2019-11-30T21:19:14Z

@costin can this: #48997 be unmuted now?

costin · 2019-12-05T12:07:51Z

Thanks for the feedback which I've incorporated in the latest commit - it's unfortunate the merging resulted in these formatting mistakes but then again, the changes were significant.
I'm now trying to fix the resolution issue.

matriv · 2019-12-05T19:50:12Z

Thx @costin, all comments seemed addressed by now.

astefan

LGTM

non-singular expression tree against the same expression used up the tree.

The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045

To recap, Attributes form the properties of a derived table. Each LogicalPlan has Attributes as output since each one can be part of a query and as such its result are sent to its consumer. This change essentially removes the name id comparison so any changes applied to existing expressions should work as long as the said expressions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent a reference from getting out of sync with its source due to optimizations. Essentially going forward there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field backed by Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute. Typically the Attribute of an Alias. * Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. * Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. * Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. * Improve ResolveMissingRef rule to handle references to named non-singular expression tree against the same expression used up the tree. #49693 backport to 7.x (cherry picked from commit 5d095e2)

The `testReplaceChildren()` has been fixed for Pivot as part of #49693. Reverting: #49045

The `testReplaceChildren()` has been fixed for Pivot as part of #49693. Reverting: #49045 (cherry picked from commit 4b9b9ed)

To recap, Attributes form the properties of a derived table. Each LogicalPlan has Attributes as output since each one can be part of a query and as such its result are sent to its consumer. This change essentially removes the name id comparison so any changes applied to existing expressions should work as long as the said expressions are semantically equivalent. This change enforces the hashCode and equals which has the side-effect of using hashCode as identifiers for each expression. By removing any property from an Attribute, the various components need to look the original source for comparison which, while annoying, should prevent a reference from getting out of sync with its source due to optimizations. Essentially going forward there are only 3 types of NamedExpressions: Alias - user define (implicit or explicit) name FieldAttribute - field backed by Elasticsearch ReferenceAttribute - a reference to another source acting as an Attribute. Typically the Attribute of an Alias. * Remove the usage of NamedExpression as basis for all Expressions. Instead, restrict their use only for named context, such as projections by using Aliasing instead. * Remove different types of Attributes and allow only FieldAttribute, UnresolvedAttribute and ReferenceAttribute. To avoid issues with rewrites, resolve the references inside the QueryContainer so the information always stays on the source. * Side-effect, simplify the rules as the state for InnerAggs doesn't have to be contained anymore. * Improve ResolveMissingRef rule to handle references to named non-singular expression tree against the same expression used up the tree.

The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045

* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have.

* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in elastic#46489, but after elastic#49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-pick 67704b0)

* Remove the limitation of not being able to use `InnerAggregate` inside PIVOTs (aggregations using extended and matrix stats) * The limitation was introduced as part of the original `PIVOT` implementation in #46489, but after #49693 it could be lifted. * Test that the `PIVOT` results in the same query as the `GROUP BY`. This should hold across all the `AggregateFunction`s we have. (cherry-picked from 67704b0)

costin added 2 commits November 28, 2019 10:38

Tweak rebase on top of master

c197210

costin added >refactoring :Analytics/SQL SQL querying v8.0.0 labels Nov 28, 2019

costin requested review from astefan, bpintea and matriv November 28, 2019 17:55

costin mentioned this pull request Nov 28, 2019

SQL: Refactor usage of NamedExpression #49570

Closed

astefan reviewed Nov 29, 2019

View reviewed changes

costin mentioned this pull request Nov 29, 2019

SQL: Refactor named expression #46954

Closed

matriv reviewed Nov 30, 2019

View reviewed changes

costin mentioned this pull request Dec 2, 2019

Extract common/reusable components from SQL for EQL #49773

Closed

6 tasks

Address feedback

de6d165

astefan approved these changes Dec 6, 2019

View reviewed changes

costin added 2 commits December 6, 2019 14:30

Improve ResolveMissingRef rule to handle references to named

f027fbe

non-singular expression tree against the same expression used up the tree.

Merge branch 'master' into refactor_named_expression_2

c134d20

costin added the high hanging fruit label Dec 6, 2019

costin merged commit 5d095e2 into elastic:master Dec 6, 2019

costin deleted the refactor_named_expression_3 branch December 6, 2019 16:10

This was referenced Dec 6, 2019

SQL: [Tests] Fix replaceChildren of Pivot #49004

Closed

[CI] NodeSubclassTests#testReplaceChildren fails for Pivot #48900

Closed

matriv mentioned this pull request Dec 6, 2019

SQL: [Tests] Unmute Pivot in NodeSublassTests #49925

Merged

matriv added a commit to matriv/elasticsearch that referenced this pull request Dec 6, 2019

SQL: [Tests] Unmute Pivot from NodeSublassTests

0ca68c0

The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045

costin mentioned this pull request Dec 7, 2019

SQL: Refactor usage of NamedExpression (#49693) #49963

Merged

matriv added a commit that referenced this pull request Dec 9, 2019

SQL: [Tests] Unmute Pivot from NodeSublassTests (#49925)

4b9b9ed

The `testReplaceChildren()` has been fixed for Pivot as part of #49693. Reverting: #49045

matriv added a commit that referenced this pull request Dec 9, 2019

SQL: [Tests] Unmute Pivot from NodeSublassTests (#49925)

48e7420

The `testReplaceChildren()` has been fixed for Pivot as part of #49693. Reverting: #49045 (cherry picked from commit 4b9b9ed)

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020

SQL: [Tests] Unmute Pivot from NodeSublassTests (elastic#49925)

79befab

The `testReplaceChildren()` has been fixed for Pivot as part of elastic#49693. Reverting: elastic#49045

matriv mentioned this pull request Apr 6, 2020

SQL: NPE when using the same alias for a projection and an aggregate and GROUPed BY #46396

Closed

verkhovin mentioned this pull request May 9, 2020

#46396 fix npe on ambiguous group by #56489

Closed

palesz mentioned this pull request Dec 3, 2020

SQL: Enable the InnerAggregates inside PIVOT #65792

Merged

palesz mentioned this pull request Dec 7, 2020

SQL: Enable the InnerAggregates inside PIVOT (#65792) #65987

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

costin commented Nov 28, 2019

Uh oh!

elasticmachine commented Nov 28, 2019

Uh oh!

costin commented Nov 28, 2019

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv commented Nov 30, 2019

Uh oh!

costin commented Dec 5, 2019

Uh oh!

matriv commented Dec 5, 2019

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development