-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](nereids) Fix the expr id are same but different expr when agg table with random distribute #52993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](nereids) Fix the expr id are same but different expr when agg table with random distribute #52993
Conversation
…able with random distribute
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
| } | ||
| Alias alias = new Alias(exprId, ImmutableList.of(function), col.getName(), | ||
| olapScan.qualified(), true); | ||
| Alias alias = new Alias(StatementScopeIdGenerator.newExprId(), ImmutableList.of(function), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a new ctor for Alias accept four args as children, name, qualifier and nameFromChild
TPC-H: Total hot run time: 33745 ms |
TPC-DS: Total hot run time: 186196 ms |
ClickBench: Total hot run time: 29.67 s |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
…able with random distribute (#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
…able with random distribute (apache#52993) If agg table is random hash distribute, would add aggregate node on scan. The aggregate function alias expr id is same to the child expr id of alias. such as query sql is `select * from db1.tagg` the query plan is as following, and the `sum(b#1) AS `b`#1`, alias expr id is same to the child expr id of alias, the id is 1 this would cause hidden problems ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#1] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`#1], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042452160, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ``` the pr fix this, and the expression change to `sum(b#1) AS `b`apache#4` ```sql LogicalResultSink[30] ( outputExprs=[a#0, b#4] ) +--LogicalAggregate[29] ( groupByExpr=[a#0], outputExpr=[a#0, sum(b#1) AS `b`apache#4], hasRepeat=false, stats=1 ) +--LogicalOlapScan ( qualified=internal.db1.tagg, indexName=<index_not_selected>, selectedIndexId=1752042065062, preAgg=ON, operativeCol=[a#0, b#1], stats=1 ) ```
What problem does this PR solve?
If agg table is random hash distribute, would add aggregate node on scan.
The aggregate function alias expr id is same to the child expr id of alias.
such as query sql is
select * from db1.taggthe query plan is as following, and the
sum(b#1) ASb#1, alias expr id is same to the child expr id of alias, the id is 1this would cause hidden problems
the pr fix this, and the expression change to
sum(b#1) ASb#4Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)