-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[refactor](Nereids) refactor column pruning #17579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
|
TeamCity pipeline, clickbench performance test result: |
|
run buildall |
0dc3b93 to
daca07b
Compare
|
run p0 |
daca07b to
651f64f
Compare
|
run buildall |
|
run buildall |
651f64f to
b008264
Compare
|
run buildall |
b008264 to
0ebdbb0
Compare
|
run buildall |
0ebdbb0 to
fa9f569
Compare
|
run buildall |
1 similar comment
|
run buildall |
91daa1a to
c441dec
Compare
|
run buildall |
|
@qzsee PTAL |
|
run buildall |
751c940 to
e12b20c
Compare
|
run buildall |
e12b20c to
eea820d
Compare
|
run buildall |
eea820d to
ddcc855
Compare
|
run buildall |
7f6eb12 to
54f417f
Compare
|
run buildall |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
This pr refactor the column pruning by the visitor, the good sides 1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate 2. support shrink output field in some plans, this can skip some useless operations so improvement example: ```sql select id from ( select id, sum(age) from student group by id )a ``` we should prune the useless `sum (age)` in the aggregate. before refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`apache#4], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ``` after refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ```
Proposed changes
This pr refactor the column pruning by the visitor, the good sides
OutputPrunableif the plan contains output field or do nothing if not contains output field, don't need to add new rule likePruneXxxChildColumns, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregateexample:
we should prune the useless
sum (age)in the aggregate.before refactor:
after refactor:
Checklist(Required)