planner: Column prune improvement for MPP Join and TableScan+Filter operators#52143
planner: Column prune improvement for MPP Join and TableScan+Filter operators#52143ti-chi-bot[bot] merged 15 commits intopingcap:masterfrom yibin87:column_prune_improve
Conversation
|
Hi @yibin87. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #52143 +/- ##
================================================
+ Coverage 70.7031% 70.8464% +0.1432%
================================================
Files 1487 1546 +59
Lines 439607 465030 +25423
================================================
+ Hits 310816 329457 +18641
- Misses 109317 114910 +5593
- Partials 19474 20663 +1189
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
/test mysql-test |
|
@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold |
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
|
/test unit-test |
|
@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/unhold |
| if !addOneHandle && ds.schema.Len() > len(parentUsedCols) && ds.SCtx().GetSessionVars().IsMPPEnforced() { | ||
| proj := LogicalProjection{ | ||
| Exprs: expression.Column2Exprs(parentUsedCols), | ||
| }.Init(ds.SCtx(), ds.QueryBlockOffset()) | ||
| proj.SetStats(ds.StatsInfo()) | ||
| proj.SetSchema(expression.NewSchema(parentUsedCols...)) | ||
| proj.SetChildren(ds) | ||
| return proj, nil | ||
| } |
There was a problem hiding this comment.
It's not a very good idea to add it here. Let me take a deeper look.
There was a problem hiding this comment.
Please let me know if any suggestions.
There was a problem hiding this comment.
It's not a very good idea to add it here. Let me take a deeper look.
- I think this is not a only mpp task enhancement but also suite for tikv task
- Should we change the output schema of datasource rather than add a projection operator in here ?
There was a problem hiding this comment.
Yeah, for point 1, tikv task will benefit from this if projection push down work, and this will be the next work I'll focus on(will see more detail in issue), currently, limit it to mpp task only.
For point 2, as I know, we can't change the output schema here. Please correct me if I'm wrong @winoros
| if !addOneHandle && ds.schema.Len() > len(parentUsedCols) && ds.SCtx().GetSessionVars().IsMPPEnforced() { | ||
| proj := LogicalProjection{ | ||
| Exprs: expression.Column2Exprs(parentUsedCols), | ||
| }.Init(ds.SCtx(), ds.QueryBlockOffset()) | ||
| proj.SetStats(ds.StatsInfo()) | ||
| proj.SetSchema(expression.NewSchema(parentUsedCols...)) | ||
| proj.SetChildren(ds) | ||
| return proj, nil | ||
| } |
There was a problem hiding this comment.
It's not a very good idea to add it here. Let me take a deeper look.
- I think this is not a only mpp task enhancement but also suite for tikv task
- Should we change the output schema of datasource rather than add a projection operator in here ?
| } | ||
| defaultSchema := BuildPhysicalJoinSchema(p.JoinType, p) | ||
| if p.schema.Len() < defaultSchema.Len() { | ||
| if p.schema.Len() > 0 { |
There was a problem hiding this comment.
We'd better column pruning in logical rule phase rather than attachToTask .
There was a problem hiding this comment.
I think, since this work takes effect only for mpp task(non-mpp task can just prune join operators' output schema to ensure this), it seems reasonable to add it here.
Signed-off-by: yibin <huyibin@pingcap.com>
|
/test pull-br-integration-test |
|
@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elsa0520, winoros The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What problem does this PR solve?
Issue Number: ref #52133
Problem Summary:
When column prune optimization is executed, filter is still included in DataSource operator. Currently, the DataSource operator's output schema should contain columns that are needed by its parent and used by the filter operator. These columns which are only used by the filter operator, can be pruned.
During the process of building MPP tasks, physical join operaotr's schema is reset to its full semantic output:
tidb/pkg/planner/core/task.go
Line 521 in b96f081
We'd better keep the column prune achievements, so that MPP Join can improve performance further(Join only construct joined columns that is needed by its parent operator tiflash#8296).
What changed and how does it work?
For problem 1,add a new logical projection above the DataSource operator during column prune for MPP mode.
For problem 2, add a new physical projection above the PhysicalHashJoin to prune useless columns during Task constructing process.
Check List
Tests
Side effects
Documentation
TPCH 100 Benchmark
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.