planner: Add HashJoin<-Receiver specific physicalPlan column pruner#38536
planner: Add HashJoin<-Receiver specific physicalPlan column pruner#38536ti-chi-bot merged 23 commits intopingcap:masterfrom yibin87:physical_column_prune
Conversation
Signed-off-by: yibin <huyibin@pingcap.com>
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
/cc @windtalker @winoros |
|
/cc @chrysan |
|
/rebuild |
|
/re-build |
winoros
left a comment
There was a problem hiding this comment.
Also, you need to change the struct from
type PhysicalExchangeSender struct {
basePhysicalPlan
...
}
to
type PhysicalExchangeSender struct {
physicalSchemaProducer
...
}
The first one means that it doesn't hold its own schema(output columns), just uses the child's.
|
And the possible place that we would meet the same issue is the place where the parent doesn't need the full schema of the child. A clear way may be to pass the parent col to the physical property. And check the parent col when you create a sender. The above solution is based on the fact that we don't project the operator's columns which are not needed by their parent(A inline projection inside the operator). If we support the inline project inside the operator. We don't need these things. |
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
planner/core/optimizer_test.go
Outdated
| "github.com/pingcap/tidb/expression" | ||
| "github.com/pingcap/tidb/parser/ast" |
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
|
/cc @Yisaer |
|
/run-unit-tests |
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
|
/run-unit-tests |
Signed-off-by: yibin <huyibin@pingcap.com>
|
/run-unit-test |
xuyifangreeneyes
left a comment
There was a problem hiding this comment.
It is possible to add a FT for the case?(check the result of explain select ...) rest LGTM
planner/core/optimizer.go
Outdated
| for i, mppCol := range sender.HashCols { | ||
| exprCols[i] = mppCol.Col | ||
| } | ||
| exprUsed := expression.GetUsedList(exprCols, sender.Schema()) |
There was a problem hiding this comment.
Maybe hashCols and hashUsed are more clear than exprCols and exprUsed.
There was a problem hiding this comment.
Nice suggestion, done.
planner/core/optimizer.go
Outdated
| ch := sender.children[0] | ||
| proj := PhysicalProjection{ | ||
| Exprs: usedExprs, | ||
| }.Init(sctx, ch.statsInfo(), 0) |
There was a problem hiding this comment.
Do we need to set selectBlockOffset here?
There was a problem hiding this comment.
Set to child's selectBlockOffset now.
Signed-off-by: yibin <huyibin@pingcap.com>
|
/run-mysql-test |
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 0e036e4 |
TiDB MergeCI notify🔴 Bad News! New failing [2] after this pr merged.
|
Signed-off-by: yibin huyibin@pingcap.com
What problem does this PR solve?
Issue Number: close #38511
Problem Summary:
In some situations, current ColumnPrune optimizer for logical plan can't eliminate useless columns in exchange sender/receivers whose parent node's HashJoin. And such useless columns consume significant CPU utilities and network bandwith.
For excample, a HashJoin node may has following plan, the ExchangeSender/Receiver has useless cols: col1, col2:
// HashJoin[col0, col3; col0==col3] <- ExchangeReceiver[col0, col1, col2] <- ExchangeSender[col0, col1, col2] <- Selection[col0, col1, col2; col1 < col2] <- TableScan[col0, col1, col2]//<- ExchangeReceiver1[col3] <- ExchangeSender1[col3] <- TableScan1[col3]Although it is hard to figure out the exact root cause of these cases, it is easy to add a simple physical column pruner to eliminate useless columns.
In this PR, choose to add a physical projection under PhysicalExchangeSender to do the column pruning. It is considered safer than directly pruning the useless columns. And it costs a little, because the projection will be executed in memory.
And it can be considered a double check even if the root cause is completed fixed in future.
What is changed and how it works?
Check List
Tests
TPCH_100, before and after this PR, query results match.
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.