Skip to content

planner: Column prune improvement for MPP Join and TableScan+Filter operators#52143

Merged
ti-chi-bot[bot] merged 15 commits intopingcap:masterfrom
yibin87:column_prune_improve
Apr 10, 2024
Merged

planner: Column prune improvement for MPP Join and TableScan+Filter operators#52143
ti-chi-bot[bot] merged 15 commits intopingcap:masterfrom
yibin87:column_prune_improve

Conversation

@yibin87
Copy link
Contributor

@yibin87 yibin87 commented Mar 27, 2024

What problem does this PR solve?

Issue Number: ref #52133

Problem Summary:

  1. Useless columns used in filter expressions won't be pruned
    When column prune optimization is executed, filter is still included in DataSource operator. Currently, the DataSource operator's output schema should contain columns that are needed by its parent and used by the filter operator. These columns which are only used by the filter operator, can be pruned.
  2. MPP physical join operator abandons the column prune achievements
    During the process of building MPP tasks, physical join operaotr's schema is reset to its full semantic output:
    p.schema = BuildPhysicalJoinSchema(p.JoinType, p)

    We'd better keep the column prune achievements, so that MPP Join can improve performance further(Join only construct joined columns that is needed by its parent operator tiflash#8296).

What changed and how does it work?

For problem 1,add a new logical projection above the DataSource operator during column prune for MPP mode.
For problem 2, add a new physical projection above the PhysicalHashJoin to prune useless columns during Task constructing process.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

TPCH 100 Benchmark

image

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-tests-checked size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 27, 2024
@tiprow
Copy link

tiprow bot commented Mar 27, 2024

Hi @yibin87. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yibin87 yibin87 changed the title Column prune improvement planner: Column prune improvement Mar 27, 2024
@yibin87 yibin87 changed the title planner: Column prune improvement planner: Column prune improvement for MPP tasks Mar 27, 2024
@codecov
Copy link

codecov bot commented Mar 27, 2024

Codecov Report

Merging #52143 (f12bcef) into master (e925628) will increase coverage by 0.1432%.
Report is 110 commits behind head on master.
The diff coverage is 100.0000%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #52143        +/-   ##
================================================
+ Coverage   70.7031%   70.8464%   +0.1432%     
================================================
  Files          1487       1546        +59     
  Lines        439607     465030     +25423     
================================================
+ Hits         310816     329457     +18641     
- Misses       109317     114910      +5593     
- Partials      19474      20663      +1189     
Flag Coverage Δ
integration 50.3545% <4.8780%> (?)
unit 70.9565% <100.0000%> (+0.4355%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 34.5816% <ø> (-11.2364%) ⬇️

@yibin87
Copy link
Contributor Author

yibin87 commented Mar 27, 2024

/test mysql-test

@tiprow
Copy link

tiprow bot commented Mar 27, 2024

@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yibin87
Copy link
Contributor Author

yibin87 commented Mar 27, 2024

/cc @elsa0520 @winoros

@ti-chi-bot ti-chi-bot bot requested review from elsa0520 and winoros March 27, 2024 08:41
@yibin87
Copy link
Contributor Author

yibin87 commented Mar 27, 2024

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2024
yibin87 added 12 commits March 27, 2024 17:08
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Mar 28, 2024

/test unit-test

@tiprow
Copy link

tiprow bot commented Mar 28, 2024

@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yibin87
Copy link
Contributor Author

yibin87 commented Mar 28, 2024

/unhold

@ti-chi-bot ti-chi-bot bot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/needs-tests-checked labels Mar 28, 2024
Comment on lines +419 to +427
if !addOneHandle && ds.schema.Len() > len(parentUsedCols) && ds.SCtx().GetSessionVars().IsMPPEnforced() {
proj := LogicalProjection{
Exprs: expression.Column2Exprs(parentUsedCols),
}.Init(ds.SCtx(), ds.QueryBlockOffset())
proj.SetStats(ds.StatsInfo())
proj.SetSchema(expression.NewSchema(parentUsedCols...))
proj.SetChildren(ds)
return proj, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a very good idea to add it here. Let me take a deeper look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me know if any suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a very good idea to add it here. Let me take a deeper look.

  1. I think this is not a only mpp task enhancement but also suite for tikv task
  2. Should we change the output schema of datasource rather than add a projection operator in here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for point 1, tikv task will benefit from this if projection push down work, and this will be the next work I'll focus on(will see more detail in issue), currently, limit it to mpp task only.
For point 2, as I know, we can't change the output schema here. Please correct me if I'm wrong @winoros

@yibin87 yibin87 requested a review from winoros April 1, 2024 08:08
@yibin87 yibin87 changed the title planner: Column prune improvement for MPP tasks planner: Column prune improvement for MPP Join and TableScan+Filter operators Apr 1, 2024
Comment on lines +419 to +427
if !addOneHandle && ds.schema.Len() > len(parentUsedCols) && ds.SCtx().GetSessionVars().IsMPPEnforced() {
proj := LogicalProjection{
Exprs: expression.Column2Exprs(parentUsedCols),
}.Init(ds.SCtx(), ds.QueryBlockOffset())
proj.SetStats(ds.StatsInfo())
proj.SetSchema(expression.NewSchema(parentUsedCols...))
proj.SetChildren(ds)
return proj, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a very good idea to add it here. Let me take a deeper look.

  1. I think this is not a only mpp task enhancement but also suite for tikv task
  2. Should we change the output schema of datasource rather than add a projection operator in here ?

}
defaultSchema := BuildPhysicalJoinSchema(p.JoinType, p)
if p.schema.Len() < defaultSchema.Len() {
if p.schema.Len() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better column pruning in logical rule phase rather than attachToTask .

Copy link
Contributor Author

@yibin87 yibin87 Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, since this work takes effect only for mpp task(non-mpp task can just prune join operators' output schema to ensure this), it seems reasonable to add it here.

@yibin87 yibin87 requested a review from elsa0520 April 2, 2024 01:40
Signed-off-by: yibin <huyibin@pingcap.com>
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 7, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Apr 10, 2024

/test pull-br-integration-test

@tiprow
Copy link

tiprow bot commented Apr 10, 2024

@yibin87: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@elsa0520 elsa0520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elsa0520, winoros

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 10, 2024
@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 10, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-04-07 14:43:10.062636216 +0000 UTC m=+800651.590176764: ☑️ agreed by winoros.
  • 2024-04-10 06:13:33.403849954 +0000 UTC m=+1029274.931390500: ☑️ agreed by elsa0520.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants