Skip to content

Columns: optimize ColumnString filter when selectivity is high (#9987)#10036

Merged
ti-chi-bot[bot] merged 6 commits intopingcap:release-9.0-beta.1from
ti-chi-bot:cherry-pick-9987-to-release-9.0-beta.1
Mar 26, 2025
Merged

Columns: optimize ColumnString filter when selectivity is high (#9987)#10036
ti-chi-bot[bot] merged 6 commits intopingcap:release-9.0-beta.1from
ti-chi-bot:cherry-pick-9987-to-release-9.0-beta.1

Conversation

@ti-chi-bot
Copy link
Member

@ti-chi-bot ti-chi-bot commented Mar 26, 2025

This is an automated cherry-pick of #9987

What problem does this PR solve?

Issue Number: ref #9699, close #10029

Problem Summary:

What is changed and how it works?

following optimization of #9670

optimize the performance of ColumnString filter when the selectivity of filter is high:

For example, when filter is `0111111111111111011111111111111101111111111111110111111111111111`, 
the mask will be `11111111111111110111111111111111101111111111111111011111111111111110`, 
since it does not be `[0]*[1]+` or `[1]+[0]*`, we need to copy each selected row one by one.

Now, we can copy 15 rows at once.

The total elapsed time of TPC-H 50 reduce from 42.9s to 41.1s.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Optimize the performance of ColumnString filter when the selectivity of filter is high. The total elapsed time of TPC-H 50 reduce 4%.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-9.0-beta.1 labels Mar 26, 2025
ref pingcap#6092

Fix related regression caused by pingcap#9661

Before, one query reads pack [start, end) from disk, and add it them to cache, 
meanwhile another query also requests to read pack [start, end), then it need to copy each pack data to a new column.

Now, return the cached column directly.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>

Co-authored-by: JaySon-Huang <tshent@qq.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 26, 2025
@JaySon-Huang
Copy link
Contributor

Also cherry-pick #9994 together to fix the performance regression

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Mar 26, 2025

In my local testing cluster, the tiflash tablescan meets performance regression without these two fixes. But there is no performance regression overall the query.

And the performance regression mainly fixed by #9994. With the commit fixed in 9994, tiflash table scan can shorten 200ms.

image

Waiting for the benchmark environment result.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 26, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 26, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 26, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-26 08:03:22.952706947 +0000 UTC m=+1033896.636943041: ☑️ agreed by JinheLin.
  • 2025-03-26 08:15:17.569984289 +0000 UTC m=+1034611.254220386: ☑️ agreed by Lloyd-Pottiger.

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Mar 26, 2025

Verified that with this PR, the workload on benchmark env takes ~263s to run all queries. Without this PR, it takes ~326s.

@ti-chi-bot ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Mar 26, 2025
@ti-chi-bot ti-chi-bot bot merged commit fdbdde5 into pingcap:release-9.0-beta.1 Mar 26, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-9.0-beta.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants