Columns: optimize ColumnString filter when selectivity is high (#9987) by ti-chi-bot · Pull Request #10036 · pingcap/tiflash

ti-chi-bot · 2025-03-26T02:52:43Z

This is an automated cherry-pick of #9987

What problem does this PR solve?

Issue Number: ref #9699, close #10029

Problem Summary:

What is changed and how it works?

following optimization of #9670

optimize the performance of ColumnString filter when the selectivity of filter is high:

For example, when filter is `0111111111111111011111111111111101111111111111110111111111111111`, 
the mask will be `11111111111111110111111111111111101111111111111111011111111111111110`, 
since it does not be `[0]*[1]+` or `[1]+[0]*`, we need to copy each selected row one by one.

Now, we can copy 15 rows at once.

The total elapsed time of TPC-H 50 reduce from 42.9s to 41.1s.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Optimize the performance of ColumnString filter when the selectivity of filter is high. The total elapsed time of TPC-H 50 reduce 4%.

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

ref pingcap#6092 Fix related regression caused by pingcap#9661 Before, one query reads pack [start, end) from disk, and add it them to cache, meanwhile another query also requests to read pack [start, end), then it need to copy each pack data to a new column. Now, return the cached column directly. Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com> Signed-off-by: JaySon-Huang <tshent@qq.com> Co-authored-by: JaySon-Huang <tshent@qq.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>

JaySon-Huang · 2025-03-26T04:37:25Z

Also cherry-pick #9994 together to fix the performance regression

JaySon-Huang · 2025-03-26T04:44:57Z

In my local testing cluster, the tiflash tablescan meets performance regression without these two fixes. But there is no performance regression overall the query.

And the performance regression mainly fixed by #9994. With the commit fixed in 9994, tiflash table scan can shorten 200ms.

Waiting for the benchmark environment result.

ti-chi-bot · 2025-03-26T08:15:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-03-26T08:15:18Z

[LGTM Timeline notifier]

Timeline:

2025-03-26 08:03:22.952706947 +0000 UTC m=+1033896.636943041: ☑️ agreed by JinheLin.
2025-03-26 08:15:17.569984289 +0000 UTC m=+1034611.254220386: ☑️ agreed by Lloyd-Pottiger.

JaySon-Huang · 2025-03-26T08:15:24Z

Verified that with this PR, the workload on benchmark env takes ~263s to run all queries. Without this PR, it takes ~326s.

Lloyd-Pottiger added 5 commits March 26, 2025 02:52

Columns: optimize filter when selectivity is high

9f55eef

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

refine

ced1553

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

fix ub

fc1daf2

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

address comments

183bad1

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

add microbenchmark

1a56ddb

Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>

ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-9.0-beta.1 labels Mar 26, 2025

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Mar 26, 2025

ti-chi-bot assigned JaySon-Huang Mar 26, 2025

ti-chi-bot mentioned this pull request Mar 26, 2025

Columns: optimize ColumnString filter when selectivity is high #9987

Merged

12 tasks

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 26, 2025

JinheLin approved these changes Mar 26, 2025

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 26, 2025

Lloyd-Pottiger approved these changes Mar 26, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 26, 2025

JaySon-Huang mentioned this pull request Mar 26, 2025

Storages: enhance data sharing column cache (#9994) #10037

Closed

12 tasks

ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Mar 26, 2025

ti-chi-bot bot merged commit fdbdde5 into pingcap:release-9.0-beta.1 Mar 26, 2025
4 checks passed

JaySon-Huang mentioned this pull request May 20, 2025

Compared with v8.5.1, v9.0.0-beta1 olap has a 9.1% performance regression in ossinxxx-x86 workload #10029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Columns: optimize ColumnString filter when selectivity is high (#9987)#10036

Columns: optimize ColumnString filter when selectivity is high (#9987)#10036
ti-chi-bot[bot] merged 6 commits intopingcap:release-9.0-beta.1from
ti-chi-bot:cherry-pick-9987-to-release-9.0-beta.1

ti-chi-bot commented Mar 26, 2025 •

edited by JaySon-Huang

Loading

Uh oh!

JaySon-Huang commented Mar 26, 2025

Uh oh!

JaySon-Huang commented Mar 26, 2025 •

edited

Loading

Uh oh!

ti-chi-bot bot commented Mar 26, 2025

Uh oh!

ti-chi-bot bot commented Mar 26, 2025

Uh oh!

JaySon-Huang commented Mar 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ti-chi-bot commented Mar 26, 2025 • edited by JaySon-Huang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Uh oh!

JaySon-Huang commented Mar 26, 2025

Uh oh!

JaySon-Huang commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ti-chi-bot bot commented Mar 26, 2025

Uh oh!

ti-chi-bot bot commented Mar 26, 2025

[LGTM Timeline notifier]

Uh oh!

JaySon-Huang commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ti-chi-bot commented Mar 26, 2025 •

edited by JaySon-Huang

Loading

JaySon-Huang commented Mar 26, 2025 •

edited

Loading

JaySon-Huang commented Mar 26, 2025 •

edited

Loading