Reduce redundant pack reads during building bitmap filter for delta merge case by gengliqi · Pull Request #9876 · pingcap/tiflash

gengliqi · 2025-02-14T15:00:57Z

What problem does this PR solve?

Issue Number: close #9875

Problem Summary:
See #9875
This optimization can provide a significant performance boost when the data being read is not updated frequently.

What is changed and how it works?

Reduce redundant pack reads during bitmap filter building for delta merge case

Run CHBenchmark 1500: Overall 26.5% Improvement

mysql> select sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) delta_rows, TIDB_TABLE from INFORMATION_SCHEMA.TIFLASH_SEGMENTS where TIDB_DATABASE='chbench_delta' group by TIDB_TABLE order by delta_rows;
+------------+------------+
| delta_rows | TIDB_TABLE |
+------------+------------+
|          0 | item       |
|         15 | region     |
|         75 | nation     |
|       4704 | warehouse  |
|      20000 | supplier   |
|      49790 | district   |
|     183924 | new_order  |
|    1022080 | history    |
|    1193288 | orders     |
|    1996224 | customer   |
|    7977425 | stock      |
|   41277141 | order_line |
+------------+------------+
12 rows in set (0.09 sec)

mysql> select sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) delta_rows, sum(stable_rows) stable_rows, 100 * sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) / sum(stable_rows) delta_percent from INFORMATION_SCHEMA.TIFLASH_SEGMENTS where TIDB_DATABASE='chbench_delta';
+------------+-------------+---------------+
| delta_rows | stable_rows | delta_percent |
+------------+-------------+---------------+
|   53724666 |  2283695006 |        2.3525 |
+------------+-------------+---------------+
1 row in set (0.09 sec)

The above table shows the delta rows for each table in chbench_delta database.

Query	Original	Optimization	Improvement
Q1	3.30	1.17	182.05%
Q2	0.90	0.88	2.27%
Q3	4.34	1.86	133.33%
Q4	5.11	2.88	77.43%
Q5	14.07	11.56	21.71%
Q6	2.98	1.00	198.00%
Q7	8.01	5.62	42.53%
Q8	5.05	2.83	78.45%
Q9	46.22	44.01	5.02%
Q10	27.34	25.09	8.97%
Q11	0.56	0.46	21.74%
Q12	5.34	3.22	65.84%
Q13	0.75	0.39	92.31%
Q14	4.05	2.04	98.53%
Q15	13.11	9.36	40.06%
Q16	10.24	10.26	-0.19%
Q17	6.88	2.49	176.31%
Q18	35.46	32.14	10.33%
Q19	0.17	0.04	325.00%
Q20	3.72	1.59	133.96%
Q21	11.53	6.66	73.12%
Q22	0.44	0.10	340.00%
Sum	209.57	165.65	26.51%

All Stable Performance Comparison

The following table presents the performance improvements for All Stable compared to both the original version (with Delta) and the optimized version (with Delta):

After compacting the data, All Stable only shows a 5.86% improvement compared to the current optimized version with delta data.
However, compared to the original version with delta data, All Stable shows a significant 33.93% improvement.

The chbenchmark data was inserted into another database. Then the compact table command is executed. As a result, the data might differ. So there are some differences in the query plans. For example, in Q20, the query plan uses hash partition join and includes late materialization for order_line table in the All Stable version, while original version uses broadcast join and no late materialization for order_line table. This causes Q20 to run slower in All Stable compared to the original version due to exchange overhead and low selectivity.

Query	All Stable	All Stable Improvement over Original	All Stable Improvement over Optimization
Q1	1.00	230.00%	17.00%
Q2	0.87	3.45%	1.15%
Q3	1.39	212.23%	33.81%
Q4	2.71	88.56%	6.27%
Q5	11.10	26.76%	4.14%
Q6	0.76	292.11%	31.58%
Q7	5.47	46.44%	2.74%
Q8	2.39	111.30%	18.41%
Q9	39.85	15.98%	10.44%
Q10	24.50	11.59%	2.41%
Q11	0.51	9.80%	-9.80%
Q12	2.99	78.60%	7.69%
Q13	0.38	97.37%	2.63%
Q14	1.86	117.74%	9.68%
Q15	9.02	45.34%	3.77%
Q16	10.32	-0.78%	-0.58%
Q17	1.55	343.87%	60.65%
Q18	32.01	10.78%	0.41%
Q19	0.04	325.00%	0.00%
Q20	1.81	105.52%	-12.15%
Q21	5.88	96.09%	13.27%
Q22	0.07	528.57%	42.86%
Sum	156.48	33.93%	5.86%

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Improve the performance of `TableScan` by skipping unnecessary row reads

Signed-off-by: gengliqi <gengliqiii@gmail.com>

…p-pack

Signed-off-by: gengliqi <gengliqiii@gmail.com>

dbms/src/Storages/DeltaMerge/File/DMFilePackFilter.cpp

Signed-off-by: gengliqi <gengliqiii@gmail.com>

dbms/src/Storages/DeltaMerge/Segment.cpp

Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com>

Lloyd-Pottiger

lgtm

Signed-off-by: gengliqi <gengliqiii@gmail.com>

JinheLin · 2025-02-27T05:55:55Z

If there is a delete mark in the column file, it seems this situation cannot be directly skipped.

JaySon-Huang · 2025-02-28T03:24:57Z

If there is a delete mark in the column file, it seems this situation cannot be directly skipped.

tiflash/dbms/src/Storages/DeltaMerge/File/DMFilePackFilter.cpp

Lines 342 to 349 in 82932c3

    
           if (handle_res[pack_id] == RSResult::Some || pack_stat.not_clean > 0 
        
               || pack_filter->getMaxVersion(dmfile, pack_id, file_provider, dm_context.scan_context) > start_ts) 
        
           { 
        
               // `not_clean > 0` means there are more than one version for some rowkeys in this pack 
        
               // `pack.max_version > start_ts` means some rows will be filtered by MVCC reading 
        
               // We need to read this pack to do RowKey or MVCC filter. 
        
               continue; 
        
           }

If there is any delete mark in the dmfile, then pack_stat.not_clean > 0 and the pack will be read from disk.

JaySon-Huang

LGTM

ti-chi-bot · 2025-02-28T03:25:21Z

[LGTM Timeline notifier]

Timeline:

2025-02-20 05:25:25.882352166 +0000 UTC m=+1111768.278574229: ☑️ agreed by Lloyd-Pottiger.
2025-02-28 03:25:20.906518774 +0000 UTC m=+585468.859677034: ☑️ agreed by JaySon-Huang.

JaySon-Huang

LGTM

ti-chi-bot · 2025-02-28T03:25:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [JaySon-Huang,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gengliqi · 2025-02-28T04:22:12Z

/retest

ti-chi-bot · 2025-09-25T14:26:05Z

In response to a cherrypick label: new pull request created to branch release-8.5: #10458.
But this PR has conflicts, please resolve them!

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

skip pack for bitmap filter normal

7816a93

Signed-off-by: gengliqi <gengliqiii@gmail.com>

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2025

gengliqi changed the title ~~Reduce redundant pack reads during bitmap filter building for delta merge case~~ Reduce redundant pack reads during building bitmap filter for delta merge case Feb 14, 2025

gengliqi added 2 commits February 15, 2025 13:07

fix test

08738b4

Signed-off-by: gengliqi <gengliqiii@gmail.com>

use binary serach instead of iteration to optimize performance

9e0e2bd

Signed-off-by: gengliqi <gengliqiii@gmail.com>

ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2025

gengliqi added 2 commits February 17, 2025 16:16

Merge remote-tracking branch 'upstream/master' into mvcc-optimize-ski…

a58f18f

…p-pack

tiny fix

747edc2

Signed-off-by: gengliqi <gengliqiii@gmail.com>

JaySon-Huang reviewed Feb 17, 2025

View reviewed changes

dbms/src/Storages/DeltaMerge/File/DMFilePackFilter.cpp Show resolved Hide resolved

fix

65e7946

Signed-off-by: gengliqi <gengliqiii@gmail.com>

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2025

update tests

3658763

Signed-off-by: gengliqi <gengliqiii@gmail.com>

gengliqi force-pushed the mvcc-optimize-skip-pack branch from 5cefac1 to 3658763 Compare February 19, 2025 10:42

gengliqi requested review from JaySon-Huang, JinheLin and Lloyd-Pottiger February 19, 2025 10:44

gengliqi added 3 commits February 19, 2025 23:50

remove config

cf38776

Signed-off-by: gengliqi <gengliqiii@gmail.com>

Merge branch 'master' into mvcc-optimize-skip-pack

895eb3e

fix

0f3b3d8

Signed-off-by: gengliqi <gengliqiii@gmail.com>

Lloyd-Pottiger reviewed Feb 20, 2025

View reviewed changes

dbms/src/Storages/DeltaMerge/Segment.cpp Outdated Show resolved Hide resolved

Update dbms/src/Storages/DeltaMerge/Segment.cpp

db233d1

Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com>

gengliqi requested a review from Lloyd-Pottiger February 20, 2025 03:23

Lloyd-Pottiger approved these changes Feb 20, 2025

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 20, 2025

gengliqi force-pushed the mvcc-optimize-skip-pack branch from 112e089 to e36245d Compare February 20, 2025 07:15

update

b00df5f

Signed-off-by: gengliqi <gengliqiii@gmail.com>

gengliqi force-pushed the mvcc-optimize-skip-pack branch from e36245d to b00df5f Compare February 20, 2025 07:19

ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 27, 2025

Merge branch 'master' into mvcc-optimize-skip-pack

82932c3

JaySon-Huang approved these changes Feb 28, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 28, 2025

JaySon-Huang approved these changes Feb 28, 2025

View reviewed changes

Merge branch 'master' into mvcc-optimize-skip-pack

3c20dbd

ti-chi-bot bot merged commit d6de212 into pingcap:master Feb 28, 2025
5 checks passed

JaySon-Huang mentioned this pull request Jul 9, 2025

Reduce redundant pack reads during building bitmap filter for delta merge case (release-8.5) #10293

Merged

12 tasks

ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Sep 25, 2025

ti-chi-bot mentioned this pull request Sep 25, 2025

Reduce redundant pack reads during building bitmap filter for delta merge case (#9876) #10458

Closed

12 tasks

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Sep 25, 2025

This is an automated cherry-pick of pingcap#9876

984716f

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce redundant pack reads during building bitmap filter for delta merge case#9876

Reduce redundant pack reads during building bitmap filter for delta merge case#9876
ti-chi-bot[bot] merged 14 commits intopingcap:masterfrom
gengliqi:mvcc-optimize-skip-pack

gengliqi commented Feb 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Lloyd-Pottiger left a comment

Uh oh!

JinheLin commented Feb 27, 2025

Uh oh!

JaySon-Huang commented Feb 28, 2025

Uh oh!

JaySon-Huang left a comment

Uh oh!

ti-chi-bot bot commented Feb 28, 2025

Uh oh!

JaySon-Huang left a comment

Uh oh!

ti-chi-bot bot commented Feb 28, 2025

Uh oh!

gengliqi commented Feb 28, 2025

Uh oh!

Uh oh!

ti-chi-bot commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

gengliqi commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Run CHBenchmark 1500: Overall 26.5% Improvement

All Stable Performance Comparison

Check List

Release note

Uh oh!

Uh oh!

Uh oh!

Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

Uh oh!

JinheLin commented Feb 27, 2025

Uh oh!

JaySon-Huang commented Feb 28, 2025

Uh oh!

JaySon-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Feb 28, 2025

[LGTM Timeline notifier]

Uh oh!

JaySon-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Feb 28, 2025

Uh oh!

gengliqi commented Feb 28, 2025

Uh oh!

Uh oh!

ti-chi-bot commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gengliqi commented Feb 14, 2025 •

edited

Loading