Skip to content

Reduce redundant pack reads during building bitmap filter for delta merge case#9876

Merged
ti-chi-bot[bot] merged 14 commits intopingcap:masterfrom
gengliqi:mvcc-optimize-skip-pack
Feb 28, 2025
Merged

Reduce redundant pack reads during building bitmap filter for delta merge case#9876
ti-chi-bot[bot] merged 14 commits intopingcap:masterfrom
gengliqi:mvcc-optimize-skip-pack

Conversation

@gengliqi
Copy link
Contributor

@gengliqi gengliqi commented Feb 14, 2025

What problem does this PR solve?

Issue Number: close #9875

Problem Summary:
See #9875
This optimization can provide a significant performance boost when the data being read is not updated frequently.

What is changed and how it works?

Reduce redundant pack reads during bitmap filter building for delta merge case

Run CHBenchmark 1500: Overall 26.5% Improvement

mysql> select sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) delta_rows, TIDB_TABLE from INFORMATION_SCHEMA.TIFLASH_SEGMENTS where TIDB_DATABASE='chbench_delta' group by TIDB_TABLE order by delta_rows;
+------------+------------+
| delta_rows | TIDB_TABLE |
+------------+------------+
|          0 | item       |
|         15 | region     |
|         75 | nation     |
|       4704 | warehouse  |
|      20000 | supplier   |
|      49790 | district   |
|     183924 | new_order  |
|    1022080 | history    |
|    1193288 | orders     |
|    1996224 | customer   |
|    7977425 | stock      |
|   41277141 | order_line |
+------------+------------+
12 rows in set (0.09 sec)

mysql> select sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) delta_rows, sum(stable_rows) stable_rows, 100 * sum(DELTA_MEMTABLE_ROWS+DELTA_PERSISTED_ROWS) / sum(stable_rows) delta_percent from INFORMATION_SCHEMA.TIFLASH_SEGMENTS where TIDB_DATABASE='chbench_delta';
+------------+-------------+---------------+
| delta_rows | stable_rows | delta_percent |
+------------+-------------+---------------+
|   53724666 |  2283695006 |        2.3525 |
+------------+-------------+---------------+
1 row in set (0.09 sec)

The above table shows the delta rows for each table in chbench_delta database.

Query Original Optimization Improvement
Q1 3.30 1.17 182.05%
Q2 0.90 0.88 2.27%
Q3 4.34 1.86 133.33%
Q4 5.11 2.88 77.43%
Q5 14.07 11.56 21.71%
Q6 2.98 1.00 198.00%
Q7 8.01 5.62 42.53%
Q8 5.05 2.83 78.45%
Q9 46.22 44.01 5.02%
Q10 27.34 25.09 8.97%
Q11 0.56 0.46 21.74%
Q12 5.34 3.22 65.84%
Q13 0.75 0.39 92.31%
Q14 4.05 2.04 98.53%
Q15 13.11 9.36 40.06%
Q16 10.24 10.26 -0.19%
Q17 6.88 2.49 176.31%
Q18 35.46 32.14 10.33%
Q19 0.17 0.04 325.00%
Q20 3.72 1.59 133.96%
Q21 11.53 6.66 73.12%
Q22 0.44 0.10 340.00%
Sum 209.57 165.65 26.51%

All Stable Performance Comparison

The following table presents the performance improvements for All Stable compared to both the original version (with Delta) and the optimized version (with Delta):

  • After compacting the data, All Stable only shows a 5.86% improvement compared to the current optimized version with delta data.
  • However, compared to the original version with delta data, All Stable shows a significant 33.93% improvement.

The chbenchmark data was inserted into another database. Then the compact table command is executed. As a result, the data might differ. So there are some differences in the query plans. For example, in Q20, the query plan uses hash partition join and includes late materialization for order_line table in the All Stable version, while original version uses broadcast join and no late materialization for order_line table. This causes Q20 to run slower in All Stable compared to the original version due to exchange overhead and low selectivity.

Query All Stable All Stable Improvement over Original All Stable Improvement over Optimization
Q1 1.00 230.00% 17.00%
Q2 0.87 3.45% 1.15%
Q3 1.39 212.23% 33.81%
Q4 2.71 88.56% 6.27%
Q5 11.10 26.76% 4.14%
Q6 0.76 292.11% 31.58%
Q7 5.47 46.44% 2.74%
Q8 2.39 111.30% 18.41%
Q9 39.85 15.98% 10.44%
Q10 24.50 11.59% 2.41%
Q11 0.51 9.80% -9.80%
Q12 2.99 78.60% 7.69%
Q13 0.38 97.37% 2.63%
Q14 1.86 117.74% 9.68%
Q15 9.02 45.34% 3.77%
Q16 10.32 -0.78% -0.58%
Q17 1.55 343.87% 60.65%
Q18 32.01 10.78% 0.41%
Q19 0.04 325.00% 0.00%
Q20 1.81 105.52% -12.15%
Q21 5.88 96.09% 13.27%
Q22 0.07 528.57% 42.86%
Sum 156.48 33.93% 5.86%

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Improve the performance of `TableScan` by skipping unnecessary row reads

Signed-off-by: gengliqi <gengliqiii@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 14, 2025
@gengliqi gengliqi changed the title Reduce redundant pack reads during bitmap filter building for delta merge case Reduce redundant pack reads during building bitmap filter for delta merge case Feb 14, 2025
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Signed-off-by: gengliqi <gengliqiii@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2025
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Signed-off-by: gengliqi <gengliqiii@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2025
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Signed-off-by: gengliqi <gengliqiii@gmail.com>
Co-authored-by: Lloyd-Pottiger <60744015+Lloyd-Pottiger@users.noreply.github.com>
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 20, 2025
@gengliqi gengliqi force-pushed the mvcc-optimize-skip-pack branch from 112e089 to e36245d Compare February 20, 2025 07:15
Signed-off-by: gengliqi <gengliqiii@gmail.com>
@gengliqi gengliqi force-pushed the mvcc-optimize-skip-pack branch from e36245d to b00df5f Compare February 20, 2025 07:19
@JinheLin
Copy link
Contributor

If there is a delete mark in the column file, it seems this situation cannot be directly skipped.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 27, 2025
@JaySon-Huang
Copy link
Contributor

If there is a delete mark in the column file, it seems this situation cannot be directly skipped.

if (handle_res[pack_id] == RSResult::Some || pack_stat.not_clean > 0
|| pack_filter->getMaxVersion(dmfile, pack_id, file_provider, dm_context.scan_context) > start_ts)
{
// `not_clean > 0` means there are more than one version for some rowkeys in this pack
// `pack.max_version > start_ts` means some rows will be filtered by MVCC reading
// We need to read this pack to do RowKey or MVCC filter.
continue;
}

If there is any delete mark in the dmfile, then pack_stat.not_clean > 0 and the pack will be read from disk.

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 28, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 28, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-20 05:25:25.882352166 +0000 UTC m=+1111768.278574229: ☑️ agreed by Lloyd-Pottiger.
  • 2025-02-28 03:25:20.906518774 +0000 UTC m=+585468.859677034: ☑️ agreed by JaySon-Huang.

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gengliqi
Copy link
Contributor Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit d6de212 into pingcap:master Feb 28, 2025
5 checks passed
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Sep 25, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10458.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Sep 25, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce redundant pack reads during building bitmap filter for delta merge case

5 participants