Skip to content

PageStorage: Fix tiflash wn oom issue by introducing DeltaTreeOnlySnapshot (#10410)#10448

Merged
ti-chi-bot[bot] merged 5 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-10410-to-release-8.5
Sep 22, 2025
Merged

PageStorage: Fix tiflash wn oom issue by introducing DeltaTreeOnlySnapshot (#10410)#10448
ti-chi-bot[bot] merged 5 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-10410-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #10410

What problem does this PR solve?

Issue Number: close #10382, close #10278

Problem Summary:

Slow query requests prevent the release of PageStorage snapshots, causing persistent issues where multi-versioned data in PageStorage memory and multi-versioned columnar storage data on disk cannot be compacted. However, under the storage-compute separation architecture, TiFlash aims to support the capability of restoring new Region replicas from S3 , which motivates storing high-frequency write-and-delete data such as RaftLog into PageStorage. As a result, with Raft messages sustained at 12K QPS, a 10-minute-long query causes the number of Pages in PageStorage to increase by 7.2 million and the number of multi-versions to rise by 17.8 million. Due to the design of PageStorage, this leads to an additional 6 GB of memory consumption, and may also be accompanied by degraded read and write performance in PageStorage.

What is changed and how it works?

* UniversalPageStorage Provide two kinds of snapshot
  * GeneralSnapshot: protect all data, used for general-purpose reading on UniversalPageStorage
  * DeltaTreeOnlySnapshot: protect only data in DeltaTree engine, used for handling reading requests
* UniversalPageStorage GC
  * The page_ids from raft layer is not protected by `DeltaTreeOnlySnapshot`
  * If the page_id is from raft layer and there is no general snapshot, clean up entries more aggressively to reduce memory usage.
* WriteNode snapshot GC
  * The write node snapshot timeout is defined by compute node `EstablishDisaggTaskRequest`
  * The default snapshot timeout increase from 5 minutes to 30 minutes

Manual Testing result

# use br to restore a dataset with chbenchmark 1500
tiup br:v8.5.2 restore db --db=chbenchmark -s s3://benchmark/ch-1500 --check-requirements=false \
 --pd 10.2.12.81:6580 --keyspace-name ks1
# start tp workload with 32 tp threads, last for 90 minutes
tiup bench ch --host 10.2.12.81 -P 8081 --warehouses 1500 run -D chbenchmark -T 32 -t 0 --time 90m
# hack the code to mock that a reading request last for 15 minutes without releasing snapshot on write node, and `dt_snapshot_delta_tree_only = false`
https://github.com/pingcap/tiflash/pull/10410/commits/13e79da504d720506ea52ba52851b4128a022e14
# checkout the memory usage of tiflash write node
# update `dt_snapshot_delta_tree_only = false` and mock another reading request last for 15 minutes without releasing snapshot on write node

According to the test result

  • 21:09~21:25, acquire a GeneralSnapshot as before this PR does.
    • The memory usage peak reach 17.4GB. Increase more than 12.7GB for the frequent insert and delete raft pages
    • The number of pages stored in UniPageStorage reach 19.6 million, and versioned entries reach 49.5 million
    • The disk usage of UniPageStorage increase from 6GB to 24GB
  • 21:34~21:50, acquire a DeltaTreeOnlySnapshot
    • The memory usage peak remain less than 5.8GB. Increase only 1.7GB from the lower watermark
    • The number of pages stored in UniPageStorage is only 3.4 million, and versioned entries is 5.7 million
    • The disk usage of UniPageStorage increase from 6GB to 16GB
image

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    described as above
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix a OOM issue when there is slow query under compute and storage disagg arch

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Sep 19, 2025
@ti-chi-bot
Copy link
Member Author

@JaySon-Huang This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 19, 2025

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@JaySon-Huang JaySon-Huang force-pushed the cherry-pick-10410-to-release-8.5 branch from d78f7f1 to 141c7c6 Compare September 19, 2025 07:01
@ti-chi-bot ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Sep 19, 2025
Signed-off-by: JaySon-Huang <tshent@qq.com>
ref pingcap#6233

Storage: Cleanup StoragePool interface
* Remove useless newLogReader interface from StoragePool
* Remove useless always true `snapshot_read` param from StorageSnapshot
* Refine unit tests of PageStorageMixedTest

Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 22, 2025
Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,JaySon-Huang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 22, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 22, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-09-22 02:20:42.395978919 +0000 UTC m=+236852.466472592: ☑️ agreed by JaySon-Huang.
  • 2025-09-22 02:58:16.885009942 +0000 UTC m=+239106.955503648: ☑️ agreed by CalvinNeo.

@JaySon-Huang
Copy link
Contributor

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 22, 2025
@ti-chi-bot ti-chi-bot bot merged commit df69a6f into pingcap:release-8.5 Sep 22, 2025
4 checks passed
@ti-chi-bot ti-chi-bot bot deleted the cherry-pick-10410-to-release-8.5 branch September 22, 2025 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants