Skip to content

Fix PageStorage GC with high valid rate PageFile#2436

Merged
ti-srebot merged 15 commits intopingcap:masterfrom
JaySon-Huang:fix_ps_gc_with_high_valid_rate
Aug 3, 2021
Merged

Fix PageStorage GC with high valid rate PageFile#2436
ti-srebot merged 15 commits intopingcap:masterfrom
JaySon-Huang:fix_ps_gc_with_high_valid_rate

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jul 19, 2021

What problem does this PR solve?

Issue Number: close #2382

Problem Summary:
There could be a chance that one big PageFile contains many valid bytes, but we still need to rewrite it with another PageFiles to a newer PageFile (so that they have the higher compact_seq) to make GC move forward.

2021-07-14 13:27:32.667 <Information> root: Running GC, [round=1] [num_gc=10]
2021-07-14 13:27:32.675 <Trace> PageStorage: PageCtl Before gc, 241941 puts and 20550 refs and 254566 deletes and 17750 upserts
2021-07-14 13:27:33.125 <Information> PageStorage: PageCtl restore 0 puts and 3427 refs and 0 deletes and 3418 upserts from checkpoint PageFile_878_0 sequence: 107469
2021-07-14 13:27:33.126 <Debug> PageStorage: PageCtl collectPageFilesToCompact stop on PageFile_881_0, type: Formal, sequence: 107470 last sequence: 107469
2021-07-14 13:27:33.189 <Debug> PageStorage: PageCtl LegacyCompactor::tryCompact exit without compaction, candidates size: 0, compact_legacy_min_num: 3
2021-07-14 13:27:33.207 <Trace> PageStorage: PageCtl PageFile_880_3, type: Formal [valid rate=0.99] [file size=2749336402]
2021-07-14 13:27:33.207 <Debug> PageStorage: PageCtl DataCompactor::tryMigrate exit without compaction, [candidates size=1] [total byte size=2723058873], Config{ PageStorage::Config {gc_min_files:3, gc_min_bytes:67108864, gc_max_valid_rate:1.000, gc_min_legacy_num:3, gc_max_expect_legacy: 100, gc_max_valid_rate_bound: 1.000, prob_do_gc_when_write_is_low:10, open
_file_max_idle_time:15} }
2021-07-14 13:27:33.207 <Debug> PageStorage: PageCtl gcApply remove 2 invalid snapshots, 1 snapshots left, longest lifetime 0.000 seconds, created from thread_id 0
2021-07-14 13:27:33.220 <Information> PageStorage: PageCtl GC exit within 0.54 sec. PageFiles from [878,0,Checkpoint] to [2368,0,Formal], min writing [2368,0,Formal], num files: 1499, num legacy:367, compact legacy archive files: 0, remove data files: 0, gc apply: 0 puts and 0 refs and 0 deletes and 0 upserts
2021-07-14 13:27:33.220 <Information> root: Run GC done, [round=1] [num_gc=10]
2021-07-14 13:27:33.220 <Information> root: Running GC, [round=2] [num_gc=10]
2021-07-14 13:27:33.227 <Trace> PageStorage: PageCtl Before gc, 241941 puts and 20550 refs and 254566 deletes and 17750 upserts
2021-07-14 13:27:33.650 <Information> PageStorage: PageCtl restore 0 puts and 3427 refs and 0 deletes and 3418 upserts from checkpoint PageFile_878_0 sequence: 107469
2021-07-14 13:27:33.653 <Debug> PageStorage: PageCtl collectPageFilesToCompact stop on PageFile_881_0, type: Formal, sequence: 107470 last sequence: 107469
2021-07-14 13:27:33.730 <Debug> PageStorage: PageCtl LegacyCompactor::tryCompact exit without compaction, candidates size: 0, compact_legacy_min_num: 3
2021-07-14 13:27:33.754 <Trace> PageStorage: PageCtl PageFile_880_3, type: Formal [valid rate=0.99] [file size=2749336402]
2021-07-14 13:27:33.754 <Debug> PageStorage: PageCtl DataCompactor::tryMigrate exit without compaction, [candidates size=1] [total byte size=2723058873], Config{ PageStorage::Config {gc_min_files:3, gc_min_bytes:67108864, gc_max_valid_rate:1.000, gc_min_legacy_num:3, gc_max_expect_legacy: 100, gc_max_valid_rate_bound: 1.000, prob_do_gc_when_write_is_low:10, open
_file_max_idle_time:15} }
2021-07-14 13:27:33.755 <Debug> PageStorage: PageCtl gcApply remove 1 invalid snapshots, 1 snapshots left, longest lifetime 0.000 seconds, created from thread_id 0
2021-07-14 13:27:33.762 <Information> PageStorage: PageCtl GC exit within 0.53 sec. PageFiles from [878,0,Checkpoint] to [2368,0,Formal], min writing [2368,0,Formal], num files: 1499, num legacy:367, compact legacy archive files: 0, remove data files: 0, gc apply: 0 puts and 0 refs and 0 deletes and 0 upserts
2021-07-14 13:27:33.762 <Information> root: Run GC done, [round=2] [num_gc=10]

What is changed and how it works?

  • Depending on PageStorage skip non continuous sequence safely #2435
  • Refine the process of DataCompactor::selectCandidateFiles:
    • If all candidates for DataCompactor are in a lower valid rate, we stop collecting the candidates until we can fully fill a new PageFile
    • If there are candidates with a high valid rate, we want to compact as many following lower rate as possible to move forward the GC and reduce the write amplification.
    • Do not collect page_files without valid pages into candidates, since it can not set the compact sequence to a bigger one and move GC forward. Instead, it cause higher write amplification while rewriting one PageFile to a new one.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch:

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

Side effects

Release note

  • No release note

@JaySon-Huang JaySon-Huang force-pushed the fix_ps_gc_with_high_valid_rate branch from 09ed29d to 366f81b Compare July 20, 2021 12:51
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/rebuild

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
… valid rate/bytes

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
@JaySon-Huang JaySon-Huang changed the title [DNM] Fix PageStorage GC with high valid rate PageFile Fix PageStorage GC with high valid rate PageFile Jul 28, 2021
JaySon-Huang and others added 2 commits July 28, 2021 20:03
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang added the type/bugfix This PR fixes a bug. label Aug 2, 2021
@jiaqizho
Copy link
Contributor

jiaqizho commented Aug 3, 2021

LGTM

@ti-srebot
Copy link
Collaborator

@jiaqizho, Thanks for your review. The bot only counts LGTMs from Reviewers and higher roles, but you're still welcome to leave your comments. See the corresponding SIG page for more information. Related SIG: tiflash(slack).

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 3, 2021
@JaySon-Huang
Copy link
Contributor Author

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 3, 2021
@ti-srebot
Copy link
Collaborator

/run-all-tests

@ti-srebot ti-srebot merged commit 543e252 into pingcap:master Aug 3, 2021
@JaySon-Huang JaySon-Huang deleted the fix_ps_gc_with_high_valid_rate branch August 3, 2021 07:47
JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request Aug 4, 2021
JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this pull request Aug 4, 2021
flowbehappy pushed a commit that referenced this pull request Aug 4, 2021
* Ignore sequence hole among PageFile meta (#2312)

* Fix bug for GC may skip unexpected WriteBatches (#2356)

* Add length check while running PageStorage GC (#2394)

* PageStorage skip non continuous sequence safely (#2435)

* Fix PageStorage GC with high valid rate PageFile (#2436)

* More debug info for DeltaTree (query_id, snapshot lifetime) (#2431)

* Fix deadlock on `removeExpiredSnapshots` (#2461)

* Add grafana panels for write throughput per instance (#2524)
JaySon-Huang pushed a commit that referenced this pull request Aug 5, 2021
* Ignore sequence hole among PageFile meta (#2312)
* Fix bug for GC may skip unexpected WriteBatches (#2356)
* Add length check while running PageStorage GC (#2394)
* PageStorage skip non continuous sequence safely (#2435)
* Fix PageStorage GC with high valid rate PageFile (#2436)

Signed-off-by: JaySon-Huang <jayson.hjs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PageStorage GC may not able to cleanup as expected under heavy update workload

4 participants