Skip to content

Storage: Fix max-id being mis-reused cause data corruption after changing tiflash replica number#8698

Merged
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
JaySon-Huang:fix_recreate_storage_ins_2
Jan 25, 2024
Merged

Storage: Fix max-id being mis-reused cause data corruption after changing tiflash replica number#8698
ti-chi-bot[bot] merged 6 commits intopingcap:masterfrom
JaySon-Huang:fix_recreate_storage_ins_2

Conversation

@JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Jan 17, 2024

What problem does this PR solve?

Issue Number: close #8695

Problem Summary: As the issue describe: #8695 (comment)

What is changed and how it works?

Introduce a GlobalPageIdAllocator to avoid a page_id from being reused in one physical_table_id scope. Specifically,

  • Introduced a GlobalPageIdAllocator to allocate page_id for segment_id, delta-layer and stable-layer for all IStorage instances instead of each StoragePool allocate the page_id for its own IStorage instance independently
  • Each IStorage instance restored from disk, it will first try raise the lower bound of the GlobalPageIdAllocator. After that, a table won't be able to reuse the allocated id
  • If an IStorage instance being physically removed (cause by setting tiflash replica to 0) and re-create again, it will still allocated the page_id from GlobalPageIdAllocator, so no page_id will be reused too.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
Load tpc-ds 50
Set tiflash replica 1 for database tpcds50 and wait for all tables progress == 1
Set tiflash replica 0 and wait for gc
Set tiflash replica 1 and wait for all tables progress == 1
Run queries on tpcds 50 through tiflash
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix the issue that the tiflash replica data may be corrupted after setting the tiflash replica to 0 and add it back later

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 17, 2024
@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch from bfd78e6 to ec9fa09 Compare January 17, 2024 17:15
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

1 similar comment
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch from 75ea74a to c640eba Compare January 18, 2024 07:36
@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch from c640eba to 3682ca2 Compare January 18, 2024 10:46
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 18, 2024
@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch 2 times, most recently from 4bdeac1 to be7916a Compare January 19, 2024 07:53
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-unit-test

@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch from be7916a to 11d7f4c Compare January 24, 2024 03:02
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 24, 2024
@JaySon-Huang JaySon-Huang force-pushed the fix_recreate_storage_ins_2 branch from 261cc22 to 87c5577 Compare January 24, 2024 08:11
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang requested a review from JinheLin January 24, 2024 08:18
@JaySon-Huang JaySon-Huang changed the title [WIP] Storage: Fix max-id being mis-reused cause data corruption after changing tiflash replica number Storage: Fix max-id being mis-reused cause data corruption after changing tiflash replica number Jan 24, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2024
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 24, 2024
@JaySon-Huang
Copy link
Contributor Author

/rebuild

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 24, 2024
@JaySon-Huang JaySon-Huang self-assigned this Jan 24, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 25, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 25, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-01-24 11:52:06.205968271 +0000 UTC m=+961567.770265969: ☑️ agreed by Lloyd-Pottiger.
  • 2024-01-25 03:46:21.562028767 +0000 UTC m=+1018823.126326472: ☑️ agreed by JinheLin.

@ti-chi-bot ti-chi-bot bot merged commit a20f385 into pingcap:master Jan 25, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Jan 25, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #8733.

@JaySon-Huang JaySon-Huang deleted the fix_recreate_storage_ins_2 branch January 25, 2024 05:29
JaySon-Huang added a commit to ti-chi-bot/tiflash that referenced this pull request Jan 25, 2024
ti-chi-bot bot pushed a commit that referenced this pull request Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mis-reuse StoragePool::max_data_page_id cause data corruption after changing tiflash replica number

4 participants