ddl: recover to the correct partition from checkpoint (#44024)#44050
Conversation
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
254a40d to
8c88b4c
Compare
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 8c88b4c |
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: bb80c4f |
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: e495ad0 |
This is an automated cherry-pick of #44024
What problem does this PR solve?
Issue Number: close #43997
Problem Summary:
The basic idea of checkpoint is to recover the progress:
Note that we can only begin with partition 2 because the local checkpoint is lost when TiDB 1 crashes.
In order to represent which partition we should begin with, reorg meta is used. The reorg meta contains a tuple: (partition ID or physical table ID, start key, end key). Every time TiDB restarts in the middle state of adding an index, it tries to reset the reorg meta to the state exactly before the last global checkpoint.
Previously, we store the reorg meta in the checkpoint manager. However, we did not distinguish the "local" reorg meta and the "global" reorg meta. When a partition is complete, the reorg meta is updated immediately, leading to a new TiDB reset to the wrong partition. Finally, the index data from some of the partitions is lost.
What is changed and how it works?
mysql.ddl_reorg_metais initialized, we also initialize the checkpoint.mysql.ddl_reorg_meta).Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.