raftstore: Execute recovery plan via raft#12022
Conversation
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
6f02ab5 to
b94c08e
Compare
|
Remove peer one by one maybe problematic, consider the following case:
Using joint consensus to atomic removing multiple peers by one command (one log) may avoid the issue, and also be more efficient. |
Reasonable, joint consensus is safer. You can propose the leave joint consensus just like how this PR does. @v01dstar |
|
@NingLin-P BTW, can we use |
lgtm, previously we don't use |
I don't think this gonna work after investigated the To conclude, the current design of Raftstore does not support For now, I suggest us using ConfChangeV2 + |
Okay, Let's use ConfChangeV2 + |
ba7bdfe to
2226501
Compare
* raftstore: Introduce force leader state (tikv#11932) close tikv#6107, ref tikv#10483 Signed-off-by: Connor1996 <zbk602423539@gmail.com> * raftstore: Wait ticks for hibernated peer when doing force leader (tikv#12364) ref tikv#10483 Force leader is rejected on a peer who is already a leader. For the hibernated leader, it doesn't step down to follower when quorum is missing due to not ticking election. So wait ticks in force leader process for hibernated peers to make sure election ticking is performed. Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * raftstore: Make unsafe recovery wait apply cover snapshot apply cases ref tikv#10483 (tikv#12308) ref tikv#10483 Signed-off-by: v01dstar <yang.zhang@pingcap.com> * raftstore: Execute recovery plan via raft (tikv#12022) Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
|
/test |
|
/merge |
|
@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: 91b17c0 |
|
@v01dstar: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
…afe recovery state (#12657) ref #12022, close #12644 Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched. Signed-off-by: v01dstar <yang.zhang@pingcap.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
…afe recovery state (#12657) (#12675) ref #12022, close #12644, ref #12657 Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched. Signed-off-by: ti-srebot <ti-srebot@pingcap.com> Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
…afe recovery state (tikv#12657) (tikv#12675) ref tikv#12022, close tikv#12644, ref tikv#12657 Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched. Signed-off-by: ti-srebot <ti-srebot@pingcap.com> Co-authored-by: Yang Zhang <yang.zhang@pingcap.com> Signed-off-by: joccau <zak.zhao@pingcap.com>
| DiskFullOpt::AllowedOnAlmostFull, | ||
| ); | ||
|
|
||
| if !*failed.lock().unwrap() { |
There was a problem hiding this comment.
Should we abort this plan if proposal fails? Because it's still in joint state and the syncer has dropped.
There was a problem hiding this comment.
The plan should be aborted only when there is already a plan being executed. When there is something wrong like not force leader or proposal fail, just print the error and let's syncer continue to trigger store report. Then the PD side would know the state is not changed and retry in the next round.
Signed-off-by: v01dstar yang.zhang@pingcap.com
What is changed and how it works?
Issue Number: ref #10483
What's Changed:
This PR makes TiKV execute unsafe recovery plan through Raft.
Related changes
N/A
Check List
Tests
Side effects
N/A
Release note