Skip to content

raftstore: Execute recovery plan via raft#12022

Merged
ti-chi-bot merged 59 commits intotikv:masterfrom
v01dstar:recover_via_raft
May 19, 2022
Merged

raftstore: Execute recovery plan via raft#12022
ti-chi-bot merged 59 commits intotikv:masterfrom
v01dstar:recover_via_raft

Conversation

@v01dstar
Copy link
Member

@v01dstar v01dstar commented Feb 25, 2022

Signed-off-by: v01dstar yang.zhang@pingcap.com

What is changed and how it works?

Issue Number: ref #10483

What's Changed:

This PR makes TiKV execute unsafe recovery plan through Raft.

Related changes

N/A

Check List

Tests

  • Integration Test

Side effects

N/A

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 25, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • 5kbpers
  • Connor1996

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. contribution This PR is from a community contributor. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 25, 2022
@Connor1996 Connor1996 changed the title Recover via raft raftstore: Execute recovery plan via raft Feb 28, 2022
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 28, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 1, 2022
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2022
@NingLin-P
Copy link
Member

NingLin-P commented Mar 4, 2022

Remove peer one by one maybe problematic, consider the following case:

  1. region with peer list (1, 2, 3, 4, 5, 6, 7) where 4, 5, 6 and 7 are in down stores, so we try to remove them one by one
  2. after removing 6, 7, the remaining three alive peers 1, 2, 3 can form the quorum of (1, 2, 3, 4, 5), so they may (and can) elect a new leader
  3. so there may have two leaders, although they have a different term, the force_leader can commit logs without replicating to majority peers, and may produce conflict committed logs, causing data corruption.

Using joint consensus to atomic removing multiple peers by one command (one log) may avoid the issue, and also be more efficient.

@Connor1996
Copy link
Member

3. so there may have two leaders, although they have a different term, the force_leader can commit logs without replicating to majority peers, and may produce conflict committed logs, causing data corruption.

Reasonable, joint consensus is safer. You can propose the leave joint consensus just like how this PR does. @v01dstar

@Connor1996
Copy link
Member

@NingLin-P BTW, can we use auto_leave == true for this scenario, so we don't need to propose leave joint explicitly?

@NingLin-P
Copy link
Member

NingLin-P commented Mar 7, 2022

@NingLin-P BTW, can we use auto_leave == true for this scenario, so we don't need to propose leave joint explicitly?

lgtm, previously we don't use auto_leave because we still need a retry mechanism for the LeaveJoint command anyway so the auto_leave doesn't help much. But in the unsafe recovery use case, the LeaveJoint command won't fail because there won't be any new leader until the majority of alive voters had leave joint state.

@v01dstar
Copy link
Member Author

v01dstar commented Mar 8, 2022

auto_leave

I don't think this gonna work after investigated the auto leave API (or ConfChangeTransition::Implicit which is the external switch to use the auto leave functionality of raft joint consensus). The way auto leave works is that Raft appends a empty v2 conf change once the enter joint consensus entry is applied and auto leave is turned on, and apparently, the entry added by Raft itself does not have the context field being filled, so the generated request in the apply FSM does not have necessary meta info like epoch, and thus blocked here. I don't think we should manually fill up those meta info neither, since it may bring more problems.

To conclude, the current design of Raftstore does not support auto leave (or ConfChangeTransition::Implicit). To make it work, I guess we need to change both Raftstore and Raft code.

For now, I suggest us using ConfChangeV2 + ConfChangeTransition::Explicit

@Connor1996
Copy link
Member

Connor1996 commented Mar 9, 2022

auto_leave

I don't think this gonna work after investigated the auto leave API (or ConfChangeTransition::Implicit which is the external switch to use the auto leave functionality of raft joint consensus). The way auto leave works is that Raft appends a empty v2 conf change once the enter joint consensus entry is applied and auto leave is turned on, and apparently, the entry added by Raft itself does not have the context field being filled, so the generated request in the apply FSM does not have necessary meta info like epoch, and thus blocked here. I don't think we should manually fill up those meta info neither, since it may bring more problems.

To conclude, the current design of Raftstore does not support auto leave (or ConfChangeTransition::Implicit). To make it work, I guess we need to change both Raftstore and Raft code.

For now, I suggest us using ConfChangeV2 + ConfChangeTransition::Explicit

Okay, Let's use ConfChangeV2 + ConfChangeTransition::Explicit

@ti-chi-bot ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 16, 2022
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 18, 2022
CalvinNeo pushed a commit to pingcap/tidb-engine-ext that referenced this pull request May 18, 2022
* raftstore: Introduce force leader state (tikv#11932)

close tikv#6107, ref tikv#10483

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

* raftstore: Wait ticks for hibernated peer when doing force leader (tikv#12364)

ref tikv#10483

Force leader is rejected on a peer who is already a leader. For the hibernated leader,
it doesn't step down to follower when quorum is missing due to not ticking election. 
So wait ticks in force leader process for hibernated peers to make sure election ticking
is performed.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* raftstore: Make unsafe recovery wait apply cover snapshot apply cases ref tikv#10483 (tikv#12308)

ref tikv#10483

Signed-off-by: v01dstar <yang.zhang@pingcap.com>

* raftstore: Execute recovery plan via raft (tikv#12022)

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 18, 2022
v01dstar and others added 9 commits May 18, 2022 18:38
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Signed-off-by: v01dstar <yang.zhang@pingcap.com>
@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 91b17c0

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 19, 2022
@ti-chi-bot
Copy link
Member

@v01dstar: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 5dd92a5 into tikv:master May 19, 2022
@v01dstar v01dstar deleted the recover_via_raft branch May 24, 2022 04:21
ti-chi-bot added a commit that referenced this pull request May 27, 2022
…afe recovery state (#12657)

ref #12022, close #12644

Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched.

Signed-off-by: v01dstar <yang.zhang@pingcap.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit that referenced this pull request May 27, 2022
…afe recovery state (#12657) (#12675)

ref #12022, close #12644, ref #12657

Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched.

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
joccau pushed a commit to joccau/tikv that referenced this pull request Jun 23, 2022
…afe recovery state (tikv#12657) (tikv#12675)

ref tikv#12022, close tikv#12644, ref tikv#12657

Cleaning up the unsafe recovery state after exiting previous joint state before proposing the recovery demotion which may return early if any error happens and leave the state untouched.

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>

Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
Signed-off-by: joccau <zak.zhao@pingcap.com>
DiskFullOpt::AllowedOnAlmostFull,
);

if !*failed.lock().unwrap() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we abort this plan if proposal fails? Because it's still in joint state and the syncer has dropped.

Copy link
Member

@Connor1996 Connor1996 Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan should be aborted only when there is already a plan being executed. When there is something wrong like not force leader or proposal fail, just print the error and let's syncer continue to trigger store report. Then the PD side would know the state is not changed and retry in the next round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants