Skip to content

raftstore: Make unsafe recovery wait apply cover snapshot apply cases ref #10483#12308

Merged
ti-chi-bot merged 16 commits intotikv:masterfrom
v01dstar:wait_snapshot_apply
Apr 15, 2022
Merged

raftstore: Make unsafe recovery wait apply cover snapshot apply cases ref #10483#12308
ti-chi-bot merged 16 commits intotikv:masterfrom
v01dstar:wait_snapshot_apply

Conversation

@v01dstar
Copy link
Member

@v01dstar v01dstar commented Mar 30, 2022

Signed-off-by: v01dstar yang.zhang@pingcap.com

What is changed and how it works?

Issue Number: ref #10483

What's Changed:

Raft snapshot apply has a different code path to update the apply index, compare to other types of Raft entry which update apply index in raftstore/store/fsm/peer.rs:on_apply_res(). This path was not covered in the initial implementation of "unsafe recovery wait apply", and this PR fixes it.

Related changes

PR that adds the initial implementation of unsafe recovery wait apply: #11716

Check List

Tests

  • Integration test

Side effects

N/A

Release note

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 30, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • BusyJay
  • Connor1996

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. contribution This PR is from a community contributor. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 30, 2022
@v01dstar
Copy link
Member Author

/assign @Connor1996

@Connor1996 Connor1996 removed their assignment Mar 31, 2022
@Connor1996 Connor1996 self-requested a review March 31, 2022 07:20
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 1, 2022
@v01dstar
Copy link
Member Author

v01dstar commented Apr 7, 2022

/CC @BusyJay

@ti-chi-bot ti-chi-bot requested a review from BusyJay April 7, 2022 01:42
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 8, 2022
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 12, 2022
@v01dstar v01dstar force-pushed the wait_snapshot_apply branch from ec99f00 to a60f19a Compare April 12, 2022 01:56
@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 12, 2022
@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 0f1c747

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 14, 2022
@Connor1996
Copy link
Member

/test

7 similar comments
@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@Connor1996
Copy link
Member

/test

@BusyJay
Copy link
Member

BusyJay commented Apr 14, 2022

test_unsafe_recover_wait_for_snapshot_apply keep retrying will just bring more unstable case to master.

@Connor1996
Copy link
Member

test_unsafe_recover_wait_for_snapshot_apply keep retrying will just bring more unstable case to master.

previous failure isn't due to test_unsafe_recover_wait_for_snapshot_apply

Signed-off-by: v01dstar <yang.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Apr 14, 2022
@v01dstar
Copy link
Member Author

test_unsafe_recover_wait_for_snapshot_apply keep retrying will just bring more unstable case to master.

previous failure isn't due to test_unsafe_recover_wait_for_snapshot_apply

I suspect that the test failed due to insufficient wait for the raft log GC, thus the snapshot path was not triggered at all. Added some new code to coordinate the data insertion and raft log GC, hope this can make the test bit more stable.

@Connor1996
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: fedc098

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Apr 15, 2022
@ti-chi-bot ti-chi-bot merged commit 45fcb08 into tikv:master Apr 15, 2022
@v01dstar v01dstar deleted the wait_snapshot_apply branch April 15, 2022 02:44
Connor1996 pushed a commit to Connor1996/tikv that referenced this pull request May 18, 2022
Connor1996 pushed a commit to Connor1996/tikv that referenced this pull request May 18, 2022
CalvinNeo pushed a commit to pingcap/tidb-engine-ext that referenced this pull request May 18, 2022
* raftstore: Introduce force leader state (tikv#11932)

close tikv#6107, ref tikv#10483

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

* raftstore: Wait ticks for hibernated peer when doing force leader (tikv#12364)

ref tikv#10483

Force leader is rejected on a peer who is already a leader. For the hibernated leader,
it doesn't step down to follower when quorum is missing due to not ticking election. 
So wait ticks in force leader process for hibernated peers to make sure election ticking
is performed.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* raftstore: Make unsafe recovery wait apply cover snapshot apply cases ref tikv#10483 (tikv#12308)

ref tikv#10483

Signed-off-by: v01dstar <yang.zhang@pingcap.com>

* raftstore: Execute recovery plan via raft (tikv#12022)

Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants