Skip to content

Unsafe Recovery: Fix TiFlash blocks commit idx forwarding in unsafe recovery#18211

Merged
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
v01dstar:fix-unsafe-recovery
Mar 5, 2025
Merged

Unsafe Recovery: Fix TiFlash blocks commit idx forwarding in unsafe recovery#18211
ti-chi-bot[bot] merged 5 commits intotikv:masterfrom
v01dstar:fix-unsafe-recovery

Conversation

@v01dstar
Copy link
Member

@v01dstar v01dstar commented Feb 12, 2025

What is changed and how it works?

Issue Number: Close #18197

What's Changed:

Fix TiFlash blocks commit idx forwarding in unsafe recovery

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Fix TiFlash blocks commit idx forwarding in unsafe recovery

Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 12, 2025
@v01dstar v01dstar requested a review from Connor1996 February 12, 2025 23:00
@v01dstar
Copy link
Member Author

/test pull-unit-test

Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 14, 2025
Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
// Learners like TiFlash are ignored, because they may be tombstoned in earlier
// stage of the unsafe recovery process, so that forced leader
// cannot append logs to them.
if failed_stores.contains(&peer.get_store_id()) || peer.get_role() == PeerRole::Learner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't simply pass learner, considering the case where all left peers are learners. We should check whether it is on tiflash by checking the label. I think you can get the store from StoreGroup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this case. This change still works in this case, doesn't it? After all, Raft itself only update commit idx based on voters' progresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in the case of unsafe recovery, learner itself could be the forced leader

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, only ignoring TiFlash nodes.
But, just FYI, using StoreGroup has a potential bug. In non-dr-auto-sync deployment, any newly provisioned TiFlash node after the regular TiKV nodes started won't be ignored, since labels are only updated when switching replication status.
I still think learner is better, even if the learner itself could be the forced leader, it is still OK, since let mut replicated_idx = self.raft_group.raft.raft_log.persisted; replicated_idx is initialized with its own log index.

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 5, 2025
@v01dstar v01dstar force-pushed the fix-unsafe-recovery branch from 51a1396 to 4eea002 Compare March 5, 2025 06:27
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 5, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 5, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-05 06:11:57.489905675 +0000 UTC m=+422030.618825420: ☑️ agreed by Connor1996.
  • 2025-03-05 07:57:54.113433498 +0000 UTC m=+428387.242353241: ☑️ agreed by glorv.

@glorv
Copy link
Contributor

glorv commented Mar 5, 2025

/retest

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 5, 2025

@v01dstar: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@Connor1996
Copy link
Member

/approve

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Connor1996, glorv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot merged commit a34740f into tikv:master Mar 5, 2025
7 of 8 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Mar 5, 2025
@v01dstar v01dstar deleted the fix-unsafe-recovery branch March 5, 2025 23:37
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Apr 23, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #18412.

ti-chi-bot bot pushed a commit that referenced this pull request Apr 24, 2025
…ecovery (#18211) (#18412)

close #18197

Fix TiFlash blocks commit idx forwarding in unsafe recovery

Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>

Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Aug 5, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #18793.

@ti-chi-bot ti-chi-bot bot removed the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Aug 5, 2025
ti-chi-bot bot pushed a commit that referenced this pull request Aug 8, 2025
…ecovery (#18211) (#18793)

close #18197

Fix TiFlash blocks commit idx forwarding in unsafe recovery

Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>

Co-authored-by: Yang Zhang <yang.zhang@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unsafe recovery failed with tiflash

4 participants