Skip to content

raftstore: fix missing pending peers in region heartbeat after split#18032

Merged
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
hbisheng:fix-split-merge
Mar 7, 2025
Merged

raftstore: fix missing pending peers in region heartbeat after split#18032
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
hbisheng:fix-split-merge

Conversation

@hbisheng
Copy link
Member

@hbisheng hbisheng commented Dec 19, 2024

What is changed and how it works?

Issue Number: Close #17992

Since #7672, TiKV has relied on the guarantee that all merge target peers must exist before PD initiates a merge. However, this guarantee was broken in the scenario in #17992, ultimately leading to a panic.

PD uses pending_peers to verify whether all target peers exist before merging[1][2]. Investigations pointed to an issue in TiKV where it incorrectly reported missing pending peers after a split. More specifically, right after a split, TiKV triggers a new peer to upload a heartbeat[3], but at this point, the peer may not have been elected as the leader yet. Before this PR, a non-leader would always set the pending_peers to empty, causing PD to receive an incorrect region state.

Since a non-leader has no knowledge of the progress of other peers, it should consevatively mark all other peers as pending. This fix ensures that PD receives correct pending_peers info, which should prevent the immature merge and avoiding the potential panic.

[1] components/raftstore/src/store/fsm/peer.rs#L4686
[2] schedule/checker/merge_checker.go#L219
[3] schedule/filter/healthy.go#L24

What's Changed:

Fixes an issue where TiKV incorrectly reported missing `pending_peers` after a 
split, causing PD to receive an incorrect region state. This could lead to 
premature merge attempts, violating the guarantee that all target peers must 
exist before merging.

Note: This issue is unlikely to occur in production since PD enforces a split-merge-interval (default: 1h) between split and merge operations. The issue only happens if a region is merged immediately after a split.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 19, 2024
@hbisheng hbisheng marked this pull request as draft December 19, 2024 05:35
@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 19, 2024
@hbisheng hbisheng changed the title raftstore: add test to reproduce panic caused by merge-after-split raftstore: fix panic from snapshot racing with split in merge-after-split scenario Dec 19, 2024
@hbisheng hbisheng changed the title raftstore: fix panic from snapshot racing with split in merge-after-split scenario raftstore: fix panic caused by snapshot racing with split in merge-after-split scenario Dec 19, 2024
@hbisheng hbisheng changed the title raftstore: fix panic caused by snapshot racing with split in merge-after-split scenario raftstore: fix panic caused by snapshot racing in merge-after-split Dec 19, 2024
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 26, 2024
@hbisheng
Copy link
Member Author

hbisheng commented Feb 6, 2025

Possible fixes:

  1. Enhance pending_create_peer (introduced in raftstore: handle the race between creating new peer and splitting correctly #8084)
    Using the info in pending_create_peer, we can block the snapshot if a conflicting split is about to happen, or block the split if a conflicting snapshot passes the snapshot check.
  2. Leader ensures all merge target peers exist before proposing a merge:
    This might be achieved using an ExtraMessage mechanism. It would allow us to rely on the assumption in raftstore: rely on the all-target-peer-exist guarantee during merging #7672 once again.

I gave up on option 1 because blocking a split after it's committed is non-intuitive and prone to more race conditions. I'm now looking into option 2 to prevent the merge from happening in the first place.

@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 13, 2025
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 13, 2025
@hbisheng hbisheng marked this pull request as ready for review February 13, 2025 08:47
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2025
@hbisheng hbisheng changed the title raftstore: fix panic caused by snapshot racing in merge-after-split raftstore: fix missing pending peers in region heartbeat after split Feb 13, 2025
}

if status.progress.is_none() {
// If `progress` is None, the current node is not the leader. Since
Copy link
Contributor

@LykxSassinator LykxSassinator Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: it's better to add extra annotations here to specify that the calling of this part will only be executed by a new split peer with already campaigned, who will most likely be elected as the new leader of the region.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comment as suggested

}

if status.progress.is_none() {
// If `progress` is None, the current node is not the leader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding an assert here

Copy link
Member Author

@hbisheng hbisheng Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant asserting that the peer is not a leader? That should always be the case according to the logic in raft-rs. So yeah I can add an assert here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might seem a bit unnecessary though because status() was just called a few lines before (on line 2102)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the assertion as suggested

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also reject PrepareMerge when there is a pending peer to further strengthen the region merge process?

@hbisheng
Copy link
Member Author

Should we also reject PrepareMerge when there is a pending peer to further strengthen the region merge process?

I hoped to do that too but it seems more complex than expected. The PrepareMerge proposal is sent to the merge source, but we actually need to check the pending peer on the merge target.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
Signed-off-by: Bisheng Huang <hbisheng@gmail.com>
@hbisheng
Copy link
Member Author

@Connor1996 @overvenus @LykxSassinator PTAL again, thanks!

Copy link
Contributor

@LykxSassinator LykxSassinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved and removed do-not-merge/needs-triage-completed labels Mar 6, 2025
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the lgtm label Mar 7, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Connor1996, LykxSassinator

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Connor1996,LykxSassinator]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 7, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 7, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-06 11:03:28.450698702 +0000 UTC m=+525921.579618438: ☑️ agreed by LykxSassinator.
  • 2025-03-07 07:45:00.245707534 +0000 UTC m=+600413.374627272: ☑️ agreed by Connor1996.

@ti-chi-bot ti-chi-bot bot merged commit dbae009 into tikv:master Mar 7, 2025
8 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Mar 7, 2025
@hbisheng hbisheng deleted the fix-split-merge branch July 14, 2025 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiKV panics: [region 15822] 15824 status Some((15824, true)) is not expected

4 participants