Skip to content

raftstore: optimize hibernate condition#19082

Merged
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
jiadebin:fix-hib
Dec 7, 2025
Merged

raftstore: optimize hibernate condition#19082
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
jiadebin:fix-hib

Conversation

@jiadebin
Copy link
Contributor

@jiadebin jiadebin commented Nov 1, 2025

What is changed and how it works?

Issue Number: Close #19070

What's Changed:

1. Hibernate condition optimization: leader no need wait for all peers' hibernate votes if any non-voter peer is down and votes reach majority.  
2. If non hibernate vote peers are all unreachable, leader can skip broadcasting hibernate request. This can reduce the number of messages before down peer is detected (10min by default).  

As noted in #19070, when a TiKV peer becomes unavailable, region leaders on other nodes are prevented from entering hibernation due to the current condition logic—which requires the leader to receive votes from all peers before hibernating. This requirement is overly strict. In practice, the leader only needs to confirm that it has received votes from all alive peers(referencing the down_peers field) and that a majority has been reached for raft voters.

Before Modification
image

After Modification
image

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Hibernate condition optimization: leader no need wait for all peers' hibernate votes if any non-voter peer is down and votes reach majority.    

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Nov 1, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 1, 2025

Hi @jiadebin. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. dco-signoff: no Indicates the PR's author has not signed dco. labels Nov 1, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 1, 2025

Welcome @jiadebin!

It looks like this is your first PR to tikv/tikv 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/tikv. 😃

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 1, 2025
@jiadebin jiadebin changed the title [WIP] kv: fix hibenate condition #19070 [WIP] kv: fix hibenate condition Nov 1, 2025
@jiadebin jiadebin marked this pull request as draft November 3, 2025 03:12
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Nov 3, 2025
@jiadebin jiadebin force-pushed the fix-hib branch 2 times, most recently from ddd61bc to 080e0b2 Compare November 4, 2025 08:20
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 4, 2025
@jiadebin jiadebin force-pushed the fix-hib branch 2 times, most recently from f0cc9af to 2e8188b Compare November 6, 2025 08:43
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 6, 2025
@jiadebin jiadebin force-pushed the fix-hib branch 2 times, most recently from f0511ba to 4f9b1af Compare November 6, 2025 08:53
@jiadebin jiadebin changed the title [WIP] kv: fix hibenate condition [WIP] raftstore: optimize hibenate condition Nov 6, 2025
@jiadebin jiadebin marked this pull request as ready for review November 6, 2025 08:58
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 6, 2025
@jiadebin jiadebin changed the title [WIP] raftstore: optimize hibenate condition raftstore: optimize hibenate condition Nov 6, 2025
pub fn maybe_hibernate(&mut self) -> bool {
self.hibernate_state
.maybe_hibernate(self.peer.peer_id(), self.peer.region())
pub fn maybe_hibernate(&mut self, down_peer_ids: &Vec<u64>) -> (bool, Vec<u64>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments about when it should hibernate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// broadcasting hibernate request. Because non hibernate vote peers may
// be down, and they have not been detected as down peers. Down peers
// are expected to encounter heartbeat timeout and be in probe state.
let heartbeat_timeout_duration = self.ctx.cfg.raft_heartbeat_interval() * 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "3" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code comment added: Using 3 times of raft_heartbeat_interval as heartbeat_timeout_duration to determine whether a peer is unreachable.

cluster.must_put(b"k2", b"v2");
}

/// Tests whether leader can hibernate when some voter is down.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Tests whether leader can hibernate when some voter is down.
/// Tests whether leader can hibernate when some voters are down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

thread::sleep(cluster.cfg.raft_store.raft_heartbeat_interval() * 2);
// Leader can go to sleep as voters can reach majority.
assert!(!awakened.load(Ordering::SeqCst));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before starting the down voter, write some data and then make sure alive peers can hibernate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
if !self.raft_group.raft.prs().has_quorum(&matched_peer_ids) {
res.reason = "not enough matched peers";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this covered by tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet before, I've add a new integration test function test_hibernate_not_enough_matched_peers() to test it.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 3, 2025
Signed-off-by: Debin <debin.jia@pingcap.com>

fn all_agree_to_hibernate(&mut self) -> bool {
if self.fsm.maybe_hibernate() {
fn agree_to_hibernate(&mut self, down_peer_ids: &[u64]) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: quorum_agree_to_hibernate might be a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

where
F: Fn(&HashSet<u64>) -> bool,
{
let mut hibernate_vote_peer_ids = HashSet::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do learners participate in the hibernation "vote" as well? The term vote can be a bit confusing, since Raft learners are not voters. I know the existing code uses this terminology, just calling out the potential ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, learners also participate. So I use hibernate_vote_xxx to distinguish it from "election vote".

/// Check if all non hibernate vote peers are unreachable.
/// If one peer is unreachable, it must be in probe state and encounter
/// heartbeat timeout.
pub fn all_non_hibernate_vote_peers_unreachable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestion: all_nonvote_peers_unreachable

I feel like it's not obvious what "non" refers to in the original name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I also want to use hibernate_vote_xxx to distinguish it from "election vote".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if we keep the function name maybe we can improve the comment to "Checks if all peers that have not sent a hibernate vote are unreachable."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good advice.

{
return false;
}
// If non hibernate vote peers are all unreachable, leader can skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to learn: what happens if the leader has already hibernated and an unreachable follower later comes back online? Will that follower resume ticking and potentially trigger an election?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've read the relevant code, and here's my understanding:

  • If an unreachable follower recovers from hibernated state, the missing_ticks mechanism can prevent it from blindly initiating an election. Instead, the leader will be woken up by this follower and resume broadcasting heartbeats, eventually bringing the entire cluster back to the hibernated idle state.
  • If an unreachable follower restarts and recovers, and the leader has not been woken up before the follower reaches its election timeout, the follower will attempt to start an election. However, during the pre-vote phase, it can wake up the leader, discover that a valid leader still exists, and consequently abort the election.
  • read_index() request on follower can also wake up leader.

@hbisheng
Copy link
Member

hbisheng commented Dec 4, 2025

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Dec 4, 2025
Signed-off-by: Debin <debin.jia@pingcap.com>
@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 5, 2025

/ok-to-test

Signed-off-by: Debin <debin.jia@pingcap.com>
@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 5, 2025

/ok-to-test

1 similar comment
@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 5, 2025

/ok-to-test

@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 5, 2025

/retest-required

Signed-off-by: Debin <debin.jia@pingcap.com>
@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 5, 2025

/ok-to-test

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Connor1996, overvenus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Connor1996,overvenus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 7, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 7, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-01 07:16:38.591660198 +0000 UTC m=+247743.405437770: ☑️ agreed by Connor1996.
  • 2025-12-07 07:14:20.295023977 +0000 UTC m=+766005.108801549: ☑️ agreed by overvenus.

@ti-chi-bot ti-chi-bot bot merged commit 666b9ea into tikv:master Dec 7, 2025
7 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Dec 7, 2025
@jiadebin
Copy link
Contributor Author

jiadebin commented Dec 8, 2025

/cherrypick release-8.5

ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this pull request Dec 8, 2025
close tikv#19070

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

@jiadebin: new pull request created to branch release-8.5: #19183.
But this PR has conflicts, please resolve them!

Details

In response to this:

/cherrypick release-8.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot bot pushed a commit that referenced this pull request Dec 12, 2025
close #19070

1. Hibernate condition optimization: leader no need wait for all peers' hibernate votes if any non-voter peer is down and votes reach majority.  
2. If non hibernate vote peers are all unreachable, leader can skip broadcasting hibernate request. This can reduce the number of messages before down peer is detected (10min by default).

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: Debin <debin.jia@pingcap.com>

Co-authored-by: Debin <debin.jia@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raftstore CPU Exhaustion: leaders prevented from hibernating after down one tikv instance

6 participants