raft: fix election deadlock when nodes have election_mode off#11981
Conversation
2a96245 to
6843f0e
Compare
943747e to
eaa4dbd
Compare
sergepetrenko
left a comment
There was a problem hiding this comment.
Hi, Philippe!
Sorry for the long delay in review and thank you for your patch!
Your approach looks good to me, I have only a couple of comments regarding the changelog wording and commit style.
It's good that you've found and fixed this issue. Could you tell me how you stumbled upon it?
Serpentian
left a comment
There was a problem hiding this comment.
Thank you for finding and fixing such a critical bug! This could have caused a cluster downtime if it had been found in production. The solution is nice and elegant, I have no significant comments regarding it
|
Hi @sergepetrenko thanks for reviewing. To answer your question:
We were testing a setup with one replicaset spread in two datacenters with one datacenter being active and the other being passive. With a Having a replicaset Say |
eaa4dbd to
60ae034
Compare
sergepetrenko
left a comment
There was a problem hiding this comment.
Philippe, thanks for the fixes!
LGTM.
|
@philippeboyd, thanks for the answer, got it. Just be aware that a replication conflict might still happen with such a setup (although it's rather unlikely).
While you won't get 2 leaders in the same term (obviously only 3 nodes participate in elections, 2 votes out of 3 give you a single leader), it's possible that the elected leader won't have all the committed transactions of the prevous leader, because the nodes with So, imagine That's unlikely, because all the candidates are in the same datacenter, so replication between them should be much faster than to the nodes of the other DC. But still possible. |
Forcing nodes with `is_enabled=false` to always broadcast `is_leader_seen=false`. This allows candidate nodes to immediately clear witness map bits for non-participating nodes, enabling elections to proceed with only active participants. Closes tarantool#12018 NO_DOC=bugfix
60ae034 to
a9e7820
Compare
|
Successfully created backport PR for |
|
Successfully created backport PR for |
|
Successfully created backport PR for |
|
Successfully created backport PR for |
Backport summary
|
Closes #12018
When instances with
election_mode=offexist in a replicaset, they continue to broadcastis_leader_seen=trueeven after the leader dies. (Their death detection timers never start since RAFT is disabled for them). This causes theleader_witness_mapbits for these hosts to remain set indefinitely on candidate nodes, blocking elections since the pre-vote protection check requiresleader_witness_map==0.The root cause is that
election_mode=offnodes cannot be distinguished from active voters in RAFT messages. Both report statefollowerwithis_leader_seenbased on local state, butelection_mode=offnodes never update their view since heartbeat processing exits early when raft is disabled.This fix forces nodes with
election_mode=offto always broadcastis_leader_seen=false. This allows candidate nodes to immediately clear witness map bits for non-participating nodes, enabling elections to proceed with only active participants.Is this the right approach or have I missed anything?