raft: fix election deadlock when nodes have election_mode off by philippeboyd · Pull Request #11981 · tarantool/tarantool

philippeboyd · 2025-10-24T15:48:41Z

Closes #12018

When instances with election_mode=off exist in a replicaset, they continue to broadcast is_leader_seen=true even after the leader dies. (Their death detection timers never start since RAFT is disabled for them). This causes the leader_witness_map bits for these hosts to remain set indefinitely on candidate nodes, blocking elections since the pre-vote protection check requires leader_witness_map==0.

The root cause is that election_mode=off nodes cannot be distinguished from active voters in RAFT messages. Both report state follower with is_leader_seen based on local state, but election_mode=off nodes never update their view since heartbeat processing exits early when raft is disabled.

This fix forces nodes with election_mode=off to always broadcast is_leader_seen=false. This allows candidate nodes to immediately clear witness map bits for non-participating nodes, enabling elections to proceed with only active participants.

Is this the right approach or have I missed anything?

coveralls · 2025-10-28T06:48:24Z

coverage: 87.678% (+0.03%) from 87.649%
when pulling a9e7820 on philippeboyd:bugfix/raft-disabled-nodes-should-not-report-leader-seen
into ec05cb1
on tarantool:master.

changelogs/unreleased/gh-11981-fix-election-deadlock.md

sergepetrenko

Hi, Philippe!

Sorry for the long delay in review and thank you for your patch!

Your approach looks good to me, I have only a couple of comments regarding the changelog wording and commit style.

It's good that you've found and fixed this issue. Could you tell me how you stumbled upon it?

src/lib/raft/raft.c

changelogs/unreleased/gh-11981-fix-election-deadlock.md

Serpentian

Thank you for finding and fixing such a critical bug! This could have caused a cluster downtime if it had been found in production. The solution is nice and elegant, I have no significant comments regarding it

src/lib/raft/raft.c

philippeboyd · 2025-11-06T13:50:20Z

Hi @sergepetrenko thanks for reviewing. To answer your question:

Could you tell me how you stumbled upon it?

We were testing a setup with one replicaset spread in two datacenters with one datacenter being active and the other being passive.

With a synchro_quorum: 2 to keep the writes fast in the active datacenter while still having data replication in the passive datacenter and protecting us from a split-brain situation.

Having a replicaset storage with intances:

dc1-storage-1 (raft candidate)
dc1-storage-2 (raft candidate)
dc1-storage-3 (raft candidate)
dc2-storage-1 (raft off)
dc2-storage-2 (raft off)
dc2-storage-3 (raft off)

Say dc1-storage-1 was the leader and it died, no election was triggered.

sergepetrenko

Philippe, thanks for the fixes!
LGTM.

sergepetrenko · 2025-11-07T11:45:33Z

@philippeboyd, thanks for the answer, got it.

Just be aware that a replication conflict might still happen with such a setup (although it's rather unlikely).

With a synchro_quorum: 2 to keep the writes fast in the active datacenter while still having data replication in the passive datacenter and protecting us from a split-brain situation.

Having a replicaset storage with intances:
dc1-storage-1 (raft candidate)
dc1-storage-2 (raft candidate)
dc1-storage-3 (raft candidate)
dc2-storage-1 (raft off)
dc2-storage-2 (raft off)
dc2-storage-3 (raft off)

While you won't get 2 leaders in the same term (obviously only 3 nodes participate in elections, 2 votes out of 3 give you a single leader), it's possible that the elected leader won't have all the committed transactions of the prevous leader, because the nodes with election_mode = 'off' are still counted in quorum for synchronous transaction commits.

So, imagine dc1-storage-1 is the leader, it writes some transaction A replicates it only to dc2-storage-1. The leader commits A, as it has gathered quorum. Then dc1-storage-1 dies before replicating A to anyone else, and the elections are triggered. Neither dc1-storage-2 nor dc1-storage-3 have the transaction, but one of them will be elected the next leader, which will cause a replication conflict once the already-committed transaction A reaches one of them (by Raft only the node having all the previously committed transactions may be elected leader).

That's unlikely, because all the candidates are in the same datacenter, so replication between them should be much faster than to the nodes of the other DC. But still possible.

Forcing nodes with `is_enabled=false` to always broadcast `is_leader_seen=false`. This allows candidate nodes to immediately clear witness map bits for non-participating nodes, enabling elections to proceed with only active participants. Closes tarantool#12018 NO_DOC=bugfix

TarantoolBot · 2025-11-07T15:50:42Z

Successfully created backport PR for release/3.2:

[backport 3.2] raft: fix election deadlock when nodes have election_mode off #12020

TarantoolBot · 2025-11-07T15:50:45Z

Successfully created backport PR for release/3.3:

[backport 3.3] raft: fix election deadlock when nodes have election_mode off #12021

TarantoolBot · 2025-11-07T15:50:48Z

Successfully created backport PR for release/3.4:

[backport 3.4] raft: fix election deadlock when nodes have election_mode off #12022

TarantoolBot · 2025-11-07T15:50:51Z

Successfully created backport PR for release/3.5:

[backport 3.5] raft: fix election deadlock when nodes have election_mode off #12023

TarantoolBot · 2025-11-07T15:50:53Z

Backport summary

Created [backport 3.3] raft: fix election deadlock when nodes have election_mode off #12021 to release/3.3 to a future 3.3.4 release
Created [backport 3.5] raft: fix election deadlock when nodes have election_mode off #12023 to release/3.5 to a future 3.5.1 release
Created [backport 3.4] raft: fix election deadlock when nodes have election_mode off #12022 to release/3.4 to a future 3.4.2 release
Created [backport 3.2] raft: fix election deadlock when nodes have election_mode off #12020 to release/3.2 to a future 3.2.3 release

philippeboyd force-pushed the bugfix/raft-disabled-nodes-should-not-report-leader-seen branch from 2a96245 to 6843f0e Compare October 24, 2025 17:01

philippeboyd requested a review from a team as a code owner October 24, 2025 17:01

sergepetrenko assigned Gerold103 and Serpentian Oct 28, 2025

sergepetrenko requested review from Gerold103, Serpentian and sergepetrenko and removed request for Gerold103 October 28, 2025 06:18

sergepetrenko assigned sergepetrenko and unassigned Gerold103 Oct 28, 2025

lenkis approved these changes Oct 29, 2025

View reviewed changes

changelogs/unreleased/gh-11981-fix-election-deadlock.md Outdated Show resolved Hide resolved

philippeboyd force-pushed the bugfix/raft-disabled-nodes-should-not-report-leader-seen branch 2 times, most recently from 943747e to eaa4dbd Compare October 31, 2025 19:38

sergepetrenko reviewed Nov 5, 2025

View reviewed changes

src/lib/raft/raft.c Show resolved Hide resolved

src/lib/raft/raft.c Show resolved Hide resolved

changelogs/unreleased/gh-11981-fix-election-deadlock.md Outdated Show resolved Hide resolved

sergepetrenko assigned philippeboyd Nov 5, 2025

Serpentian reviewed Nov 6, 2025

View reviewed changes

src/lib/raft/raft.c Show resolved Hide resolved

philippeboyd force-pushed the bugfix/raft-disabled-nodes-should-not-report-leader-seen branch from eaa4dbd to 60ae034 Compare November 6, 2025 13:56

philippeboyd changed the title ~~fix(raft): fix election deadlock when nodes have election_mode off~~ raft: fix election deadlock when nodes have election_mode off Nov 6, 2025

Serpentian approved these changes Nov 7, 2025

View reviewed changes

sergepetrenko approved these changes Nov 7, 2025

View reviewed changes

sergepetrenko unassigned philippeboyd and Serpentian Nov 7, 2025

sergepetrenko added backport/3.2 Automatically create a 3.2 backport PR backport/3.3 Automatically create a 3.3 backport PR backport/3.4 Automatically create a 3.4 backport PR backport/3.5 Automatically create a 3.5 backport PR labels Nov 7, 2025

sergepetrenko force-pushed the bugfix/raft-disabled-nodes-should-not-report-leader-seen branch from 60ae034 to a9e7820 Compare November 7, 2025 11:48

sergepetrenko added full-ci Enables all tests for a pull request and removed full-ci Enables all tests for a pull request labels Nov 7, 2025

sergepetrenko merged commit 214b54c into tarantool:master Nov 7, 2025
59 checks passed

TarantoolBot mentioned this pull request Nov 7, 2025

[backport 3.2] raft: fix election deadlock when nodes have election_mode off #12020

Merged

TarantoolBot mentioned this pull request Nov 7, 2025

[backport 3.3] raft: fix election deadlock when nodes have election_mode off #12021

Merged

TarantoolBot mentioned this pull request Nov 7, 2025

[backport 3.4] raft: fix election deadlock when nodes have election_mode off #12022

Merged

TarantoolBot mentioned this pull request Nov 7, 2025

[backport 3.5] raft: fix election deadlock when nodes have election_mode off #12023

Merged

philippeboyd deleted the bugfix/raft-disabled-nodes-should-not-report-leader-seen branch November 24, 2025 21:31

Conversation

philippeboyd commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sergepetrenko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Serpentian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philippeboyd commented Nov 6, 2025

Uh oh!

sergepetrenko left a comment

Choose a reason for hiding this comment

Uh oh!

sergepetrenko commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

TarantoolBot commented Nov 7, 2025

Uh oh!

TarantoolBot commented Nov 7, 2025

Uh oh!

TarantoolBot commented Nov 7, 2025

Uh oh!

TarantoolBot commented Nov 7, 2025

Uh oh!

TarantoolBot commented Nov 7, 2025

Backport summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

philippeboyd commented Oct 24, 2025 •

edited

Loading

coveralls commented Oct 28, 2025 •

edited

Loading

sergepetrenko commented Nov 7, 2025 •

edited

Loading