Skip to content

[backport 3.3] box: forbid concurrent invocation of box_raft_try_promote#12044

Merged
sergepetrenko merged 1 commit intorelease/3.3from
backport/release/3.3/11882
Nov 24, 2025
Merged

[backport 3.3] box: forbid concurrent invocation of box_raft_try_promote#12044
sergepetrenko merged 1 commit intorelease/3.3from
backport/release/3.3/11882

Conversation

@TarantoolBot
Copy link
Collaborator

@TarantoolBot TarantoolBot commented Nov 20, 2025

(This PR is a backport of #11882 to release/3.3 to a future 3.3.4 release.)


This patch fixes two bugs related to concurrent invocation of box_raft_try_promote.

Closes #11703
Closes #11708

@TarantoolBot TarantoolBot requested a review from a team as a code owner November 20, 2025 07:46
@TarantoolBot TarantoolBot changed the title [Backport release/3.3] box: forbid concurrent invocation of box_raft_try_promote [backport 3.3] box: forbid concurrent invocation of box_raft_try_promote Nov 20, 2025
@sergepetrenko
Copy link
Collaborator

@CuriousGeorgiy could you please rewrite the test so that it doesn't rely on make_bootstrap_leader{graceful} here and in #12043 ?

Copy link

@AArdeev AArdeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good

@CuriousGeorgiy CuriousGeorgiy force-pushed the backport/release/3.3/11882 branch from f4add23 to fb4f71c Compare November 23, 2025 17:15
Currently, we allow concurrent invocation of `box_raft_try_promote`, since
we either disable the `is_in_promote` guard in `box_promote` or omit it in
`box_cfg_xc`, while `box_raft_try_promote` yields to write the raft state.

This leads to a race over the `diag` of the fiber executing the
`box_raft_try_promote_f` trigger, and can lead to more bugs.

Let's forbid concurrent execution of `box_raft_try_promote` by enabling the
`is_in_promote` guard while `box_raft_try_promote` is called.

One potential caveat is the concurrent execution of `box_promote_qsync`
which is also guarded by the `is_in_promote`. However, it is executed in
the raft worker fiber and can be retried until `box_raft_try_promote`
execution finishes.

To be on the safer side, let's:
1. Call `raft_restore` before setting the fiber's `diag` to prevent any
    potential tampering with it from `raft->on_update` triggers.
2. Set `is_box_configured` right before calling `box_raft_try_promote` to
   maintain the invariant that it is always called by `box.cfg` first. Also
   move `box_broadcast_ballot` for consistency.

The `raft_leader_promote` test from the gh-6033 test group should now test
that a concurrent promote fails rather than succeeds.

Closes #11703
Closes #11708

NO_DOC=<bugfix>

(cherry picked from commit b592845)
@CuriousGeorgiy CuriousGeorgiy force-pushed the backport/release/3.3/11882 branch from fb4f71c to 1b576cc Compare November 23, 2025 17:38
@CuriousGeorgiy
Copy link
Member

@sergepetrenko replaced it with bootstrap_mode = 'config' here and in #12043.

@CuriousGeorgiy CuriousGeorgiy removed their assignment Nov 23, 2025
@coveralls
Copy link

Coverage Status

coverage: 87.56% (-0.01%) from 87.572%
when pulling 1b576cc on backport/release/3.3/11882
into 983f792
on release/3.3
.

@sergepetrenko sergepetrenko merged commit 036c8c1 into release/3.3 Nov 24, 2025
25 checks passed
@sergepetrenko sergepetrenko deleted the backport/release/3.3/11882 branch November 24, 2025 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants