box: forbid concurrent invocation of box_raft_try_promote#11882
box: forbid concurrent invocation of box_raft_try_promote#11882sergepetrenko merged 1 commit intotarantool:masterfrom
box_raft_try_promote#11882Conversation
9157d89 to
37dbb0f
Compare
box_raft_try_promotebox_raft_try_promote
37dbb0f to
a807569
Compare
Gerold103
left a comment
There was a problem hiding this comment.
Thanks for the patch! Hm, tricky bug.
test/replication-luatest/gh_11708_manual_leader_election_concurrent_promote_crashes_test.lua
Show resolved
Hide resolved
changelogs/unreleased/manual-leader-election-concurrent-promote-crashes.md
Outdated
Show resolved
Hide resolved
b89bdf3 to
6727c59
Compare
6727c59 to
9519932
Compare
I'd like to note, that we should somehow cover that place, so that we don't introduce new triggers tomorrow, that'll clear the diag. Assertions in the |
9519932 to
334118e
Compare
Serpentian
left a comment
There was a problem hiding this comment.
Thank you for the patch! Looks like we're safe here now
sergepetrenko
left a comment
There was a problem hiding this comment.
Thanks for the patch!
Only one question from me.
Currently, we allow concurrent invocation of `box_raft_try_promote`, since
we either disable the `is_in_promote` guard in `box_promote` or omit it in
`box_cfg_xc`, while `box_raft_try_promote` yields to write the raft state.
This leads to a race over the `diag` of the fiber executing the
`box_raft_try_promote_f` trigger, and can lead to more bugs.
Let's forbid concurrent execution of `box_raft_try_promote` by enabling the
`is_in_promote` guard while `box_raft_try_promote` is called.
One potential caveat is the concurrent execution of `box_promote_qsync`
which is also guarded by the `is_in_promote`. However, it is executed in
the raft worker fiber and can be retried until `box_raft_try_promote`
execution finishes.
To be on the safer side, let's:
1. Call `raft_restore` before setting the fiber's `diag` to prevent any
potential tampering with it from `raft->on_update` triggers.
2. Set `is_box_configured` right before calling `box_raft_try_promote` to
maintain the invariant that it is always called by `box.cfg` first. Also
move `box_broadcast_ballot` for consistency.
The `raft_leader_promote` test from the tarantoolgh-6033 test group should now test
that a concurrent promote fails rather than succeeds.
Closes tarantool#11703
Closes tarantool#11708
NO_DOC=<bugfix>
334118e to
6f3eefa
Compare
|
Successfully created backport PR for |
|
Successfully created backport PR for |
|
Successfully created backport PR for |
|
Successfully created backport PR for |
Backport summary
|
This patch fixes two bugs related to concurrent invocation of
box_raft_try_promote.Closes #11703
Closes #11708