Skip to content

[backport 3.4] limbo: do not limit its size on replicas#12027

Merged
Gerold103 merged 1 commit intorelease/3.4from
backport/release/3.4/12017
Nov 12, 2025
Merged

[backport 3.4] limbo: do not limit its size on replicas#12027
Gerold103 merged 1 commit intorelease/3.4from
backport/release/3.4/12017

Conversation

@TarantoolBot
Copy link
Collaborator

@TarantoolBot TarantoolBot commented Nov 11, 2025

(This PR is a backport of #12017 to release/3.4 to a future 3.4.2 release.)


There was a possible deadlock when a replica had
box.cfg.replication_synchro_queue_max_size smaller than the master.

The scenario was that the replica would receive some transactions, they would all enter the limbo and wait for CONFIRM in "submitted" state.

But the master sends more transactions instead of CONFIRM. Those transactions block the applier fiber in txn_commit_submit(), because the fiber can't exceed the limbo max size and is waiting for free space.

The free space however will never appear, because those "submitted" transactions aren't going anywhere until CONFIRM receipt. Which in turn will never happen, because the applier fiber is blocked on waiting for limbo space.

The only way is to let the replica apply these transactions bypassing the limbo max size limitation. It makes no sense to block them. Otherwise their CONFIRM can't be received.

This was probably working until
commit 20aad15 ("limbo: handle spurious wakeups on space waiting") (not counting that before that it was broken in many other ways), but seems like wasn't covered by the tests.

Closes #11836

NO_DOC=bugfix

There was a possible deadlock when a replica had
box.cfg.replication_synchro_queue_max_size smaller than the
master.

The scenario was that the replica would receive some transactions,
they would all enter the limbo and wait for CONFIRM in "submitted"
state.

But the master sends more transactions instead of CONFIRM. Those
transactions block the applier fiber in txn_commit_submit(),
because the fiber can't exceed the limbo max size and is waiting
for free space.

The free space however will never appear, because those
"submitted" transactions aren't going anywhere until CONFIRM
receipt. Which in turn will never happen, because the applier
fiber is blocked on waiting for limbo space.

The only way is to let the replica apply these transactions
bypassing the limbo max size limitation. It makes no sense to
block them. Otherwise their CONFIRM can't be received.

This was probably working until commit
20aad15 ("limbo: handle
spurious wakeups on space waiting") (not counting that before that
it was broken in many other ways), but seems like wasn't covered
by the tests.

Closes #11836

NO_DOC=bugfix

(cherry picked from commit cc77a6e)
@TarantoolBot TarantoolBot requested a review from a team as a code owner November 11, 2025 21:57
@TarantoolBot TarantoolBot changed the title [Backport release/3.4] limbo: do not limit its size on replicas [backport 3.4] limbo: do not limit its size on replicas Nov 11, 2025
@coveralls
Copy link

Coverage Status

coverage: 87.542% (-0.04%) from 87.579%
when pulling a40a0c3 on backport/release/3.4/12017
into ff5a300
on release/3.4
.

@Gerold103 Gerold103 removed the request for review from a team November 11, 2025 22:58
@Gerold103 Gerold103 merged commit bd9df2b into release/3.4 Nov 12, 2025
25 checks passed
@Gerold103 Gerold103 deleted the backport/release/3.4/12017 branch November 12, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants