Skip to content

limbo: fix a crash in linearization point waiting#11845

Merged
Gerold103 merged 1 commit intomasterfrom
gerold103/gh-11807-linearization-point-crash
Oct 1, 2025
Merged

limbo: fix a crash in linearization point waiting#11845
Gerold103 merged 1 commit intomasterfrom
gerold103/gh-11807-linearization-point-crash

Conversation

@Gerold103
Copy link
Collaborator

The DB state synchronization for a linearizable transaction is trying to wait for the receipt of all potentially confirmed synchro txns from a remote master, and then waits for their confirmation locally. Thus guaranteeing, that if any transaction was committed on the master before this point, then it is now visible on the current replica too.

Waiting for the synchro txns confirmation was done in a way that if the limbo isn't empty, then it 100% must contain a synchro txn in it.

But it is not always so. Sometimes it might contain a volatile async txn, which isn't written to WAL yet. Or it might even contain dummy entries created by the limbo flush operation (for a snapshot, for a new replica join). About these things the linearization sync must not care and should treat them like if the limbo is empty.

Note that there might be more places where txn_limbo_is_empty() seems not exactly 100% safe to use, but for none of them a reproducer could be designed. The other places might actually be safe until proven otherwise.

Closes #11807

NO_DOC=bugfix

The DB state synchronization for a linearizable transaction is
trying to wait for the receipt of all potentially confirmed
synchro txns from a remote master, and then waits for their
confirmation locally. Thus guaranteeing, that if any transaction
was committed on the master before this point, then it is now
visible on the current replica too.

Waiting for the synchro txns confirmation was done in a way that
if the limbo isn't empty, then it 100% must contain a synchro txn
in it.

But it is not always so. Sometimes it might contain a volatile
async txn, which isn't written to WAL yet. Or it might even
contain dummy entries created by the limbo flush operation (for a
snapshot, for a new replica join). About these things the
linearization sync must not care and should treat them like if the
limbo is empty.

Note that there might be more places where txn_limbo_is_empty()
seems not exactly 100% safe to use, but for none of them a
reproducer could be designed. The other places might actually be
safe until proven otherwise.

Closes #11807

NO_DOC=bugfix
@Gerold103 Gerold103 self-assigned this Sep 16, 2025
@coveralls
Copy link

Coverage Status

coverage: 87.621% (-0.01%) from 87.632%
when pulling 557c407 on gerold103/gh-11807-linearization-point-crash
into 4a329ce
on master
.

@Gerold103 Gerold103 marked this pull request as ready for review September 16, 2025 22:11
@Gerold103 Gerold103 requested a review from a team as a code owner September 16, 2025 22:11
@Serpentian Serpentian self-assigned this Sep 19, 2025
@Gerold103
Copy link
Collaborator Author

@Serpentian , ping 🥹.

Copy link
Contributor

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch is great, but I'm afraid there're two more places, where the same type of crash may happen, let's either make sure it's not possible there or fix them/create tickets

@Serpentian Serpentian removed their assignment Sep 27, 2025
Copy link
Contributor

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any more comments, thank you!

Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
LGTM.

@sergepetrenko sergepetrenko added backport/3.3 Automatically create a 3.3 backport PR backport/3.4 Automatically create a 3.4 backport PR backport/3.5 Automatically create a 3.5 backport PR labels Sep 30, 2025
@sergepetrenko sergepetrenko added the full-ci Enables all tests for a pull request label Sep 30, 2025
@Gerold103 Gerold103 merged commit 4db3d1e into master Oct 1, 2025
90 of 91 checks passed
@Gerold103 Gerold103 deleted the gerold103/gh-11807-linearization-point-crash branch October 1, 2025 17:51
@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.3:

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.4:

@TarantoolBot
Copy link
Collaborator

Successfully created backport PR for release/3.5:

@TarantoolBot
Copy link
Collaborator

Backport summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/3.3 Automatically create a 3.3 backport PR backport/3.4 Automatically create a 3.4 backport PR backport/3.5 Automatically create a 3.5 backport PR full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qsync: txn_limbo_wait_last_txn segfaults

6 participants