Skip to content

Snapshot might contain outdated limbo+raft checkpoints #11754

@Gerold103

Description

@Gerold103

The snapshot creation process in the beginning saves the limbo+raft checkpoints (see checkpoint_new()). Then it creates a read-view, waits for the currently running txns to get committed (so the read-view only contains committed data), and writes the snapshot.

The problem is that if there are non-committed synchro txns in the limbo, the checkpoint will be outdated by the time when those txns get confirmed and committed.

At least the confirmed vclock and the confirmed LSN are going to be too old. Older than the confirmed data.

It is unknown to which consequences this could lead to. But at the very least it might be a problem if the snapshot is taken somewhere without the following xlogs having the CONFIRM of the txns in this snap. Then we can get an instance with new data but old limbo state.

Still, not sure which bugs could be produced from this. Most certainly, something is broken.

Metadata

Metadata

Assignees

Labels

3.2Target is 3.2 and all newer release/master branchesbugSomething isn't workingqsync replicationrecoverySnapshot, WAL

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions