Fix WAL handling on consensus snapshot by timvisee · Pull Request #7577 · qdrant/qdrant

timvisee · 2025-11-21T14:30:14Z

Fix how we handle WAL clearing when we apply a consensus snapshot.

The main problem this tries to solve is the following: in this function, we clear the WAL before we persist the new consensus state from the snapshot. If we crash in between, we only clear the WAL while keeping old consensus state. This will cause a crash loop on startup.

To fix the problem we flip the order. We now clear the WAL after persisting the new consensus state. On startup we truncate any extra entries from the WAL if we failed to clear it.

Needs some more testing before we apply this change.

This needs fixes in WAL crate here: qdrant/wal#99

Applying qdrant/wal#99 also fixes WAL CRC issues that pop up like this log line:

2025-11-21T15:23:22.205425Z  WARN wal::segment: CRC mismatch at offset 40: 414236364 != 838619149

Testing

Testing this change is not easy, and I don't see how I can write a nice automated test for it. I tested this PR including qdrant/wal#99.

What I did is this:

set QDRANT__CLUSTER__CONSENSUS__COMPACT_WAL_ENTRIES=5 to enable aggressive consensus WAL compaction
start cluster of 3 nodes
kill last node
send DELETE /cluster/peer/1 at least 5 times to trigger WAL compaction

a. without this PR: add panic in between these two statements to simulate crash at the right time:

qdrant/lib/storage/src/content_manager/consensus_manager.rs

Lines 557 to 563 in 1b6e525

    
           self.wal.lock().clear()?; 
        
           self.persistent.write().update_from_snapshot( 
        
               meta, 
        
               address_by_id, 
        
               metadata_by_id, 
        
               cluster_metadata, 
        
           )?;

b. or with this PR: add panic in between these two statements to simulate crash at the right time:

qdrant/lib/storage/src/content_manager/consensus_manager.rs

Lines 581 to 593 in 0f274c5

    
           self.persistent.write().update_from_snapshot( 
        
               meta, 
        
               address_by_id, 
        
               metadata_by_id, 
        
               cluster_metadata, 
        
           )?; 
        
           // Clear now obsolete WAL entries after persisting new Raft state 
        
           // This way we prevent a crash due to an empty WAL if we crash right after clearing it, 
        
           // without bumping the Raft state. If we now crash after persisting the new state but 
        
           // before clearing the WAL, we will clear the WAL on next startup by truncating all entries 
        
           // above our commit. 
        
           self.wal.lock().clear()?;

(recompile and) start 3rd node again
node now applies consensus snapshot and crashes due to panic inserted in point 5
remove panic added in point 5
(recompile and) start 3rd node again

a. before this PR: crash loop!
b. or after this PR: start and continue fine, and notice a log line like this:

2025-11-21T15:31:06.595041Z  WARN storage::content_manager::consensus_manager: Consensus WAL has 34 unapplied entries, truncating from index 0 onwards

Tasks

Review and merge Fix incorrect flush after truncate, other improvements wal#99
Bump wal dependency: Bump wal dependency to fix flush problems #7587
Validate we don't miss any consensus operations this way, we must re-receive what we truncate

All Submissions:

Contributions should target the dev branch. Did you create your branch from dev?
Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

lib/storage/src/content_manager/consensus_manager.rs

KShivendu

Before/after repro worked as intented. Very cool!

Reviewing the code properly now 👀

lib/storage/src/content_manager/consensus/consensus_wal.rs

lib/storage/src/content_manager/consensus_manager.rs

lib/storage/src/content_manager/consensus/persistent.rs

* After clearing WAL, flush segment * Add debug log when WAL is cleared * Clear WAL on consensus snapshot after writing state, truncate on start * Apply consensus snapshot offset * Fix off by one error * Tweak debug assertion message * Change WAL reconciliation condition, and fully clear WAL in this case * Add debug assertion to prove Raft index and snapshot index are equal * Add documentation to resolve bot nit * Return error on WAL clear failure * Fix typo * Remove unused truncate functions

timvisee added 3 commits November 21, 2025 10:43

After clearing WAL, flush segment

1e5d073

Add debug log when WAL is cleared

61c5cfb

Clear WAL on consensus snapshot after writing state, truncate on start

7821c9c

timvisee force-pushed the consensus-snapshot-wal-fixes branch from 0dd1629 to 1cf5b47 Compare November 21, 2025 14:45

timvisee mentioned this pull request Nov 21, 2025

Fix incorrect flush after truncate, other improvements qdrant/wal#99

Merged

timvisee marked this pull request as ready for review November 21, 2025 15:36

timvisee requested review from KShivendu, agourlay, ffuugoo and generall November 21, 2025 15:36

This comment was marked as resolved.

Sign in to view

qdrant deleted a comment from coderabbitai bot Nov 21, 2025

timvisee added 3 commits November 21, 2025 16:57

Apply consensus snapshot offset

fcb4114

Fix off by one error

f6e454f

Tweak debug assertion message

0f274c5

timvisee force-pushed the consensus-snapshot-wal-fixes branch from 16403e4 to 0f274c5 Compare November 21, 2025 15:57

ffuugoo reviewed Nov 21, 2025

View reviewed changes

lib/storage/src/content_manager/consensus_manager.rs Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

KShivendu reviewed Nov 21, 2025

View reviewed changes