Skip to content

Fix WAL handling on consensus snapshot#7577

Merged
timvisee merged 12 commits intodevfrom
consensus-snapshot-wal-fixes
Nov 24, 2025
Merged

Fix WAL handling on consensus snapshot#7577
timvisee merged 12 commits intodevfrom
consensus-snapshot-wal-fixes

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Nov 21, 2025

Fix how we handle WAL clearing when we apply a consensus snapshot.

The main problem this tries to solve is the following: in this function, we clear the WAL before we persist the new consensus state from the snapshot. If we crash in between, we only clear the WAL while keeping old consensus state. This will cause a crash loop on startup.

To fix the problem we flip the order. We now clear the WAL after persisting the new consensus state. On startup we truncate any extra entries from the WAL if we failed to clear it.

Needs some more testing before we apply this change.

This needs fixes in WAL crate here: qdrant/wal#99

Applying qdrant/wal#99 also fixes WAL CRC issues that pop up like this log line:

2025-11-21T15:23:22.205425Z  WARN wal::segment: CRC mismatch at offset 40: 414236364 != 838619149

Testing

Testing this change is not easy, and I don't see how I can write a nice automated test for it. I tested this PR including qdrant/wal#99.

What I did is this:

  1. set QDRANT__CLUSTER__CONSENSUS__COMPACT_WAL_ENTRIES=5 to enable aggressive consensus WAL compaction
  2. start cluster of 3 nodes
  3. kill last node
  4. send DELETE /cluster/peer/1 at least 5 times to trigger WAL compaction
  5. a. without this PR: add panic in between these two statements to simulate crash at the right time:
    self.wal.lock().clear()?;
    self.persistent.write().update_from_snapshot(
    meta,
    address_by_id,
    metadata_by_id,
    cluster_metadata,
    )?;

    b. or with this PR: add panic in between these two statements to simulate crash at the right time:
    self.persistent.write().update_from_snapshot(
    meta,
    address_by_id,
    metadata_by_id,
    cluster_metadata,
    )?;
    // Clear now obsolete WAL entries after persisting new Raft state
    // This way we prevent a crash due to an empty WAL if we crash right after clearing it,
    // without bumping the Raft state. If we now crash after persisting the new state but
    // before clearing the WAL, we will clear the WAL on next startup by truncating all entries
    // above our commit.
    self.wal.lock().clear()?;
  6. (recompile and) start 3rd node again
  7. node now applies consensus snapshot and crashes due to panic inserted in point 5
  8. remove panic added in point 5
  9. (recompile and) start 3rd node again
  10. a. before this PR: crash loop!
    b. or after this PR: start and continue fine, and notice a log line like this:
    2025-11-21T15:31:06.595041Z  WARN storage::content_manager::consensus_manager: Consensus WAL has 34 unapplied entries, truncating from index 0 onwards
    

Tasks

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee force-pushed the consensus-snapshot-wal-fixes branch from 0dd1629 to 1cf5b47 Compare November 21, 2025 14:45
@timvisee timvisee marked this pull request as ready for review November 21, 2025 15:36
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 21, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Nov 21, 2025
@timvisee timvisee force-pushed the consensus-snapshot-wal-fixes branch from 16403e4 to 0f274c5 Compare November 21, 2025 15:57
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Member

@KShivendu KShivendu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before/after repro worked as intented. Very cool!

Reviewing the code properly now 👀

@qdrant qdrant deleted a comment from coderabbitai bot Nov 24, 2025
@qdrant qdrant deleted a comment from coderabbitai bot Nov 24, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 24, 2025
@timvisee timvisee requested a review from ffuugoo November 24, 2025 12:23
@qdrant qdrant deleted a comment from coderabbitai bot Nov 24, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 24, 2025
@timvisee timvisee merged commit 549f13f into dev Nov 24, 2025
16 checks passed
@timvisee timvisee deleted the consensus-snapshot-wal-fixes branch November 24, 2025 14:49
timvisee added a commit that referenced this pull request Nov 25, 2025
* After clearing WAL, flush segment

* Add debug log when WAL is cleared

* Clear WAL on consensus snapshot after writing state, truncate on start

* Apply consensus snapshot offset

* Fix off by one error

* Tweak debug assertion message

* Change WAL reconciliation condition, and fully clear WAL in this case

* Add debug assertion to prove Raft index and snapshot index are equal

* Add documentation to resolve bot nit

* Return error on WAL clear failure

* Fix typo

* Remove unused truncate functions
@timvisee timvisee mentioned this pull request Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants