Skip to content

fix: propagate WAL CRC chain across snapshot recovery#901

Merged
mattisonchao merged 1 commit intomainfrom
fix/wal-crc-divergence-after-snapshot
Feb 24, 2026
Merged

fix: propagate WAL CRC chain across snapshot recovery#901
mattisonchao merged 1 commit intomainfrom
fix/wal-crc-divergence-after-snapshot

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

@mattisonchao mattisonchao commented Feb 24, 2026

Motivation

When a follower installs a snapshot, wal.Clear() resets the CRC chain to 0. Since CRC(n) = CRC32(CRC(n-1) + payload(n)), all subsequent WAL CRCs diverge from the leader even though the DB data is identical (fixes #898).

Modification

  • Add previous_entry_crc field to the Append protobuf message for CRC propagation during replication.
  • Expose previousCrc through the WAL read chain (ReadRecordWithValidationsegment.Readwal.readAtIndexReader.ReadNext) so the leader can include it in every Append message.
  • Add AppendAsyncWithPreviousCrc(entry, previousCrc) to the Wal interface so the follower can seed the CRC chain when appending after snapshot install.
  • Preserve the caller's CRC seed in readWriteSegment when RecoverIndex returns an empty segment.
  • Add U32Zero constant to common/constant for typed zero CRC values.
  • Add TestReadNextReturnsCrc and TestAppendAsyncWithPreviousCrc WAL tests.
  • Update TestFollowerCursor_SendSnapshot to verify non-zero PreviousEntryCrc after snapshot.

@mattisonchao mattisonchao force-pushed the fix/wal-crc-divergence-after-snapshot branch 3 times, most recently from dfaca6d to 5f32eaf Compare February 24, 2026 08:16
When a follower installs a snapshot, wal.Clear() resets the CRC chain
to 0. Since CRC(n) = CRC32(CRC(n-1) + payload(n)), all subsequent WAL
CRCs diverge from the leader even though the DB data is identical.

Fix this by propagating the previous entry's CRC through the replication
protocol. The leader includes previousEntryCrc in every Append message
(read from the WAL record), and the follower uses it to seed the CRC
chain via AppendAsyncWithPreviousCrc when the WAL is empty after
snapshot install.

Closes #898

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao mattisonchao force-pushed the fix/wal-crc-divergence-after-snapshot branch from 5f32eaf to 23aa1c3 Compare February 24, 2026 08:23
@mattisonchao mattisonchao self-assigned this Feb 24, 2026
@mattisonchao mattisonchao merged commit 471cb49 into main Feb 24, 2026
17 of 21 checks passed
@mattisonchao mattisonchao deleted the fix/wal-crc-divergence-after-snapshot branch February 24, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WAL checksum divergence detected across replicas under chaos testing

1 participant