Conversation
The indexes file is intentionally not fsynced, so corruption after a crash is expected. Previously this caused a badmatch crash in find_snapshots during init. Now ra_snapshot carries the machine config and can recover live indexes by reading the snapshot and calling ra_machine:live_indexes/2, the same approach used in complete_accept.
When a crash occurs after the segment writer flushes a WAL file to
segments but before the WAL file is deleted, recovery replays the
same WAL creating segments that overlap with those from the first
flush. compact_segrefs correctly handles this by truncating the
range of partially overlapping segment refs. However the deletion
logic used the -- operator which compares full {Filename, Range}
tuples. A segment whose range was truncated (but not removed) no
longer matched its original ref, so it appeared in the diff and
was deleted even though the reader still referenced it. The
subsequent fold during state machine recovery then crashed with
ra_log_failed_to_open_segment enoent.
Compare by filename only when deciding which segments to delete,
so that segments still referenced by the reader (even with a
truncated range) are preserved.
When a leader receives a pre_vote_rpc from a follower with a stale term, make_all_rpcs now includes peers in snapshot_backoff status alongside normal peers. This ensures the lagging follower that triggered the pre-vote gets its snapshot expeditiously rather than waiting for the backoff timer to fire. The pending backoff timer is cancelled via a new cancel_snapshot_retry_timer effect before the RPC is sent.
…concurrently
During multi-file WAL recovery after a power-off, the segment writer
processes mem tables from earlier WAL files asynchronously. When servers
have no Pid (normal during recovery), the segment writer deletes entries
directly from the mem table ETS. If this deletion races with recovery of
the next WAL file, recover_entry calls mem_table_please which re-scans
the (now partially depleted) ETS table. The resulting ra_mt state has a
LastSeq that no longer matches the PrevIdx tracked in the writers map,
causing ra_mt:insert_sparse to return {error, gap_detected} — an
unhandled case_clause in recover_entry that crashes the node at boot.
Fix by carrying the Tables map across WAL files in the recovery fold,
alongside the already-carried Writers map. This way recover_entry reuses
the ra_mt state it built during earlier file recovery rather than
re-scanning a potentially mutated ETS table.
Made-with: Cursor
michaelklishin
approved these changes
Mar 11, 2026
Else it may fail to boot. Ignore for windows.
New servers should register _after_ log initialisation to ensure the config file is fully written as it is required for successful recovery
d84c687 to
3ec78e4
Compare
mkuratczyk
approved these changes
Mar 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes several issues that can cause Ra nodes to crash during boot after an unclean shutdown (e.g. power loss).
Changes
Recover corrupt snapshot indexes from machine state. The indexes file is not fsynced, so corruption after a crash is expected. Instead of crashing with a badmatch in find_snapshots, ra_snapshot now recovers live indexes by reading the snapshot and calling ra_machine:live_indexes/2.
Fix WAL recovery crash from concurrent segment writer deletes. During multi-file WAL recovery the segment writer can delete mem table ETS entries that the next WAL file's recovery depends on, causing an unhandled gap_detected error. Fixed by carrying the Tables map across WAL files so recovery reuses its own ra_mt state instead of re-scanning a mutated ETS table.
Fix segment deletion after dual WAL flush. After a crash between segment flush and WAL deletion, compact_segrefs truncates overlapping segment ranges — but the -- based deletion compared full tuples, so truncated-but-still-referenced segments were incorrectly deleted. Now compares by filename only.
Send RPCs to snapshot_backoff peers when leader enforces leadership. make_all_rpcs now includes snapshot_backoff peers so a lagging follower that triggered a pre_vote_rpc gets its snapshot immediately rather than waiting for the backoff timer.
Register new servers after log init. Ensures the config file is fully written before registration, as it is required for recovery.
Sync parent directory after creating config file. Ensures the directory entry is durable after a crash.