Add ra_log_sync to batch and serialize snapshot fsyncs#582
Merged
Conversation
Introduce a per-system gen_batch_server sync pool that serializes fsync calls across all Ra servers, preventing N parallel fsyncs from saturating the disk during concurrent snapshot writes.
Switch zpad_hex/1 from io_lib:format to integer_to_binary/2, returning a binary instead of a char list. This is up to 3x faster and avoids list allocations in make_snapshot_dir/3. Add missing sync_dir calls after snapshot writes and checkpoint promotions to ensure directory entries are durable. The dir syncs are included in the ra_log_sync fun so they are batched alongside the file fsync rather than running on the caller.
When the segment writer flushed mem tables with live indexes present, it used the last live index as the boundary for normal log entries. This meant all entries above the last live index were written to segments, even thouse already covered by the snapshot.
mkuratczyk
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduce ra_log_sync, a per-system pool of gen_batch_server workers that serialise and batch fsync calls across all Ra servers. Snapshot and checkpoint-promotion fsyncs are routed through this pool instead of running N parallel fsyncs that can saturate the disk. The pool size scales with the scheduler count and workers process batches in reverse arrival order so the most recently written file is synced first, flushing the filesystem journal and making subsequent syncs near-instant.
Avoid flushing entries below the snapshot index to segments. The segment writer previously used the last live index as the lower bound when writing mem table entries, which could cause entries already covered by the snapshot to be written to segments unnecessarily. The boundary is now derived from the snapshot index instead.
Open a new segment when the current one contains only stale entries. When all entries in the active segment are below the smallest live index and the segment exceeds a (lowered) 1 KB data-size threshold, the segment writer rolls over to a fresh segment. This avoids appending new entries into segments dominated by dead data.
Add WAL fill ratio tracking. New counters (current_file_size, max_file_size) are maintained in the WAL and exposed via ra_log_wal:fill_ratio/1 and ra_aux:wal_fill_ratio/1, allowing callers to observe how full the current WAL file is.
Optimise ra_lib:make_dir/1. Replace the is_dir + ensure_dir + make_dir sequence with a single prim_file:make_dir/1 call that handles eexist and falls back to ensure_dir only on enoent, avoiding redundant syscalls on the hot path.
Optimise ra_lib:zpad_hex/1. Switch from io_lib:format/2 to integer_to_binary/2, returning a binary instead of a char list. This is up to 3x faster and eliminates list allocations in make_snapshot_dir/3, which now concatenates binaries directly.
Add directory fsync calls for snapshot durability. sync_dir calls are added after snapshot writes and checkpoint promotions to ensure directory entries are durable. These are batched through ra_log_sync alongside the file fsync.
Move make_snapshot_dir into the background work fun so the directory path is computed on the worker process rather than the caller.