db/state: add optional throttle to MergeLoop to reduce disk I/O pressure#20486
Conversation
|
i'm not sure that is better. because merge step can take 24hours (on very large .kv files) |
Good point! The throttle wouldn't help during a single merge of large historical files (initial sync?). That's a different problem entirely (and What we're addressing is the chain-tip case: multiple smaller merge steps (e.g. step ranges 2048-2176, 2176-2192, 2192-2194 - real scenario btw) running back-to-back, each taking a few minutes. Without a pause between them, block execution is starved of I/O continuously across the full sequence. The customizable gap lets block execution process pending blocks between each step. On our fleet at chain tip, individual merge steps take 1-5 minutes, not hours. The stalls come from running 3-5 of them consecutively with no break. Without the throttle, those merge steps run immediately one after another and block execution stays at 1-5 Mgas/s for the entire duration. With the throttle, each 2-second gap allows a burst of block processing at closer to normal throughput. If you'd like to verify we could come uop with a few easy approaches:
We are willing to add more information! We are very open in this regards. |
|
|
|
real source of high io during .kv merge is: #14809 but i'm not sure when we will able to release full-fix for this issue. So, I would just merge your PR. Because it adding some flexibility. And if we have time - we can replace this env variables by some sort of randomized-time. |
The MergeLoop background goroutine performs continuous heavy disk I/O when merging state files. This saturates disk (90%+ utilization), blocking block execution which competes for the same storage. Symptoms observed: - Block drift increases during merge operations - Synchronized stalls across nodes at similar block heights - RPC timeouts and degraded service Add ERIGON_MERGE_THROTTLE_MS environment variable to insert a pause between merge operations. This reduces disk contention and allows operators to desynchronize merge timing across a node fleet. Usage: ERIGON_MERGE_THROTTLE_MS=500 ./erigon ... Default behavior (no throttle) is preserved when unset. Signed-off-by: Peter Lemenkov <lemenkov@gmail.com> Assisted-by: Claude (Anthropic) <https://claude.ai>
c93cd42 to
b777f31
Compare
Yes. We rely on a proxy-server on top of it and it'll handle the dispatching. |
Problem
When Erigon is running at chain tip,
MergeLoopexecutes merge steps back-to-back with no pause between iterations. Each merge step involves heavy disk I/O (reading, compressing, and writing state files). Running these steps consecutively saturates the disk, starving block execution of I/O bandwidth.The result is periodic block processing stalls: the node's reported block number freezes for minutes at a time while background merges consume all available I/O, then bursts forward when a merge step completes. During these stalls the node falls behind the chain tip and is marked unhealthy by load balancers.
Observed behavior
On a production fleet running Erigon v3.3.x on AWS Graviton instances (64GB RAM, EBS gp3 volumes), we observed the following pattern during MergeLoop activity on individual nodes:
The node always recovers eventually, but the stalls cause the node to be removed from load balancer rotation, reducing fleet capacity.
Solution
Add a configurable delay between
MergeLoopiterations via theMERGE_THROTTLE_MSenvironment variable (default 0, preserving current behavior). The delay is inserted after each successfulmergeLoopStep, giving block execution a window to access the disk before the next merge step begins.Production results
We have been running this patch on a 3-node production fleet since December 2025. Results:
AGGREGATION_DELAY_MS(PR Reduce impact of synchronized aggregation across fleet nodes #20391) address togetherRecommended values based on our testing:
Notes
COMPRESS_WORKERS(PR Reduce impact of background merge/compress to ChainTip #18995) which reduces I/O pressure within each merge step by limiting worker parallelism. This PR addresses I/O pressure between merge steps.AGGREGATION_DELAY_MS(PR Reduce impact of synchronized aggregation across fleet nodes #20391, merged) which staggers the start time of aggregation across fleet nodes.