Skip to content

db/state: add optional throttle to MergeLoop to reduce disk I/O pressure#20486

Merged
AskAlexSharov merged 1 commit into
erigontech:mainfrom
lemenkov:merge_throttle
Apr 11, 2026
Merged

db/state: add optional throttle to MergeLoop to reduce disk I/O pressure#20486
AskAlexSharov merged 1 commit into
erigontech:mainfrom
lemenkov:merge_throttle

Conversation

@lemenkov

Copy link
Copy Markdown
Contributor

Problem

When Erigon is running at chain tip, MergeLoop executes merge steps back-to-back with no pause between iterations. Each merge step involves heavy disk I/O (reading, compressing, and writing state files). Running these steps consecutively saturates the disk, starving block execution of I/O bandwidth.

The result is periodic block processing stalls: the node's reported block number freezes for minutes at a time while background merges consume all available I/O, then bursts forward when a merge step completes. During these stalls the node falls behind the chain tip and is marked unhealthy by load balancers.

Observed behavior

On a production fleet running Erigon v3.3.x on AWS Graviton instances (64GB RAM, EBS gp3 volumes), we observed the following pattern during MergeLoop activity on individual nodes:

  • Block execution throughput drops from ~20 Mgas/s to 1-5 Mgas/s
  • Node block number freezes for 8-16 minutes per merge step
  • Page cache eviction of 16GB+ as merge I/O displaces cached state data
  • Lag accumulates at ~5 blocks/minute during each stall
  • Worst observed: 164 blocks behind over a 188-minute period of continuous merge activity

The node always recovers eventually, but the stalls cause the node to be removed from load balancer rotation, reducing fleet capacity.

Solution

Add a configurable delay between MergeLoop iterations via the MERGE_THROTTLE_MS environment variable (default 0, preserving current behavior). The delay is inserted after each successful mergeLoopStep, giving block execution a window to access the disk before the next merge step begins.

Before (current):
  mergeLoopStep()  → heavy I/O
  mergeLoopStep()  → immediately, more heavy I/O
  mergeLoopStep()  → immediately, more heavy I/O
 
After (with ERIGON_MERGE_THROTTLE_MS=2000):
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up

Production results

We have been running this patch on a 3-node production fleet since December 2025. Results:

  • Individual node availability during merge-heavy periods improved from ~90% to >99%
  • Block execution stalls reduced from 8-16 minutes to under 5 minutes
  • Nodes maintain chain tip proximity during merge activity
  • No negative impact on merge completion time (merges still finish, just spread over a slightly longer window)
  • Fleet-wide availability (via load-balanced proxy) is near 99.99%, with the remaining downtime caused by synchronized stalls that this patch and AGGREGATION_DELAY_MS (PR Reduce impact of synchronized aggregation across fleet nodes #20391) address together

Recommended values based on our testing:

Use case Value Effect
Default (no throttle) 0 Current behavior, no change
Light throttle 500 Slight breathing room between merges
Production RPC nodes 2000 Good balance of merge progress and block execution
Heavy RPC workload 5000 Prioritize block execution over merge speed

Notes

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

i'm not sure that

 
After (with ERIGON_MERGE_THROTTLE_MS=2000):
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up

is better. because merge step can take 24hours (on very large .kv files)

@lemenkov

Copy link
Copy Markdown
Contributor Author

i'm not sure that

 
After (with ERIGON_MERGE_THROTTLE_MS=2000):
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up
  mergeLoopStep()  → heavy I/O
  sleep(2s)        → block execution catches up

is better. because merge step can take 24hours (on very large .kv files)

Good point! The throttle wouldn't help during a single merge of large historical files (initial sync?). That's a different problem entirely (and NO_DEEP_MERGE_HISTORY=true is probably the right workaround for that).

What we're addressing is the chain-tip case: multiple smaller merge steps (e.g. step ranges 2048-2176, 2176-2192, 2192-2194 - real scenario btw) running back-to-back, each taking a few minutes. Without a pause between them, block execution is starved of I/O continuously across the full sequence. The customizable gap lets block execution process pending blocks between each step.

On our fleet at chain tip, individual merge steps take 1-5 minutes, not hours. The stalls come from running 3-5 of them consecutively with no break.

09:59:34  BuildFilesInBackground step=2193
10:02:09  aggregated step=2193 took=2m30s
10:02:09  MergeLoop throttle enabled delay_ms=2000
10:02:41  serial executed blk=24826479 gas/s=4.97M   ← catching up during merge
10:04:24  Execution DONE in=2m20s block=24826482
10:06:02  Execution DONE in=1m16s block=24826488
10:07:16  Execution DONE in=1m9s block=24826494

Without the throttle, those merge steps run immediately one after another and block execution stays at 1-5 Mgas/s for the entire duration. With the throttle, each 2-second gap allows a burst of block processing at closer to normal throughput.

If you'd like to verify we could come uop with a few easy approaches:

  1. Side-by-side comparison: if you have two nodes at chain tip, set ERIGON_MERGE_THROTTLE_MS=2000 on one and leave the other at default. Compare mgas/s and block age during the next merge cycle.

  2. Single node, restart at chain tip: let a node sync normally, then restart it with ERIGON_MERGE_THROTTLE_MS=2000. No re-sync needed, btw. The next MergeLoop cycle will use the throttle.

  3. Just check existing logs: look at any node at chain tip during MergeLoop, and if you see consecutive merge steps completing with sustained mgas/s drops between them, that's exactly where the 2-second gap helps.

We are willing to add more information! We are very open in this regards.

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

individual merge steps take 1-5 minutes, - in your case it's acceptable to have 5min of 1-5 Mgas/s?

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

NO_DEEP_MERGE_HISTORY=true - is good workaround. but it's only for history files (.ef/.v), not for domain files (.kv).

real source of high io during .kv merge is: #14809 but i'm not sure when we will able to release full-fix for this issue.

So, I would just merge your PR. Because it adding some flexibility. And if we have time - we can replace this env variables by some sort of randomized-time.

The MergeLoop background goroutine performs continuous heavy disk I/O
when merging state files. This saturates disk (90%+ utilization),
blocking block execution which competes for the same storage.

Symptoms observed:
- Block drift increases during merge operations
- Synchronized stalls across nodes at similar block heights
- RPC timeouts and degraded service

Add ERIGON_MERGE_THROTTLE_MS environment variable to insert a pause
between merge operations. This reduces disk contention and allows
operators to desynchronize merge timing across a node fleet.

Usage:
  ERIGON_MERGE_THROTTLE_MS=500 ./erigon ...

Default behavior (no throttle) is preserved when unset.

Signed-off-by: Peter Lemenkov <lemenkov@gmail.com>
Assisted-by: Claude (Anthropic) <https://claude.ai>
@lemenkov

lemenkov commented Apr 11, 2026

Copy link
Copy Markdown
Contributor Author

individual merge steps take 1-5 minutes, - in your case it's acceptable to have 5min of 1-5 Mgas/s?

Yes. We rely on a proxy-server on top of it and it'll handle the dispatching.

@AskAlexSharov AskAlexSharov enabled auto-merge April 11, 2026 03:30
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 11, 2026
Merged via the queue into erigontech:main with commit 302e128 Apr 11, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants