Skip to content

execution/execmodule: prune chaindata in CommitCycle (+ release roTx, bloatnet collation gate)#21192

Merged
AskAlexSharov merged 3 commits into
performancefrom
mdbx_bloat_prune_fix
May 15, 2026
Merged

execution/execmodule: prune chaindata in CommitCycle (+ release roTx, bloatnet collation gate)#21192
AskAlexSharov merged 3 commits into
performancefrom
mdbx_bloat_prune_fix

Conversation

@sudeepdino008

Copy link
Copy Markdown
Member

Summary

Three fixes to keep chaindata size bounded during long FCU catch-ups on bloatnet:

  • Release roTx before CommitCycle flush. Lets MDBX reclaim retired pages while the brief RwTx is writing. Flush/ClearRam don't read through roTx, so closing it early is safe.
  • Prune chaindata in CommitCycle. The in-loop PruneExecutionStage runs on the SharedDomains' block overlay (MemoryMutation backed by an RO tx). MemoryMutation.PruneSmallBatches silently no-ops when its backing tx isn't a TemporalRwTx, so prune doesn't fire for the entire duration of a hasMore loop — aggregated steps accumulate in chaindata domain tables for hours. After commit, invoke runForkchoicePrune on a fresh real RW tx so prune actually drains.
  • Skip state-collation block-snapshot gate on bloatnet. Aggregator's readyForCollation capTxNum is gated on block snapshots catching up. On a paused chain that gate blocks state aggregation → blocks prune → CommitmentVals grows unboundedly. For mainnet/sepolia/etc the gate stays.

Observed on bloatnet

In a 100m batchSize catch-up:

  • chaindata growth per cycle: ~2 GB → step-boundary drain ~−9 GB
  • file size oscillates instead of growing unboundedly (~35 GB ceiling vs >100 GB without the fix)

Test plan

  • CI on performance passes
  • Mainnet sync still healthy (the only path-sensitive change is the CommitCycle prune call; same code as runForkchoicePrune used elsewhere)

Holding the old RO snapshot across the flush prevents MDBX from
reclaiming retired pages while the RwTx is writing. Flush/ClearRam
are write-only into commitRwTx and do not read through roTx, so
closing roTx earlier is safe.
The in-loop PruneExecutionStage runs on the SharedDomains' block
overlay (MemoryMutation backed by an RO tx).
MemoryMutation.PruneSmallBatches silently returns nil when its backing
tx is not a TemporalRwTx, so prune is a no-op for the whole duration
of one RunLoop — aggregated steps pile up in chaindata domain tables
for hours.

After commitRwTx.Commit, invoke runForkchoicePrune on a fresh real RW
tx so prune actually drains the backlog.
Aggregator.readyForCollation gates state aggregation on block
snapshots catching up. When the chain is paused (CL not sending
blocks) the gate blocks state aggregation → blocks PruneSmallBatches
→ CommitmentVals keeps growing.

On bloatnet the recovery guarantee from the gate isn't critical
(devnet/debug workload; SetFrozenBlocksProvider docstring notes
recovery may need 'erigon seg rm-state --latest'). Mainnet, sepolia,
etc keep the gate.
@sudeepdino008 sudeepdino008 force-pushed the mdbx_bloat_prune_fix branch from 4832202 to be37298 Compare May 14, 2026 14:12
@AskAlexSharov AskAlexSharov merged commit e574a75 into performance May 15, 2026
37 checks passed
@AskAlexSharov AskAlexSharov deleted the mdbx_bloat_prune_fix branch May 15, 2026 00:59
sudeepdino008 added a commit that referenced this pull request May 15, 2026
## Summary

On heavy-state chains (bloatnet), `ChangeSets3` was the dominant
chaindata growth source post-catch-up — file grew unboundedly because
prune couldn't keep up with the per-block changeset write rate.

**Root cause:** the `pruneDiffsLimitOnChainTip = 1000` cap in
`PruneExecutionStage` (active when `initialCycle=false`). On bloatnet:
- per-block changeset entries: ~1000–1500 (each ~5 KB serialized diff
chunks)
- per commit-cycle: ~40 blocks executed → ~40k–60k entries written
- per commit-cycle: ChangeSets3 prune drains at most 1000 (or until 2s
timeout) → drain rate is **roughly 1–2% of write rate**
- net: ChangeSets3 grows ~1–2 GB per minute under heavy load, pushing
chaindata file size up by tens of GB per hour

Observed on a 12-hour bloatnet run: ChangeSets3 stayed at 0 B during
catch-up (`initialCycle=true` overrides the cap to `math.MaxInt`), then
ballooned from 0 → 40 GB in the ~3 hours after the chain caught up. File
size grew 38 GB → 181 GB over the same window, with ~80% of the new
space attributable to ChangeSets3 + write amplification from a too-small
reclaim pool.

## Changes

1. **execution/stagedsync: bump ChangeSets3 chain-tip prune limit 1000 →
200000.**
The 2s timeout still bounds wall time; the cap raise removes the
artificial floor on how many entries one call drains. With 200k cap × 2s
timeout, a single PruneExecutionStage invocation can drain up to ~1 GB
of changesets — well above the per-cycle write rate.

2. **db/rawdb: PruneTable: fold logEvery + ctx + timeout into one
mod-1000 check.**
Per-iteration `select`-on-`logEvery.C` was a syscall on every row. Moved
into the same mod-stride as ctx-done + timeout, and bumped stride 100 →
1000. For 200k-row prunes this shaves the per-iter overhead noticeably
without affecting timeout responsiveness (1000 iters at ~microseconds
each = under 10 ms granularity).

## Notes

- Catch-up path (`initialCycle=true`) is unaffected — the override there
already uses `math.MaxInt` / 1h.
- Mainnet's per-block changeset rate is much lower than bloatnet's, so
the old 1000 cap was rarely binding. The new 200k cap is just as benign
there (the 2s timeout caps actual work).
- The bump pairs with the prune-in-CommitCycle change (#21192) — that
gave us a second prune call per FCU iteration, but both paths shared the
1000 cap. Doubling calls doesn't help if each is throttled.

## Test plan

- [ ] CI on \`performance\`
- [ ] Mainnet sync still healthy (cap raise + stride change are
non-functional w.r.t. correctness; only affect drain throughput)
domiwei pushed a commit to domiwei/erigon that referenced this pull request May 15, 2026
Cherry-pick of erigontech#21192, restricted to the
`execution/execmodule/forkchoice.go` changes (release roTx + prune in
CommitCycle).

The `node/eth/backend.go` bloatnet block-snapshot collation gate from
the original PR is intentionally omitted.

Co-authored-by: moskud <sudeep.kumar@erigon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants