execution/execmodule: prune chaindata in CommitCycle (+ release roTx, bloatnet collation gate)#21192
Merged
Merged
Conversation
Holding the old RO snapshot across the flush prevents MDBX from reclaiming retired pages while the RwTx is writing. Flush/ClearRam are write-only into commitRwTx and do not read through roTx, so closing roTx earlier is safe.
The in-loop PruneExecutionStage runs on the SharedDomains' block overlay (MemoryMutation backed by an RO tx). MemoryMutation.PruneSmallBatches silently returns nil when its backing tx is not a TemporalRwTx, so prune is a no-op for the whole duration of one RunLoop — aggregated steps pile up in chaindata domain tables for hours. After commitRwTx.Commit, invoke runForkchoicePrune on a fresh real RW tx so prune actually drains the backlog.
Aggregator.readyForCollation gates state aggregation on block snapshots catching up. When the chain is paused (CL not sending blocks) the gate blocks state aggregation → blocks PruneSmallBatches → CommitmentVals keeps growing. On bloatnet the recovery guarantee from the gate isn't critical (devnet/debug workload; SetFrozenBlocksProvider docstring notes recovery may need 'erigon seg rm-state --latest'). Mainnet, sepolia, etc keep the gate.
4832202 to
be37298
Compare
JkLondon
approved these changes
May 14, 2026
AskAlexSharov
approved these changes
May 15, 2026
This was referenced May 15, 2026
sudeepdino008
added a commit
that referenced
this pull request
May 15, 2026
## Summary On heavy-state chains (bloatnet), `ChangeSets3` was the dominant chaindata growth source post-catch-up — file grew unboundedly because prune couldn't keep up with the per-block changeset write rate. **Root cause:** the `pruneDiffsLimitOnChainTip = 1000` cap in `PruneExecutionStage` (active when `initialCycle=false`). On bloatnet: - per-block changeset entries: ~1000–1500 (each ~5 KB serialized diff chunks) - per commit-cycle: ~40 blocks executed → ~40k–60k entries written - per commit-cycle: ChangeSets3 prune drains at most 1000 (or until 2s timeout) → drain rate is **roughly 1–2% of write rate** - net: ChangeSets3 grows ~1–2 GB per minute under heavy load, pushing chaindata file size up by tens of GB per hour Observed on a 12-hour bloatnet run: ChangeSets3 stayed at 0 B during catch-up (`initialCycle=true` overrides the cap to `math.MaxInt`), then ballooned from 0 → 40 GB in the ~3 hours after the chain caught up. File size grew 38 GB → 181 GB over the same window, with ~80% of the new space attributable to ChangeSets3 + write amplification from a too-small reclaim pool. ## Changes 1. **execution/stagedsync: bump ChangeSets3 chain-tip prune limit 1000 → 200000.** The 2s timeout still bounds wall time; the cap raise removes the artificial floor on how many entries one call drains. With 200k cap × 2s timeout, a single PruneExecutionStage invocation can drain up to ~1 GB of changesets — well above the per-cycle write rate. 2. **db/rawdb: PruneTable: fold logEvery + ctx + timeout into one mod-1000 check.** Per-iteration `select`-on-`logEvery.C` was a syscall on every row. Moved into the same mod-stride as ctx-done + timeout, and bumped stride 100 → 1000. For 200k-row prunes this shaves the per-iter overhead noticeably without affecting timeout responsiveness (1000 iters at ~microseconds each = under 10 ms granularity). ## Notes - Catch-up path (`initialCycle=true`) is unaffected — the override there already uses `math.MaxInt` / 1h. - Mainnet's per-block changeset rate is much lower than bloatnet's, so the old 1000 cap was rarely binding. The new 200k cap is just as benign there (the 2s timeout caps actual work). - The bump pairs with the prune-in-CommitCycle change (#21192) — that gave us a second prune call per FCU iteration, but both paths shared the 1000 cap. Doubling calls doesn't help if each is throttled. ## Test plan - [ ] CI on \`performance\` - [ ] Mainnet sync still healthy (cap raise + stride change are non-functional w.r.t. correctness; only affect drain throughput)
domiwei
pushed a commit
to domiwei/erigon
that referenced
this pull request
May 15, 2026
Cherry-pick of erigontech#21192, restricted to the `execution/execmodule/forkchoice.go` changes (release roTx + prune in CommitCycle). The `node/eth/backend.go` bloatnet block-snapshot collation gate from the original PR is intentionally omitted. Co-authored-by: moskud <sudeep.kumar@erigon.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three fixes to keep chaindata size bounded during long FCU catch-ups on
bloatnet:MemoryMutation.PruneSmallBatchessilently no-ops when its backing tx isn't aTemporalRwTx, so prune doesn't fire for the entire duration of a hasMore loop — aggregated steps accumulate in chaindata domain tables for hours. After commit, invokerunForkchoicePruneon a fresh real RW tx so prune actually drains.readyForCollationcapTxNum is gated on block snapshots catching up. On a paused chain that gate blocks state aggregation → blocks prune → CommitmentVals grows unboundedly. For mainnet/sepolia/etc the gate stays.Observed on bloatnet
In a 100m batchSize catch-up:
Test plan
performancepassesrunForkchoicePruneused elsewhere)