execution/execmodule: apply runloop refactor with furious-prune fix#21245
Merged
Conversation
e89da5a to
9218b51
Compare
sudeepdino008
added a commit
that referenced
this pull request
May 18, 2026
Applies the intent of PR #21245 onto rev_up_check2 (off performance branch, so a direct cherry-pick wasn't clean). RunLoop refactor: - New PruneFn callback alongside CommitCycleFn - CommitCycleFn now takes hasMore so impl can skip BeginTemporalRw on the final iter (PFB no longer needs the post-loop flush+commit) - Dropped BeforeIteration, PruneTimeout, FirstCycle from RunLoopConfig - RunLoop always invokes CommitCycle; caller returns (nil,nil) to skip ProcessFrozenBlocks: PruneFn wraps pe.sync.RunPrune; CommitCycle kicks agg.BuildFilesInBackground after each commit so seg-build progresses alongside PFB. updateForkChoice: - initialCycle = !isSynced (was limitedBigJump) — prune budget tracks actual sync state, not just LoopBlockLimit chunking - Tip case (!initialCycle): PruneFn and CommitCycle both no-op; the post-RunLoop runForkchoiceFlushCommit + runForkchoicePrune handle the single block - Catchup case: PruneFn forces initialCycle=false to runForkchoicePrune so it always uses furious budget regardless of in-loop initialCycle Skipped from upstream PR: aggregator.go's MaxCollationTxNum getter + its optional cap in BuildFilesInBackground — the underlying SetMaxCollationTxNum / maxCollationTxNum atomic field aren't on performance, so BuildFilesInBackground is called uncapped. Functionally neutral; just slightly less collation throttle control.
3b4c52e to
4d9b7e7
Compare
4d9b7e7 to
8f94c9f
Compare
sudeepdino008
added a commit
that referenced
this pull request
May 19, 2026
Applies the intent of PR #21245 onto rev_up_check2 (off performance branch, so a direct cherry-pick wasn't clean). RunLoop refactor: - New PruneFn callback alongside CommitCycleFn - CommitCycleFn now takes hasMore so impl can skip BeginTemporalRw on the final iter (PFB no longer needs the post-loop flush+commit) - Dropped BeforeIteration, PruneTimeout, FirstCycle from RunLoopConfig - RunLoop always invokes CommitCycle; caller returns (nil,nil) to skip ProcessFrozenBlocks: PruneFn wraps pe.sync.RunPrune; CommitCycle kicks agg.BuildFilesInBackground after each commit so seg-build progresses alongside PFB. updateForkChoice: - initialCycle = !isSynced (was limitedBigJump) — prune budget tracks actual sync state, not just LoopBlockLimit chunking - Tip case (!initialCycle): PruneFn and CommitCycle both no-op; the post-RunLoop runForkchoiceFlushCommit + runForkchoicePrune handle the single block - Catchup case: PruneFn forces initialCycle=false to runForkchoicePrune so it always uses furious budget regardless of in-loop initialCycle Skipped from upstream PR: aggregator.go's MaxCollationTxNum getter + its optional cap in BuildFilesInBackground — the underlying SetMaxCollationTxNum / maxCollationTxNum atomic field aren't on performance, so BuildFilesInBackground is called uncapped. Functionally neutral; just slightly less collation throttle control.
4 tasks
taratorio
approved these changes
May 19, 2026
taratorio
reviewed
May 19, 2026
…p threshold to 16
…ts3 unwind window
taratorio
reviewed
May 19, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors PipelineExecutor.RunLoop to separate pruning from the commit/refresh cycle, and adjusts forkchoice execution to better distinguish “at tip” vs “catchup burst” behavior (to avoid prune-budget bloat during short bursts). It also updates ProcessFrozenBlocks to use the new callback-based RunLoop and to trigger state-file building inline, plus adds an Aggregator getter to support consistent collation caps.
Changes:
- Split RunLoop’s “prune + commit” logic into
PruneFnandCommitCycle, and updateProcessFrozenBlocks/forkchoice to use the new structure. - Change forkchoice’s
initialCyclecomputation to trigger catchup behavior for moderate bursts (threshold = 16 blocks) instead of relying on large loop limits. - Add
Aggregator.MaxCollationTxNum()so callers can cap background file building consistently with collation/prune logic.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| execution/execmodule/forkchoice.go | Updates forkchoice RunLoop to use new PruneFn + CommitCycle split and changes initialCycle detection logic. |
| execution/execmodule/executor.go | Refactors RunLoop API to callback-based prune/commit, updates ProcessFrozenBlocks accordingly, and adjusts tx rollback handling. |
| db/state/aggregator.go | Adds MaxCollationTxNum() getter for collation cap visibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…dle short chains too
taratorio
approved these changes
May 19, 2026
AskAlexSharov
pushed a commit
that referenced
this pull request
May 20, 2026
…21270) Manual reapply of #21245 (runloop refactor: split prune from commit-cycle) onto the `performance` branch, plus follow-up fixes for the post-frozen-blocks prune-budget regression observed on bloatnet. ## Changes - **Split `CommitCycleFn` from `PruneFn`** in `RunLoopConfig`; drop `BeforeIteration`/`PruneTimeout`/`FirstCycle`. - **`ProcessFrozenBlocks`**: kicks `agg.BuildFilesInBackground` from the new `CommitCycle` callback so snapshot files advance during PFB (previously they only progressed after normal sync resumed). On the last iter (`!hasMore`) returns `(nil, nil)` to skip a wasted `BeginTemporalRw`. Outer `defer tx.Rollback()` is closure-form so it follows the closure's `tx` reassignments across iterations. - **`updateForkChoice`**: `PruneFn` is a no-op at tip (post-RunLoop path handles flush+commit+prune); in catchup it drains pipeline prune via `runForkchoicePrune(true)` — `initialCycle=true` so `PruneExecutionStage` gets the catchup budget instead of the 2 s slot budget that can't keep up with 40-block bursts. - **`initialCycle` predicate** — block-count delta against `finishProgressBefore`: ```go const smallBlockJumpThreshold = 16 headNum := fcuHeader.Number.Uint64() initialCycle := headNum > finishProgressBefore && headNum-finishProgressBefore > smallBlockJumpThreshold ``` The `headNum >` guard avoids `uint64` underflow when an FCU goes back (e.g. ePBS on Glamsterdam). Threshold 16 engages catchup mode for bloatnet's ~40-block bursts and stays out of the way for steady-state 1-block tip FCUs and test fixtures. - **`runForkchoicePrune`**: short-chain skip-gate removed (`maxTxNum < (stepSize*5)/4`). The gate left `ChangeSets3` un-pruned on disk for short chains, which broke `MaxReorgDepth` enforcement in tests (`TestFcuReturnsReorgTooDeepCode38006` on the upstream `main` branch). Skip removal is also consistent with the `exec3/storage-component` direction (`c380b438e7`). - **FCU `CommitCycle` safety**: `defer commitRwTx.Rollback()` immediately after `BeginTemporalRw` so the Commit-error path doesn't leak the RW tx. Rollback after a successful Commit is idempotent (per Copilot review on #21245). ## Why the predicate change matters — bloatnet bloatnet stabilises at ~38 GB on tip only when `initialCycle=false` is held until execution actually reaches tip: - A 40-block batch (what `--batchSize=100mb` settles on) writes ~1.4 GB into MDBX. - At-tip prune budget (`SecondsPerSlot/3` ≈ 2 s) removes less than one such batch. - Flipping too early — while bursts are still arriving — lets writes outrun prune and MDBX runs away. The earlier `!isSynced` and `wall-clock head-age` variants either flipped too eagerly (Caplin keeps headers/finish aligned during bursts) or never (test fixtures use ancient timestamps, regressing several rpc/jsonrpc + engineapi tests). Block-count delta with threshold 16 is a clean proxy and works for tests, mainnet tip, and bloatnet bursts. | Cycle | Agg | Prune | initialCycle | db_size after | |---|---|---|---|---| | 1 | 2m12s | 39.3s | true | 36.29 GB | | 2 | 2m23s | 34.6s | true | 36.57 GB | | 3 | 2m24s | 38.6s | true | 36.76 GB | Plateau holds across cycles vs. the prior failure mode where the file extended +6 GB/cycle once the predicate flipped early. ## Remaining differences vs #21245 (intentional) - `Aggregator.MaxCollationTxNum()` getter and the `BuildFilesInBackground` cap pattern from #21245 are NOT ported here — the underlying `maxCollationTxNum` field doesn't exist on the `performance` branch lineage. Followup, requires also adding the field on this branch. - `runForkchoicePrune` body uses `e.db.UpdateTemporal(...)` directly (matches this branch's storage design — `agg.CollateAndPruneIfNeeded` ownership is being moved out of the FCU path on the `exec3/storage-component` track), whereas #21245 still calls `CollateAndPruneIfNeeded` via this function. ## Test plan - [ ] CI green on rerun (race-tests / tests-mac-linux all OSes; sonar). - [x] Bloatnet → chain-tip; db_size stays bounded. - [ ] Once at tip, confirm block-count delta predicate flips `initialCycle=false` for steady-state 1-block FCUs. - [ ] Sanity-check mainnet behaviour (FCU bursts ≤16 blocks at tip).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Delayed flip of
initialCycleBefore:
initialCycle := limitedBigJump, gated onLoopBlockLimit(default 5000) — small catchups and steady-state at-tip both producedinitialCycle=false, so the system couldn't distinguish them. That's the bloat trigger on chains where block batch exec can still write more than a 2s-prune-per-block can clear.Now:
initialCycle := headNum > finishProgressBefore && headNum - finishProgressBefore > 16(theheadNum >guard avoidsuint64underflow when FCU goes back, e.g. ePBS on Glamsterdam). Block-count threshold of 16 trips on bloatnet's ~40-block bursts (with margin for heavier blocks) and stays out of the way for steady-state 1-block tip FCUs, small organic catchups, and 1-block backwards FCUs.A
TODOin the code marks the eventual renameinitialCycle→atTip(inverted polarity) across stage/prune APIs.Bloatnet: why this matters
bloatnet stabilises at ~38 GB on tip only when
initialCycle=falseis held until execution actually reaches tip:--batchSize=100mbsettles on) writes ~1.4 GB into MDBX.SecondsPerSlot/3≈ 2 s) removes less than one such batch.The previous threshold (
LoopBlockLimit=5000) never tripped on bloatnet's 40-block bursts. The new threshold of 16 engages the catchup prune budget for any burst the prune budget can't clear in one slot.ProcessFrozenBlocks: callback-driven, with inline file buildingRunLoopConfig(PruneFn+CommitCycle).PruneFn: runspe.sync.RunPruneon the loop tx.CommitCycle: flush +ClearRam+ commit, then kicksagg.BuildFilesInBackground— behaviour change: PFB previously didn't trigger file building inline, so files only advanced after normal sync resumed. Cap followsCollateAndPrune(via newAggregator.MaxCollationTxNum()getter). On the last iter (!hasMore) returns(nil, nil)to skip a wastedBeginTemporalRw.defer tx.Rollback()converted to closure form to tracktxreassignments across iterations (fixes a latent leak on ShouldBreak / mid-loop-error paths).updateForkChoiceRunLoop: prune ↔ commit splitCommitCycleintoPruneFn+CommitCycle. The split makes the prune tx and the flush tx independently owned, opening the door for a future "same RwTx for prune + flush" optimisation (one commit per cycle instead of two). Not taken in this PR — we keepCollateAndPruneIfNeeded(which owns its own RW tx); the structure is just ready for it.PruneFn: closesroTx, then runsrunForkchoicePrunewithinitialCycle=truehardcoded so the catchup prune budget runs against the in-flight bursts. Post-RunLoop prune still uses the realinitialCycle.CommitCycle: openscommitRwTx, flushes +ClearRam, commits, re-opensroTx+ overlay.!initialCycle(at tip → post-RunLoop path handles flush+commit+prune). In catchup, every iter — last one included — runs through both callbacks.Plumbing
RunLoopConfig.PruneFnreplaces the in-looppe.sync.RunPrunecall.PruneTimeoutandBeforeIteration(both unreferenced after the split).db/state: addsAggregator.MaxCollationTxNum()getter so callers can apply the same cap pattern asCollateAndPrune.