Skip to content

execution/execmodule: apply runloop refactor with furious-prune fix#21270

Merged
AskAlexSharov merged 3 commits into
performancefrom
rev_up_check2
May 20, 2026
Merged

execution/execmodule: apply runloop refactor with furious-prune fix#21270
AskAlexSharov merged 3 commits into
performancefrom
rev_up_check2

Conversation

@sudeepdino008

@sudeepdino008 sudeepdino008 commented May 19, 2026

Copy link
Copy Markdown
Member

Manual reapply of #21245 (runloop refactor: split prune from commit-cycle) onto the performance branch, plus follow-up fixes for the post-frozen-blocks prune-budget regression observed on bloatnet.

Changes

  • Split CommitCycleFn from PruneFn in RunLoopConfig; drop BeforeIteration/PruneTimeout/FirstCycle.
  • ProcessFrozenBlocks: kicks agg.BuildFilesInBackground from the new CommitCycle callback so snapshot files advance during PFB (previously they only progressed after normal sync resumed). On the last iter (!hasMore) returns (nil, nil) to skip a wasted BeginTemporalRw. Outer defer tx.Rollback() is closure-form so it follows the closure's tx reassignments across iterations.
  • updateForkChoice: PruneFn is a no-op at tip (post-RunLoop path handles flush+commit+prune); in catchup it drains pipeline prune via runForkchoicePrune(true)initialCycle=true so PruneExecutionStage gets the catchup budget instead of the 2 s slot budget that can't keep up with 40-block bursts.
  • initialCycle predicate — block-count delta against finishProgressBefore:
    const smallBlockJumpThreshold = 16
    headNum := fcuHeader.Number.Uint64()
    initialCycle := headNum > finishProgressBefore && headNum-finishProgressBefore > smallBlockJumpThreshold
    The headNum > guard avoids uint64 underflow when an FCU goes back (e.g. ePBS on Glamsterdam). Threshold 16 engages catchup mode for bloatnet's ~40-block bursts and stays out of the way for steady-state 1-block tip FCUs and test fixtures.
  • runForkchoicePrune: short-chain skip-gate removed (maxTxNum < (stepSize*5)/4). The gate left ChangeSets3 un-pruned on disk for short chains, which broke MaxReorgDepth enforcement in tests (TestFcuReturnsReorgTooDeepCode38006 on the upstream main branch). Skip removal is also consistent with the exec3/storage-component direction (c380b438e7).
  • FCU CommitCycle safety: defer commitRwTx.Rollback() immediately after BeginTemporalRw so the Commit-error path doesn't leak the RW tx. Rollback after a successful Commit is idempotent (per Copilot review on execution/execmodule: apply runloop refactor with furious-prune fix #21245).

Why the predicate change matters — bloatnet

bloatnet stabilises at ~38 GB on tip only when initialCycle=false is held until execution actually reaches tip:

  • A 40-block batch (what --batchSize=100mb settles on) writes ~1.4 GB into MDBX (lots of commitment trie changes).
  • At-tip prune budget (SecondsPerSlot/3 ≈ 2 s) removes less than one such batch.
  • Flipping too early — while bursts are still arriving — lets writes outrun prune and MDBX runs away.

The earlier !isSynced and wall-clock head-age variants either flipped too eagerly (Caplin keeps headers/finish aligned during bursts) or never (test fixtures use ancient timestamps, regressing several rpc/jsonrpc + engineapi tests). Block-count delta with threshold 16 is a clean proxy and works for tests, mainnet tip, and bloatnet bursts.

Cycle Agg Prune initialCycle db_size after
1 2m12s 39.3s true 36.29 GB
2 2m23s 34.6s true 36.57 GB
3 2m24s 38.6s true 36.76 GB

Plateau holds across cycles vs. the prior failure mode where the file extended +6 GB/cycle once the predicate flipped early.

Remaining differences vs #21245 (intentional)

Test plan

  • CI green on rerun (race-tests / tests-mac-linux all OSes; sonar).
  • Bloatnet → chain-tip; db_size stays bounded.
  • Once at tip, confirm block-count delta predicate flips initialCycle=false for steady-state 1-block FCUs.
  • Sanity-check mainnet behaviour (FCU bursts ≤16 blocks at tip).

Applies the intent of PR #21245 onto rev_up_check2 (off performance
branch, so a direct cherry-pick wasn't clean).

RunLoop refactor:
- New PruneFn callback alongside CommitCycleFn
- CommitCycleFn now takes hasMore so impl can skip BeginTemporalRw on
  the final iter (PFB no longer needs the post-loop flush+commit)
- Dropped BeforeIteration, PruneTimeout, FirstCycle from RunLoopConfig
- RunLoop always invokes CommitCycle; caller returns (nil,nil) to skip

ProcessFrozenBlocks: PruneFn wraps pe.sync.RunPrune; CommitCycle kicks
agg.BuildFilesInBackground after each commit so seg-build progresses
alongside PFB.

updateForkChoice:
- initialCycle = !isSynced (was limitedBigJump) — prune budget tracks
  actual sync state, not just LoopBlockLimit chunking
- Tip case (!initialCycle): PruneFn and CommitCycle both no-op; the
  post-RunLoop runForkchoiceFlushCommit + runForkchoicePrune handle
  the single block
- Catchup case: PruneFn forces initialCycle=false to runForkchoicePrune
  so it always uses furious budget regardless of in-loop initialCycle

Skipped from upstream PR: aggregator.go's MaxCollationTxNum getter +
its optional cap in BuildFilesInBackground — the underlying
SetMaxCollationTxNum / maxCollationTxNum atomic field aren't on
performance, so BuildFilesInBackground is called uncapped. Functionally
neutral; just slightly less collation throttle control.
@sudeepdino008 sudeepdino008 marked this pull request as ready for review May 20, 2026 05:21
@AskAlexSharov AskAlexSharov enabled auto-merge (squash) May 20, 2026 05:24
@AskAlexSharov AskAlexSharov merged commit b1528ef into performance May 20, 2026
64 checks passed
@AskAlexSharov AskAlexSharov deleted the rev_up_check2 branch May 20, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants