Skip to content

step accumulation during snapshot-to-chaintip catchup #20911

@sudeepdino008

Description

@sudeepdino008

Problem

During snapshot-to-chaintip catchup on mainnet, executed state accumulates in DB faster than it can collate to state files.

  1. ethereum has ~400 txs now; for 5000 blocks, that's 2M txs. We have stepsize of 390625 now.
  2. With the block-snapshots cap on state collation (enforce block-snapshots cap inside aggregator collation #20852 / enforce block-snapshots cap inside aggregator collation #20900) further caps state collation

result is at times ~5-7 steps is held in db. Which defeats the point of step size reduction.

Symptoms

Mainnet archive node, ~block 24.98M, step_size=390625, steps_in_frozen_file=256. Two consecutive cycles:

Cycle ending at 24,978,999 → next cycle 24,979,000 → 24,983,999 (5000 blocks, block-limit hit):

INFO [4/6 Execution] DONE                            in=1m40.618s block=24976697
...
INFO [BlockCollector] Inserting blocks               from=24984000 to=24984999
INFO [sync] limited big jump                         from=24978999 to=24983999 amount=5000 padding=2
INFO [3/6 Senders] Started                           from=24978999 to=24983999
INFO [3/6 Senders] prune done                        in=33.703s
INFO Timings: Background Prune                       prune=34s initialCycle=true alloc=27.5GB sys=29.2GB
INFO BuildFilesInBackground                          step=8903 lastInDB=8905
INFO [snapshots] holding state collation at block snapshot boundary
                                                     step=8903
                                                     stepEndTxNum="3478125000 (step 8904)"
                                                     blockSnapshotsTxNum="3477899736 (step 8903)"
INFO [4/6 Execution] serial starting                 from=24979000 to=24983999
                                                     initialTxNum=3478848825
                                                     lastFrozenStep=8902
                                                     initialCycle=true

In-cycle progress shows the in-memory buffer climbing monotonically toward the 512MB cap — no mid-cycle drain to DB:

serial executed blk=24976840 blks=143 ... buf=14.7MB/512.0MB
serial executed blk=24977004 blks=164 ... buf=30.1MB/512.0MB
serial executed blk=24977171 blks=167 ... buf=45.2MB/512.0MB
serial executed blk=24977346 blks=175 ... buf=61.4MB/512.0MB

How wide is one cycle, in steps?

erigon seg txnum for the cycle's boundary blocks:

block min_txnum step
24,979,000 (cycle start) 3,478,848,826 8905
24,982,975 3,480,974,294 8911
24,983,999 (cycle end) 3,481,512,634 8912

The cycle spans steps 8905 → 8912 ≈ 7 steps ((3,481,512,634 − 3,478,848,826) / 390,625 ≈ 6.82).

State collation is parked at step 8903. Block snapshots cover up to txnum 3,477,899,736 (~225k txns short of the step-8904 boundary at 3,478,125,000), so even step 8904 can't be drained yet.

Why the cap made this worse

Before the cap, state files could outrun block files — unsafe (recovery required erigon seg rm-state --latest) but it kept the DB drained. The cap fixed the safety issue and exposed a cadence problem that was always there:

  1. Block retirement runs only in the Prune phaseRetireBlocksInBackground is called from SnapshotsPrune (execution/stagedsync/stage_snapshots.go:442), after Execution finishes.
  2. The cap stays frozen during ExecV3aggregator.readyForCollation re-reads FrozenBlocks() per step, but block snapshots don't grow during ExecV3.
  3. Serial ExecV3 doesn't commit to MDBX mid-cycle — state accumulates in SharedDomains (buf=X/512MB log above); rwTx commits only when isBatchFull (exec3_serial.go:227) or blockLimit (exec3.go:688) trips. Steps committed in cycle N only become visible to the aggregator goroutine in cycle N+1's BuildFilesInBackground.

Net effect:

Cycle N: 5000 blocks / ~5–7 steps committed at cycle end
Prune:   RetireBlocksInBackground → ~1 chunk → cap advances ~1 step
→ +5–7 steps committed, ≤1 step drained per cycle. Backlog grows.

Possible directions (suggestions, not a fix-list)

  • Eager block retirement driven off block progress rather than stage cadence, so the cap advances promptly as new blocks land instead of once-per-prune.
  • Revisit --sync.loop.block.limit (5000) and --batchSize (512M) defaults. With current mainnet density (~2M txs / 5000 blocks ≈ 5+ steps), one cycle straddles many step boundaries and the cap-induced backlog scales with cycle width. A batch sized to roughly one step might naturally keep the drain in pace.
  • Yield-on-collation in ExecV3. Mid-cycle BuildFiles is a non-starter because the rwTx isn't committed. Instead, ExecV3 could end the cycle early when it detects that the next step is collateable but blocked by uncommitted in-flight data — the outer loop commits, the next cycle's BuildFilesInBackground picks up the freshly-visible step. Trades cycle-start overhead for tighter drain cadence.

Refs

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions