Skip to content

cp: cap state collation at block snapshots progress#20701

Merged
AskAlexSharov merged 2 commits into
release/3.4from
cp/20680-to-3.4
Apr 21, 2026
Merged

cp: cap state collation at block snapshots progress#20701
AskAlexSharov merged 2 commits into
release/3.4from
cp/20680-to-3.4

Conversation

@sudeepdino008

Copy link
Copy Markdown
Member

No description provided.

@AskAlexSharov AskAlexSharov merged commit ce3b92d into release/3.4 Apr 21, 2026
21 checks passed
@AskAlexSharov AskAlexSharov deleted the cp/20680-to-3.4 branch April 21, 2026 04:45
Sahil-4555 pushed a commit to Sahil-4555/erigon that referenced this pull request May 29, 2026
**Issue:** erigontech#20701

- When state's `commitBlock` is ahead of chaindata canonical
(`TxNums.Last`) — e.g. preverified snapshots ship state ahead of blocks
— an FCU's `unwindTarget` (parent of head) lands at-or-above canonical,
so there's nothing above to roll back. The old code still entered the
unwind branch and was rejected as `ReorgTooDeep` (because
`minUnwindable` is bounded by state files past canonical). Forkchoice
now skips the unwind path in this case and just canonicalizes the new
blocks (`WriteCanonicalHash` + `AppendCanonicalTxNums`), so `TxNums`
catches up to state instead of state being unwound.

Remove the two other solutions in code for "state ahead of blocks":

- `FrozenBlocksProvider` cap + `SetFrozenBlocksProvider` +
`MaxCollatableTxNum` helper removed. `readyForCollation` now uses only
the `reorgBlockDepth` gate.

- `alignStateToBlockSnapshots` recovery (state-files-deletion when
`commitBlock > frozenBlocks`) removed from snapshot stage — block
catchup via forkchoice replaces it.


  ## Tests

  ### Unit (`execution/execmodule/exec_module_test.go`)

Both reproduce `commitBlock > TxNums.Last` by truncating canonical +
`TxNums` and clearing `ChangeSets3` (so `CanUnwindToBlockNum` hits
the commitment-block fallback), while leaving the committed domain state
in place — the same shape as a chaindata wipe with state
still in snapshot files. Both are confirmed red→green (fail as
`ReorgTooDeep` with the no-unwind branch reverted).

- **`TestUpdateForkChoiceRecoversWhenStateAheadOfTxNums`** — index
repair: state already at the head. Asserts the FCU returns
`ExecutionStatusTooFarAway` (not `ReorgTooDeep`) and `TxNums` is
re-extended to `commitBlock`.
- **`TestUpdateForkChoiceForwardExecutesAfterStateAheadRecovery`** —
forward drive: state executed to block 10, chain extends to 15.
First FCU repairs the index and returns `TooFarAway`; second FCU drives
execution forward (`execProgress` 10 → 15) and returns
  `Success`.

  ### Manual (mainnet-minimal)

`state ahead of blocks` reproduced by deleting block-snapshot files and
wiping chaindata. Three scenarios:

1. **Small gap, Caplin.** State 570 blocks ahead. Caplin filled the gap
via `InsertBlocks`, no-unwind path activated,
`AppendCanonicalTxNums` extended `TxNums` past `commitBlock`, execution
resumed at `commitBlock+1`, block retire rebuilt the deleted
  snapshots. 0 `ReorgTooDeep`.

2. **Deep gap (~16k blocks), Caplin.** Without the no-unwind fix the
system deadlocks (~90+ FCUs rejected `ReorgTooDeep`). With it: 0
`ReorgTooDeep`, execution started ~5 min after restart, full saw-tooth
recovery.

3. **Deep gap (~17k blocks), Lighthouse (`--externalcl`).** Recovery
path goes through `EngineBlockDownloader` instead of
`BackwardBeaconDownloader` but lands in the same forkchoice. 0
`ReorgTooDeep`, execution started ~8 min after restart.

`behind commitment` fires 1–3 times during the catch-up window in each
case, then stops once `TxNums` overtakes `commitBlock`.

---------

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants