Skip to content

stagedsync: generate changesets near tip even during initialCycle#20495

Merged
yperbasis merged 1 commit into
mainfrom
yperbasis/initialCycle
Apr 11, 2026
Merged

stagedsync: generate changesets near tip even during initialCycle#20495
yperbasis merged 1 commit into
mainfrom
yperbasis/initialCycle

Conversation

@yperbasis

@yperbasis yperbasis commented Apr 11, 2026

Copy link
Copy Markdown
Member

Summary

  • shouldGenerateChangeSets no longer short-circuits on initialCycle — changesets are generated for blocks within MaxReorgDepth of the batch end regardless, so the node can always handle reorgs at the tip
  • Removes the initialCycle parameter from shouldGenerateChangeSets

Cherry-picked from #20445.

Instead of flipping initialCycle mid-batch in each executor, let
shouldGenerateChangeSets handle it directly: generate changesets for
blocks within MaxReorgDepth of the batch end regardless of initialCycle.

This removes the initialCycle parameter from shouldGenerateChangeSets
and the mid-batch initialCycle flips from exec3_serial and
exec3_parallel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis added this pull request to the merge queue Apr 11, 2026
Merged via the queue into main with commit d283d30 Apr 11, 2026
35 checks passed
@yperbasis yperbasis deleted the yperbasis/initialCycle branch April 11, 2026 10:28
AskAlexSharov added a commit that referenced this pull request Apr 11, 2026
…cle (#20496)

Cherry-pick of #20495

Co-authored-by: yperbasis <andrey.ashikhmin@gmail.com>
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 7, 2026
…ch as serial (erigontech#21659)

Fixes erigontech#21650

## Problem

Parallel exec evaluated `shouldGenerateChangeSets` once per batch, at
`startBlockNum` (`exec3_parallel.go`), while serial exec evaluates it
per block. The predicate ("is this block within `MaxReorgDepth` of the
batch end") therefore degenerated into "is the whole batch shorter than
`MaxReorgDepth`": any batch longer than 96 blocks produced **zero**
changesets, including for its last 96 blocks.

After a large catch-up batch (initial sync, restart recovery,
post-downtime catch-up), the node could not unwind even one block:
`CanUnwindToBlockNum` found an empty `ChangeSets3` table, fell back to
the latest commitment block (= the tip itself), and every FCU requiring
a shallow reorg was rejected with `-38006 Too deep reorg`, permanently.
Latent since before the exec3 split; exposed by erigontech#20495 (changesets near
tip during initialCycle) and first hit on glamsterdam-devnet-5 (4-block
reorg after a batch-executed tip, node bricked for 16h). Not
devnet-specific: `EXEC3_PARALLEL` defaults to true, so any chain
executing a >96-block batch was affected.

## Fix

- `changesetWindowStart` (new pure helper, `exec3.go`): first block of
`[startBlockNum, maxBlockNum]` for which `shouldGenerateChangeSets` is
true; `MaxUint64` when none. Single source of truth for both sides of
the pipeline.
- Exec loop: `pe.shouldGenerateChangesets bool` →
`pe.changesetWindowStart uint64`; `ensureChangesetAccumulator` gates per
block, so the existing lazy install sites start capturing exactly at the
window.
- Commitment calculator: new `perBlockFrom` — blocks `>= perBlockFrom`
compute per-block (changesets get correct per-block branch deltas); the
last pre-window block triggers `computeTransition`, which folds the
accumulated batch prefix under a **nil** changeset accumulator and
eagerly flushes the deferred branch update under the same swap. Without
that, the no-saved-CS fallbacks (`computeWithBlockAccumulator`,
`flushPendingUpdates`) would leak pre-window branch deltas into the
first window block's changeset and corrupt the very unwind being
enabled. A boundary flush also covers `BATCH_COMMITMENTS=false`, where
pre-window blocks compute per-block too.

Serial exec is untouched.

## Tests

- `TestChangesetWindowStart` — table test for the window helper.
- `TestLargeBatchExecGeneratesChangesetsForReorgWindow` — e2e: a
110-block single-batch FCU must leave `ReadLowestUnwindableBlock ==
tip−96` (was `MaxUint64`).
- `TestUpdateForkChoiceShallowReorgAfterLargeBatchExec` — e2e incident
replay: 110-block batch, then FCU onto a fork branching 4 blocks below
tip must unwind + re-execute (was `ReorgTooDeep`). Also covers the
calculator transition: leaked branch deltas would wrong-trie-root the
fork re-exec.

Both e2e tests were written first and failed with the exact production
error codes.

`execmoduletester` no longer hardcodes `AlwaysGenerateChangesets=true`
(which had masked this bug from the entire suite) — it now inherits
production defaults. Tests that intentionally reorg deeper than
`MaxReorgDepth` (`TestLowDiffLongChain`, `TestLargeReorgTrieGC`) opt in
via the new `WithAlwaysGenerateChangesets(true)`, mirroring
`--experimental.always-generate-changesets`; the new regression tests
pin `false`.

## Validation

- `make lint` clean; full `execution/...` + rpc/db/cl/polygon
tester-consumer suites green; race detector on the new concurrency path.
- Live on glamsterdam-devnet-5 (erigon this branch + Prysm
`glamsterdam-devnet-5` image): synced 0→tip through 5,000-block FCU
batches; after every completed batch `reorgSafeBlock = batchEnd−96`;
`mdbx_dump` of `ChangeSets3` shows exactly `[head−96, head]`; graceful
restart + resync to tip with zero unwind-related errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants