stagedsync: generate changesets near tip even during initialCycle#20495
Merged
Conversation
Instead of flipping initialCycle mid-batch in each executor, let shouldGenerateChangeSets handle it directly: generate changesets for blocks within MaxReorgDepth of the batch end regardless of initialCycle. This removes the initialCycle parameter from shouldGenerateChangeSets and the mid-batch initialCycle flips from exec3_serial and exec3_parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AskAlexSharov
approved these changes
Apr 11, 2026
AskAlexSharov
pushed a commit
that referenced
this pull request
Apr 11, 2026
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
Jun 7, 2026
…ch as serial (erigontech#21659) Fixes erigontech#21650 ## Problem Parallel exec evaluated `shouldGenerateChangeSets` once per batch, at `startBlockNum` (`exec3_parallel.go`), while serial exec evaluates it per block. The predicate ("is this block within `MaxReorgDepth` of the batch end") therefore degenerated into "is the whole batch shorter than `MaxReorgDepth`": any batch longer than 96 blocks produced **zero** changesets, including for its last 96 blocks. After a large catch-up batch (initial sync, restart recovery, post-downtime catch-up), the node could not unwind even one block: `CanUnwindToBlockNum` found an empty `ChangeSets3` table, fell back to the latest commitment block (= the tip itself), and every FCU requiring a shallow reorg was rejected with `-38006 Too deep reorg`, permanently. Latent since before the exec3 split; exposed by erigontech#20495 (changesets near tip during initialCycle) and first hit on glamsterdam-devnet-5 (4-block reorg after a batch-executed tip, node bricked for 16h). Not devnet-specific: `EXEC3_PARALLEL` defaults to true, so any chain executing a >96-block batch was affected. ## Fix - `changesetWindowStart` (new pure helper, `exec3.go`): first block of `[startBlockNum, maxBlockNum]` for which `shouldGenerateChangeSets` is true; `MaxUint64` when none. Single source of truth for both sides of the pipeline. - Exec loop: `pe.shouldGenerateChangesets bool` → `pe.changesetWindowStart uint64`; `ensureChangesetAccumulator` gates per block, so the existing lazy install sites start capturing exactly at the window. - Commitment calculator: new `perBlockFrom` — blocks `>= perBlockFrom` compute per-block (changesets get correct per-block branch deltas); the last pre-window block triggers `computeTransition`, which folds the accumulated batch prefix under a **nil** changeset accumulator and eagerly flushes the deferred branch update under the same swap. Without that, the no-saved-CS fallbacks (`computeWithBlockAccumulator`, `flushPendingUpdates`) would leak pre-window branch deltas into the first window block's changeset and corrupt the very unwind being enabled. A boundary flush also covers `BATCH_COMMITMENTS=false`, where pre-window blocks compute per-block too. Serial exec is untouched. ## Tests - `TestChangesetWindowStart` — table test for the window helper. - `TestLargeBatchExecGeneratesChangesetsForReorgWindow` — e2e: a 110-block single-batch FCU must leave `ReadLowestUnwindableBlock == tip−96` (was `MaxUint64`). - `TestUpdateForkChoiceShallowReorgAfterLargeBatchExec` — e2e incident replay: 110-block batch, then FCU onto a fork branching 4 blocks below tip must unwind + re-execute (was `ReorgTooDeep`). Also covers the calculator transition: leaked branch deltas would wrong-trie-root the fork re-exec. Both e2e tests were written first and failed with the exact production error codes. `execmoduletester` no longer hardcodes `AlwaysGenerateChangesets=true` (which had masked this bug from the entire suite) — it now inherits production defaults. Tests that intentionally reorg deeper than `MaxReorgDepth` (`TestLowDiffLongChain`, `TestLargeReorgTrieGC`) opt in via the new `WithAlwaysGenerateChangesets(true)`, mirroring `--experimental.always-generate-changesets`; the new regression tests pin `false`. ## Validation - `make lint` clean; full `execution/...` + rpc/db/cl/polygon tester-consumer suites green; race detector on the new concurrency path. - Live on glamsterdam-devnet-5 (erigon this branch + Prysm `glamsterdam-devnet-5` image): synced 0→tip through 5,000-block FCU batches; after every completed batch `reorgSafeBlock = batchEnd−96`; `mdbx_dump` of `ChangeSets3` shows exactly `[head−96, head]`; graceful restart + resync to tip with zero unwind-related errors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
shouldGenerateChangeSetsno longer short-circuits oninitialCycle— changesets are generated for blocks withinMaxReorgDepthof the batch end regardless, so the node can always handle reorgs at the tipinitialCycleparameter fromshouldGenerateChangeSetsCherry-picked from #20445.