Skip to content

execution: fix parallel exec to produce changesets at end of each batch as serial#21659

Merged
AskAlexSharov merged 1 commit into
mainfrom
worktree-parallel-exec-gen-changesets-fix
Jun 7, 2026
Merged

execution: fix parallel exec to produce changesets at end of each batch as serial#21659
AskAlexSharov merged 1 commit into
mainfrom
worktree-parallel-exec-gen-changesets-fix

Conversation

@taratorio

Copy link
Copy Markdown
Member

Fixes #21650

Problem

Parallel exec evaluated shouldGenerateChangeSets once per batch, at startBlockNum (exec3_parallel.go), while serial exec evaluates it per block. The predicate ("is this block within MaxReorgDepth of the batch end") therefore degenerated into "is the whole batch shorter than MaxReorgDepth": any batch longer than 96 blocks produced zero changesets, including for its last 96 blocks.

After a large catch-up batch (initial sync, restart recovery, post-downtime catch-up), the node could not unwind even one block: CanUnwindToBlockNum found an empty ChangeSets3 table, fell back to the latest commitment block (= the tip itself), and every FCU requiring a shallow reorg was rejected with -38006 Too deep reorg, permanently. Latent since before the exec3 split; exposed by #20495 (changesets near tip during initialCycle) and first hit on glamsterdam-devnet-5 (4-block reorg after a batch-executed tip, node bricked for 16h). Not devnet-specific: EXEC3_PARALLEL defaults to true, so any chain executing a >96-block batch was affected.

Fix

  • changesetWindowStart (new pure helper, exec3.go): first block of [startBlockNum, maxBlockNum] for which shouldGenerateChangeSets is true; MaxUint64 when none. Single source of truth for both sides of the pipeline.
  • Exec loop: pe.shouldGenerateChangesets boolpe.changesetWindowStart uint64; ensureChangesetAccumulator gates per block, so the existing lazy install sites start capturing exactly at the window.
  • Commitment calculator: new perBlockFrom — blocks >= perBlockFrom compute per-block (changesets get correct per-block branch deltas); the last pre-window block triggers computeTransition, which folds the accumulated batch prefix under a nil changeset accumulator and eagerly flushes the deferred branch update under the same swap. Without that, the no-saved-CS fallbacks (computeWithBlockAccumulator, flushPendingUpdates) would leak pre-window branch deltas into the first window block's changeset and corrupt the very unwind being enabled. A boundary flush also covers BATCH_COMMITMENTS=false, where pre-window blocks compute per-block too.

Serial exec is untouched.

Tests

  • TestChangesetWindowStart — table test for the window helper.
  • TestLargeBatchExecGeneratesChangesetsForReorgWindow — e2e: a 110-block single-batch FCU must leave ReadLowestUnwindableBlock == tip−96 (was MaxUint64).
  • TestUpdateForkChoiceShallowReorgAfterLargeBatchExec — e2e incident replay: 110-block batch, then FCU onto a fork branching 4 blocks below tip must unwind + re-execute (was ReorgTooDeep). Also covers the calculator transition: leaked branch deltas would wrong-trie-root the fork re-exec.

Both e2e tests were written first and failed with the exact production error codes.

execmoduletester no longer hardcodes AlwaysGenerateChangesets=true (which had masked this bug from the entire suite) — it now inherits production defaults. Tests that intentionally reorg deeper than MaxReorgDepth (TestLowDiffLongChain, TestLargeReorgTrieGC) opt in via the new WithAlwaysGenerateChangesets(true), mirroring --experimental.always-generate-changesets; the new regression tests pin false.

Validation

  • make lint clean; full execution/... + rpc/db/cl/polygon tester-consumer suites green; race detector on the new concurrency path.
  • Live on glamsterdam-devnet-5 (erigon this branch + Prysm glamsterdam-devnet-5 image): synced 0→tip through 5,000-block FCU batches; after every completed batch reorgSafeBlock = batchEnd−96; mdbx_dump of ChangeSets3 shows exactly [head−96, head]; graceful restart + resync to tip with zero unwind-related errors.

@taratorio taratorio requested review from mh0lt and yperbasis as code owners June 6, 2026 12:52
@taratorio taratorio requested a review from AskAlexSharov June 6, 2026 13:16
@mh0lt mh0lt added this pull request to the merge queue Jun 6, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 6, 2026
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Jun 7, 2026
Merged via the queue into main with commit 2af9783 Jun 7, 2026
90 checks passed
@AskAlexSharov AskAlexSharov deleted the worktree-parallel-exec-gen-changesets-fix branch June 7, 2026 17:21
@taratorio taratorio added the Glamsterdam https://eips.ethereum.org/EIPS/eip-7773 label Jun 8, 2026
Lord1Egypt pushed a commit to Lord1Egypt/erigon that referenced this pull request Jun 10, 2026
…-queue runs (erigontech#21695)

## Problem

The kurtosis matrix's third-party images (lighthouse, teku, assertoor,
kurtosis engine/core/expander, vector, fluent-bit) are cached via
`actions/cache`, but that cache is **only ever written in PR scope,
never on the default branch** — so the merge queue gets no protection
from it.

Why: `cache-warming.yml` calls this workflow with `cache-warming-only:
true`, which skips the `assertoor_test` matrix job — the only job that
pulls + saves those images. So `docker-cl-*` is never saved to
`refs/heads/main`. (Every existing `docker-cl-*` cache entry is scoped
to `refs/pull/NNNNN/merge`.) Merge-queue runs execute on ephemeral
`gh-readonly-queue/main/*` branches and can only restore caches from the
default branch — so **every merge-queue run misses and pulls all 8
images from Docker Hub**, fully exposed to Docker Hub flakes on the path
that gates merges.

## Example failure


https://github.com/erigontech/erigon/actions/runs/27067265745/job/79891001617
— a merge-queue run for erigontech#21659. The
`assertoor_caplin-minimal_parallel_test` shard missed the cache, tried
to pull `sigp/lighthouse:v7.0.1`, hit `registry-1.docker.io … context
deadline exceeded` on all 3 retry attempts, and fast-cancelled the whole
merge-group run.

## Fix

Warm the `docker-cl-*` cache **on the default branch** via a dedicated
workflow (`warm-kurtosis-cl-images.yml`):

- **Triggers:** `push` to `main`/`release/**` filtered on `paths:
[test-kurtosis-assertoor.yml]` — the cache key is derived from the
pinned image versions, which live in that file, so it only needs
re-warming on a version bump — **plus a daily `schedule`** to repopulate
the cache if it's LRU-evicted between bumps (the repo sits at the 500 GB
cache ceiling, so eviction is active).
- It calls `test-kurtosis-assertoor.yml` with a new **`cl-images-only`**
input that runs *only* the warm job — `build-erigon-image` and the test
matrix are gated off, so the schedule/paths runs don't rebuild the image
or run tests.
- The warm job uses **`actions/cache/restore@v5` with `lookup-only:
true`** + an explicit `actions/cache/save@v5` on miss: when the cache
already exists it's a ~10 s no-op (no download), and it only pulls +
saves on a genuine miss.

Net effect: merge-queue and first-PR runs restore the CL cache from the
default branch instead of pulling from Docker Hub.

## Scope

- **`build-erigon-image` is intentionally left on its every-push
cadence.** Its BuildKit layer cache is source-dependent (the base + `go
mod download` layers track `Dockerfile`/`go.mod`/`go.sum`, the compile
layer changes every commit), a different concern from these static
version-pinned images. Optimizing *its* warming (paths on
`Dockerfile`/`go.mod`/`go.sum` + a deps-stage split so the warm skips
the compile) is a possible follow-up, not in this PR.
- Complementary to erigontech#21693 (retry on the erigon image build) — a
different Docker Hub touchpoint.

## Validation

- `actionlint` clean on both workflows (the one `SC2086` info is
pre-existing, in an untouched `run:` block).
- Gating verified: with `cl-images-only`, only `warm-third-party-images`
runs; `cache-warming.yml` (`cache-warming-only`) still warms
`build-erigon-image` every push; PR/merge_group still run the full
matrix and now restore the default-branch CL cache.
mh0lt added a commit that referenced this pull request Jun 12, 2026
Brings origin/main (through the latest tip) into the typed-vio branch (#21536).

Resolution:
- committer.go: dropped the BAL-ahead-fold (foldedAhead/maybeFoldAhead/
  foldBlockFromBAL/shadowCrossCheck) — it was introduced inside the typed-vio
  commit and is not on main. Took main's committer.go (the #21659
  changeset-window: perBlockFrom/computeTransition) and removed the matching
  wiring from exec3.go / exec3_parallel.go (blockRequest channel + type, the
  unused calcMode enum).
- execution/state: kept the typed per-path versioned reads/writes; applied
  main's #21667 (accept Dependency/Estimate cells), #21590 (SD-revival),
  #21659 (per-batch changesets), #21654 (phantom accesses). The storage
  net-zero read-value filter is retained in updateWrite, with new guard tests.

Verified: build, make lint, execution/{state,exec,stagedsync,commitment} unit
tests, and eest-spec parallel blocktests (devnet 82941/0, stable 69256/0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Glamsterdam https://eips.ethereum.org/EIPS/eip-7773

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[glamsterdam-devnet-5] no change sets for unwinding after initial sync causes node to get stuck

3 participants