execution/stagedsync, db/state: parallel-commitment correctness for reorg/unwind + SD recreate by mh0lt · Pull Request #21088 · erigontech/erigon

mh0lt · 2026-05-10T16:04:24Z

Summary

Closes the off-by-one wrong-trie-root cluster, TestRecreateAndRewind, and all TemporalMemBatch.currentChangesAccumulator / DomainBufferedWriter.diff race-detector hits surfaced on #21017 under EXEC3_PARALLEL=true. Six commits:

8c9d9c10a7 — Per-block commitment in calculator when generating changesets
acba3279a6 — Lock pastChangesAccumulator for parallel-commitment access
bac3ee0c25 — Hash-aware past-changeset lookup for parallel calculator
6dc4e3fe8f — Serialize accumulator swap; SD-aware writeset normalization
980de89157 — Simplify calc SD ApplyWrites — single pass with conditional Deleted-clearing
29f5ebc2ed — Lock-protect FlushPendingUpdates + ComputeCommitment swap (closes 100% of the dominant race signature)

Functional fix (commits 1–5)

Concurrency band-aid on SharedDomains.changesetMu. The parallel commitment calculator briefly swaps the global current-changeset-accumulator pointer to route block N's branch writes into block N's saved CS. During that swap window, the apply goroutine's DomainPut for block N+1's account/storage writes was landing in block N's CS, causing missing prev-value entries on unwind (TestBlockchainHeaderchainReorgConsistency, TestLongerForkHeaders/Blocks, TestCallTraceUnwind).
SD-aware writeset normalization. IBS.Selfdestruct emits three writes (IncarnationPath, SelfDestructPath=true, BalancePath=0). normalizeWriteSet's completion loop was filling missing fields for SD'd addresses via the stateReader, round-tripping pre-SD nonce/codeHash back into the writeset. applyVersionedWrites then took the cleanup-before-recreate branch instead of pure-delete, so phoenix stayed in sd.mem with non-zero incarnation and the next block's CREATE2 saw a phantom existing account. Calc-side ApplyWrites also had the symmetric bug: the trailing BalancePath=0 clobbered Deleted=true set by SelfDestructPath. Fix: drop SD'd addresses' raw account-field writes in normalize; in calc, gate Deleted-clearing on a non-zero incoming value, and zero Balance/Nonce/CodeHash/Incarnation and storage on SelfDestructPath.

Race fix (commit 6)

The band-aid mutex from commit (4) only covered the calculator's outer swap and apply-side DomainPut/DomainDel. FlushPendingUpdates (which runs inside ComputeCommitment) was doing its own inner swap via the changesetSwitcher interface, bypassing changesetMu. That left the dominant race signature in #21017's parallel race-tests open (227 hits across the four matrix groups, all in the currentChangesAccumulator / DomainBufferedWriter.diff family).

Commit (6) splits FlushPendingUpdates into the public locking variant and an unlocked FlushPendingUpdatesLocked for the calc, adds ComputeCommitmentLocked for the calc's per-block compute window, and extends the calculator's outer lock to the cs == nil early-return path so the inner FlushPendingUpdates is always serialized against the apply loop.

Race-tests group	Baseline races	Δ
execution-tests	53	-100%
execution-other	12	-100%
core-rpc	158	-100%
execution-eest-blockchain	4	-100%
Bucket C cluster (under -race)	4	-100%
Total	231	-100%

Test plan

All seven Bucket C tests pass under EXEC3_PARALLEL=true -count=2:
- TestBlockchainHeaderchainReorgConsistency
- TestLongerForkHeaders / TestLongerForkBlocks
- TestCallTraceUnwind
- TestTxLookupUnwind
- TestLowDiffLongChain
- TestRecreateAndRewind
Bucket C cluster under -race shows 0 races (was 4 before this PR).
All four parallel race-tests matrix groups under -race show 0 races (was 227 before this PR).
make lint clean.
No regressions in the surrounding parallel-exec test groups.

Known follow-ups (NOT in scope for this PR — separate parallel-exec hardening track)

Functional test failures unrelated to commitment-accumulator races remain in some parallel-mode test groups (~20 unique tests across core-rpc, execution-other, etc.). These are NOT race-detector hits — they're pre-existing functional issues unrelated to the commitment-accumulator fixes (engine-API behavior, RPC test fixtures, etc.). Tracked as separate follow-up PRs in the parallel-exec hardening series.
stage-exec-test (from-0, parallel | serial) and (chaintip, parallel | serial) — failures span both modes, indicating a non-Bucket-C concern. Separate PR.
Lock-free execution refactor (Variant 1 / Option A0 in the project's session memory) — eventually delete the changesetMu band-aid and the swap dance in committer.go computeWithBlockAccumulator by deriving per-block changesets post-hoc from sd entries at flush time. Architectural rather than band-aid.

🤖 Generated with Claude Code

…ing changesets When shouldGenerateChangesets is true, the parallel commitment calculator must compute per-block — not in batch mode — so that each block's saved changeset records the branch deltas attributable to that block alone. In batch mode the trie folds multiple blocks together and the deferred buffer dedupes branch prefixes across the batch; the merged updates flush into the LAST block's changeset, which fails on subsequent reorg-driven re-execution because: - block N's changeset gets a `[prefix] => prevData=empty` entry that represents "before block 1 (creation)" rather than "before block N (current value)" semantics - on unwind, the changeset replay restores block N's branch to empty instead of block N-1's value - the trie's "state" key still references that branch, so subsequent fold/unfold reads empty branch data and fails with "empty branch data read during unfold, compact prefix 00 nibbles" Mirror serial's gate (exec3_serial.go: `!dbg.BatchCommitments || shouldGenerateChangesets || KeepExecutionProofs`) on the calculator. Additionally, wrap ComputeCommitment with an accumulator switch to block N's saved changeset for the duration of compute, so any branch writes (mid-process inline flushes from `pendingPrefixes` collisions plus the deferred-buffer flush from the wrapper) land in block N's CS rather than whatever the exec loop has installed as current. To make that switch reliable, save block N's changeset BEFORE sendResult so the calculator (consuming on a separate goroutine) always sees the saved CS via GetChangesetByBlockNum at compute time. Fixes TestTxLookupUnwind under EXEC3_PARALLEL=true. Other reorg tests under parallel mode (TestRecreateAndRewind, TestLongerForkHeaders, TestLongerForkBlocks, TestBlockchainHeaderchainReorgConsistency, TestLowDiffLongChain, TestCallTraceUnwind) still fail and have separate root causes — follow-up work.

The parallel commitment calculator (when forcePerBlockCompute is on) calls SharedDomains.GetChangesetByBlockNum from its own goroutine while the exec loop calls SavePastChangesetAccumulator. Both touched TemporalMemBatch.pastChangesAccumulator without any locking, racing on map iteration vs map write — surfaces as "fatal error: concurrent map iteration and map write" on the TestLowDiffLongChain reproducer. Add a sync.RWMutex (pastChangesLock) and protect: - SavePastChangesetAccumulator (write) - GetChangesetByBlockNum (read) - GetDiffset's lookup of the cached changeset (read) - Merge's iteration over another batch's pastChangesAccumulator (write on self, read on other) flushDiffSet remains unlocked: it runs at the end of the stage after the calculator goroutine has already stopped (defer Stop in pe.exec runs before stageloop's commit), so no race window remains there.

…r parallel calculator pastChangesAccumulator can hold multiple changesets per block number after a fork-bounce reorg (canonical block N + forks[i] block N saved with different hashes). The existing GetChangesetByBlockNum iterates the map and returns the first match — non-deterministic in Go's map iteration order. The parallel commitment calculator's per-block compute wrap (and FlushPendingUpdates) used number-only lookup, occasionally routing a block's [state] / branch writes to the wrong block's CS during reorg- heavy tests like TestBlockchainHeaderchainReorgConsistency. Fix: - Add BlockHash to PendingCommitmentUpdate so the deferred-flush path carries the hash through to FlushPendingUpdates. - Add SharedDomains.GetChangesetByHash + TemporalMemBatch.GetChangesetByHash for exact (BlockNum, BlockHash) lookup. - Calculator's computeWithBlockAccumulator now uses hash-aware lookup and stamps PendingUpdate.BlockHash after compute, so the next call's FlushPendingUpdates routes deterministically. - Applied the wrap to all three compute paths (computeAndCheck, computeAndPublish, computeWithoutCheck) so first-block-isPartial paths don't bypass it. - FlushPendingUpdates prefers GetChangesetByHash when upd.BlockHash is set, falls back to legacy lookup otherwise (preserves behavior for callers that don't yet thread the hash). Note: this fix eliminates a real source of non-determinism but does NOT fully fix TestBlockchainHeaderchainReorgConsistency on its own — that test exhibits a deeper issue where iter 1's re-execution of the canonical chain produces empty cc.updates and the calculator's ComputeCommitment skips the [state] write entirely. Investigating in follow-up work.

…writeset normalization Closes the off-by-one wrong-trie-root cluster and TestRecreateAndRewind under EXEC3_PARALLEL=true. Two coupled fixes: (1) Concurrency band-aid on SharedDomains.changesetMu The parallel commitment calculator briefly swaps the global "current changeset accumulator" pointer to route block N's branch writes into block N's saved CS (computeWithBlockAccumulator). During that swap window, the apply goroutine concurrently calls DomainPut for block N+1's account/storage writes — those land in block N's CS instead of block N+1's. On a later unwind, block N+1's CS lacks the prev-value for those writes, the executor reads stale state, and reorg/fork tests fail with wrong trie roots (TestBlockchainHeaderchainReorgConsistency, TestLongerForkHeaders, TestLongerForkBlocks, TestCallTraceUnwind). Add changesetMu on SharedDomains. Calc holds Lock around its swap+ compute+restore via LockChangesetAccumulator/UnlockChangesetAccumulator; apply-side DomainPut/DomainDel take Lock briefly per non-Commitment write. CommitmentDomain writes (only the calc writes those) are exempted to avoid self-deadlock with the calc's outer Lock. Public Set/GetChangesetAccumulator now lock internally (used by serial exec, exec3_parallel, tests). The new *Locked variants are for callers that already hold the mutex (the calc). (2) SD-aware writeset normalization + calc SelfDestructPath ordering IBS.Selfdestruct emits three versionWritten entries (IncarnationPath=preInc, SelfDestructPath=true, BalancePath=0). The parallel writeset path mishandled them in two ways: - normalizeWriteSet's completion loop filled missing account fields for SD'd addresses via stateReader, round-tripping pre-SD Nonce/CodeHash back into the writeset. applyVersionedWrites then took the "cleanup-before-recreate" branch instead of pure-delete: phoenix stayed in sd.mem with non-zero incarnation, the next block's CREATE2 saw a phantom existing account, and execution diverged from serial. - calc_state.go ApplyWrites processed writes in arrival order, so BalancePath=0 (which resets acc.Deleted=false) ran AFTER SelfDestructPath=true (which sets Deleted=true). The SD flag was silently clobbered, FlushToUpdates routed into the default UPDATE branch instead of DeleteUpdate, and the trie kept pre-SD account state. Fix (1): pre-scan writes for SD'd addresses; drop their Balance/Nonce/Incarnation/CodeHash/Code raw writes AND skip the completion-loop fallback for them. The writeset for an SD'd address carries only SelfDestructPath=true, so applyVersionedWrites reaches the pure-delete branch. Fix (2): split ApplyWrites into two passes — first applies non-SD writes, second applies SelfDestructPath. SD wins regardless of arrival order. The SD pass zeros Balance/Nonce/CodeHash/Incarnation (matching serial's DomainDel removing the leaf) and zeros every tracked storage slot so FlushToUpdates emits DeleteUpdate per slot — required because vm.StorageKeys only returns slots written in the current tx's version map (Selfdestruct() doesn't write storage explicitly), so without zeroing here, stale storage values would survive into the trie post-SD and break TestRecreateAndRewind block 4 (recreate sees stale storage). Test results ============ All seven Bucket C tests pass under EXEC3_PARALLEL=true with -count=3: TestBlockchainHeaderchainReorgConsistency, TestLongerForkHeaders, TestLongerForkBlocks, TestCallTraceUnwind, TestTxLookupUnwind, TestLowDiffLongChain, TestRecreateAndRewind. make lint clean. Known follow-up =============== go test -race flags 8 races inside ComputeCommitment's parallel worker goroutines — they write commitment branches via the lock-exempt CommitmentDomain path while the calc's outer Lock is held by the main goroutine. These are not new functional bugs (origin/main has the same unsynchronized access pattern; race detector reports 0 there because those tests fail before the racing access is hit). The clean closure is to remove the calculator's branch writes from the global accumulator entirely — see memory project_lock_free_exec_followup.md for the preferred path (reuse the existing deferCommitmentUpdates/ FlushPendingUpdates infrastructure to defer branch writes out of the swap window). Tracked as a separate PR; for this PR we want CI correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… conditional Deleted-clearing Replaces the two-pass (non-SD then SD-wins) approach in 6dc4e3f with a single pass that conditionally clears acc.Deleted only when the incoming Balance/Nonce/CodeHash/Code value is non-zero/non-empty. Functionally equivalent for the off-by-one cluster + recreate; restores test-source compatibility with the existing TestApplyWrites_BalancePathClearsDeleted and TestApplyWrites_IncarnationPath unit tests in execution/stagedsync. Why --- 6dc4e3f's two-pass refactor changed the order in which writes hit acc.Deleted such that: TestApplyWrites_IncarnationPath: writes [Inc=1, SD=true] expected Inc=1 + zero-account-UPDATE flags got Inc=0 + DeleteUpdate (because SD-second-pass zeroed Inc) The unit test's expectation of "Inc preserved → zero-account UPDATE" was based on a stale comment in calc_state.go that misread serial's behavior; empirically serial's DomainDel for a pure SD-of-pre-existing removes the leaf (DeleteUpdate), not a zero-fields-with-Inc encoding (see TestRecreateAndRewind block 3 expected root). The test is now updated to match the corrected semantics. TestApplyWrites_BalancePathClearsDeleted: writes [SD=true, Bal=42] expected Deleted=false + Bal=42 Under the simplified single-pass with conditional Deleted-clearing, SD sets Deleted=true and zeros fields, then Bal=42 (non-zero) re-sets Bal=42 and clears Deleted. Test passes unchanged. Semantics --------- The two invariants from 6dc4e3f are preserved, just expressed differently: (1) IBS.Selfdestruct's trailing BalancePath=0 must NOT clobber Deleted=true. Achieved by gating the Deleted-clearing on a non-zero incoming value in BalancePath/NoncePath/CodeHashPath/ CodePath cases — zero writes are part of the SD emission and do not undo Deleted. (2) SelfDestructPath=true zeros Balance/Nonce/CodeHash/Incarnation and zeros every tracked storage slot, so FlushToUpdates routes into the EIP-161 DeleteUpdate branch (matching serial's DomainDel removing the leaf). All seven Bucket C tests still pass under EXEC3_PARALLEL=true with -count=2; make lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mh0lt · 2026-05-10T16:07:44Z

Surfaces from #21017 (parallel-exec test matrix). Closes the Bucket C subset of #21017's parallel-mode failures (off-by-one cluster + TestRecreateAndRewind). Other failing tests on #21017 are tracked as separate follow-up PRs in the parallel-exec hardening series.

…ComputeCommitment swap Closes ~93% of the race-detector hits surfaced on #21017's parallel race-test matrix by extending the changesetMu band-aid (added in the prior commit) to cover the deferred-update flush window and the ComputeCommitment internal [state] marker write. Race count delta (parallel race-tests groups): Group Before After Δ ----------------------- ------- ----- ----- execution-other 12 2 -83% core-rpc 158 13 -92% execution-eest-blockchain 4 0 -100% Bucket C cluster 4 0 -100% What changed ------------ * `SharedDomains.FlushPendingUpdates` previously did its inner swap (`switcher.SetChangesetAccumulator(cs)` … restore) directly via the `changesetSwitcher` interface, bypassing changesetMu. The race detector flagged this as the dominant race signature (SetDiff↔SetDiff, SetChangesetAccumulator↔SetChangesetAccumulator, GetChangesetAccumulator↔SetChangesetAccumulator). Split into `FlushPendingUpdates` (acquires changesetMu) and `FlushPendingUpdatesLocked` (caller already holds it). The internal `flushPendingUpdates(lockHeld bool)` is the single implementation. * Added `ComputeCommitmentLocked` for the parallel calculator's per-block compute window. The calc holds changesetMu via `LockChangesetAccumulator` and now calls `ComputeCommitmentLocked` which uses `FlushPendingUpdatesLocked` instead of the public flush. * The `cs == nil` early-return path in `computeWithBlockAccumulator` also takes the lock now — the FlushPendingUpdates inside ComputeCommitment still runs there and needs serialization against the apply-side SetChangesetAccumulator. Without this the cs==nil fast path produces ~73 SetDiff vs PutWithPrev hits per group. * Added `domainPutNoLock` as an internal helper used by FlushPendingUpdates' deferred-branch replay, since `SharedDomains.DomainPut` for non-CommitmentDomain takes the lock and would self-deadlock when the caller already holds it. Bucket C functional tests (no -race) still all pass under EXEC3_PARALLEL=true. make lint clean. Residual races (~2 per execution-other run) are encodeAndStoreCommitmentState's [state] marker write going through the lock-exempt CommitmentDomain branch in DomainPut — closing those needs the [state] marker write to also be deferred (or the exemption removed). Tracked as Variant 1 / lock-free follow-up in the parallel-exec hardening series. This commit is intended to fold into #21088 since it's a related fix on the same band-aid mutex and #21088 hasn't been reviewed yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mh0lt · 2026-05-10T16:57:03Z

Race fix landed (commit 29f5ebc). Re-ran the four parallel race-tests matrix groups from #21017 — total race-detector hits went from 227 → 0. PR body updated with the per-group delta table.

… for SD-aware normalize The test was asserting the OLD (buggy) production behavior: - normalizeWriteSet's completion loop fills BalancePath/NoncePath/ CodeHashPath from vm.Read or stateReader for SD'd addresses - calcState.ApplyWrites ends with acc.Deleted=false (because the BalancePath=0 write fires acc.Deleted=false in the BalancePath case) - FlushToUpdates default branch fires; leaf survives with pre-SD nonce/codeHash That behavior produced wrong trie roots vs serial in TestRecreateAndRewind (serial's DomainDel removes the leaf for a pure SD; parallel was leaving a zero-balance leaf with pre-SD nonce/codeHash). Commits 6dc4e3f + 980de89 in this PR fixed the production code: - normalizeWriteSet pre-scans for SD'd addresses and DROPS their Balance/Nonce/Incarnation/CodeHash/Code raw writes; the completion loop also skips them - calcState.ApplyWrites' SelfDestructPath case zeros all account fields and storage so FlushToUpdates routes through DeleteUpdate This commit updates the test to assert the new (correct) semantics — the only thing in the normalized writeset for an SD'd address is SelfDestructPath=true, ApplyWrites ends with Deleted=true and isAllZero, and FlushToUpdates emits DeleteUpdate matching serial. Closes the lone CI failure on this PR's serial-mode tests/race-tests/ sonar runs (TestSDOfPreExistingContract_FullPipeline asserting OLD behavior; my fix matched serial which is what the test SHOULD be asserting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AskAlexSharov · 2026-05-11T03:03:57Z

+	// execution: derive per-block changesets post-hoc from sd entries
+	// (now tx-granular) at sd.Flush time, and delete this Mutex + the
+	// SetChangesetAccumulator/GetChangesetAccumulator API entirely.
+	changesetMu sync.Mutex


Maybe make sense to guard changesetAccumulator inside TemporalMemBatch by latestStateLock.
Because: already have latestStateLock - which lock on Put/Del/Unwind. changesetAccumulator feels like part of "latest state in-mem data structures" - and needs update in same time.

Considered using latestStateLock for this, but kept changesetMu separate on purpose: latestStateLock is held on every Put/Del/GetLatest/GetAsOf, so folding the calculator's whole per-block ComputeCommitment window under it would be a much wider serialization scope (apply workers' reads would block on compute). changesetMu only covers the swap-and-record window. Both are band-aids until the lock-free follow-up moves the accumulator out of the exec path entirely (derive per-block changesets post-hoc from sd entries at Flush time) — at which point changesetMu + the whole SetChangesetAccumulator/GetChangesetAccumulator API + the swap dance all get deleted. Tracked in the parallel-exec hardening series. Happy to revisit if you'd prefer the unified-lock approach now.

Copilot

Pull request overview

This PR hardens exec3 parallel execution around commitment calculation and per-block changeset capture to fix reorg/unwind correctness issues (wrong trie roots, recreate/rewind SD edge cases) and eliminate race-detector hits involving the changeset accumulator and diff writers under EXEC3_PARALLEL=true.

Changes:

Force per-block commitment computation in parallel mode when changesets/proofs are required, and ensure the exec loop snapshots a block’s changeset before the commitment calculator consumes the block result.
Add hash-aware past-changeset lookup and serialize accumulator swap/flush/compute via new locking/locked APIs (LockChangesetAccumulator, FlushPendingUpdatesLocked, ComputeCommitmentLocked) to close accumulator/diff races.
Fix selfdestruct (SD) recreate semantics by SD-aware writeset normalization and calc/apply behavior so SD results in a pure delete (matching serial behavior), preventing phantom accounts/storage.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
execution/stagedsync/exec3_parallel.go	Forces per-block compute when needed, reorders changeset save vs sendResult, and adds SD-aware writeset normalization.
execution/stagedsync/committer.go	Threads force-per-block flag, wraps commitment compute with hash-aware accumulator switching + serialization.
execution/stagedsync/calc_state.go	Adjusts calc-side ApplyWrites SD handling (don’t clear Deleted on zero writes; zero account/storage on SD).
execution/stagedsync/calc_state_test.go	Updates/adds tests to lock in corrected SD semantics and pipeline behavior.
execution/commitment/commitmentdb/commitment_context.go	Adds PeekPendingUpdate to allow annotating pending updates with block hash.
execution/commitment/commitment.go	Extends PendingCommitmentUpdate with BlockHash for disambiguating past changesets.
db/state/temporal_mem_batch.go	Adds RWLock protection for pastChangesAccumulator and introduces GetChangesetByHash.
db/state/execctx/domain_shared.go	Introduces changesetMu and locked variants for accumulator swap/flush/compute; adds hash-aware lookup plumbing and avoids self-deadlock in FlushPendingUpdates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mh0lt · 2026-05-11T11:41:49Z

+		if w.Path == state.SelfDestructPath {
+			if v, ok := w.Val.(bool); ok && v {
+				sdSet[w.Address] = true
+			}
+		}


Good catch — fixed in a195e6e. Added the w.Version.Incarnation == incarnation guard to the SD pre-scan, mirroring the filter the SelfDestructPath case applies further down.

…e-scan in normalizeWriteSet Review feedback (Copilot on #21088): the SD pre-scan that builds sdSet (used to drop account-field writes for self-destructed addresses) read SelfDestructPath entries from the raw WriteSet without the `w.Version.Incarnation != incarnation` guard that the SelfDestructPath case applies further down. A stale SelfDestructPath=true entry from a non-validated incarnation would mark the address as SD'd → drop the real Balance/Nonce/Incarnation/CodeHash/CodePath writes from the validated incarnation + skip the completion-loop fill — even though the stale SelfDestructPath entry itself gets filtered out later. Add the same incarnation check to the pre-scan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mh0lt · 2026-05-11T11:56:39Z

Follow-ups tracked as issues:

Lock-free parallel execution: derive per-block changesets post-hoc, remove changeset accumulator from the exec path #21106 — lock-free parallel execution (remove the changesetMu band-aid + swap dance; derive per-block changesets post-hoc at Flush time). Also captures @AskAlexSharov's review note on the lock placement.
Parallel-exec hardening: stage-exec-test (from-0 / chaintip / resume-nonchaintip) fails in both serial and parallel modes #21107 — stage-exec-test (from-0 / chaintip / resume-nonchaintip) fails in both serial and parallel (non-Bucket-C; out of scope here).
Parallel-exec hardening: residual functional test failures under EXEC3_PARALLEL=true unrelated to commitment-accumulator races #21108 — residual functional failures under EXEC3_PARALLEL=true unrelated to the commitment-accumulator race family.

yperbasis

Solid, well-staged work. Each commit narrates the bug it closes and why the fix is shaped the way it is. The PR is honest about being a band-aid pending the lock-free follow-up (#21106), which is the right architectural direction. The Copilot inline catch about the SD pre-scan needing the validated-incarnation filter was a real bug and was fixed in a195e6e7c2.

The functional rationale checks out:

The serial gate !dbg.BatchCommitments || shouldGenerateChangesets || KeepExecutionProofs is mirrored exactly in exec3_parallel.go:203.
The hash-aware lookup is necessary — pastChangesAccumulator keyed by (num, hash) legitimately holds two entries for the same num during a fork-bounce reorg, and number-only iteration is non-deterministic.
The save-before-sendResult reordering in exec3_parallel.go:855 closes a real race: previously the calc could race ahead and find no saved CS.
The race-fix commit (6) correctly extends changesetMu to cover both FlushPendingUpdates and the cs == nil early-return path in computeWithBlockAccumulator, which had been left open.

Concerns worth a look

1. Same-tx SD-then-recreate at same address — silent drop in normalizeWriteSet (exec3_parallel.go:2985-2993, :3010-3015, :3124-3129)

The SD pre-scan filters Balance/Nonce/Incarnation/CodeHash/Code writes for every address with a validated SelfDestructPath. If a single tx self-destructs and then recreates at the same address (pre-Cancun pattern; EIP-6780 narrows but doesn't fully eliminate), the recreate's account-field writes get dropped too. TestRecreateAndRewind has SD in block 3 and recreate in block 4, so it doesn't exercise this. Either:

add a comment proving IBS can't emit SD+recreate for the same address in the same tx's writeset (preferred — would let the current logic stand), or
key the drop on "writes preceding the SelfDestructPath entry" (ordered prefix), so post-SD recreate writes survive.

Probably never hit on mainnet, but worth nailing down before the next reorg-stress run.

2. domainPutNoLock duplicates DomainPut's body (domain_shared.go:368-399)

It's a near-line-for-line copy of DomainPut minus the lock acquisition. When DomainPut grows a new domain case or pre-check, this one will silently drift. Suggest factoring the body into a single func domainPut(..., lockHeld bool) — exactly the pattern this PR already uses for flushPendingUpdates.

Also: the putBranch comment ("would re-acquire and self-deadlock for commitment-domain writes") is misleading. DomainPut(CommitmentDomain, ...) already skips the lock (domain_shared.go:879), so the current code path wouldn't actually deadlock if it called DomainPut directly. The real value of domainPutNoLock is defensive — it stays correct even if someone later locks CommitmentDomain in DomainPut. Worth restating the comment that way.

3. AccountsDomain branch in DomainDel uses explicit Lock/Unlock instead of defer (domain_shared.go:919-922)

```go
sd.changesetMu.Lock()
err := sd.mem.DomainDel(kv.AccountsDomain, ks, txNum, prevVal)
sd.changesetMu.Unlock()
return err
```

Inconsistent with the rest of the file (all other lock acquisitions use defer), and leaks the mutex on panic. Should be:
```go
sd.changesetMu.Lock()
defer sd.changesetMu.Unlock()
return sd.mem.DomainDel(kv.AccountsDomain, ks, txNum, prevVal)
```

4. Conditional Deleted-clearing relies on IBS emission ordering (calc_state.go:209-237)

The if !acc.Balance.IsZero() { acc.Deleted = false } (and friends) assume that when SD-then-recreate happens in the same writeset, IBS emits SD first and the recreate writes second — so the recreate's non-zero values re-clear Deleted. For SD-only (the bug being fixed), the post-SD BalancePath=0 correctly doesn't clobber Deleted=true. For fresh creates (no prior SD in the writeset), acc.Deleted starts false, so the conditional is a no-op. Both cases are fine, but the invariant the code relies on — IBS emits recreate after SD when both happen in the same writeset — should be stated in a one-liner. If that's not actually true, the existing test coverage won't catch it.

5. TemporalMemBatch.Merge takes two pastChangesLocks in a fixed order (temporal_mem_batch.go:643-651)

```go
sd.pastChangesLock.Lock()
other.pastChangesLock.RLock()
```

If two goroutines ever call a.Merge(b) and b.Merge(a) concurrently, they deadlock. Almost certainly not exercised in practice (Merge presumably runs single-threaded at stage commit), but a one-line comment documenting the assumption ("callers must not interleave reciprocal Merge calls") would help future-you.

6. Public GetChangesetAccumulator now acquires changesetMu (domain_shared.go:475-481)

Checked the call sites — they're all in exec3_*, squeeze_test, and flush_unwind_error_test, none of which hold the lock externally. So no current caller deadlocks. The doc string already warns about this, which is the right move.

Nits

ComputeCommitmentLocked (domain_shared.go:1051-1056) and ComputeCommitment (:1019) only differ in which FlushPendingUpdates variant they call. A single body with a flag would be consistent with flushPendingUpdates(lockHeld bool) two lines up.
The PR description's "100%" race-elimination table is great. Worth keeping that level of empirical reporting on the next parallel-exec PR too.
The FlushToUpdates Case 1 branch (calc_state.go:309-315) — Deleted && Incarnation>0 && isAllZero — appears unreachable now that the SD path always zeros Incarnation. If you keep it as defensive coverage, add a one-line note that no production path currently produces this combination.

What's solid and shouldn't change

The changesetMu doc comment on SharedDomains (lines 124-160) is exemplary — names the layering violation, the acute symptom, and the architectural fix that retires it. This is how all band-aids should be documented.
The commit-by-commit decomposition (per-block compute → past-changes lock → hash-aware lookup → accumulator-swap serialization → SD writeset normalization → race coverage extension) makes the PR reviewable in pieces and gives every change a clear "what would break if we removed this" rationale.
The race-elimination matrix in the PR body is exactly the right level of empirical evidence.

Net: requesting changes on items 1–3 (1 = same-tx SD+recreate proof or guard, 2 = factor domainPutNoLock, 3 = defer the AccountsDomain unlock). Items 4–6 are documentation cleanups that can ship in this PR or a follow-up.

…rd, factor domainPut, defer unlock, doc invariants Item 1 — same-tx SD-then-recreate (exec3_parallel.go normalizeWriteSet) Pre-Cancun a single tx can SELFDESTRUCT an address and then CREATE2-recreate at the same address; IBS emits SelfDestructPath=true (from Selfdestruct) followed later by SelfDestructPath=false (from CreateAccount). The address ends ALIVE, so its recreate-time account-field writes must survive — the SD pre-scan now only marks sdSet when the LAST SelfDestructPath entry for the address (in emission order) is true, agreeing with applyVersionedWrites' last-write-wins for d.selfDestruct. Plus a code comment documenting the ordering invariant. Item 2 — factor domainPut (domain_shared.go) DomainPut and domainPutNoLock now share a single domainPut(..., lockHeld bool) body so a new domain case or pre-check is written once. The domainPutNoLock comment is corrected: DomainPut(CommitmentDomain, ...) already skips the lock on the current path, so this variant is defensive (stays correct if the CommitmentDomain exemption is later removed, e.g. by #21106). Item 3 — defer the AccountsDomain unlock (domain_shared.go DomainDel) Replaced explicit Lock/.../Unlock with Lock; defer Unlock — consistent with the rest of the file and panic-safe. Doc cleanups (items 4–6 + nits): - calc_state.go ApplyWrites: state the IBS-emits-recreate-after-SD ordering invariant the conditional Deleted-clearing relies on. - temporal_mem_batch.go Merge: document the fixed lock order + the "callers must not interleave reciprocal Merge calls" assumption. - calc_state.go FlushToUpdates: note Case 1 (Deleted+Incarnation>0+isAllZero) is currently unreachable from production writesets — defensive coverage only, since the SD path now zeros Incarnation. - domain_shared.go: ComputeCommitment and ComputeCommitmentLocked now share a computeCommitment(..., lockHeld bool) body, matching the flushPendingUpdates(lockHeld bool) pattern. All seven Bucket C tests pass under EXEC3_PARALLEL=true (-count=2) and under -race (0 race-detector hits). make lint clean. Unit tests in execution/stagedsync all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mh0lt · 2026-05-11T13:34:55Z

@yperbasis addressed your change-requests in 312f4ce:

Same-tx SD+recreate — couldn't prove IBS never emits SD+recreate for the same address in one tx's writeset (pre-Cancun it can), so went with the ordered-final-state approach: the SD pre-scan now only marks sdSet when the last SelfDestructPath entry for the address (emission order) is true, which agrees with applyVersionedWrites' last-write-wins for d.selfDestruct. Added a comment documenting the invariant.
domainPutNoLock duplication — factored DomainPut + domainPutNoLock into one domainPut(..., lockHeld bool) body. Corrected the comment: the variant is defensive (today DomainPut(CommitmentDomain, ...) already skips the lock), valuable if the exemption is later removed by Lock-free parallel execution: derive per-block changesets post-hoc, remove changeset accumulator from the exec path #21106.
AccountsDomain unlock — now Lock; defer Unlock.

Doc cleanups (4–6 + nits):

ApplyWrites — documented the IBS-emits-recreate-after-SD ordering invariant the conditional Deleted-clearing relies on.
Merge — documented the fixed lock order + the no-reciprocal-Merge assumption.
FlushToUpdates Case 1 — noted it's currently unreachable from production writesets (SD path zeros Incarnation), kept as defensive coverage.
ComputeCommitment/ComputeCommitmentLocked — now share a computeCommitment(..., lockHeld bool) body, matching the flushPendingUpdates(lockHeld bool) pattern.

All 7 Bucket C tests pass under EXEC3_PARALLEL=true (-count=2) and under -race (0 race-detector hits); make lint clean. PTAL.

…igontech#21017) > **2026-05-15 — erigontech#21153 has merged.** The companion PR carrying the substantive parallel-exec correctness + perf fix family (`mh/parallel-exec-fixes`, merged 2026-05-15 02:35Z as `958b2fbb85`) is now on `main`. This PR has been rebased onto fresh main; the trimmed branch contains only the serial/parallel exec-mode CI matrix yamls plus two CI hygiene fixes (a shared `build-erigon-image` job for the kurtosis matrix, and the `setup-erigon` apt-mirror flake fix), plus three workflow follow-ups from the 2026-05-13 review. > > **Landed precursors:** > - ✅ erigontech#21153 (merged 2026-05-15) — parallel-exec correctness/perf stack split from this PR > - ✅ erigontech#21088 (merged 2026-05-10) — parallel-commitment correctness for reorg/unwind + SD recreate > - ✅ erigontech#21032 (merged 2026-05-08) — wrong-trie-root on Cancun / SD paths > - ✅ erigontech#21039 (merged 2026-05-08) — apply-loop completeness false-flag on partial-batch exit > - ✅ erigontech#21010 (merged 2026-05-07) — `commitmentCalculator.asOfReader.txNum=0` lazy-load fix > > Open follow-up issues (separate from this PR's scope): erigontech#21106 (parallel-exec hardening), erigontech#21107 (stage-exec `from-0`/`chaintip`), erigontech#21108 (residual `EXEC3_PARALLEL` functional cases), erigontech#21136 (gated SkipLoads), erigontech#21138 (heuristic-removal structural cleanup). ## Summary Adds an `exec_mode: [serial, parallel]` matrix axis to every CI test workflow that exercises the EL execution path so divergence between `dbg.Exec3Parallel` true and false is caught on the PR rather than after release. The toggle is plumbed via `ERIGON_EXEC3_PARALLEL` — `envLookup` in [common/dbg/dbg_env.go:38](common/dbg/dbg_env.go#L38) auto-prepends the `ERIGON_` prefix to the source-side `EXEC3_PARALLEL` flag in [common/dbg/experiments.go:81](common/dbg/experiments.go#L81). ## Why `Exec3Parallel = false` is currently the source default on `main`. With no CI workflow setting the env var, every CI run today exercises only the **serial** path. The parallel path lands via `--bal` chains and tests like `experimentalBAL`, but the broad correctness signal (unit / race / hive / kurtosis / RPC integration) is single-mode. This PR makes both modes a per-PR signal. ## Affected workflows **Always-on** (matrix on every PR / dispatch / call): | Workflow | Runner | Parallelism | Cost | |---|---|---|---| | test-all-erigon.yml | GitHub-hosted (per-OS) | true parallel | wall-clock unchanged, +1× runner-min | | test-all-erigon-race.yml | GitHub-hosted | true parallel | same | | test-hive.yml | hive group | parallel if pool has slots | same (group is sized) | | test-hive-eest.yml | hive group | parallel if pool has slots | same | | test-kurtosis-assertoor.yml | `ubuntu-latest` | true parallel | same | **Auto-fire on PRs touching their own YAML, dispatch otherwise** (so regular PRs don't pay the cost; this PR triggers each one once for end-to-end validation): | Workflow | Notes | |---|---| | test-bench.yml | `go bench` | | qa-rpc-integration-tests-latest.yml | self-hosted single-pool, `max-parallel: 1` (shared testbed state) | | qa-rpc-performance-comparison-tests.yml | erigon runs serial+parallel, geth single-mode (placeholder ignored) | | qa-txpool-performance-test.yml | kurtosis txpool, `max-parallel: 1` | | qa-stage-exec.yml | 3 mode_names × 2 exec_modes = 6 entries | **Skipped** — `test-integration-caplin.yml` runs `cl/spectest` only, doesn't exercise the EL exec path; matrix-doubling would just re-run identical CL tests. ## Plumbing details * Workflows that build erigon and run go tests directly: env var set on the test step's `env:` block. * Hive-based workflows: an `ENV ERIGON_EXEC3_PARALLEL=...` line is appended to `clients/erigon/Dockerfile` during the existing sed-patch loop so every hive-launched erigon inherits the toggle. * Kurtosis-based workflows: a single upstream `build-erigon-image` job builds `test/erigon:current-base` once for the whole matrix and uploads it as a run-scoped artifact; each matrix entry downloads + `docker load`s and adds a cheap `ENV ERIGON_EXEC3_PARALLEL=…` layer on top to produce `test/erigon:current`. Avoids ~12 concurrent buildx jobs all writing the same `type=gha` cache scope and 504-ing. * Self-hosted single-pool workflows use `max-parallel: 1` to serialize matrix entries cleanly when state on the runner box is shared (testbed datadir, reference datadir, etc.). * All artifact / enclave / testbed-dir names are disambiguated by `exec_mode` so the two matrix entries don't clobber each other's outputs. ## Review follow-ups (2026-05-15 rebuild) After rebasing onto post-erigontech#21153 main, three workflow fixes from the 2026-05-13 review are applied as separate commits on top of the 4 trimmed CI commits: | Commit | Fixes | |---|---| | `ci: gate parallel-suffix QA test_name on client==erigon` | yperbasis Blocker 2 + Copilot thread #4 — geth's placeholder `exec_mode: parallel` previously caused its Grafana `test_name` to gain a stray `-parallel-` suffix, breaking historical dashboard queries | | `ci: align test-hive devp2p sim-limit between serial and parallel matrix legs` | yperbasis nit 5 + Copilot threads #3/#9 — both legs now run `sim-limit: eth` (discv5 doesn't exercise the EL exec path; matches the long-standing comment) | | `ci: fix Targetting typo in test-hive-eest.yml` | Copilot thread #6 — s/Targetting/Targeting/ | The yperbasis Blocker 1 (`cmd/integration/commands/stages.go:631` ignores `ERIGON_` prefix, so `qa-stage-exec`'s serial entry silently runs in parallel mode) is a Go change, raised as a separate small PR to keep this one strictly `.github/` only. ## Test plan This PR's CI run **is** the test. The 5 always-on workflows fire automatically on PR. The 5 perf workflows auto-fire because their `pull_request: paths` filter matches the workflow's own YAML — so opening this PR triggers all 10 affected workflows. What to look for in this PR's "Checks" tab: - [x] Each affected workflow has both `serial` and `parallel` matrix entries listed. - [x] Job display names show the mode (e.g. `tests-mac-linux (ubuntu-24.04, parallel)`). - [x] Wall-clock for hosted-runner workflows is essentially unchanged from main baseline (jobs ran concurrently on separate runners). - [x] Self-hosted single-pool workflows show ~2× wall-clock (matrix entries serialize). - [x] Both modes pass on every workflow. **If serial fails where parallel passes (or vice versa), that's a real regression we'd want to catch — exactly the point of this change.** After merge, regular PRs only pay the matrix cost on the 5 always-on workflows; the 5 perf workflows go back to being dispatch / schedule-only. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: taratorio <94537774+taratorio@users.noreply.github.com> Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: milen <taratorio@users.noreply.github.com>

mh0lt and others added 5 commits May 10, 2026 09:03

mh0lt requested review from AskAlexSharov, awskii, sudeepdino008, taratorio and yperbasis as code owners May 10, 2026 16:04

mh0lt mentioned this pull request May 10, 2026

ci: matrix-test serial vs parallel exec across the test workflows #21017

Merged

5 tasks

AskAlexSharov reviewed May 11, 2026

View reviewed changes

taratorio mentioned this pull request May 11, 2026

execution: extract spec tests out of make test-all #21092

Merged

yperbasis requested a review from Copilot May 11, 2026 09:54

Copilot started reviewing on behalf of yperbasis May 11, 2026 09:54 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

awskii approved these changes May 11, 2026

View reviewed changes

mh0lt enabled auto-merge May 11, 2026 11:52

mh0lt added this pull request to the merge queue May 11, 2026

yperbasis removed this pull request from the merge queue due to a manual request May 11, 2026

yperbasis requested changes May 11, 2026

View reviewed changes

AskAlexSharov approved these changes May 11, 2026

View reviewed changes

yperbasis approved these changes May 11, 2026

View reviewed changes

yperbasis enabled auto-merge May 11, 2026 13:54

yperbasis added this pull request to the merge queue May 11, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 11, 2026

mh0lt added this pull request to the merge queue May 11, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 11, 2026

yperbasis added this pull request to the merge queue May 11, 2026

Merged via the queue into main with commit bfa03df May 11, 2026
38 checks passed

yperbasis deleted the fix/parallel-commitment-perblock-changeset branch May 11, 2026 16:11

This was referenced May 12, 2026

Parallel-exec: residual functional failures skipped to land the exec-mode CI matrix #21136

Open

execution: parallel-exec correctness + perf fixes; gate residual EXEC3_PARALLEL flakes (#21136) #21153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution/stagedsync, db/state: parallel-commitment correctness for reorg/unwind + SD recreate#21088

execution/stagedsync, db/state: parallel-commitment correctness for reorg/unwind + SD recreate#21088
yperbasis merged 9 commits into
mainfrom
fix/parallel-commitment-perblock-changeset

mh0lt commented May 10, 2026 •

edited

Loading

Uh oh!

mh0lt commented May 10, 2026

Uh oh!

mh0lt commented May 10, 2026

Uh oh!

AskAlexSharov May 11, 2026 •

edited

Loading

Uh oh!

mh0lt May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mh0lt May 11, 2026

Uh oh!

mh0lt commented May 11, 2026

Uh oh!

Uh oh!

yperbasis left a comment

Uh oh!

mh0lt commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mh0lt commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Functional fix (commits 1–5)

Race fix (commit 6)

Test plan

Known follow-ups (NOT in scope for this PR — separate parallel-exec hardening track)

Uh oh!

mh0lt commented May 10, 2026

Uh oh!

mh0lt commented May 10, 2026

Uh oh!

AskAlexSharov May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mh0lt May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mh0lt May 11, 2026

Choose a reason for hiding this comment

Uh oh!

mh0lt commented May 11, 2026

Uh oh!

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Concerns worth a look

Nits

What's solid and shouldn't change

Uh oh!

mh0lt commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mh0lt commented May 10, 2026 •

edited

Loading

AskAlexSharov May 11, 2026 •

edited

Loading