execution/state, stagedsync: parallel-exec SD-revival and metamorphic-CREATE2 fixes#21590
Conversation
…-CREATE2 fixes Bundle of correctness fixes for parallel exec around the SELFDESTRUCT revival and metamorphic-CREATE2 patterns. Validated by from-0 sync of chiado to tip (21.4M, including the regression noted below) and sepolia past block 4,913,058. execution/stagedsync/exec3_parallel.go - calcFees: nil pre-state must not short-circuit EIP-161 emptyPre when the worker has already bumped Nonce or set CodeHash; otherwise the emitted SelfDestructPath wipes those writes via normalizeWriteSet's sdSet filter. - normalizeWriteSet: detect SD-then-revival by scanning the versionMap history for any prior SelfDestructPath=true write (new AnyDoneBoolWriteEquals helper) rather than relying on the latest-value read which a revival flips back to false. Narrower than an IncarnationPath probe: pure CREATE (no prior SD=true) doesn't wipe pre-existing storage, so its same-value SSTOREs still no-op against pre-block. Earlier IncarnationPath-based form regressed chiado 7,221,794 by preserving same-value SSTOREs that canonical correctly dropped. execution/state/intra_block_state.go - getVersionedAccount: honour prior-tx SelfDestructPath cell even when CachedReaderV3 (which bypasses the versionMap) returns the pre-SD record. Includes the same-tx metamorphic SD+CREATE2 revival check (AddressPath at TxIdx >= destructTxIndex is the signal; strict > would miss it). - CreateAccount: default the synthetic IncarnationPath and BalancePath reads to (StorageRead, UnknownVersion) rather than inheriting the outer (source, version), which may carry refreshVersionedAccount's BalancePath promotion and trip the validator's recursive AddressPath check. - CreateAccount: only emit IncarnationPath on contractCreation=true. CreateAccount(false) leaves newObj.data.Incarnation at 0 and emitting it would clobber a prior tx's SD-side IncarnationPath cell. - accountRead: drop the (MapRead, V) promotion when vm.Read(AddressPath) returns None, to avoid a non-converging validator livelock. execution/state/rw_v3.go - LightCollector: only emit IncarnationPath on incarnation up-revs. A value-transfer to a same-block SD'd address leaves newObj.Incarnation at 0 while original carries the prior block's value; emitting the down-rev would clobber the SD-side IncarnationPath cell and break a same-block CREATE2's prevInc lookup. execution/state/versionedio.go - versionedStateReader.ReadAccountData: mirror IBS's same-tx metamorphic-recreation check (AddressPath at TxIdx >= destructTxIndex before falling back to the strict > subfield checks). execution/state/versionmap.go - New VersionMap.AnyDoneBoolWriteEquals helper that scans the cells for a Done write at TxIdx <= limit whose data equals target. Used to detect a prior in-block SelfDestructPath=true write that a later revival flipped back to false (a case Read alone, which is latest-only, misses). execution/state/versionedio_test.go - Four regression tests pinning the IBS / versionedStateReader behavior above: BalancePath-promoted AddressPath read, synthetic IncarnationPath stamp, prior-tx SD with stale stateReader record, and same-tx metamorphic SD+CREATE2.
Flip the EXEC3_PARALLEL env var default from false to true, making parallel execution the default for plain `./build/bin/erigon` runs. Setting EXEC3_PARALLEL=false (or ERIGON_EXEC3_PARALLEL=false) still forces the serial path. Depends on #21590 (the parallel-exec SD-revival and metamorphic-CREATE2 fixes) — that PR must merge first so the new default lands with the correctness fixes already in place. The 5-chain soak validation that backs the flip is documented in #21590's test plan.
Flip the EXEC3_PARALLEL env var default from false to true, making parallel execution the default for plain `./build/bin/erigon` runs. Setting EXEC3_PARALLEL=false (or ERIGON_EXEC3_PARALLEL=false) still forces the serial path. Depends on #21590 (the parallel-exec SD-revival and metamorphic-CREATE2 fixes) — that PR must merge first so the new default lands with the correctness fixes already in place. The 5-chain soak validation that backs the flip is documented in #21590's test plan.
yperbasis
left a comment
There was a problem hiding this comment.
Approving. Findings to address (follow-ups unless noted):
-
TestGetVersionedAccount_PriorTxSelfDestruct_ReturnsNilpasses onmain— theNewVersionedStateReaderwrapper's pre-existing SD check returns nil before the new gate is ever load-bearing, so the fix isn't pinned by any test. Hand the IBS the rawaccountStateReader(theCachedReaderV3analogue from the gate's comment); I verified that variant is red on main / green here. -
Question on the
>=revival check: the only production writer I can find that placesAddressPath@Nnext toSelfDestructPath=true@NiscalcFees' tip-credit sibling emission (workerTxOutfilters AddressPath out whenever final SD=true). For a coinbase contract that self-destructs without recreate, serial burns the tip (EIP-7708 case 2, perrunPostApplyMessageOnMinIBS), while the>=makes tx N+1'scalcFeessee the coinbase as revived withtip_Nand emittip_N + tip_{N+1}, carrying the pre-SD nonce/codeHash along. If there's no guard I missed, a targeted re-exec of the mainnet 25151825 window would settle it. Related: which flush other than calcFees can produce the vm state hand-built inTestGetVersionedAccount_SameTxMetamorphicRecreate_ReturnsAccount(SD=true + AddressPath at one TxIdx with the account ending alive)? -
The
LightCollector.UpdateAccountDatachange looks inert:CollectorWritesare captured and fee-adjusted but never flushed to the versionMap or applied to domains (applyResult.writescomes fromnormalizeWriteSet(blockIO.WriteSet);resolveStorageWrites/filterWritesByVersionMaphave only test callers). The head commit's rationale — "LightCollector's strict-greater gate already suppresses the down-rev on the apply side" — then doesn't hold: the down-revIncarnationPath=0cell fromCreateAccount(addr, false)reaches the vm via TxOut at HEAD, same as main. The revert stands onTestDeleteRecreateAccountalone; please fix the narrative, and the PR body still lists the reverted contractCreation gate as a live fix. -
The from-0 soaks predate the head commit, which changed IncarnationPath emission in exactly the SD-revival scenarios targeted here; only unit/EEST suites ran after the revert. Cheap ask: re-exec the chiado 7,221,794 and sepolia 4,913,058 windows at HEAD.
-
Residual sharp edges worth tracking issues: (a)
versionedRead's per-path revival is still strict->(versionedio.go:874) — probinggetVersionedAccountwith aCodeHashPathcell at the destruct TxIdx returns a zeroed CodeHash despite the cell holding the new hash, and the new>=gate now routes such reads into that merge; (b) theaccountReaddowngrade leaves a narrow validation blind spot — Done-path reads withcopyV==nilare never recorded, so the promoted sub-field dependency's only carrier was the AddressPath stamp; after the downgrade, a prior tx re-executing with a different balance while dropping its AddressPath write validates as None+StorageRead with no value check. Recording the promoted read at its own sub-path would close it. -
Nits: the 4-probe revival block is now duplicated (intra_block_state.go:1070, versionedio.go:289) and must stay in lockstep — extract a
VersionMaphelper;AnyDoneBoolWriteEqualscould be named for what it detects (e.g.HadSelfDestruct); several new comments run 3-6 sentences with scenario walk-throughs — per the repo comment guidelines the forensic detail belongs in the commit message.
| // Don't inherit outer (source, version): it may carry | ||
| // refreshVersionedAccount's BalancePath promotion (MapRead, V_bal), | ||
| // which stamps IncarnationPath with a cell vm never wrote and trips | ||
| // the validator's recursive AddressPath check. | ||
| incSource, incVersion := StorageRead, UnknownVersion | ||
| if sdb.versionMap != nil { |
There was a problem hiding this comment.
Pre-existing on this PR's earlier commits and reverted in f33af66 — gating IncarnationPath emit by contractCreation is exactly what TestDeleteRecreateAccount went red on (pre-Cancun SD + value-transfer revival needs the incarnation reset). See the revert commit message for the full rationale.
…IncarnationPath The gate was added to avoid clobbering a prior tx's SD-side IncarnationPath cell when a value-transfer revival writes newObj.data.Incarnation=0 — meant to protect the metamorphic SD+CREATE2 case. Reverting because TestDeleteRecreateAccount goes red: pre-Cancun SD followed by a value-transfer revival expects IncarnationPath to reset on the revival, and skipping the write leaves the SD's stale incarnation in the versionMap. The trie root then diverges from canonical at the recreate block. (Earlier rationale on this revert claimed LightCollector's strict-greater incarnation gate already suppresses the down-rev on the apply side. That's not true: CollectorWrites are captured and fee-adjusted but never flushed to the versionMap or applied to domains, so that gate is currently inert. The revert stands on TestDeleteRecreateAccount alone.)
…ct_ReturnsNil Hand the IBS the raw accountStateReader (the CachedReaderV3 analogue from the gate's comment) instead of wrapping it in NewVersionedStateReader. The wrapper has a pre-existing SD-revival check that short-circuits before getVersionedAccount's new gate is ever load-bearing, so the test passed on main too and didn't actually pin the fix. Verified: red on origin/main, green on this branch.
9d34aab to
e0f3b16
Compare
|
Thanks for the review! Addressed in force-push (commits f33af66 + e0f3b16):
Copilot inline reply: also addressed (it was suggesting the exact gate we reverted). |
…o typed-vio origin/main moved 21 commits forward since the last sync; #21590 (parallel-exec SD-revival and metamorphic-CREATE2 fixes, da0a479) touches the same files the typed-vio L2 refactor does and produced 5 conflicts. Resolutions: - execution/state/intra_block_state.go::CreateAccount — drop the `(source, version)` inheritance for synthetic IncarnationPath/BalancePath reads (each defaults to (StorageRead, UnknownVersion), only promoted to MapRead when the path's own versionMap cell exists). Gate the recordWriteIncarnation on contractCreation=true. Use typed ReadIncarnation / ReadBalance helpers instead of generic Read. - execution/state/intra_block_state.go::accountRead — demote a sub-field MapRead promotion to (StorageRead, UnknownVersion) when AddressPath has no cell, via typed ReadAddress. - execution/state/intra_block_state.go::getVersionedAccount — already had the SD-revival check (>=destructTxIndex for AddressPath, > for sub-fields). - execution/state/rw_v3.go::LightCollector.UpdateAccountData — emit IncarnationPath only on up-revs (`>` not `!=`) so a value-transfer to a same-block SD'd address can't clobber the SD-side cell. Retained typed &VersionedWrite[uint64]{...} form. - execution/state/versionedio.go::ReadAccountData — same-tx metamorphic recreation check (AddressPath at >=destructTxIndex first, sub-field > fallbacks) using revivalLimit = txIndex - 1. - execution/state/versionedio.go::addStorageUpdate — added the EIP-7928 no-op filter (skip when value equals slot's last recorded write); kept the free-function signature. - execution/state/versionmap.go — added AnyDoneBoolWriteEquals (scoped to SelfDestructPath given the typed-vio per-path WriteCell[T] layout). - execution/stagedsync/exec3_parallel.go::normalizeWriteSet — added the SD-then-revival history-scan branch (latest SelfDestructPath may be false after revival; AnyDoneBoolWriteEquals catches the prior true). - execution/state/versionedio_test.go — ported the 4 #21590 regression tests (TestAccountRead_BalancePathPromotion / TestCreateAccount_ SyntheticIncarnationStamp / TestGetVersionedAccount_PriorTxSelfDestruct / TestGetVersionedAccount_SameTxMetamorphicRecreate) and the 2 storage no-op/RM-DEP-fallthrough tests, re-typed to VersionedWrite[T]/ VersionedRead[T] and per-path Set helpers. Not ported: #21590's calcFees coinbase-emit fix (EIP-161 empty-removal of zero-balance coinbase). vio-on-main's calcFees uses the typed CollectorWrites machinery and a different flow; the fix would need a separate semantic merge. This affects coinbase emptiness on EIP-161 forks but does not affect the SD/CREATE2 reincarnation class the test_recreate eest-race flake exercises. Tracked as a follow-up. Also addresses sudeepdino008's review comment: removed the SetMaxCollationTxNum shim in db/state/aggregator_vio_shim.go and its one call site in execution/stagedsync/exec3.go (the upstream method no longer exists on main). Verification: - make erigon integration clean - make lint clean (2 runs) - make test-short clean (including the 4 ported regression tests)
…oval to typed finalizeTx Completes the semantic merge deferred in 6f1aa5a. vio-on-main's finalizeTx predicate was `coinbaseChanged := newBal != oldBal`; #21590 extends that to also fire on EIP-161 empty-removal (touch a zero-balance coinbase so the commitment calculator emits the empty-account delete, mirroring serial's AddBalance(coinbase, 0) → TouchAccount → SD path). Layered onto the typed-vio APIs: - Pre-scan result.TxOut for NoncePath / CodeHashPath worker writes to the coinbase address via typed Header() / ValAny() accessors, so the EIP-161 emptiness check sees the worker's post-write coinbase state (a sender== coinbase tx that bumped Nonce isn't mistakenly treated as empty when FeeTipped==0). - coinbaseEmptyPre = pre-tx balance zero & coinbaseNonce 0 & code empty & no worker code-hash write; coinbaseEmptyRemoval = emptyRemoval & coinbaseEmptyPre & newBalance zero. - emitCoinbase = newBal != oldBal || coinbaseEmptyRemoval. - When coinbaseEmptyRemoval fires: emit &VersionedWrite[bool]{SelfDestructPath, true}. Otherwise: emit BalancePath plus the AddressPath sibling write (mirrors serial's implicit account creation on first credit), as &VersionedWrite[*accounts.Account]{AddressPath, addrAcc}. Verification: make erigon clean, make lint clean (2 runs), make test-short -count=1 on execution/state/... + execution/stagedsync/... clean.
Closes erigontech#17630 ## Summary Flip the `EXEC3_PARALLEL` env var default from `false` to `true`, so a plain `./build/bin/erigon` run uses parallel execution by default. `EXEC3_PARALLEL=false` (or `ERIGON_EXEC3_PARALLEL=false`) still forces the serial path. One-line change in `common/dbg/experiments.go`. ## Test plan - [x] `make erigon` clean. - [x] `make lint` clean. - [ ] CI on this branch. ## Note on dependency Backed by the soak validation done in erigontech#21590 (chiado from-0 to tip, sepolia from-0 past 4,913,058, hoodi/mainnet clean — see that PR's test plan). erigontech#21590 should land first so the new default ships with the correctness fixes already in place. If erigontech#21591 merges first by accident, users hitting the SD-revival / metamorphic-CREATE2 patterns in parallel exec would regress until erigontech#21590 lands. A separate gnosis race-class bug (4 occurrences in ~300K blocks around 14.59M-14.89M, recurring + non-deterministic) surfaced during the same soak — it is **not** addressed by either PR and is tracked in [docs/plans/20260602-gnosis-parallel-exec-race-14594499.md](https://github.com/erigontech/erigon/blob/main/docs/plans/20260602-gnosis-parallel-exec-race-14594499.md). Flipping the default doesn't make it worse: users who hit it can always set `EXEC3_PARALLEL=false`. Co-authored-by: Andrew Ashikhmin <34320705+yperbasis@users.noreply.github.com>
Brings origin/main (through the latest tip) into the typed-vio branch (#21536). Resolution: - committer.go: dropped the BAL-ahead-fold (foldedAhead/maybeFoldAhead/ foldBlockFromBAL/shadowCrossCheck) — it was introduced inside the typed-vio commit and is not on main. Took main's committer.go (the #21659 changeset-window: perBlockFrom/computeTransition) and removed the matching wiring from exec3.go / exec3_parallel.go (blockRequest channel + type, the unused calcMode enum). - execution/state: kept the typed per-path versioned reads/writes; applied main's #21667 (accept Dependency/Estimate cells), #21590 (SD-revival), #21659 (per-batch changesets), #21654 (phantom accesses). The storage net-zero read-value filter is retained in updateWrite, with new guard tests. Verified: build, make lint, execution/{state,exec,stagedsync,commitment} unit tests, and eest-spec parallel blocktests (devnet 82941/0, stable 69256/0).
Closes #20711.
Summary
Correctness fixes for parallel execution around the SELFDESTRUCT-revival and metamorphic-CREATE2 patterns. Surfaced as a chiado from-0 wrong-trie-root and a sepolia tx-4 missed
CallNewAccountGas(25,000); both classes resolved by this PR.Interrelated sub-fixes:
execution/stagedsync/exec3_parallel.go—calcFees: nil pre-state must not short-circuit EIP-161 emptyPre when the worker has already bumped Nonce or set CodeHash, otherwise the emittedSelfDestructPath=truewipes those writes vianormalizeWriteSet's sdSet filter.execution/stagedsync/exec3_parallel.go—normalizeWriteSet: detect SD-then-revival by scanning the versionMap history for any priorSelfDestructPath=truewrite (newAnyDoneBoolWriteEqualshelper) rather than relying on the latest-value read, which a revival flips back to false. Narrower than anIncarnationPathprobe: pure CREATE (no prior SD=true) doesn't wipe pre-existing storage, so its same-value SSTOREs still no-op against pre-block.execution/state/intra_block_state.go—getVersionedAccount: honour prior-txSelfDestructPathcell even whenCachedReaderV3(which bypasses the versionMap) returns the pre-SD record. Includes the same-tx metamorphic SD+CREATE2 revival check (AddressPathat TxIdx>=destructTxIndex is the signal; strict>misses it).execution/state/intra_block_state.go—CreateAccount: default the syntheticIncarnationPath/BalancePathreads to(StorageRead, UnknownVersion)rather than inheriting the outer (source, version) which may carryrefreshVersionedAccount'sBalancePathpromotion and trip the validator.execution/state/intra_block_state.go—accountRead: drop the(MapRead, V)promotion whenvm.Read(AddressPath)returns None, to avoid a non-converging validator livelock.execution/state/versionedio.go—versionedStateReader.ReadAccountData: mirror IBS's same-tx metamorphic-recreation check (AddressPathat TxIdx>=destructTxIndex before falling back to strict>subfield checks).execution/state/versionmap.go: newVersionMap.AnyDoneBoolWriteEqualshelper scans cells for aDonewrite at TxIdx<=limit whose data equals target. Used by the narrowednormalizeWriteSetfilter above.Test plan
execution/state/versionedio_test.go:TestAccountRead_BalancePathPromotion_DoesNotInvalidate,TestCreateAccount_SyntheticIncarnationStamp_DoesNotInvalidate,TestGetVersionedAccount_PriorTxSelfDestruct_ReturnsNil(strengthened to use the rawaccountStateReaderso the IBS gate is actually load-bearing — red onorigin/main, green here),TestGetVersionedAccount_SameTxMetamorphicRecreate_ReturnsAccount. All pass.EXEC3_PARALLEL=true,--prune.mode=archive. Passed the original parallel_exec: chiado exec from 0 trie-root missmatch #20711 regression block (7,221,794) cleanly and reached tip.0x140da4236…metamorphic-deploy pattern at 4,913,056-57 was the original sepolia regression motivating this work.)make lintclean.Follow-ups
>=revival check vscalcFeescoinbase tip-credit: for a coinbase contract that self-destructs without recreate,calcFeesemitsAddressPath@NalongsideSelfDestructPath=true@N. The>=revival check ingetVersionedAccountcould then surface that as a "revived" account in tx N+1, causing tx N+1'scalcFeesto emittip_N + tip_{N+1}carrying pre-SD nonce/codeHash — while serial burnstip_Nvia EIP-7708 case 2. Mainnet block 25,151,825 tx 31 (the MEV pattern cited in thecalcFeescomment) is the target for a re-exec confirmation. To be addressed as a follow-up; the discriminator is likely "AddressPath revival only counts when paired with a CREATE-signal (non-empty CodeHash or bumped Incarnation), not a barecalcFeestip-credit emission".