execution/state: exclude own-tx writes from SD revival check by yperbasis · Pull Request #21286 · erigontech/erigon

yperbasis · 2026-05-19T16:39:09Z

Summary

The SD revival check in both versionedRead's SD short-circuit and versionedStateReader.ReadAccountData used s.txIndex / vr.txIndex as the upper bound for versionMap.LatestTxIndex. That bound is inclusive — so when a tx re-executes (incarnation N+1), the new incarnation sees writes from incarnation N at the same TxIndex, and the revival check declares a spurious revival across any intervening SELFDESTRUCT.

Concrete repro in TestDeleteRecreateSlotsAcrossManyBlocks block 1 under EXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1 reads CodeHashPath for the CREATE2 collision check. TX3 inc 0 has already written BalancePath at (txIndex=3, inc=0). The revival check finds that write at txIndex=3 > destructTxIndex=2 (TX2's SD), declares revival, and bypasses the SD short-circuit. CodeHashPath then reads TX1's stale hash(aaCode) from the versionMap, CREATE2 fails with ContractAddressCollision, BB consumes the failed-child gas (gas remaining after CREATE2: 733 vs 2335 on success), and the per-tx gas accounting comes out exactly 1602 higher than the header.

Fix

versionMap.Read uses floor(txIdx-1), so it already excludes the current tx's own writes. LatestTxIndex's bound is inclusive — that's the inconsistency. Subtract one explicitly at the callsite: revivalLimit = txIndex - 1. Applied symmetrically in versionedRead and versionedStateReader.ReadAccountData.

Test plan

make lint clean.
Local stress: GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 -run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/ — 500/500 pass (vs ~15% failure rate baseline).
No regressions on the SD/recreate + broader blockchain family at -count=2 -race: TestSelfDestructReceive, TestDeleteRecreateSlots, TestDeleteRecreateAccount, TestDeleteCreateRevert, TestCVE2020_26265, TestEIP161AccountRemoval, TestModes, plus the reorg/EIP-155/EIP-2718/EIP-1559 tests.
CI green on race-tests / tests-linux (execution-tests).

🤖 Generated with Claude Code

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yperbasis · 2026-05-19T18:54:37Z

500-iter stress: 500/500 PASS on TestDeleteRecreateSlotsAcrossManyBlocks with both commits. (GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 ./execution/tests/ -run TestDeleteRecreateSlotsAcrossManyBlocks.)

The SD revival check in both versionedRead's SD short-circuit and versionedStateReader.ReadAccountData used s.txIndex / vr.txIndex as the upper bound for LatestTxIndex. versionMap.LatestTxIndex treats that bound as INCLUSIVE — so when a tx re-executes (incarnation N+1), the new incarnation sees writes from incarnation N at the same TxIndex, and the revival check declares a spurious revival across any intervening SELFDESTRUCT. Concrete repro in TestDeleteRecreateSlotsAcrossManyBlocks block 1 under EXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1 reads CodeHashPath for the CREATE2 collision check. TX3 inc 0 has already written BalancePath at (txIndex=3, inc=0). The revival check finds that write at txIndex=3 > destructTxIndex=2 (TX2's SD), declares revival, and bypasses the SD short-circuit. CodeHashPath then reads TX1's stale hash(aaCode) from the versionMap, CREATE2 fails with ContractAddressCollision, BB consumes the failed-child gas, and the per-tx gas accounting comes out exactly 1602 higher than the header. versionMap.Read uses floor(txIdx-1), so it already excludes the current tx's own writes. LatestTxIndex's bound is inclusive, so the fix is to subtract one explicitly at the callsite: revivalLimit = txIndex - 1. Applied symmetrically in versionedRead and the parallel-tx reader. Test plan: GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 -run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/ — 500/500 pass (vs ~15% failure rate baseline). No regressions on the SD/recreate + reorg test family at -count=2 -race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

b23da1a changed getStateObject to (a) seed refreshVersionedAccount with UnknownVersion and (b) drop the synthetic accountRead AddressPath recording, to quell validator path==AddressPath false positives behind the SD-recreate flake fix. Both are now redundant: the merged main carries #21286 (exclude own-tx writes from the SD revival check), which independently covers the +1602 / CREATE2-collision flake mode, and the validator already keys on the precise IncarnationPath signal. They are also harmful — without the accountRead recording the account is missing from the parallel- built block access list, so selfdestruct/create-destruct family blocks fail "block access list mismatch" under the parallel runner (serial unaffected). Restoring the original recording clears ~102 EEST bal-devnet (Amsterdam) failures. TestDeleteRecreateSlotsAcrossManyBlocks stays 200/200 under GOMAXPROCS=2 (100 with -race) — the flake fix is carried by #21286 and b23da1a's validator/scheduler changes.

…tives (erigontech#21294) ## Summary `TestDeleteRecreateSlotsAcrossManyBlocks` flaked ~3–5% under `GOMAXPROCS=2 -race` with `ERIGON_EXEC3_PARALLEL=true`, in two modes: a +1602 gas mismatch (a CREATE2 collision burned inner gas on a stale CodeHash read) and a runaway validator-invalid retry loop. Several distinct validator/recording false positives contributed; this PR fixes them in turn. - **`getStateObject`** initialised `accountVersion` to `sdb.Version()` (the worker's own incarnation), gating out every prior-tx sub-read update in `refreshVersionedAccount` and pinning source/version to a value vm never wrote. Switched to `UnknownVersion`. - **`accountRead`** synthesized an AddressPath read using the version that flowed through `refreshVersionedAccount` — usually a sub-read's (Balance/Nonce/etc) version. The validator's `path==AddressPath` checks then fired against an entry vm never wrote. Removed the synthetic recording; sub-reads are already recorded by `versionedRead`. - **`versionedRead` MVReadResultNone**: exclude AddressPath from the `readStorage==nil` probe recording. `getStateObject` probes vm then falls back to the stateReader (account does exist); the "we read nothing" recording made the validator's path==Address branch invalidate any pre-existing account that a prior tx in the block wrote BalancePath for. - **`validateRead`** wrapped in `validateReadImpl` with a `recursive` flag. Two false-positive paths in the cross-validate recursion produce spurious Invalid when invoked as a synthetic probe (readVal nil, source borrowed): the path==AddressPath BalancePath-presence check, and the `source!=MapRead+nil-readVal` tiebreaker. Both gated on `!recursive`. - **`validateRead`** also gained an SD-staleness cross-check in the `MVReadResultDone` branch: a later tx self-destructing without revival means the read predates the destruct and the serial path returns zero via the SD-zero short-circuit; `checkVersion` alone misses this because SD doesn't write the read's own path. `revivalLimit = txIndex - 1` excludes our own writes (validation runs post-flush). - **Deferred queue**: on ErrDependency with no effective blocker, OR validator-invalid, push to a new `deferred[]` queue instead of `pending`. `scheduleExecution` drains a deferred tx only when its predecessor has validated AND no worker at a lower index is in flight. Lower-indexed workers' flushes land at indices visible to N's reads via `vm.Read`'s `floor(N-1)`; higher-indexed ones don't. Non-deferred txs keep dispatching speculatively. A safety net force-drains when pending is empty and no workers are in flight. - **`tooManyRetries`** helper + cap on validator-invalid retries (mirrors the worker-error path) so a genuine loop can't spin forever. erigontech#21286 (yperbasis) fixes the same `revivalLimit = txIndex - 1` insight in the runtime SD short-circuit (`versionedRead` + `versionedStateReader.ReadAccountData`); this PR fixes the validator/recording side. The two are complementary — the runtime fix prevents the bad read at source, this PR prevents the validator from generating false-positive invalidations on legitimate reads. ## Test plan - [x] `make lint` clean (run multiple times). - [x] 200/200 pass on `TestDeleteRecreateSlotsAcrossManyBlocks` under `GOMAXPROCS=2 ERIGON_EXEC3_PARALLEL=true go test -race` (vs ~95-97/100 baseline). - [x] 100/100 pass on the same test rebased onto a clean `origin/main`. - [x] No EEST regression: `cancun` blocktests (8389) + `eip4844_blobs` race-build (6991) + the previously-affected `test_sufficient_balance_blob_tx_pre_fund_tx` (864) all pass under the 12-worker parallel runner. - [ ] CI green on race-tests + EEST parallel shards.

yperbasis requested a review from mh0lt as a code owner May 19, 2026 16:39

yperbasis marked this pull request as draft May 19, 2026 16:48

yperbasis requested a review from Copilot May 19, 2026 18:48

Copilot AI reviewed May 19, 2026

View reviewed changes

yperbasis marked this pull request as ready for review May 19, 2026 18:55

yperbasis requested a review from taratorio May 19, 2026 18:55

yperbasis changed the title ~~execution/state: fix stale-field reads on CREATE recreate after intervening SD~~ execution/state: fix parallel-exec stale reads across SD/recreate May 19, 2026

yperbasis changed the title ~~execution/state: fix parallel-exec stale reads across SD/recreate~~ execution/state: fix parallel-exec stale reads across SELFDESTRUCT/recreate May 19, 2026

yperbasis force-pushed the yperbasis/fix-exec3-parallel-tests branch from 441901a to fdc0b84 Compare May 19, 2026 21:44

yperbasis changed the title ~~execution/state: fix parallel-exec stale reads across SELFDESTRUCT/recreate~~ execution/state: exclude own-tx writes from SD revival check May 19, 2026

mh0lt mentioned this pull request May 20, 2026

execution: fix parallel-exec SD-recreate flake + validator false positives #21294

Merged

5 tasks

mh0lt approved these changes May 20, 2026

View reviewed changes

mh0lt added this pull request to the merge queue May 20, 2026

Merged via the queue into main with commit e3f36d9 May 20, 2026
69 checks passed

mh0lt deleted the yperbasis/fix-exec3-parallel-tests branch May 20, 2026 05:38

yperbasis mentioned this pull request May 20, 2026

Parallel exec: TestDeleteRecreateSlotsAcrossManyBlocks intermittently hits 'too many incarnations' cap under -race CI load #21215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution/state: exclude own-tx writes from SD revival check#21286

execution/state: exclude own-tx writes from SD revival check#21286
mh0lt merged 1 commit into
mainfrom
yperbasis/fix-exec3-parallel-tests

yperbasis commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

yperbasis commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yperbasis commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

yperbasis commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yperbasis commented May 19, 2026 •

edited

Loading