execution/state: exclude own-tx writes from SD revival check#21286
Merged
Conversation
Contributor
There was a problem hiding this comment.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Member
Author
|
500-iter stress: 500/500 PASS on |
The SD revival check in both versionedRead's SD short-circuit and versionedStateReader.ReadAccountData used s.txIndex / vr.txIndex as the upper bound for LatestTxIndex. versionMap.LatestTxIndex treats that bound as INCLUSIVE — so when a tx re-executes (incarnation N+1), the new incarnation sees writes from incarnation N at the same TxIndex, and the revival check declares a spurious revival across any intervening SELFDESTRUCT. Concrete repro in TestDeleteRecreateSlotsAcrossManyBlocks block 1 under EXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1 reads CodeHashPath for the CREATE2 collision check. TX3 inc 0 has already written BalancePath at (txIndex=3, inc=0). The revival check finds that write at txIndex=3 > destructTxIndex=2 (TX2's SD), declares revival, and bypasses the SD short-circuit. CodeHashPath then reads TX1's stale hash(aaCode) from the versionMap, CREATE2 fails with ContractAddressCollision, BB consumes the failed-child gas, and the per-tx gas accounting comes out exactly 1602 higher than the header. versionMap.Read uses floor(txIdx-1), so it already excludes the current tx's own writes. LatestTxIndex's bound is inclusive, so the fix is to subtract one explicitly at the callsite: revivalLimit = txIndex - 1. Applied symmetrically in versionedRead and the parallel-tx reader. Test plan: GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 -run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/ — 500/500 pass (vs ~15% failure rate baseline). No regressions on the SD/recreate + reorg test family at -count=2 -race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
441901a to
fdc0b84
Compare
5 tasks
mh0lt
approved these changes
May 20, 2026
mh0lt
added a commit
that referenced
this pull request
May 20, 2026
b23da1a changed getStateObject to (a) seed refreshVersionedAccount with UnknownVersion and (b) drop the synthetic accountRead AddressPath recording, to quell validator path==AddressPath false positives behind the SD-recreate flake fix. Both are now redundant: the merged main carries #21286 (exclude own-tx writes from the SD revival check), which independently covers the +1602 / CREATE2-collision flake mode, and the validator already keys on the precise IncarnationPath signal. They are also harmful — without the accountRead recording the account is missing from the parallel- built block access list, so selfdestruct/create-destruct family blocks fail "block access list mismatch" under the parallel runner (serial unaffected). Restoring the original recording clears ~102 EEST bal-devnet (Amsterdam) failures. TestDeleteRecreateSlotsAcrossManyBlocks stays 200/200 under GOMAXPROCS=2 (100 with -race) — the flake fix is carried by #21286 and b23da1a's validator/scheduler changes.
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
May 20, 2026
…tives (erigontech#21294) ## Summary `TestDeleteRecreateSlotsAcrossManyBlocks` flaked ~3–5% under `GOMAXPROCS=2 -race` with `ERIGON_EXEC3_PARALLEL=true`, in two modes: a +1602 gas mismatch (a CREATE2 collision burned inner gas on a stale CodeHash read) and a runaway validator-invalid retry loop. Several distinct validator/recording false positives contributed; this PR fixes them in turn. - **`getStateObject`** initialised `accountVersion` to `sdb.Version()` (the worker's own incarnation), gating out every prior-tx sub-read update in `refreshVersionedAccount` and pinning source/version to a value vm never wrote. Switched to `UnknownVersion`. - **`accountRead`** synthesized an AddressPath read using the version that flowed through `refreshVersionedAccount` — usually a sub-read's (Balance/Nonce/etc) version. The validator's `path==AddressPath` checks then fired against an entry vm never wrote. Removed the synthetic recording; sub-reads are already recorded by `versionedRead`. - **`versionedRead` MVReadResultNone**: exclude AddressPath from the `readStorage==nil` probe recording. `getStateObject` probes vm then falls back to the stateReader (account does exist); the "we read nothing" recording made the validator's path==Address branch invalidate any pre-existing account that a prior tx in the block wrote BalancePath for. - **`validateRead`** wrapped in `validateReadImpl` with a `recursive` flag. Two false-positive paths in the cross-validate recursion produce spurious Invalid when invoked as a synthetic probe (readVal nil, source borrowed): the path==AddressPath BalancePath-presence check, and the `source!=MapRead+nil-readVal` tiebreaker. Both gated on `!recursive`. - **`validateRead`** also gained an SD-staleness cross-check in the `MVReadResultDone` branch: a later tx self-destructing without revival means the read predates the destruct and the serial path returns zero via the SD-zero short-circuit; `checkVersion` alone misses this because SD doesn't write the read's own path. `revivalLimit = txIndex - 1` excludes our own writes (validation runs post-flush). - **Deferred queue**: on ErrDependency with no effective blocker, OR validator-invalid, push to a new `deferred[]` queue instead of `pending`. `scheduleExecution` drains a deferred tx only when its predecessor has validated AND no worker at a lower index is in flight. Lower-indexed workers' flushes land at indices visible to N's reads via `vm.Read`'s `floor(N-1)`; higher-indexed ones don't. Non-deferred txs keep dispatching speculatively. A safety net force-drains when pending is empty and no workers are in flight. - **`tooManyRetries`** helper + cap on validator-invalid retries (mirrors the worker-error path) so a genuine loop can't spin forever. erigontech#21286 (yperbasis) fixes the same `revivalLimit = txIndex - 1` insight in the runtime SD short-circuit (`versionedRead` + `versionedStateReader.ReadAccountData`); this PR fixes the validator/recording side. The two are complementary — the runtime fix prevents the bad read at source, this PR prevents the validator from generating false-positive invalidations on legitimate reads. ## Test plan - [x] `make lint` clean (run multiple times). - [x] 200/200 pass on `TestDeleteRecreateSlotsAcrossManyBlocks` under `GOMAXPROCS=2 ERIGON_EXEC3_PARALLEL=true go test -race` (vs ~95-97/100 baseline). - [x] 100/100 pass on the same test rebased onto a clean `origin/main`. - [x] No EEST regression: `cancun` blocktests (8389) + `eip4844_blobs` race-build (6991) + the previously-affected `test_sufficient_balance_blob_tx_pre_fund_tx` (864) all pass under the 12-worker parallel runner. - [ ] CI green on race-tests + EEST parallel shards.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The SD revival check in both
versionedRead's SD short-circuit andversionedStateReader.ReadAccountDatauseds.txIndex/vr.txIndexas the upper bound forversionMap.LatestTxIndex. That bound is inclusive — so when a tx re-executes (incarnationN+1), the new incarnation sees writes from incarnationNat the sameTxIndex, and the revival check declares a spurious revival across any interveningSELFDESTRUCT.Concrete repro in
TestDeleteRecreateSlotsAcrossManyBlocksblock 1 underEXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1 readsCodeHashPathfor the CREATE2 collision check. TX3 inc 0 has already writtenBalancePathat(txIndex=3, inc=0). The revival check finds that write attxIndex=3 > destructTxIndex=2(TX2's SD), declares revival, and bypasses the SD short-circuit.CodeHashPaththen reads TX1's stalehash(aaCode)from theversionMap, CREATE2 fails withContractAddressCollision, BB consumes the failed-child gas (gas remaining after CREATE2: 733 vs 2335 on success), and the per-tx gas accounting comes out exactly 1602 higher than the header.Fix
versionMap.Readusesfloor(txIdx-1), so it already excludes the current tx's own writes.LatestTxIndex's bound is inclusive — that's the inconsistency. Subtract one explicitly at the callsite:revivalLimit = txIndex - 1. Applied symmetrically inversionedReadandversionedStateReader.ReadAccountData.Test plan
make lintclean.GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 -run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/— 500/500 pass (vs ~15% failure rate baseline).-count=2 -race:TestSelfDestructReceive,TestDeleteRecreateSlots,TestDeleteRecreateAccount,TestDeleteCreateRevert,TestCVE2020_26265,TestEIP161AccountRemoval,TestModes, plus the reorg/EIP-155/EIP-2718/EIP-1559 tests.race-tests / tests-linux (execution-tests).🤖 Generated with Claude Code