Skip to content

execution/state: exclude own-tx writes from SD revival check#21286

Merged
mh0lt merged 1 commit into
mainfrom
yperbasis/fix-exec3-parallel-tests
May 20, 2026
Merged

execution/state: exclude own-tx writes from SD revival check#21286
mh0lt merged 1 commit into
mainfrom
yperbasis/fix-exec3-parallel-tests

Conversation

@yperbasis

@yperbasis yperbasis commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

The SD revival check in both versionedRead's SD short-circuit and versionedStateReader.ReadAccountData used s.txIndex / vr.txIndex as the upper bound for versionMap.LatestTxIndex. That bound is inclusive — so when a tx re-executes (incarnation N+1), the new incarnation sees writes from incarnation N at the same TxIndex, and the revival check declares a spurious revival across any intervening SELFDESTRUCT.

Concrete repro in TestDeleteRecreateSlotsAcrossManyBlocks block 1 under EXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1 reads CodeHashPath for the CREATE2 collision check. TX3 inc 0 has already written BalancePath at (txIndex=3, inc=0). The revival check finds that write at txIndex=3 > destructTxIndex=2 (TX2's SD), declares revival, and bypasses the SD short-circuit. CodeHashPath then reads TX1's stale hash(aaCode) from the versionMap, CREATE2 fails with ContractAddressCollision, BB consumes the failed-child gas (gas remaining after CREATE2: 733 vs 2335 on success), and the per-tx gas accounting comes out exactly 1602 higher than the header.

Fix

versionMap.Read uses floor(txIdx-1), so it already excludes the current tx's own writes. LatestTxIndex's bound is inclusive — that's the inconsistency. Subtract one explicitly at the callsite: revivalLimit = txIndex - 1. Applied symmetrically in versionedRead and versionedStateReader.ReadAccountData.

Test plan

  • make lint clean.
  • Local stress: GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 -run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/500/500 pass (vs ~15% failure rate baseline).
  • No regressions on the SD/recreate + broader blockchain family at -count=2 -race: TestSelfDestructReceive, TestDeleteRecreateSlots, TestDeleteRecreateAccount, TestDeleteCreateRevert, TestCVE2020_26265, TestEIP161AccountRemoval, TestModes, plus the reorg/EIP-155/EIP-2718/EIP-1559 tests.
  • CI green on race-tests / tests-linux (execution-tests).

🤖 Generated with Claude Code

@yperbasis yperbasis requested a review from mh0lt as a code owner May 19, 2026 16:39
@yperbasis yperbasis marked this pull request as draft May 19, 2026 16:48
@yperbasis yperbasis requested a review from Copilot May 19, 2026 18:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yperbasis

Copy link
Copy Markdown
Member Author

500-iter stress: 500/500 PASS on TestDeleteRecreateSlotsAcrossManyBlocks with both commits. (GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500 ./execution/tests/ -run TestDeleteRecreateSlotsAcrossManyBlocks.)

@yperbasis yperbasis marked this pull request as ready for review May 19, 2026 18:55
@yperbasis yperbasis requested a review from taratorio May 19, 2026 18:55
@yperbasis yperbasis changed the title execution/state: fix stale-field reads on CREATE recreate after intervening SD execution/state: fix parallel-exec stale reads across SD/recreate May 19, 2026
@yperbasis yperbasis changed the title execution/state: fix parallel-exec stale reads across SD/recreate execution/state: fix parallel-exec stale reads across SELFDESTRUCT/recreate May 19, 2026
The SD revival check in both versionedRead's SD short-circuit and
versionedStateReader.ReadAccountData used s.txIndex / vr.txIndex as
the upper bound for LatestTxIndex. versionMap.LatestTxIndex treats
that bound as INCLUSIVE — so when a tx re-executes (incarnation
N+1), the new incarnation sees writes from incarnation N at the same
TxIndex, and the revival check declares a spurious revival across
any intervening SELFDESTRUCT.

Concrete repro in TestDeleteRecreateSlotsAcrossManyBlocks block 1
under EXEC3_PARALLEL=true GOMAXPROCS=2 -race: TX3 incarnation 1
reads CodeHashPath for the CREATE2 collision check. TX3 inc 0 has
already written BalancePath at (txIndex=3, inc=0). The revival
check finds that write at txIndex=3 > destructTxIndex=2 (TX2's SD),
declares revival, and bypasses the SD short-circuit. CodeHashPath
then reads TX1's stale hash(aaCode) from the versionMap, CREATE2
fails with ContractAddressCollision, BB consumes the failed-child
gas, and the per-tx gas accounting comes out exactly 1602 higher
than the header.

versionMap.Read uses floor(txIdx-1), so it already excludes the
current tx's own writes. LatestTxIndex's bound is inclusive, so the
fix is to subtract one explicitly at the callsite:
revivalLimit = txIndex - 1. Applied symmetrically in versionedRead
and the parallel-tx reader.

Test plan: GOMAXPROCS=2 EXEC3_PARALLEL=true go test -race -count=500
-run TestDeleteRecreateSlotsAcrossManyBlocks ./execution/tests/ —
500/500 pass (vs ~15% failure rate baseline). No regressions on the
SD/recreate + reorg test family at -count=2 -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis force-pushed the yperbasis/fix-exec3-parallel-tests branch from 441901a to fdc0b84 Compare May 19, 2026 21:44
@yperbasis yperbasis changed the title execution/state: fix parallel-exec stale reads across SELFDESTRUCT/recreate execution/state: exclude own-tx writes from SD revival check May 19, 2026
@mh0lt mh0lt added this pull request to the merge queue May 20, 2026
Merged via the queue into main with commit e3f36d9 May 20, 2026
69 checks passed
@mh0lt mh0lt deleted the yperbasis/fix-exec3-parallel-tests branch May 20, 2026 05:38
mh0lt added a commit that referenced this pull request May 20, 2026
b23da1a changed getStateObject to (a) seed refreshVersionedAccount
with UnknownVersion and (b) drop the synthetic accountRead AddressPath
recording, to quell validator path==AddressPath false positives behind
the SD-recreate flake fix.

Both are now redundant: the merged main carries #21286 (exclude own-tx
writes from the SD revival check), which independently covers the
+1602 / CREATE2-collision flake mode, and the validator already keys
on the precise IncarnationPath signal. They are also harmful — without
the accountRead recording the account is missing from the parallel-
built block access list, so selfdestruct/create-destruct family blocks
fail "block access list mismatch" under the parallel runner (serial
unaffected).

Restoring the original recording clears ~102 EEST bal-devnet
(Amsterdam) failures. TestDeleteRecreateSlotsAcrossManyBlocks stays
200/200 under GOMAXPROCS=2 (100 with -race) — the flake fix is carried
by #21286 and b23da1a's validator/scheduler changes.
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 20, 2026
…tives (erigontech#21294)

## Summary

`TestDeleteRecreateSlotsAcrossManyBlocks` flaked ~3–5% under
`GOMAXPROCS=2 -race` with `ERIGON_EXEC3_PARALLEL=true`, in two modes: a
+1602 gas mismatch (a CREATE2 collision burned inner gas on a stale
CodeHash read) and a runaway validator-invalid retry loop. Several
distinct validator/recording false positives contributed; this PR fixes
them in turn.

- **`getStateObject`** initialised `accountVersion` to `sdb.Version()`
(the worker's own incarnation), gating out every prior-tx sub-read
update in `refreshVersionedAccount` and pinning source/version to a
value vm never wrote. Switched to `UnknownVersion`.
- **`accountRead`** synthesized an AddressPath read using the version
that flowed through `refreshVersionedAccount` — usually a sub-read's
(Balance/Nonce/etc) version. The validator's `path==AddressPath` checks
then fired against an entry vm never wrote. Removed the synthetic
recording; sub-reads are already recorded by `versionedRead`.
- **`versionedRead` MVReadResultNone**: exclude AddressPath from the
`readStorage==nil` probe recording. `getStateObject` probes vm then
falls back to the stateReader (account does exist); the "we read
nothing" recording made the validator's path==Address branch invalidate
any pre-existing account that a prior tx in the block wrote BalancePath
for.
- **`validateRead`** wrapped in `validateReadImpl` with a `recursive`
flag. Two false-positive paths in the cross-validate recursion produce
spurious Invalid when invoked as a synthetic probe (readVal nil, source
borrowed): the path==AddressPath BalancePath-presence check, and the
`source!=MapRead+nil-readVal` tiebreaker. Both gated on `!recursive`.
- **`validateRead`** also gained an SD-staleness cross-check in the
`MVReadResultDone` branch: a later tx self-destructing without revival
means the read predates the destruct and the serial path returns zero
via the SD-zero short-circuit; `checkVersion` alone misses this because
SD doesn't write the read's own path. `revivalLimit = txIndex - 1`
excludes our own writes (validation runs post-flush).
- **Deferred queue**: on ErrDependency with no effective blocker, OR
validator-invalid, push to a new `deferred[]` queue instead of
`pending`. `scheduleExecution` drains a deferred tx only when its
predecessor has validated AND no worker at a lower index is in flight.
Lower-indexed workers' flushes land at indices visible to N's reads via
`vm.Read`'s `floor(N-1)`; higher-indexed ones don't. Non-deferred txs
keep dispatching speculatively. A safety net force-drains when pending
is empty and no workers are in flight.
- **`tooManyRetries`** helper + cap on validator-invalid retries
(mirrors the worker-error path) so a genuine loop can't spin forever.

erigontech#21286 (yperbasis) fixes the same `revivalLimit = txIndex - 1` insight
in the runtime SD short-circuit (`versionedRead` +
`versionedStateReader.ReadAccountData`); this PR fixes the
validator/recording side. The two are complementary — the runtime fix
prevents the bad read at source, this PR prevents the validator from
generating false-positive invalidations on legitimate reads.

## Test plan

- [x] `make lint` clean (run multiple times).
- [x] 200/200 pass on `TestDeleteRecreateSlotsAcrossManyBlocks` under
`GOMAXPROCS=2 ERIGON_EXEC3_PARALLEL=true go test -race` (vs ~95-97/100
baseline).
- [x] 100/100 pass on the same test rebased onto a clean `origin/main`.
- [x] No EEST regression: `cancun` blocktests (8389) + `eip4844_blobs`
race-build (6991) + the previously-affected
`test_sufficient_balance_blob_tx_pre_fund_tx` (864) all pass under the
12-worker parallel runner.
- [ ] CI green on race-tests + EEST parallel shards.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants