execution/stagedsync: recover CodePath alongside CodeHashPath in parallel normalizeWriteSet#21706
execution/stagedsync: recover CodePath alongside CodeHashPath in parallel normalizeWriteSet#21706mh0lt wants to merge 4 commits into
Conversation
…WriteSet
normalizeWriteSet recovered an account's CodeHashPath from the versionMap (via
the CodeHashPath case and the fill-missing-fields loop) but had no equivalent
recovery for CodePath: the CodePath case kept the write only at the validated
incarnation, and the fill loop never emitted code. A tx whose validated
writeset lacked a fresh CodePath — e.g. an EIP-7702 delegating tx that
re-executes, where SetCode short-circuits because so.Code() already returns the
designator written by the prior incarnation (bytes.Equal(prevcode, code)) —
therefore persisted a non-empty codeHash with no code bytes. A later block then
read empty code for the delegated account, and the EIP-3607 sender check
wrongly rejected the 7702 sender ("sender not an eoa").
Recover the code this tx wrote from the versionMap (incarnation-agnostic,
scoped to this tx so a merely-touched contract's prior-tx code is not
re-emitted) whenever an account has a non-empty codeHash but no code in the
normalized output, mirroring CodeHashPath. Code can no longer be lost while its
hash survives.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The earlier recovery only re-emitted code found in THIS tx's versionMap (rr.Version().TxIndex == txIndex). On the real failure path that guard misses: a re-executing 7702 delegation whose code equals the already- committed designator makes IBS.SetCode short-circuit (bytes.Equal), so the validated incarnation writes no CodePath and the prior incarnation's versionMap entry is invalidated on re-exec — the versionMap holds nothing for this tx. The fill-missing loop still fills CodeHashPath from committed state, so the account persists a codeHash with no code; a later 7702 sender then reads empty code and is wrongly rejected "sender not an eoa" (observed re-executing mainnet blocks 25277235 / 25279079 / 25280960). Recover the designator from the versionMap, else fall back to the post-state via stateReader.ReadAccountCode (mirroring how CodeHashPath is recovered). Gate emission on types.ParseDelegation so only 7702 designators are re-emitted — never ordinary unchanged contract code for a touched contract (no write amplification, no callee-code misattribution). This prevents the drop during forward execution. It cannot repair state already collated into immutable snapshots with codeHash-but-no-code; that needs a snapshot unwind (separate, in development). Adds TestNormalizeWriteSet_CodePathRecoveredFromStateReader for the short-circuit/stateReader path; the existing versionMap-path test stays. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Update: on-node reproduction + strengthened fixRunning the perf-stack binary on a cold mainnet minimal resync reproduced this bug on re-execution — Root cause of the gap: the recovery only re-emitted code present in this tx's versionMap ( Strengthened fix ( Scope: this prevents the drop during forward execution. It cannot repair state already collated into immutable snapshots with codeHash-but-no-code — that requires a snapshot unwind (separate work, in development). Full forward validation therefore needs a fresh resync with this fix in place (the existing corrupt datadir can't validate it). |
There was a problem hiding this comment.
Pull request overview
Fixes an intermittent parallel-execution consistency gap in normalizeWriteSet where an address can end up with a non-empty CodeHashPath persisted without its companion CodePath, causing EIP-7702 delegated EOAs to be misclassified (surfacing as false EIP-3607 “sender not an eoa” rejections).
Changes:
- Add a recovery pass in
normalizeWriteSetto re-emit missingCodePathwhen a non-emptyCodeHashPathis present (and the code parses as an EIP-7702 delegation designator). - Add unit tests covering both version-map recovery and
stateReaderfallback recovery scenarios for the dropped-CodePathcase.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
execution/stagedsync/exec3_parallel.go |
Adds CodePath recovery logic to keep delegation designator code aligned with its emitted CodeHashPath during parallel normalize. |
execution/stagedsync/exec3_finalize_test.go |
Adds regression tests ensuring CodePath is recovered alongside CodeHashPath for EIP-7702 delegation scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if _, ok := types.ParseDelegation(code); !ok { | ||
| continue | ||
| } |
There was a problem hiding this comment.
if err != nil \|\| accounts.InternCodeHash(hh) != h {
this pattern looks very weird. don't know what it means
There was a problem hiding this comment.
Fixed by 953389b.
emit now verifies the recovered code against the emitted hash before re-emitting — crypto.Keccak256Hash(code) != want.Value() → skip — on both the versionMap and stateReader recovery paths, so a CodePath is only re-added when the code genuinely hashes to the CodeHashPath it is recovered for. Recovery can therefore never persist an account whose code disagrees with its codeHash.
Test: TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch (constructs a candidate whose keccak ≠ the recovered hash and asserts no CodePath is emitted).
There was a problem hiding this comment.
The shipped check is a direct value comparison — no intern:
if crypto.Keccak256Hash(code) != want.Value() { return false }This leans on the trade-off unique is built around: Make pays a lookup once (and at most one canonical allocation per distinct value, shared thereafter) so that every read and comparison afterward is cheap and allocation-free. CodeHash is a unique.Handle[common.Hash], so a handle compare is a word compare and .Value() is an allocation-free read of the canonical value — no Make, no map lookup.
Escape analysis confirms it (go build -gcflags=-m):
exec3_parallel.go: inlining call to accounts.CodeHash.Value
exec3_parallel.go: inlining call to unique.Handle[go.shape.[32]uint8].Value
.Value() (and the underlying unique.Handle.Value) inline to a direct read — no escapes to heap / moved to heap note anywhere in the closure, and crypto.Keccak256Hash(code) is a stack [32]byte that's compared and discarded.
So InternCodeHash(hh) != h is the wrong side of the trade here: it re-pays the Make lookup to turn an already-cheap value compare into a handle compare — useful when you'll store and reuse the handle, pure overhead for a one-shot check. Reading .Value() and comparing is the intended path, and it allocates nothing.
yperbasis
left a comment
There was a problem hiding this comment.
Requesting changes on three points (plus a style one):
-
Verify recovered code against the emitted hash before re-emitting. The hash and the code can come from different sources (
vm.Readat ceilingtxIndex+1can return another tx's write; the stateReader current-view races the publish loop's apply progress). On disagreement this persists code that doesn't match the persisted codeHash — and CodeDomain isn't covered by the state root, so a wrong designator target would silently misdirect later execution, worse than today's loudsender not an eoa. A keccak over ≤23 bytes, skipping emission on mismatch, closes this. (Same gap as Copilot's inline comment.) -
Narrow the trigger. The fill-missing loop emits
CodeHashPathfor every touched account, so the recovery runs for every touched non-empty-code account on the serialized result-processing path: a fullstateReader.ReadAccountCodeper storage-touched contract per tx just to discard viaParseDelegation, and a redundantCodePathre-emit for every tx touching a delegated EOA (deduped at DomainPut, but still TouchKey + GetLatest + writeLog per touch — constant on post-Pectra mainnet). Suggest gating recovery on the address having aCodePath/CodeHashPathentry (any incarnation) in the raw writeset: a drop requires this tx to have written code; when the hash comes purely from the fill loop, the code is already committed in CodeDomain (or unrecoverable anyway). With that, recovery should ~never fire in production — so also add a log when it does; a silent safety net hides the upstream scheduler bug it catches (the[codepath-recovery]log mentioned in the PR comments isn't in this diff). -
Correct the mechanism claims in the comments/test docstrings. The premise of
TestNormalizeWriteSet_CodePathTravelsWithCodeHash— "blockIO.WriteSet retains both incarnations' entries (versionMap doesn't clear old)" — is contradicted by the pipeline:RecordWritesreplaces the per-tx write set (versionedio.go:1321) and re-exec shrinkage deletes dropped paths from the versionMap (exec3_parallel.go:2331); IBS emits CodePath/CodeHashPath/CodeSizePath atomically. The "short-circuit against an already-committed designator" path also produces no corruption — the committed designator bytes are already in CodeDomain, which is why the narrowed gate loses nothing. The tests are fine as robustness tests of normalizeWriteSet's contract against arbitrary inputs, but the docstrings shouldn't present a mechanism the scheduler prevents. If the trigger is narrowed per point 2,TestNormalizeWriteSet_CodePathRecoveredFromStateReadershould assert no emission instead. -
Comment style: the new multi-paragraph block comments are well over the repo's comment limits — condense to one or two sentences stating the invariant; the scenario forensics belong in the PR description.
…perf-statecache-lru-pr Reconcile #21386 onto the cleaned commitment-cache model from #21380. Resolution: - branch_cache.go (+test), temporal_mem_batch.go, kv_interface.go -> take #21380's cleaned model: reduced BranchCache API, txN-watermark UnwindTo, FlushOption pattern, flush-callback-after-MDBX-write. - preload*.go, trunk_pin_test.go -> deleted (trunk-pin extracted to mh/branch-cache-trunk-pin). - domain_shared.go -> combine: keep #21386's StateCache refresh, keccak code-cache fix, codeHash->code read bypass, per-worker kvmetrics (wm) and the collector/reqMetrics fields; adopt #21380's cleaned FlushOption multi-domain callback (cb-after-MDBX-write), MeteredGetterWithTxN watermark + txN=0 skip; drop the extracted adaptive-pin controller + PublishMetrics. Notable behavioural deltas (flagged for review): - BranchCache unwind: #21386's epoch/unwindFloor model -> #21380's txN-watermark UnwindTo (same effect: evict entries above the unwind point). - StateCache flush entries now stamped with sd.txNum (batch high-water) as the unwind watermark: the cleaned WithFlushCallback exposes step, not per-key txN; sd.txNum is a safe conservative upper bound. - temporal_mem_batch.go DomainMetrics refs retargeted changeset -> kvmetrics. Carries: keccak codeHash fix, #21706 CodePath recovery, stateCache. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… not just 7702 The CodePath recovery in normalizeWriteSet was bounded to EIP-7702 designators (types.ParseDelegation), so an ordinary CREATE/CREATE2 contract whose validated incarnation's SetCode short-circuited (re-execution: so.Code() already returns the prior incarnation's bytes) had its stale-incarnation CodePath dropped by the incarnation filter while CodeHashPath survived — persisting a non-empty codeHash with no code (codeHash-no-code). A later block that CALLs the contract then runs it as empty and diverges: mainnet 25291004 deploys a SafeProxyFactory CREATE2 proxy whose code is dropped; block 25297789 calls it, the call reverts on the missing code, and the block fails with a gas mismatch (−346,536 gas) — a deterministic wedge under both parallel and serial re-exec. Generalize the recovery: when the versionMap holds CodePath for the address (the code was written in THIS block — a deploy, code change, or 7702 designator), recover it unconditionally; it is always genuine in-block code, never an unchanged contract's bytes. Keep the ParseDelegation gate only on the stateReader fallback (versionMap miss), since stateReader also returns an unchanged contract's existing code and re-emitting that for every touched contract would be write amplification. Forward-prevention only — cannot repair codeHash-no-code already collated into immutable snapshot files (that needs a snapshot unwind). Adds TestNormalizeWriteSet_CodePathRecoveredForCreatedContract (ordinary bytecode, versionMap-hit path; fails under the old 7702-only gate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…comments Addresses review on #21706: - Copilot: the CodePath recovery now validates that the recovered code hashes to the CodeHashPath it is recovered for; a mismatch is skipped rather than emitted, so recovery can never write code that disagrees with its hash. - AskAlexSharov: drop the cryptic InternCodeHash(hh) != h comparison — compare crypto.Keccak256Hash(code) against the codeHash's raw Value() (no interning, no heap escape). - Comment policy: trim the recovery comment to the invariant + gating rationale; the forward-only limitation and the mainnet reproduction move to the PR body. New test TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch pins the reject-on-mismatch guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Recover
CodePathalongsideCodeHashPathin the parallel executor'snormalizeWriteSet, so an account can never be persisted with a non-emptycodeHashbut no code bytes (codeHash-no-code).Root cause
On re-execution,
SetCodeshort-circuits (so.Code()already returns the priorincarnation's bytes,
bytes.Equal), so the validated incarnation emits no freshCodePath.normalizeWriteSetkeepsCodeHashPath(resolved from the versionmap / filled from committed state) but the incarnation filter drops the stale
CodePath— leaving a codeHash with no code. A later block that CALLs thecontract executes it as empty and diverges (wrong gas → wrong root). Two ways the
fresh
CodePathgoes missing:deployed via the SafeProxyFactory at block 25291004, CALLed and reverted
6785 blocks later at 25297789.
"sender not an eoa".
Fix
After the fill-missing pass in
normalizeWriteSet: for any address with anon-empty
CodeHashPathin the output but noCodePath(and notself-destructed), recover the code so code always travels with its hash —
contract (genuine in-block code).
stateReader, bounded to7702 designators (
stateReaderalso returns an unchanged contract's existingcode, and re-emitting that for every touched contract is write amplification with
no correctness benefit; an unchanged contract's code is already in
CodeDomain).The recovered code is validated to hash to the
codeHashit is recovered for(
crypto.Keccak256Hash(code) == codeHash.Value()); a mismatch is skipped, sorecovery never writes code that disagrees with its hash.
Tests
TestNormalizeWriteSet_CodePathTravelsWithCodeHash— 7702 designator via theversionMap.
TestNormalizeWriteSet_CodePathRecoveredFromStateReader— committed 7702designator via the
stateReaderfallback.TestNormalizeWriteSet_CodePathRecoveredForCreatedContract— ordinary CREATE2contract (the SafeProxyFactory case).
TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch— code whose keccak≠ the recovered hash is rejected, not emitted.
All fail-without / pass-with the fix on
main.Scope and limitations
Forward-prevention only. This stops new codeHash-no-code from being written;
it cannot repair codeHash-no-code already collated into immutable snapshot
.kvfiles — that requires a snapshot unwind.
Targets
main; a cherry-pick torelease/3.5will follow (notrelease/3.4,which does not run parallel execution by default).