Skip to content

execution/stagedsync: recover CodePath alongside CodeHashPath in parallel normalizeWriteSet#21706

Open
mh0lt wants to merge 4 commits into
mainfrom
mh/parallel-codepath-recovery
Open

execution/stagedsync: recover CodePath alongside CodeHashPath in parallel normalizeWriteSet#21706
mh0lt wants to merge 4 commits into
mainfrom
mh/parallel-codepath-recovery

Conversation

@mh0lt

@mh0lt mh0lt commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Recover CodePath alongside CodeHashPath in the parallel executor's
normalizeWriteSet, so an account can never be persisted with a non-empty
codeHash but no code bytes (codeHash-no-code).

Root cause

On re-execution, SetCode short-circuits (so.Code() already returns the prior
incarnation's bytes, bytes.Equal), so the validated incarnation emits no fresh
CodePath. normalizeWriteSet keeps CodeHashPath (resolved from the version
map / filled from committed state) but the incarnation filter drops the stale
CodePath — leaving a codeHash with no code. A later block that CALLs the
contract executes it as empty and diverges (wrong gas → wrong root). Two ways the
fresh CodePath goes missing:

  • CREATE/CREATE2 of an ordinary contract — reproduced on mainnet: a Safe proxy
    deployed via the SafeProxyFactory at block 25291004, CALLed and reverted
    6785 blocks later at 25297789.
  • EIP-7702 delegating tx — the delegated sender then fails EIP-3607 as
    "sender not an eoa".

Fix

After the fill-missing pass in normalizeWriteSet: for any address with a
non-empty CodeHashPath in the output but no CodePath (and not
self-destructed), recover the code so code always travels with its hash —

  • versionMap hit → the code was written in this block; recover it for any
    contract (genuine in-block code).
  • versionMap miss → fall back to committed state via stateReader, bounded to
    7702 designators (stateReader also returns an unchanged contract's existing
    code, and re-emitting that for every touched contract is write amplification with
    no correctness benefit; an unchanged contract's code is already in CodeDomain).

The recovered code is validated to hash to the codeHash it is recovered for
(crypto.Keccak256Hash(code) == codeHash.Value()); a mismatch is skipped, so
recovery never writes code that disagrees with its hash.

Tests

  • TestNormalizeWriteSet_CodePathTravelsWithCodeHash — 7702 designator via the
    versionMap.
  • TestNormalizeWriteSet_CodePathRecoveredFromStateReader — committed 7702
    designator via the stateReader fallback.
  • TestNormalizeWriteSet_CodePathRecoveredForCreatedContract — ordinary CREATE2
    contract (the SafeProxyFactory case).
  • TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch — code whose keccak
    ≠ the recovered hash is rejected, not emitted.

All fail-without / pass-with the fix on main.

Scope and limitations

Forward-prevention only. This stops new codeHash-no-code from being written;
it cannot repair codeHash-no-code already collated into immutable snapshot .kv
files — that requires a snapshot unwind.

Targets main; a cherry-pick to release/3.5 will follow (not release/3.4,
which does not run parallel execution by default).

…WriteSet

normalizeWriteSet recovered an account's CodeHashPath from the versionMap (via
the CodeHashPath case and the fill-missing-fields loop) but had no equivalent
recovery for CodePath: the CodePath case kept the write only at the validated
incarnation, and the fill loop never emitted code. A tx whose validated
writeset lacked a fresh CodePath — e.g. an EIP-7702 delegating tx that
re-executes, where SetCode short-circuits because so.Code() already returns the
designator written by the prior incarnation (bytes.Equal(prevcode, code)) —
therefore persisted a non-empty codeHash with no code bytes. A later block then
read empty code for the delegated account, and the EIP-3607 sender check
wrongly rejected the 7702 sender ("sender not an eoa").

Recover the code this tx wrote from the versionMap (incarnation-agnostic,
scoped to this tx so a merely-touched contract's prior-tx code is not
re-emitted) whenever an account has a non-empty codeHash but no code in the
normalized output, mirroring CodeHashPath. Code can no longer be lost while its
hash survives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mh0lt mh0lt requested a review from yperbasis as a code owner June 9, 2026 15:32
The earlier recovery only re-emitted code found in THIS tx's versionMap
(rr.Version().TxIndex == txIndex). On the real failure path that guard
misses: a re-executing 7702 delegation whose code equals the already-
committed designator makes IBS.SetCode short-circuit (bytes.Equal), so
the validated incarnation writes no CodePath and the prior incarnation's
versionMap entry is invalidated on re-exec — the versionMap holds nothing
for this tx. The fill-missing loop still fills CodeHashPath from committed
state, so the account persists a codeHash with no code; a later 7702
sender then reads empty code and is wrongly rejected "sender not an eoa"
(observed re-executing mainnet blocks 25277235 / 25279079 / 25280960).

Recover the designator from the versionMap, else fall back to the
post-state via stateReader.ReadAccountCode (mirroring how CodeHashPath is
recovered). Gate emission on types.ParseDelegation so only 7702
designators are re-emitted — never ordinary unchanged contract code for a
touched contract (no write amplification, no callee-code misattribution).

This prevents the drop during forward execution. It cannot repair state
already collated into immutable snapshots with codeHash-but-no-code; that
needs a snapshot unwind (separate, in development).

Adds TestNormalizeWriteSet_CodePathRecoveredFromStateReader for the
short-circuit/stateReader path; the existing versionMap-path test stays.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mh0lt

mh0lt commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Update: on-node reproduction + strengthened fix

Running the perf-stack binary on a cold mainnet minimal resync reproduced this bug on re-execution — sender not an eoa at blocks 25277235, 25279079, and 25280960 (different addresses each time). The original recovery did not fire on these ([codepath-recovery] silent), which exposed a gap:

Root cause of the gap: the recovery only re-emitted code present in this tx's versionMap (rr.Version().TxIndex == txIndex). The real failure path is a re-executing 7702 delegation whose code equals the already-committed designator → IBS.SetCode short-circuits (bytes.Equal) → the validated incarnation writes no CodePath, and the prior incarnation's versionMap entry is invalidated on re-exec. So the versionMap has nothing for this tx; the fill-missing loop still fills CodeHashPath from committed state → codeHash persists with no code.

Strengthened fix (7a84cb413d): recover the designator from the versionMap, else fall back to stateReader.ReadAccountCode (mirroring how CodeHashPath is recovered), and gate emission on types.ParseDelegation so only 7702 designators are re-emitted — never ordinary unchanged contract code (no write amplification, no callee misattribution). New TestNormalizeWriteSet_CodePathRecoveredFromStateReader covers the stateReader path.

Scope: this prevents the drop during forward execution. It cannot repair state already collated into immutable snapshots with codeHash-but-no-code — that requires a snapshot unwind (separate work, in development). Full forward validation therefore needs a fresh resync with this fix in place (the existing corrupt datadir can't validate it).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an intermittent parallel-execution consistency gap in normalizeWriteSet where an address can end up with a non-empty CodeHashPath persisted without its companion CodePath, causing EIP-7702 delegated EOAs to be misclassified (surfacing as false EIP-3607 “sender not an eoa” rejections).

Changes:

  • Add a recovery pass in normalizeWriteSet to re-emit missing CodePath when a non-empty CodeHashPath is present (and the code parses as an EIP-7702 delegation designator).
  • Add unit tests covering both version-map recovery and stateReader fallback recovery scenarios for the dropped-CodePath case.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
execution/stagedsync/exec3_parallel.go Adds CodePath recovery logic to keep delegation designator code aligned with its emitted CodeHashPath during parallel normalize.
execution/stagedsync/exec3_finalize_test.go Adds regression tests ensuring CodePath is recovered alongside CodeHashPath for EIP-7702 delegation scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread execution/stagedsync/exec3_parallel.go Outdated
Comment on lines +3453 to +3455
if _, ok := types.ParseDelegation(code); !ok {
continue
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil \|\| accounts.InternCodeHash(hh) != h {

this pattern looks very weird. don't know what it means

@mh0lt mh0lt Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 953389b.

emit now verifies the recovered code against the emitted hash before re-emitting — crypto.Keccak256Hash(code) != want.Value() → skip — on both the versionMap and stateReader recovery paths, so a CodePath is only re-added when the code genuinely hashes to the CodeHashPath it is recovered for. Recovery can therefore never persist an account whose code disagrees with its codeHash.

Test: TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch (constructs a candidate whose keccak ≠ the recovered hash and asserts no CodePath is emitted).

@mh0lt mh0lt Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shipped check is a direct value comparison — no intern:

if crypto.Keccak256Hash(code) != want.Value() { return false }

This leans on the trade-off unique is built around: Make pays a lookup once (and at most one canonical allocation per distinct value, shared thereafter) so that every read and comparison afterward is cheap and allocation-free. CodeHash is a unique.Handle[common.Hash], so a handle compare is a word compare and .Value() is an allocation-free read of the canonical value — no Make, no map lookup.

Escape analysis confirms it (go build -gcflags=-m):

exec3_parallel.go: inlining call to accounts.CodeHash.Value
exec3_parallel.go: inlining call to unique.Handle[go.shape.[32]uint8].Value

.Value() (and the underlying unique.Handle.Value) inline to a direct read — no escapes to heap / moved to heap note anywhere in the closure, and crypto.Keccak256Hash(code) is a stack [32]byte that's compared and discarded.

So InternCodeHash(hh) != h is the wrong side of the trade here: it re-pays the Make lookup to turn an already-cheap value compare into a handle compare — useful when you'll store and reuse the handle, pure overhead for a one-shot check. Reading .Value() and comparing is the intended path, and it allocates nothing.

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes on three points (plus a style one):

  1. Verify recovered code against the emitted hash before re-emitting. The hash and the code can come from different sources (vm.Read at ceiling txIndex+1 can return another tx's write; the stateReader current-view races the publish loop's apply progress). On disagreement this persists code that doesn't match the persisted codeHash — and CodeDomain isn't covered by the state root, so a wrong designator target would silently misdirect later execution, worse than today's loud sender not an eoa. A keccak over ≤23 bytes, skipping emission on mismatch, closes this. (Same gap as Copilot's inline comment.)

  2. Narrow the trigger. The fill-missing loop emits CodeHashPath for every touched account, so the recovery runs for every touched non-empty-code account on the serialized result-processing path: a full stateReader.ReadAccountCode per storage-touched contract per tx just to discard via ParseDelegation, and a redundant CodePath re-emit for every tx touching a delegated EOA (deduped at DomainPut, but still TouchKey + GetLatest + writeLog per touch — constant on post-Pectra mainnet). Suggest gating recovery on the address having a CodePath/CodeHashPath entry (any incarnation) in the raw writeset: a drop requires this tx to have written code; when the hash comes purely from the fill loop, the code is already committed in CodeDomain (or unrecoverable anyway). With that, recovery should ~never fire in production — so also add a log when it does; a silent safety net hides the upstream scheduler bug it catches (the [codepath-recovery] log mentioned in the PR comments isn't in this diff).

  3. Correct the mechanism claims in the comments/test docstrings. The premise of TestNormalizeWriteSet_CodePathTravelsWithCodeHash — "blockIO.WriteSet retains both incarnations' entries (versionMap doesn't clear old)" — is contradicted by the pipeline: RecordWrites replaces the per-tx write set (versionedio.go:1321) and re-exec shrinkage deletes dropped paths from the versionMap (exec3_parallel.go:2331); IBS emits CodePath/CodeHashPath/CodeSizePath atomically. The "short-circuit against an already-committed designator" path also produces no corruption — the committed designator bytes are already in CodeDomain, which is why the narrowed gate loses nothing. The tests are fine as robustness tests of normalizeWriteSet's contract against arbitrary inputs, but the docstrings shouldn't present a mechanism the scheduler prevents. If the trigger is narrowed per point 2, TestNormalizeWriteSet_CodePathRecoveredFromStateReader should assert no emission instead.

  4. Comment style: the new multi-paragraph block comments are well over the repo's comment limits — condense to one or two sentences stating the invariant; the scenario forensics belong in the PR description.

mh0lt added a commit that referenced this pull request Jun 11, 2026
…perf-statecache-lru-pr

Reconcile #21386 onto the cleaned commitment-cache model from #21380.

Resolution:
- branch_cache.go (+test), temporal_mem_batch.go, kv_interface.go -> take
  #21380's cleaned model: reduced BranchCache API, txN-watermark UnwindTo,
  FlushOption pattern, flush-callback-after-MDBX-write.
- preload*.go, trunk_pin_test.go -> deleted (trunk-pin extracted to
  mh/branch-cache-trunk-pin).
- domain_shared.go -> combine: keep #21386's StateCache refresh, keccak
  code-cache fix, codeHash->code read bypass, per-worker kvmetrics (wm) and
  the collector/reqMetrics fields; adopt #21380's cleaned FlushOption
  multi-domain callback (cb-after-MDBX-write), MeteredGetterWithTxN watermark
  + txN=0 skip; drop the extracted adaptive-pin controller + PublishMetrics.

Notable behavioural deltas (flagged for review):
- BranchCache unwind: #21386's epoch/unwindFloor model -> #21380's
  txN-watermark UnwindTo (same effect: evict entries above the unwind point).
- StateCache flush entries now stamped with sd.txNum (batch high-water) as the
  unwind watermark: the cleaned WithFlushCallback exposes step, not per-key
  txN; sd.txNum is a safe conservative upper bound.
- temporal_mem_batch.go DomainMetrics refs retargeted changeset -> kvmetrics.

Carries: keccak codeHash fix, #21706 CodePath recovery, stateCache.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt and others added 2 commits June 12, 2026 11:21
… not just 7702

The CodePath recovery in normalizeWriteSet was bounded to EIP-7702 designators
(types.ParseDelegation), so an ordinary CREATE/CREATE2 contract whose validated
incarnation's SetCode short-circuited (re-execution: so.Code() already returns
the prior incarnation's bytes) had its stale-incarnation CodePath dropped by the
incarnation filter while CodeHashPath survived — persisting a non-empty codeHash
with no code (codeHash-no-code). A later block that CALLs the contract then runs
it as empty and diverges: mainnet 25291004 deploys a SafeProxyFactory CREATE2
proxy whose code is dropped; block 25297789 calls it, the call reverts on the
missing code, and the block fails with a gas mismatch (−346,536 gas) — a
deterministic wedge under both parallel and serial re-exec.

Generalize the recovery: when the versionMap holds CodePath for the address
(the code was written in THIS block — a deploy, code change, or 7702 designator),
recover it unconditionally; it is always genuine in-block code, never an
unchanged contract's bytes. Keep the ParseDelegation gate only on the
stateReader fallback (versionMap miss), since stateReader also returns an
unchanged contract's existing code and re-emitting that for every touched
contract would be write amplification.

Forward-prevention only — cannot repair codeHash-no-code already collated into
immutable snapshot files (that needs a snapshot unwind).

Adds TestNormalizeWriteSet_CodePathRecoveredForCreatedContract (ordinary
bytecode, versionMap-hit path; fails under the old 7702-only gate).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…comments

Addresses review on #21706:
- Copilot: the CodePath recovery now validates that the recovered code hashes
  to the CodeHashPath it is recovered for; a mismatch is skipped rather than
  emitted, so recovery can never write code that disagrees with its hash.
- AskAlexSharov: drop the cryptic InternCodeHash(hh) != h comparison — compare
  crypto.Keccak256Hash(code) against the codeHash's raw Value() (no interning,
  no heap escape).
- Comment policy: trim the recovery comment to the invariant + gating rationale;
  the forward-only limitation and the mainnet reproduction move to the PR body.

New test TestNormalizeWriteSet_CodePathRecoveryRejectsHashMismatch pins the
reject-on-mismatch guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants