Skip to content

StateCache LRU + Mode rework (PR #2 of the perf stack)#21386

Open
mh0lt wants to merge 44 commits into
mh/perf-caches-prfrom
mh/perf-statecache-lru-pr
Open

StateCache LRU + Mode rework (PR #2 of the perf stack)#21386
mh0lt wants to merge 44 commits into
mh/perf-caches-prfrom
mh/perf-statecache-lru-pr

Conversation

@mh0lt

@mh0lt mh0lt commented May 24, 2026

Copy link
Copy Markdown
Contributor

This PR ships the execution/cache LRU/Mode rework + the StateCache population commits as a follow-on to PR #21380 (State Cache Consolidation). The LRU/Mode rework was always meant to ship separately so the policy change can be reviewed independently of #21380's BranchCache work.

Important

Stacks on #21380. Base is mh/perf-caches-pr, NOT main. Merge order: #21380 → this PR.

Important

Do not merge until CI is green on both parallel and serial.

Scope — 11 commits cherry-picked from mh/all-stack

sha (rebased) source subject
cb4443bf51 fba4ce8999 execution/cache, db/state/execctx: SD-transparent ethHash bypass for CodeDomain
d75ec41fcd 7d0998d0db execution/cache, db/state, execution/state: codeSizeCache for EXTCODESIZE / EXTCODEHASH
77cf879d9a cbe9044e52 execution/exec, execution/execmodule: BlockReadAheader populates cache.StateCache
67297a5dfe f2d4c3df74 execution/state, execution/cache: stateObject.code populate + addrToHash LRU
cca736e34d 7c3e054063 execution/cache, db/state/execctx: addr → codeHash LRU above SD
2a21a81608 c8f10544c0 execution/exec: cachePopulatingGetter caches negative results
2eea7d2c61 d01a345062 execution/cache: surface fill-and-freeze cliff via inserts/dropped counters
576c5ade3e 8052c84831 execution/cache: replace GenericCache map with sharded LRU + Mode
8e239f3518 6b785d4360 execution/cache: STATE_CACHE_MODE env override at NewStateCache time
ad9f74c897 c55128565a execution/cache: correct the LFU rationale in Mode docstring
266e2979bd f80655f6d2 execution/cache: reduce default cache caps to 100 MB each (bench knob)

One commit deferred

The 12th commit on the original handoff list — 66bcc44702 (BAL-driven BlockStateCache prewarm) — has been dropped from this PR because it depends on the execution/balcache package, which is introduced by PR-A (eth/71 BAL wire protocol) off main. It will be reintroduced as a small follow-up PR once both this PR and PR-A have merged.

🤖 Generated with Claude Code

Mark Holt and others added 11 commits May 25, 2026 07:28
…CodeDomain

Adds a third map (`ethHashToCode`) to CodeCache, keyed by the 32-byte
Ethereum codeHash (keccak256). New methods `GetByEthHash` and
`PutWithEthHash` expose direct L2b access without going through the
addr→maphash→code two-level path. The byte storage duplicates L2 in the
worst case (2x code-bytes memory at the cap); accepted for the per-key
fast path on many-addrs-one-code workloads.

`SharedDomains.GetLatest(CodeDomain, ...)` consults L2b transparently:
when the addr-keyed cache misses, resolve the codeHash from the
AccountsDomain (typically warm because the EVM just loaded the account),
probe `stateCache.GetCodeByHash` before falling through to the file
accessor stack. On miss, fill both L1 and L2b via PutCodeWithHash. The
fast path is unchanged.

Workload shape this targets: many addresses sharing one codeHash
(proxies, factory-deployed clones, ERC-20 holders, OpenZeppelin
templates). Today's addr-keyed cache misses on every fresh address even
when the bytecode is already cached. With this change a single L2b
entry serves N addresses after the first population.

Microbench results:
- BenchmarkCodeCache_GetByEthHash_Hit:       17.01 ns/op
- BenchmarkCodeCache_GetByEthHash_Miss:      15.45 ns/op
- BenchmarkCodeCache_Get_AddrLevel_Hit:      32.44 ns/op (existing)
- BenchmarkCodeCache_GetByEthHash_ManyAddrs: 17.02 ns/op

L2b hit is ~2x faster than the existing two-level addr path (one map
probe vs two), and enables hits on workloads where L1 would miss.

Cross-client research at agentspecs/cross-client-state-access-2026-05-14.md
notes geth's separate codeSizeCache as the further (geth-proven) win
for EXTCODESIZE/EXTCODEHASH and addrToHash LRU as a one-line behaviour
fix; both queued as follow-up surgical commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SIZE / EXTCODEHASH

Adds a third caching layer to CodeCache (alongside L1 addr→maphash and L2b
ethHash→bytes): codeSizeByEthHash maps the 32-byte Ethereum codeHash to
its byte length. Tiny per-entry footprint (32B key + 8B value vs 5-10 KB
for full bytes) so the same memory budget gives ~1000x the hit surface.
Capped at 1M entries (geth core/state/database_code.go uses the same size).

EXTCODESIZE / EXTCODEHASH callers — historically the slowest opcodes on
the lab dashboard's bench — answer from a single map probe without paying
the file accessor stack cost of the full bytes. Geth-proven; cross-client
writeup at agentspecs/cross-client-state-access-2026-05-14.md notes this
as the largest single available win for the synthetic bench.

Wiring:
- CodeCache.GetCodeSizeByEthHash / PutCodeSizeByEthHash — direct accessors.
- PutWithEthHash now populates the size layer alongside L2b, so every
  bytes-load creates a future fast-path entry "for free".
- StateCache wrappers GetCodeSizeByHash / PutCodeSizeByHash.
- SharedDomains.GetCodeSize(tx, addr) — the SD-transparent fast path:
  resolve codeHash via the AccountsDomain cache chain, probe the size
  cache, then L2b, then file-read+populate. Returns (0, false, nil) for
  EOAs and no-code accounts without paying any file read.
- temporalGetter.GetCodeSize so callers reach it via the existing getter.
- ReaderV3.ReadAccountCodeSize type-asserts on a codeSizeGetter interface
  and routes through the fast path when the underlying getter supports it;
  falls back to GetLatest+len otherwise. No kv.TemporalGetter interface
  change.

Limitation: capacity is no-op-when-full, not LRU. A separate surgical
commit will swap to real LRU eviction; mirrors the addrToHash fix queued
from the same cross-client writeup.

Tests: 3 new (PopulatedAlongsideBytes, DirectPutAndGet, EmptyHashOrNegativeIsNoOp).
All existing CodeCache tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e.StateCache

The BlockReadAheader has always prefetched BAL-listed (and access-list)
addresses' account/code/storage via a fresh ReaderV3 on a separate RoTx.
Its prefetches warmed OS page cache + RoTx cursors — disconnected from
the process-global cache.StateCache that SharedDomains.GetLatest probes
on the EVM hot path. The two layers were two separate caches; nothing
the prefetcher loaded ever reached the EVM's lookup path.

Reth's structural advantage on EXTCODESIZE-loop benches is that its
prewarm writes to the same hashmap the EVM reads from
(crates/engine/execution-cache/src/cached_state.rs:663). When EVM enters,
every BAL-listed addr's first read is a 20 ns cache probe — no file
accessor stack, no decompression CPU. PR #21128 swapped this from
mini-moka to a lock-free fixed-cache for a measured +10.8 % mgas/s.

This commit closes the equivalent gap on Erigon: a thin cache-populating
TemporalGetter wrapper writes successful reads through to cache.StateCache
as a side effect. ReaderV3 is unchanged; the wrapper sits in front. When
the prefetcher already has the codeHash from a preceding account read,
the next CodeDomain read routes through StateCache.PutCodeWithHash so
the L2b (ethHash → bytes) + size-cache layers fill alongside the bare
addr-keyed L1.

Wiring:
- BlockReadAheader.SetStateCache(*cache.StateCache) setter.
- ExecModule construction calls readAheader.SetStateCache(domainCache),
  so the same StateCache the FCU/canonical path wires onto SD is the one
  the prefetcher warms.
- cachePopulatingGetter wraps the worker's ttx; both BAL-warming and
  tx-warming paths gain the same treatment.

Fgprof on the EXTCODESIZE-EXISTING_CONTRACT-30M bench had shown 95 % of
EVM wall-clock in seg.Getter.nextPos (Huffman decompression of code
values). With this commit, every BAL-listed addr's lookup should hit
the cache and skip the file accessor stack entirely — eliminating the
dominant cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ash LRU

Two surgical commits bundled (both touch the code-read hot path):

1. IntraBlockState.GetCodeSize now loads the full bytes via
   stateReader.ReadAccountCode on first touch and populates
   stateObject.code, so subsequent same-addr EXTCODESIZE /
   EXTCODEHASH / CALL within the tx are in-struct slice-len calls
   (~50 ns), not full reader round-trips. Mirrors geth's pattern
   at core/state/state_object.go ~Code() — pay one read per addr
   per tx, free for the rest.

2. CodeCache.addrToHash switched from a no-op-when-full
   maphash.Map[versionedAddressID] to an LRU
   lru.Cache[[20]byte, versionedAddressID] (hashicorp/golang-lru/v2,
   already imported elsewhere). Cap derived from the existing byte
   budget at ~28 bytes/entry (~580 k entries for the 16 MB default).
   Fresh-address workloads (mainnet thousands of new addrs per
   block) now warm up the addr layer over time instead of silently
   dropping new entries forever; matches geth's lru.Cache at
   core/state/database_code.go.

   The hashToCode layer is unchanged (content-addressed bytes,
   immutable, byte-capped with new-entry no-op when full — the same
   semantic as before since code bytes by codeHash never change).

Bench on the EXTCODESIZE-EXISTING_CONTRACT-30M family: 62.34 mgas/s
(was 61.50). The marginal gain is small on this bench because BAL
prefetch already populates the cache layers; neither lever fires
heavily. The expected wins are on non-BAL workloads where
EXTCODESIZE-loop patterns repeat within a tx (#1) and
fresh-address-churn mainnet blocks fill the addr layer (#2).

Updated TestCodeCache_AddrCapacityLimit to assert LRU eviction
(was asserting no-op-when-full); the prior behaviour was the bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nethermind-style addr → 32-byte codeHash LRU sitting above
SharedDomains.codeHashForAddr. When the EVM-known codeHash for an
address has already been resolved once, subsequent lookups skip the
entire AccountsDomain chain (sd.mem → sd.parent.mem → sd.stateCache →
tx.GetLatest) and the account-RLP decode.

Wiring:
- CodeCache adds addrToEthHash *lru.Cache[[20]byte, [32]byte] sized
  to the existing addrCapacityB budget; methods GetAddrCodeHash /
  PutAddrCodeHash / DeleteAddrCodeHash.
- StateCache wrappers route to the CodeCache instance.
- SD.codeHashForAddr probes the LRU first; on miss falls through to
  the existing chain and populates on the way out (including the
  zero-hash sentinel for missing-or-EOA accounts — repeat lookups
  return immediately).
- Invalidation: SD.DomainPut for AccountsDomain drops the entry
  (CREATE / CREATE2-replace path); SD.DomainDel for AccountsDomain
  also drops the entry (SELFDESTRUCT); StateCache.RevertWithDiffset
  drops on unwind.

Helps non-BAL workloads where codeHashForAddr is currently the cold
account-domain probe. On the EXISTING_CONTRACT bench (BAL prefetch
already populates everything), this is within noise; the lever is for
mainnet workloads where many addresses miss the BAL-prefetch window
but the cache is warm from prior lookups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cache-populating wrapper on the read-ahead worker's TemporalTx
previously gated cache writes on `len(v) > 0`. That dropped negative
results — i.e. missing accounts, empty storage slots, no-code probes —
on the floor. Repeated probes of the same missing address re-paid the
file accessor stack walk every time, instead of hitting a cached
negative entry.

Mirrors the revm pattern that drives reth's 1700-3400 mgas/s on
account_access NON_EXISTING / EXISTING_EOA variants: revm represents
a missing address as a real CacheAccount{ account: None, status:
LoadedNotExisting } and reth's ExecutionCache.account_cache uses
FixedCache<Address, Option<Account>> where None is a first-class
cacheable value. Bottom of the reth path is: BAL prewarm calls
basic_account once → returns None → cache hit forever for that addr.

The cycle-2 sweep on account_access[EXTCODESIZE/NON_EXISTING/30M]
showed 3.65 → 494 mgas/s without this fix; with the fix the same
bench reports 508 mgas/s (within run-to-run noise but trending right).
Most of the win was already captured by the readAhead-populates-
cache.StateCache wiring (commit cbe9044) and the balcache port
(d41e2e8) — those raised the cache hit rate on populated entries
enough that the EVM rarely fell through to the file accessor on
this bench. The fix is mechanically correct regardless and should
matter more on workloads with mixed populated / negative probes
across blocks.

See agentspecs/reth-missing-eoa-fastpath-2026-05-15.md for the
detailed mechanism analysis and the three concrete copy-able
patterns from reth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unters

GenericCache.Put has no eviction policy. When the byte budget is reached,
new keys are silently dropped until Clear/ClearWithHash/ValidateAndPrepare-
mismatch resets the cache. On a long-running node this manifests as a
monotonic miss-rate climb that's hard to attribute without instrumentation.

Add two counters next to hits/misses:
  inserts - new keys accepted
  dropped - new keys rejected at the budget check (the existing branch
            at the new-key cap; not a behaviour change)

PrintStatsAndReset logs both. Sets up the diagnostic baseline before the
eviction-policy swap in the follow-up commits on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the maphash.Map[T] backing store in GenericCache with
freelru.ShardedLRU[uint64, entry[T]] (same lib as db/state/cache.go;
already in go.mod). Adds a Mode constructor flag:

  - ModeEvictLRU (default): per-shard LRU evicts the oldest entry on
    insert when its slot cap is reached. OnEvict drops bytes from
    currentSize.
  - ModeNoOp: preserves the historical fill-and-freeze behaviour
    (silently drop new keys at the byte cap; counted via dropped).
    Kept as the diagnostic baseline so the regression bench can
    compare A/B.

Per-shard eviction is a known trade-off of freelru.ShardedLRU —
RemoveOldest is shard-local, not globally LRU. Matches the trade-off
db/state/cache.go / execution/cache/code_cache.go /
execution/balcache/balcache.go already accept. LFU (W-TinyLFU, the
policy reth uses) is scan-resistant by design and would slot in
behind the same Mode wrapper as a follow-up; the seam is documented
at policy.go.

Key shape: pre-hash via common/maphash.Hash (Go's randomized stdlib
hasher, already used by the previous maphash.Map) into uint64; entry
stores the full key for collision check. Same pattern as
db/state/cache.go.

Byte-budget translation: per-domain avg-entry constants in
state_cache.go (avgAccountEntryBytes / avgStorageEntryBytes /
avgCommitmentEntryBytes) — account / storage are near-fixed sizes so
the translation is reliable. capacityBytes becomes a sizing hint
plus telemetry (SizeBytes / PrintStatsAndReset). Code domain is
unchanged; CodeCache wraps its own LRUs.

Adds metrics: inserts, evictions, dropped — all exposed in
PrintStatsAndReset alongside the existing hits / misses / hit_rate.
Mode is also logged.

Touches one external call site: execution/vm/contract.go's
jumpDestCache now constructs with ModeEvictLRU.

Tests: TestDomainCache_PutCapacityLimit renamed to ..._NoOpMode and
asserts the fill-and-freeze contract under explicit ModeNoOp. New
TestDomainCache_PutEvictsWhenFull_EvictMode asserts eviction under
ModeEvictLRU using a small entry-count cap (the byte→entry
translation is approximate; the test uses the entry-count knob via
the in-package newGenericCacheEntries constructor to make the
assertion deterministic).

Pre-existing lint issues on mh/sd-code-cache (intra_block_state.go
nilness, preload_parallel.go prealloc) are surfaced by lint
non-determinism but are out of this commit's scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single env knob read once at NewStateCache. Default ModeEvictLRU,
recognised override "noop" (for the regression-bench baseline so
ModeEvictLRU and ModeNoOp can be compared on the same binary).
Unrecognised values fall back to evict with a warn log.

ModeNoOp engagement is logged at info level because the
fill-and-freeze behaviour is a deliberate diagnostic state, not a
production setting.

Pattern matches db/state/cache.go's D_LRU_ENABLED / D_LRU knobs
(dbg.EnvString from common/dbg).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous comment asserted "reth uses W-TinyLFU for state caches" —
that is wrong on the execution hot path. Reth's cross-block state cache
is `fixed-cache` (PR #21128, v1.11.0): a lock-free direct-mapped /
set-associative array with collision-evict semantics. No LRU list, no
LFU sketch. Their published wins (~25% newPayload p50 / +33% gas/s) came
from *removing* LRU/LFU bookkeeping, not adding LFU.

Where reth uses real LRU/LFU it's deliberate and not the execution cache
(schnellru::LruMap for networking; moka in precompile_cache.rs explicitly
configured with eviction_policy(EvictionPolicy::lru())).

The docstring now reflects two follow-up policies both real:
- ModeEvictFixedCache (reth's actual choice, more interesting structural
  option than LFU)
- ModeEvictLFU (W-TinyLFU; helps mainnet steady-state, not the cycle-2
  bloat fixtures which are pure cold scans)

Decision criterion (per agentspecs/lfu-vs-lru-state-cache-decision-2026-05-15.md):
ship ModeEvictLFU only if a 24h mainnet replay shows current sharded-LRU
hit-rate < 90 % on Account/Storage. Otter is the only credible Go
W-TinyLFU library; ristretto has documented correctness bugs and is
disqualified for an EL hot path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Investigation knob, NOT a permanent default. Account / Storage / Code
each capped at 100 MB so the bench measures layer contributions instead
of being dominated by preallocated cache memory pressure (1 GB / 1 GB /
512 MB defaults push sys past the GC/page-cache pressure band on this
hardware/workload mix).

Permanent defaults stay at 1 GB / 1 GB / 512 MB; this commit will be
reverted or dynamically gated by relative-to-available sizing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the mh/perf-statecache-lru-pr branch from 266e297 to 4a512ce Compare May 25, 2026 07:29
mh0lt and others added 3 commits May 25, 2026 15:09
This PR ships the parallel-exec correctness fixes from
`mh/parallel-exec-fixes` onto the perf stack, packaged as a focused PR
on top of [#21386 (StateCache
LRU)](#21386) which itself
stacks on [#21380 (State Cache
Consolidation)](#21380).

> [!IMPORTANT]
> **Stacks on #21386#21380.** Base is `mh/perf-statecache-lru-pr`,
NOT `main`. Merge order: #21380#21386 → this PR.

> [!IMPORTANT]
> **Do not merge until CI is green on both parallel and serial.** Same
gating rule as #21380 / #21386.

## Scope — 13 commits from `mh/parallel-exec-fixes`

Brought in via a merge commit so the bisection trail is preserved.

| sha | what it fixes |
|---|---|
| `25053e38e9` | parallel SD-of-pre-existing-contract — the 197-line
foundational fix |
| `2e2bf3ccc0` | clean exit when single-block batch already covered
maxBlockNum |
| `6e451f5ed2` | don't emit StoragePath=0 writes from IBS.Selfdestruct |
| `616a4fa0a8` | clear calc Deleted on a non-SD account write even when
zero |
| `d99f2f704d` | gate known parallel-exec failures behind EXEC3_PARALLEL
(#21136) |
| `34e83e82b7` | install per-block changeset accumulator before any of
the block's writes |
| `b340d7e592` | drop stale sd.mem 'Trim old version entries' comment |
| `629cc23566` | O(1) CollectorWrites fee-balance update, drop dead
VersionedWrites.SetBalance |
| `a0ecfc7e12` | first-match-wins in CollectorWrites BalancePath index |
| `445f97e446` | emit EIP-7708 Burn log under parallel-exec when
coinbase self-destructs |
| `5e1f5fa901` | mirror ReadAccountData SD-revival check into
versionedRead |
| `a5dc83f509` | drop two stale EXEC3_PARALLEL t.Skips |
| `8af901104f` | drop TestReceiptHashFromRPC unit-suite RPC integration
test |

## Merge conflicts resolved

3 files, 8 regions — all resolved by keeping HEAD's typed-readset /
per-path revival shape and confirming HEAD already absorbs each fix's
intent. See the merge commit message (`cfc4ec1418`) for the per-region
rationale.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mark Holt <erigon@dev-bm-e3-ethmainnet-n4.erigon.io>
…govet)

stateObject and s are both verified non-nil earlier in their respective
scopes; the secondary checks at lines 749 and 783 are redundant. govet
nilness check fails on these.
codeHashForAddr resolves an account's codeHash from the AccountsDomain so
the CodeDomain ethHash bypass can serve shared bytecode without an
addr-keyed file read. decodeAccountCodeHash decoded the account value with
acc.DecodeForStorage, but AccountsDomain values are SerialiseV3-encoded.
DecodeForStorage is the legacy MDBX bitmask format with an incompatible
binary layout; applied to V3 bytes it silently misparses and leaves
CodeHash empty.

As a result codeHashForAddr returned nil for every account and the ethHash
bypass never engaged for any contract — every CodeDomain read that missed
the addr cache fell through to a file read. This is a decoding-correctness
bug: the wrong decoder is applied to the stored encoding.

Use accounts.DeserialiseV3, the matching decoder for AccountsDomain values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sudeepdino008

Copy link
Copy Markdown
Member

Deterministic gas-used mismatch at mainnet chain tip with state cache on (parallel exec)

Hit a reproducible invalid-block failure while running this branch (fd74979033) on a mainnet --prune.mode=minimal node at chain tip.

Symptom

Parallel catch-up execution (initialCycle=true, FCU path) fails the gas-used check, with execution computing less gas than the header:

invalid block, block=25246401, gas used by execution: 36458218, in header: 36468327   (Δ = −10,109)
  hash=fa7171beb846755c1f3f00d367bc3f6ea76db30513b2a6f4dd2b70f31ab90d88
invalid block, block=25246414, gas used by execution: 18322595, in header: 18366878   (Δ = −44,283)
  hash=a5c088ba8ec28f5007b15567e152e4a7216221b203942a7e2eb4f793f651c4a9

Fully deterministic: the FCU loop retried block 25246401 111 times and 25246414 114 times, producing identical wrong gas values on every retry. Failure stack: exec3_parallel.go:281/656 → exec3.go:264 → stage_execute.go:395 → forkchoice.go:507.

Environment

  • branch head fd74979033 plus one local commit adding read-only Prometheus counters on the read path (present in both the failing and the passing runs below, so not a factor)
  • mainnet, minimal datadir freshly synced same day, at tip
  • EXEC3_PARALLEL=true, 10 workers, --exec.state-cache=true (defaults otherwise)

Control experiment

Restarting the same binary on the same datadir with --exec.state-cache=false executed straight past both blocks on the first attempt and reached tip. For reference, a pre-this-branch build (main from ~Jun 3) ran all day at tip with the state cache enabled and parallel exec without a single invalid block.

So: state cache ON + parallel ⇒ deterministic wrong execution; state cache OFF ⇒ correct.

Notes / hypothesis

Gas is under-counted while the block otherwise fails at the gas check (not a state-root mismatch first), which points at a stale value flowing into a gas-sensitive path — SSTORE current/original-value pricing, or the new CodeDomain L2b hash-bypass / GetCodeSize fast path feeding stale code bytes/size. Both failing blocks were executed inside a multi-block catch-up batch, not single-block tip-following, if that helps narrow the window.

Happy to provide the datadir state, full logs, or run candidate fixes against the same node — currently digging into per-tx gas divergence on an unwound copy.

@sudeepdino008

Copy link
Copy Markdown
Member

Root cause of the gas-mismatch / wrong-trie-root failures: L2b bypass breaks DomainPut's prevVal read on EIP-7702 delegations

Follow-up to my previous comment — fully bisected and root-caused with an offline repro (integration stage_exec with the state cache attached + unwind; fails deterministically at the same block with identical wrong root across 1/10/20 workers, passes with cache off).

Bisection

toggle result
full state cache ❌ wrong root (also seen as gas-used mismatch at tip)
disable accounts / storage caches ❌ still fails
disable code cache ✅ passes
disable GetCodeSize fast path only ❌ still fails
disable the CodeDomain L2b bypass only (SharedDomains.GetLatest) ✅ passes

Smoking gun

An assert inside the L2b bypass comparing against the authoritative read fired:

L2b divergence: addr=042201a835f9ab04bb098dee1756bb8a26a2e068
resolvedCodeHash=8b38194773e4314f48b6d8e1c5aef93b68fedc5a427c5e90df8b3f5f68873542
cachedLen=23 dbLen=0
cached=ef010027dbd0e71b85700e29994d6d3a51f2e32442aa61...   ← EIP-7702 delegation designator
stack: ... domain_shared.go (L2b in GetLatest) ← DomainPut prevVal read ← apply loop

Mechanism (EIP-7702 lost-write)

  1. Authority X delegates to delegate D earlier in the batch → L2b caches keccak(0xef0100‖D) → designator bytes. That mapping is immutable and shared by every authority delegating to D.
  2. Authority Y delegates to the same D later in the batch. Apply order: Y's account record is written first (codeHash = designator hash), then DomainPut(CodeDomain, Y, designator).
  3. DomainPut reads prevVal via sd.GetLatest(CodeDomain, Y): mem misses (code not yet written) → L2b bypass resolves the codeHash from Y's freshly written account record (mem hit in codeHashForAddr) → GetCodeByHash hits via X's entry → returns the new designator as the "previous" code. Authoritative prev is empty.
  4. The no-change short-circuit (bytes.Equal(prevVal, v)) silently drops the CodeDomain write. Y's designator never lands.
  5. Result: wrong trie root, or — when a later tx calls through Y's delegation — wrong execution and the gas-used mismatches reported above.

This explains the intermittency (requires ≥2 authorities delegating to the same delegate within one exec batch — common at tip, not universal) and the full determinism once a qualifying window exists.

Fix directions

The L2b shortcut is only safe for pure reads of committed state; on the DomainPut/DomainDel prevVal path it can observe the same tx's in-flight account write and time-travel the answer. Options:

  1. prevVal reads in DomainPut/DomainDel use an internal getLatest variant that skips the L2b bypass (smallest, targeted);
  2. codeHashForAddr refuses to resolve from sd.mem/parent-mem (only stateCache/db layers) — keeps the bypass for hot committed accounts, loses it for in-flight ones;
  3. writers pass prevVal explicitly for CodeDomain puts where known.

Option 1 seems strictly correct: the prevVal contract is "value before this write", which the hash-resolved shortcut cannot guarantee once the account record has already advanced.

Repro recipe (offline, ~3 min): patch integration stage_exec to attach cache.NewDefaultStateCache() to the SD, unwind ~90 blocks on a recent mainnet datadir, re-exec. Happy to share the exact patch/assert or test a candidate fix against the captured window.

…nchCache + stateCache

The optimization stack and the cache branch had grown two divergent
FlushWithCallback methods — one single-domain for the BranchCache
(commitment), one all-domains for the StateCache flush-only fix. Merge
them into one.

FlushWithCallback is now all-domains: it invokes cb for every
(domain, key, latest-value, step) across all domains (sd.domains +
sd.storage), then drains the mem-batch — callback, MDBX flush and drain
in one latestStateLock window. SharedDomains.Flush passes a single
callback that routes CommitmentDomain → BranchCache and
Accounts/Storage/Code → StateCache.

The per-write stateCache.Put/Delete in domainPut/DomainDel are removed:
a write is in-flight, fork-specific state living in sd.mem; mirroring it
into the process-wide cache let a sibling fork's re-execution read
another fork's uncommitted bytes. The cache is now refreshed only on
flush, so it mirrors committed, fork-agnostic state. drainLocked empties
sd.mem as part of the flush so a child SD chained as parent reads
through to the refreshed cache / DB instead of stale bytes.

This folds the cache fix (was 527cb23077 on the cache branch) into the
optimization stack so the local group-test exercises the final code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the mh/perf-statecache-lru-pr branch from 5053ad8 to ba6c67a Compare June 5, 2026 09:51
@mh0lt

mh0lt commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Pushed 355e25f8a0execution/cache, stagedsync: don't advance state-cache blockHash during fork-validation.

ci-gate does not auto-run on this PR (it triggers only for base main/release/**/performance, and this PR targets mh/perf-caches-pr), so I manually dispatched the full CI Gate suite (lint, tests, race-tests, eest-spec-tests, hive, hive-eest, caplin, kurtosis, bench, repro) on this branch to get consensus coverage for the change — the state cache feeds execution reads, so EEST/hive matter here.

Dispatched run: https://github.com/erigontech/erigon/actions/runs/27019158872

@mh0lt mh0lt force-pushed the mh/perf-statecache-lru-pr branch from 355e25f to cf05caf Compare June 5, 2026 17:10
mh0lt and others added 3 commits June 7, 2026 10:24
Add a Collector, owned by the Aggregator (process lifetime), that aggregates
KV-read metrics across every read path. Producers fill their own *DomainMetrics
lock-free and hand ownership over a buffered channel tagged with a Source
(exec/commitment/warmup/rpc/engine); a single collector goroutine folds them
into map[Source]*DomainMetrics with no lock or atomics on the aggregate. The
goroutine also self-publishes source-labelled Prometheus gauges
(kv_read_count / kv_read_duration_ns, labels {source,domain,op}) on a ticker —
process-level and independent of whether a block is executing.

Wiring:
- Aggregator owns the Collector: Start in newAggregator (on a.wg), Stop+drain
  in Close before wg.Wait. Exposed to SharedDomains via the duck-typed
  kvmetrics.MetricsCollectorProvider on *AggregatorRoTx (same pattern as
  BranchCacheProvider), so the leaf kvmetrics package stays cycle-free.
- SharedDomains.MergeMetrics(source, wm) now hands a finished worker's metrics
  to BOTH sinks: the per-batch sd.metrics (under one lock, for the existing log
  line) and the collector (lock-free, for Prometheus). Ownership of wm transfers
  to the collector, so producers allocate a fresh instance afterwards.
- Producers tagged: exec workers (SourceExec, per task), the commitment fold
  (SourceCommitment), trie warmup and concurrent mount (SourceWarmup).
- AsGetterCollected(tx, source) gives concurrent short-lived callers (RPC,
  engine) a per-getter instance + flush closure; gated on KVReadLevelledMetrics.

The new gauges are additive — the existing exec-scoped mxExec* gauges and the
per-batch log line are unchanged. The memBatch put-path (CachePut*) is left on
the existing shared aggregate deliberately: those counters are load-bearing for
SizeEstimate's flush accounting, so moving them belongs in a separate change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a request-scoped read accumulator to SharedDomains
(StartRequestMetrics(source) / flush at Close) so a single-goroutine RPC handler
that reads through the plain AsGetter is metered without a shared accumulator or
a per-getter flush. getLatestMetered folds nil-metrics reads into it; this
short-circuits for exec workers (which pass their own per-worker instance), so
there is no cross-goroutine access to the request accumulator.

Wire eth_simulation's SimulateV1 to tag its reads as SourceRPC. Engine block
execution is already metered as SourceExec (it runs through the exec workers);
SourceEngine and the per-read-getter paths (exec_module CacheView, vm/runtime,
which build a getter per read) are left for a follow-up that needs a view-level
accumulator.

Also make Collector.Snapshot drain the buffered samples first so a snapshot
reflects everything sent so far, and add a -race collector test covering
concurrent Send + Snapshot + Close-drain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roducers

The exec hot path must never block on metrics and must never lose counts. A
buffered channel alone satisfies neither at the boundary: a full buffer blocks a
plain send, and a non-blocking send drops. Resolve both with retain-on-full.

- Collector.TrySend is non-blocking and returns whether the sample was queued.
  Exec workers keep a retained accumulator (collectorAcc): each task folds its
  reads in and TrySends; on a full buffer the send is skipped and the worker
  keeps adding to the same accumulator, retrying next task. A single blocking
  flush at worker exit drains the remainder (off the hot path, lossless). So
  execution never waits on metrics and no count is dropped.
- Collector.Send is the blocking variant for low-frequency boundary producers
  (commitment fold, warmup teardown, an RPC request closing) where a rare brief
  wait is fine and losing the sample is not.
- SharedDomains.LogMergeMetrics folds a task into the per-batch sd.metrics log
  aggregate only; the exec path uses it each task and feeds the collector
  separately via the retained accumulator, so the two sinks (which reset on
  different conditions) never double-count.
- Dropped the drop counter and its gauge — nothing is dropped now.

Validated: build, make lint (0 issues), go test -race ./db/state/kvmetrics
(incl. a test proving TrySend never blocks on a full buffer), and hive cancun
with KV_READ_METRICS=true at the pinned ref = 226/0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sudeepdino008

Copy link
Copy Markdown
Member

#21675 -- found an issue while running/testing this branch

mh0lt and others added 3 commits June 8, 2026 10:56
Ripples the perf-caches-pr main merge (+ #21380 review fixes) up the stack.
Resolved 5 conflicts by keeping statecache-lru's newer model where it has
evolved past perf-caches-pr:

- BranchCache: kept the txNum/epoch unwind model (Unwind + epoch invalidation)
  over perf-caches-pr's step/txN/UnwindTo model; ported the PutIfClean peek fix
  (avoid write-path miss-accounting). Removed the now-dead BranchCache.UnwindTo
  and converted the residual 5-arg PinEntry call (preload_parallel.go).
- domain_shared.go: kept the lock-free wm metrics path + epoch-stamped branch
  Put + ClearBranchCache/DetachBranchCache; added the statecfg import for
  PickTrieVariant. generic_cache.go kept the freelru LRU. exec3_parallel.go
  kept AsGetterNoMetrics.
- Restored statecache-lru's branch_cache_test.go / trunk_pin_test.go (they test
  the epoch model; the auto-merge had pulled perf-caches-pr's step/txN tests).

Validated: go build ./...; commitment + execmodule reorg/fork gate + stateCache
tests; make lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ripples the statecache-lru main merge up. Resolved 4 conflicts by combining
the metrics-collector additions with the rippled cache/commitment changes:

- domain_shared.go: keep the kvmetrics MetricsCollectorProvider lookup +
  statecache-lru's new TrieConfig commitment-context ctor (cfg, branchCache).
- aggregator.go: keep the kvmetrics collector init + statecache-lru's
  oldestVisible; the MeteredGetLatestWithTxN/getLatestWithTxN methods now take
  *kvmetrics.DomainMetrics (the metrics relocation changeset -> kvmetrics).
- commitment_context.go: Metrics() returns *kvmetrics.DomainMetrics.
- eth_simulation.go: keep StartRequestMetrics(SourceRPC); the defer toggle is
  now sharedDomains.SetDeferCommitmentUpdates(false) (renamed in the refactor).

Validated: go build ./...; commitment + kvmetrics + execmodule reorg tests;
make lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…" nickname

"ethHash" collided with Ethash (the proof-of-work algorithm) and obscured that
it is the keccak *code* hash. Rename the codeHash-keyed code-cache layer and its
API so the name says what it is:

  ethHashToCode        -> codeHashToCode
  GetByEthHash         -> GetByCodeHash
  PutWithEthHash       -> PutWithCodeHash
  codeSizeByEthHash    -> codeSizeByCodeHash
  GetCodeSizeByEthHash -> GetCodeSizeByCodeHash
  ethHashCodeSize/Hits/Misses -> codeHash...
  local codeEthHash    -> codeHash

The "L2b" tier nickname in comments/labels becomes codeHashToCode (it is a
content-addressed codeHash→code map, not a cache depth-level). Values and
behaviour are unchanged — mechanical rename. Test file renamed to
code_cache_codehash_test.go.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…or (#21663)

Stacked on #21386 (the state-cache PR); this is the metrics-only change.

## What

Replaces the per-task lock-merge of KV-read metrics with a
**process-level, channel-fed collector** owned by the Aggregator, so
every read path contributes (not just block execution) and the metrics
carry a `source` label.

## Why

`changeset.DomainMetrics` was execution-bound: only exec workers and the
commitment fold (during exec) produced metrics, folding into a
SharedDomains-scoped aggregate under a per-task lock. RPC/engine
`AsGetter` reads collected nothing, so "KV read IO" really meant "IO
during block execution." This makes IO observable process-wide,
lock-free on the aggregate, and broken out by subsystem.

## Design

- **New leaf package `db/state/kvmetrics`** — relocates
`DomainIOMetrics`/`DomainMetrics` + ctx helpers out of
`db/state/changeset` (they never belonged there) and adds the `Source`
enum, the `Collector`, and a shared `LogMetrics(level, source, detail)`
formatter. Imports only `kv` + stdlib + the metrics façade → no import
cycle.
- **Collector** owned by the Aggregator (process lifetime): `Start` on
`a.wg`, `Stop`+drain in `Close`. A single goroutine folds `{source,
metrics}` samples into `map[Source]*DomainMetrics` with **no
lock/atomics on the aggregate**, and self-publishes `source`-labelled
Prometheus gauges (`kv_read_count` / `kv_read_duration_ns`, labels
`{source,domain,op}`) on a ticker — additive to the existing `mxExec*`
gauges. Reached from SharedDomains via the duck-typed
`MetricsCollectorProvider` on `*AggregatorRoTx` (same pattern as
`BranchCacheProvider`).
- **Never block, never drop.** Exec workers retain an accumulator and
hand it off with a non-blocking `TrySend`; on a full buffer they keep
adding and retry next task, with one blocking flush at worker exit.
Boundary producers (commitment fold, warmup teardown, RPC request close)
use the blocking `Send` (off the hot path, lossless).
- **Sources**: exec, commitment, warmup, and RPC (`eth_simulation`'s
`SimulateV1`, via a request-scoped accumulator flushed at Close). Engine
block execution is already covered as `exec` (it runs through the exec
workers).
- The per-batch **log line** (`sd.metrics`) is kept unchanged via
`LogMergeMetrics`.

## Deliberately scoped out (follow-ups)

- **memBatch put-path (`CachePut*`)** stays on the existing shared
aggregate: those counters are load-bearing for `SizeEstimate`'s flush
accounting, so moving them belongs in a separate change.
- **`SourceEngine` and per-read-getter paths** (`exec_module`
`CacheView`, `vm/runtime`) build a getter per read and need a view-level
accumulator to meter — left for a follow-up.

## Verification

- `make erigon` / `go build ./...` (the import-cycle gate), `make lint`
0 issues.
- `go test -race ./db/state/kvmetrics` — incl. a test proving `TrySend`
never blocks on a full buffer, concurrent Send+Snapshot+Close-drain, and
correct grouped folding.
- **hive `ethereum/engine` cancun with `KV_READ_METRICS=true` at the
CI-pinned hive ref = 226/0.**

🤖 Generated with [Claude Code](https://claude.com/claude-code)
mh0lt added a commit that referenced this pull request Jun 10, 2026
…bsystem

Remove the contract trunk-pin preload and the adaptive pin controller from
the consolidation PR so the BranchCache core (root slot + LRU tail) and the
#21138 fix can land independently. The subsystem carries a consensus blocker
(immortal txN=0 pins survive UnwindTo yet cache mutable MDBX state -> wrong
root on reorg) plus most of the review majors, and needs its own benchmarks
and an opt-in flag.

Deleted: adaptive_pin.go, preload.go, preload_parallel.go, preload_ranges.go,
trunk_pin_metrics.go and their tests. Removed the BranchCache pinned tier
(PinEntry/PinnedCount/PinnedStats/TryClaimPreload/MissCallback/onMiss and the
ContractHashFromPrefix helper) and the SharedDomains controller wiring
(adaptivePinController, triggerTrunkPreload, EnableParaTrieDB preload+Bind).
Kept the root slot, LRU tail, and EnableParaTrieDB core.

The subsystem is re-added on mh/branch-cache-trunk-pin for re-implementation
with watermark txN tagging, same-prefix tail eviction, and an opt-in flag,
to land after the state-cache PR (#21386).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt added a commit that referenced this pull request Jun 10, 2026
Contract storage-trunk pinning + the adaptive pin controller, extracted from
the consolidation PR (#21380) to be landed on its own after the state-cache
PR (#21386). This branch re-adds the subsystem unchanged on top of the clean
BranchCache core as a single additive commit, as the base for re-implementation.

KNOWN BLOCKER to fix before this lands (do NOT ship as-is):
  - PinEntry tags pins with txN=0, which UnwindTo treats as immortal, yet the
    pinned bytes come from mutable MDBX commitment state. A reorg below a
    pinned block can't evict the pin -> stale branch bytes -> wrong root.
  Fix: tag pins with the conservative watermark (step+1)*stepSize-1 so UnwindTo
  evicts a pin exactly when its source data is unwound (file-sourced pins get
  an ancient watermark and stay effectively immortal -> no perf loss); plus
  PinEntry must evict any pre-existing same-prefix LRU-tail entry; add a
  pin-then-unwind regression test; gate behind an opt-in ENABLE_ADAPTIVE_PIN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt and others added 2 commits June 10, 2026 23:30
…st; make state cache an SD detail

The Nethermind-style addr→codeHash LRU is invalidated only at flush, so
consulting it before sd.mem returned a codeHash stale relative to an
in-batch account write (a 7702 set/clear, a selfdestruct) — a non-empty
codeHash beside an empty mem-routed code read, which surfaces on re-exec
as EIP-3607 "sender not an eoa" (codeHash-no-code). Route codeHashForAddr
through sd.mem / parent.mem first; the LRU becomes a committed-state layer
that may only answer once mem has missed.

Make the state cache an SD implementation detail so no caller can consult
it out of precedence: drop GetStateCache(), fold StateCache.Unwind into
sd.Unwind (alongside the BranchCache unwind), and route PrintStatsAndReset
through a new nil-safe sd.PrintCacheStats().

Adds TestCodeHashForAddr_InBatchAccountWinsOverStaleLRU (an in-batch
account write must override a stale LRU entry; proven to fail on the
prior LRU-first precedence).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…revent poisoning

The content-addressed code cache (codeHash->code) was populated using a
separately-read account codeHash as the key. Under parallel/speculative exec
that hash can be skewed or cross-account, so a (codeHash, code) pair that
doesn't satisfy keccak(code)==codeHash could enter the shared map and corrupt
every account sharing that codeHash — surfacing as a wrong-forwarder gas
divergence, and (once the bad bytes are persisted) codeHash-no-code /
"sender not an eoa" on cold re-read.

Key every content-cache entry by the code's OWN hash, keccak(code), at the
single SharedDomains getter populate path and the flush callback, and bring
the read-ahead prefetcher onto the same model (dropping the skewable codeHash
hint). Speculative code stays in the version map and never enters the global
cache; the cache is populated only on a real read through the shared domain,
so a skewed account read can no longer produce a mismatched entry.

Validated by a fresh mainnet resync across the former-corruption range
(25.27M-25.29M): the previously-deterministic gas divergence / sender-not-an-eoa
no longer reproduces, forward or on cold restart re-exec.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt added a commit that referenced this pull request Jun 11, 2026
…ters

The sstoreInsert/Update/Delete/Noop and hasStorageMiss package-global atomics
were incremented on the commitment/exec hot paths but their getters
(SstoreClassificationCounts, HasStorageMissCount) have no consumer anywhere in
the stack (#21380, #21386, the perfviz view) — write-only prototype perf-debug
scaffolding. The canonical metrics framework (kvmetrics, #21663) is in main and
covers KV reads, not these. Remove the counters, their Record* funcs/getters,
and the call sites; this also drops execution/state's only import of
execution/commitment.

If SSTORE classification is wanted in production later, express it via the
kvmetrics collector rather than bespoke globals.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt and others added 3 commits June 11, 2026 08:24
…WriteSet

normalizeWriteSet recovered an account's CodeHashPath from the versionMap (via
the CodeHashPath case and the fill-missing-fields loop) but had no equivalent
recovery for CodePath: the CodePath case kept the write only at the validated
incarnation, and the fill loop never emitted code. A tx whose validated
writeset lacked a fresh CodePath — e.g. an EIP-7702 delegating tx that
re-executes, where SetCode short-circuits because so.Code() already returns the
designator written by the prior incarnation (bytes.Equal(prevcode, code)) —
therefore persisted a non-empty codeHash with no code bytes. A later block then
read empty code for the delegated account, and the EIP-3607 sender check
wrongly rejected the 7702 sender ("sender not an eoa").

Recover the code this tx wrote from the versionMap (incarnation-agnostic,
scoped to this tx so a merely-touched contract's prior-tx code is not
re-emitted) whenever an account has a non-empty codeHash but no code in the
normalized output, mirroring CodeHashPath. Code can no longer be lost while its
hash survives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The earlier recovery only re-emitted code found in THIS tx's versionMap
(rr.Version().TxIndex == txIndex). On the real failure path that guard
misses: a re-executing 7702 delegation whose code equals the already-
committed designator makes IBS.SetCode short-circuit (bytes.Equal), so
the validated incarnation writes no CodePath and the prior incarnation's
versionMap entry is invalidated on re-exec — the versionMap holds nothing
for this tx. The fill-missing loop still fills CodeHashPath from committed
state, so the account persists a codeHash with no code; a later 7702
sender then reads empty code and is wrongly rejected "sender not an eoa"
(observed re-executing mainnet blocks 25277235 / 25279079 / 25280960).

Recover the designator from the versionMap, else fall back to the
post-state via stateReader.ReadAccountCode (mirroring how CodeHashPath is
recovered). Gate emission on types.ParseDelegation so only 7702
designators are re-emitted — never ordinary unchanged contract code for a
touched contract (no write amplification, no callee-code misattribution).

This prevents the drop during forward execution. It cannot repair state
already collated into immutable snapshots with codeHash-but-no-code; that
needs a snapshot unwind (separate, in development).

Adds TestNormalizeWriteSet_CodePathRecoveredFromStateReader for the
short-circuit/stateReader path; the existing versionMap-path test stays.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…perf-statecache-lru-pr

Reconcile #21386 onto the cleaned commitment-cache model from #21380.

Resolution:
- branch_cache.go (+test), temporal_mem_batch.go, kv_interface.go -> take
  #21380's cleaned model: reduced BranchCache API, txN-watermark UnwindTo,
  FlushOption pattern, flush-callback-after-MDBX-write.
- preload*.go, trunk_pin_test.go -> deleted (trunk-pin extracted to
  mh/branch-cache-trunk-pin).
- domain_shared.go -> combine: keep #21386's StateCache refresh, keccak
  code-cache fix, codeHash->code read bypass, per-worker kvmetrics (wm) and
  the collector/reqMetrics fields; adopt #21380's cleaned FlushOption
  multi-domain callback (cb-after-MDBX-write), MeteredGetterWithTxN watermark
  + txN=0 skip; drop the extracted adaptive-pin controller + PublishMetrics.

Notable behavioural deltas (flagged for review):
- BranchCache unwind: #21386's epoch/unwindFloor model -> #21380's
  txN-watermark UnwindTo (same effect: evict entries above the unwind point).
- StateCache flush entries now stamped with sd.txNum (batch high-water) as the
  unwind watermark: the cleaned WithFlushCallback exposes step, not per-key
  txN; sd.txNum is a safe conservative upper bound.
- temporal_mem_batch.go DomainMetrics refs retargeted changeset -> kvmetrics.

Carries: keccak codeHash fix, #21706 CodePath recovery, stateCache.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt and others added 4 commits June 11, 2026 09:44
…h callback

Make the commitment BranchCache honor the same (txNum, epoch) unwind model as
the state cache GenericCache, and carry per-key txNum (not step) into flush-time
cache population. Addresses #21752.

BranchCache:
- Add epoch + unwindFloor; stamp entries with the write txN and the epoch they
  were written in. Get drops a superseded-epoch entry whose txN is at/above the
  floor lazily on read (>= floor matches GenericCache). Unwind bumps the epoch
  and lowers the floor — O(1), no tail scan. Replaces the O(n) UnwindTo
  iterate-and-evict. Frozen (txN 0) and current-epoch entries always survive.

Flush callback (kv tidy):
- WithFlushCallback / FlushConfig.DomainCallbacks now deliver the value's per-key
  txNum, not just the step. temporal_mem_batch passes latest.txNum.
- SharedDomains.Flush stamps branch and state cache entries with that per-key
  txNum, so unwind invalidation is tx-precise (an unwind to a txNum inside the
  latest step drops exactly the entries above it, not the whole step).

Tests: BranchCache unwind tests rewritten for the epoch model (lazy drop, floor
boundary, current-epoch survival, frozen survival).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A contract's code value is invariant for a given hash, but its EXISTENCE is
not: code deployed on a fork that is later unwound must no longer be
discoverable — including by codeHash — so it can't be served as live state on
the surviving fork. The code cache therefore stops treating its content layers
as immutable and honors the same (txNum, epoch) model as the account/storage/
branch caches (#21752).

- Every layer (addr→code, addr→codeHash, codeHash→code, maphash→code, size)
  carries a (txNum, epoch) stamp. Get drops a superseded-epoch entry whose txNum
  is at/above the unwind floor lazily on read (decrementing the byte counters).
- Unwind bumps the epoch and lowers the floor — O(1), no scan — replacing the
  wholesale addr-layer Purge (which also nuked the whole warm working set every
  unwind). Re-deploying the same code on the live fork revives a stranded entry.
- Thread the value's write txNum (per-key on flush, step-derived on read, a
  conservative sd.txNum upper bound on derived populates) through
  PutWithCodeHash / PutAddrCodeHash / PutCodeSizeByCodeHash and the StateCache
  wrappers and call sites.
- Clear now hard-resets every layer (was: kept content as immutable).

Tradeoff: unwinding one deployment can drop code shared with a still-live one,
which is then re-fetched (a multiplicity cost) — accepted to keep stale code out
of the cache. Tests cover undiscoverable-after-unwind across all layers,
below-floor survival, and re-deploy revival.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Byte-match #21380's WithFlushCallback txNum doc so re-syncing this stacked
branch onto #21380 attributes the kv plumbing cleanly to #21380 (#21752).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…statecache-lru-pr

# Conflicts:
#	db/state/execctx/domain_shared.go
mh0lt added a commit that referenced this pull request Jun 11, 2026
Port the resident fixed-array trunk onto the #21380+#21386 cache stack: an
accountTrunk holding account-trie branches at nibble depths 1-4 in dense arrays
(d1[16]/d2[256]/d3[4096]/d4[65536]) indexed directly by the compact-hex prefix,
plus a per-contract storageTrunk (depths 0-3 + deep overflow) in a pinned map
keyed by account hash. Each slot is an atomic.Pointer, so trunk reads/writes
take no mutex and don't serialize through a shared lock the way the LRU tail
and a storage trunk's deep overflow map do.

Trunk and storage-trunk entries flow through the shared lookup/store/Invalidate
walk, so Get's (txN, epoch) staleness check covers them unchanged. Adds
PinEntry/PinnedCount and a SetMissCallback seam for the residency/adaptive layer
(added separately). BRANCH_CACHE_TRUNK_DISABLE routes depths 1-4 back to the
tail for A/B.
…sd.storage

The TemporalMemBatch flush-callback loop iterated sd.domains[domain] for every
domain, but StorageDomain values live in the separate sd.storage btree, not
sd.domains[StorageDomain] (see getLatest/DomainPut). So the StorageDomain flush
callback never fired: the stateCache's storage entries were only ever
read-populated and never flush-updated. Once a slot was cached, a later write
to it was invisible and the cache served the stale value on hit — surfacing
under parallel exec as a swap reading a stale reserve, reverting, and producing
a gas mismatch.

Iterate sd.storage for StorageDomain so its flush callback fires and the cache
is refreshed/invalidated on every committed slot change.

TestFlush_UpdatesStorageStateCache is a deterministic regression: it writes a
slot, flushes, overwrites it, flushes again, and asserts the cache reflects the
second write. It fails without this change and passes with it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mh0lt added a commit that referenced this pull request Jun 12, 2026
… both caches)

Pulls forward the minimal StateCache consistency guarantee from #21386 so this
PR is logically consistent across both aggregator-lifetime caches.

The StateCache write path (domainPut/DomainDel) now INVALIDATES the cache entry
instead of storing the written value. The value already lives in sd.mem (the
write path's local copy), so storing it in the cache both double-stored it and
placed an uncommitted value into a long-lived cache — which a failed commit
would leave ahead of MDBX (the same poisoning class fixed for the BranchCache).
The cache now holds only committed state; reads repopulate from committed files.

Consequently the ClearWithHash-on-invalid-block call is removed: an invalid
block (and fork validation, which never commits) only invalidates entries, never
stores wrong ones, so there is nothing to clear.

#21386 will, on rebase, add warm post-commit repopulation back under its
txNum/epoch model and remove the now-redundant block-hash machinery
(ValidateAndPrepare/ClearWithHash). Reorg invalidation (RevertWithDiffset) and
the read-through populate are unchanged here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…onto #21380)

Brings #21380's review-fix commit (f4c82e8: trimmed comments, dropped debug
logs) onto #21386, plus the net-zero invalidate-on-write commit + its revert.

Conflict resolutions:
- db/state/execctx/domain_shared.go: reconciled the two commit-gating models into
  one. flushMem/Flush stay plain (cold-but-correct). Commit now stashes the
  flush-callback tuples for ALL caches (CommitmentDomain->BranchCache,
  Accounts/Storage/Code->StateCache) during flush and applies them only after
  tx.Commit() succeeds — extending #21380's by-construction commit-gating to the
  StateCache so no cache can be advanced past durable MDBX on a failed commit.
  Restored ProbeReadLayers (dropped by auto-merge).
- db/state/changeset/state_changeset.go: kept #21386's relocation of the metrics
  types to db/state/kvmetrics; dropped the stale duplicate DomainIOMetrics block.
- execution/commitment/branch_cache_test.go: kept #21386's Unwind-API test; dropped
  #21380's UnwindTo-API tests (that API was renamed/superseded in #21386).
- execution/commitment/commitmentdb/commitment_context.go: kept #21386's sd
  interface additions (ProbeReadLayers, Metrics, kvmetrics import).
- db/state/execctx/flush_storage_cache_test.go: rewrote Flush->Commit since the
  caches are now commit-gated (Flush no longer warms them).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants