Skip to content

Add state caches that persist throughuout blocks#18868

Merged
Giulio2002 merged 92 commits into
mainfrom
code-cache
Feb 2, 2026
Merged

Add state caches that persist throughuout blocks#18868
Giulio2002 merged 92 commits into
mainfrom
code-cache

Conversation

@Giulio2002

@Giulio2002 Giulio2002 commented Jan 29, 2026

Copy link
Copy Markdown
Contributor

Performance: Size-capped execution caches for chain-tip processing

Summary

This PR introduces a suite of size-capped caches designed to improve chain-tip block execution performance by reducing redundant database reads. The caches are optimized for data locality and use a simple "stop growing when full" eviction policy rather than LRU, which avoids the overhead of tracking access patterns while still providing cache benefits for hot data.

Key Design Decisions

Size-capped caches that don't evict

All caches in this PR are capped by byte size and do not evict entries when full - they simply stop accepting new entries. This design:

  • Favors data locality by keeping the first N entries that fit within the capacity
  • Avoids LRU overhead (no access time tracking, no eviction scans)
  • Is simpler to reason about and more predictable
  • Works well for block execution where working sets are relatively stable

Cache Components

  1. StateCache (execution/cache/state_cache.go) - Unified cache for domain data:

    • Account: 256 MB
    • Storage: 128 MB
    • Commitment: 128 MB
    • Code: 512 MB (code bytes) + 16 MB (address mappings)
  2. CodeCache (execution/cache/code_cache.go) - Two-level cache for contract code:

    • Level 1: address → maphash(code) (mutable, cleared on reorg)
    • Level 2: maphash(code) → code (immutable, never cleared)
    • Enables efficient code deduplication (common with proxies/clones)
  3. GenericCache (execution/cache/generic_cache.go) - Bounded concurrent cache:

    • Thread-safe using maphash.Map
    • Tracks current byte size vs capacity
    • Updates to existing keys always allowed even at capacity

info@weblogix.biz and others added 11 commits January 29, 2026 16:45
- Remove code cache creation in exec3.go to prevent stale cache data
  between block generation and execution phases
- Simplify ValidateAndPrepare to only clear on actual hash mismatch
- Remove code cache integration from domain_shared.go GetLatest
  (cache management should happen at higher level)

This fixes TestDeleteRecreateSlotsAcrossManyBlocks which was failing
due to stale cached code being returned after selfdestruct/recreate.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…appings

This improves the code cache design:
- Level 1: addr→codeHash (262144 entries, mutable, cleared on reorg)
- Level 2: codeHash→code (2048 entries, immutable, never cleared)

Multiple addresses can share the same code (common with proxies/clones),
and code hash is immutable so the same hash always means same code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prints cache sizes and hit rates at the end of each block:
- addr cache: hits/total (hit%) size
- code cache: hits/total (hit%) size

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Don't populate cache for addresses already dirty in current block
- Capture dirty state at start of GetCode/GetCodeSize before getStateObject
- This prevents re-caching code after Selfdestruct removes the entry
- Fixes TestCVE2020_26265 and TestSelfDestructReceive with cache enabled
- Increase DefaultCodeCacheSize to 10_000

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
:qa
Merge remote-tracking branch 'origin/main' into code-cache
@Giulio2002 Giulio2002 changed the title Code cache Add state caches that persist throughuout blocks Jan 30, 2026
@Giulio2002 Giulio2002 enabled auto-merge (squash) February 2, 2026 14:53
Comment thread execution/vm/contract.go Outdated
codeHash = c.CodeHash.Value()
}

if !c.CodeHash.IsZero() {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like 2 times calling IsZero()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

@Giulio2002 only question:
how do we handle case when err/panic returned from exec/unwind? Maybe drop all caches in this case?

@Giulio2002

Giulio2002 commented Feb 2, 2026

Copy link
Copy Markdown
Contributor Author

@Giulio2002 only question: how do we handle case when err/panic returned from exec/unwind? Maybe drop all caches in this case?

drop all caches. all invalid blocks clear all caches. we have a correctness check in the form of the block hash stored in the state cache

@AskAlexSharov AskAlexSharov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Maybe ProcessFrozenBlocks also need it?

@Giulio2002

Copy link
Copy Markdown
Contributor Author

Maybe ProcessFrozenBlocks also need it?

too complex

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Maybe ProcessFrozenBlocks also need it?

too complex

What is the right way to perf-test this caches?

@Giulio2002 Giulio2002 merged commit 8f3279a into main Feb 2, 2026
21 checks passed
@Giulio2002 Giulio2002 deleted the code-cache branch February 2, 2026 16:47
mh0lt added a commit that referenced this pull request Jun 5, 2026
…emove the schedule-time ValidateAndPrepare purge

The state cache carried a per-domain blockHash and was scrubbed by ValidateAndPrepare
before every block. In the parallel executor that call sits in processRequest — a
*schedule* step, not an apply step — copied from the serial path (#18868). With a
32-deep pipeline and heavy retry traffic the single blockHash almost never equals the
next call's parentHash, so it took the wipe branch ~100% of the time: measured
storage-cache purge_rate ~100%, hit ~35% during catch-up, the cache wiped every block.

Make the cache what it should be — a SharedDomains implementation detail, populated only
at flush (committed, fork-agnostic state) and invalidated only on unwind. Coherence is
now txNum/epoch based, no block awareness and no diffset:

- Each GenericCache entry carries (txNum, epoch). A read is valid iff it was written in
  the current epoch OR its txNum is at/below unwindFloor (predates every unwind).
- Unwind(txNum) bumps the epoch and lowers the floor — O(1), no scan. Stale entries are
  dropped lazily on their next read. txNum slots are reused across forks, so the epoch
  (not the txNum) tells a dead fork's write from the live fork's at the same txNum.
- CodeCache clears its small mutable addr layers on unwind; immutable content-addressed
  code is kept.

Wiring: FlushWithCallback delivers txNum (cache stamps it; branchCache derives step);
read-population and read-ahead stamp the step's txNum upper bound. The three exec-flow
ValidateAndPrepare calls are removed; the unwind path calls stateCache.Unwind(txNum)
unconditionally (diffset-free, matching the overlay's maxtx prune — diffsets aren't
generated below the reorg window, so the old changeSet-gated cache revert left a stale
gap). RevertWithDiffset/blockHash/ClearWithHash and the fork-validation cache scrub are
removed. (DB-level diffset retirement is a follow-on.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mh0lt added a commit that referenced this pull request Jun 5, 2026
…emove the schedule-time ValidateAndPrepare purge

The state cache carried a per-domain blockHash and was scrubbed by ValidateAndPrepare
before every block. In the parallel executor that call sits in processRequest — a
*schedule* step, not an apply step — copied from the serial path (#18868). With a
32-deep pipeline and heavy retry traffic the single blockHash almost never equals the
next call's parentHash, so it took the wipe branch ~100% of the time: measured
storage-cache purge_rate ~100%, hit ~35% during catch-up, the cache wiped every block.

Make the cache what it should be — a SharedDomains implementation detail, populated only
at flush (committed, fork-agnostic state) and invalidated only on unwind. Coherence is
now txNum/epoch based, no block awareness and no diffset:

- Each GenericCache entry carries (txNum, epoch). A read is valid iff it was written in
  the current epoch OR its txNum is at/below unwindFloor (predates every unwind).
- Unwind(txNum) bumps the epoch and lowers the floor — O(1), no scan. Stale entries are
  dropped lazily on their next read. txNum slots are reused across forks, so the epoch
  (not the txNum) tells a dead fork's write from the live fork's at the same txNum.
- CodeCache clears its small mutable addr layers on unwind; immutable content-addressed
  code is kept.

Wiring: FlushWithCallback delivers txNum (cache stamps it; branchCache derives step);
read-population and read-ahead stamp the step's txNum upper bound. The three exec-flow
ValidateAndPrepare calls are removed; the unwind path calls stateCache.Unwind(txNum)
unconditionally (diffset-free, matching the overlay's maxtx prune — diffsets aren't
generated below the reorg window, so the old changeSet-gated cache revert left a stale
gap). RevertWithDiffset/blockHash/ClearWithHash and the fork-validation cache scrub are
removed. (DB-level diffset retirement is a follow-on.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants