Skip to content

rpc: add canonical hash cache#19173

Merged
lupin012 merged 9 commits into
mainfrom
lupin012/addCacheOnCanonicalHash
May 3, 2026
Merged

rpc: add canonical hash cache#19173
lupin012 merged 9 commits into
mainfrom
lupin012/addCacheOnCanonicalHash

Conversation

@lupin012

@lupin012 lupin012 commented Feb 14, 2026

Copy link
Copy Markdown
Contributor

Adds a canonicalHashCache LRU (10,000 entries, 32 bytes/entry, ~320 KB) to BlockReader to cache blockNum → hash lookups for snapshot-range blocks.

Why a separate cache instead of increasing headerByNumCache:
headerByNumCache stores full *Header objects (~500 bytes/entry, 1,000 slots, ~500 KB). To cover the same 10,000-block working set it would require ~5 MB. canonicalHashCache achieves the same coverage at 16× less memory by storing only the 32-byte hash.

Two regimes:

  • Working set ≤ 1,000 blocks (headerByNumCache warm): canonicalHashCache short-circuits ViewSingleFile (refcount increments + closure allocation) which main always pays even when the *Header is cached → ~3.3× (572 → 167 ns)
  • Working set > 1,000 and ≤ 10,000 blocks (headerByNumCache evicted, canonicalHashCache still warm): main falls back to full snapshot path → ~33× (5,119 → 167 ns)

Results (real mainnet snapshots, i9-14900HX):

Benchmark ns/op B/op allocs/op Notes
BenchmarkCanonicalHash_RealSnapshot 167 8 1 this PR — canonicalHashCache hit
BenchmarkCanonicalHash_RealSnapshot_MainEquivalent 572 176 5 main equivalent — headerByNumCache warm
BenchmarkCanonicalHash_RealSnapshot_Cold 5,119 1,657 11 working set > 1,000 blocks

Why MainEquivalent (572 ns) is higher than the theoretical estimate (~235 ns): ViewSingleFile is called on every CanonicalHash invocation even when headerByNumCache is warm. It atomically increments the refcount of each non-frozen segment and allocates a release closure — 5 allocs, 176 B per call. The theoretical estimate only counted MDBX miss + LRU lookup + atomic hash load and did not account for this overhead.

@lupin012 lupin012 changed the title rpc: add canoncal hash cache rpc: add canonical hash cache Feb 14, 2026
@Giulio2002

Giulio2002 commented Feb 14, 2026

Copy link
Copy Markdown
Contributor

why

Thought a cache for BN-to-hash translation (for historical block) would be more efficient (normally the api requests BN).
The cache is populated only post-snapshot, once the mapping between block number and hash is consolidated and finalized. Since these records are immutable and immune to reorgs at this stage, the cache should provides a better performance boost without requiring any invalidation logic
If you think it's not needed, I'll revert it.

Comment thread db/snapshotsync/freezeblocks/block_reader.go
@yperbasis yperbasis added the RPC label Feb 16, 2026
@lupin012 lupin012 marked this pull request as ready for review February 18, 2026 21:30
@yperbasis yperbasis added this to the 3.5.0 milestone Mar 22, 2026
@yperbasis yperbasis requested a review from Copilot April 17, 2026 10:58

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide performance measures

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory LRU cache for canonical block hashes returned from snapshot headers to reduce repeated snapshot lookups during RPC reads.

Changes:

  • Introduce canonicalHashCache LRU on BlockReader, configurable via RPC_CANONICAL_HASH_LRU.
  • Use the cache in BlockReader.CanonicalHash for snapshot-derived canonical hashes (DB results still take precedence and are not cached).
  • Add unit tests around DB-hit/DB-miss behavior and ensuring DB results are not cached.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
db/snapshotsync/freezeblocks/block_reader.go Adds canonical-hash LRU cache and uses it on the snapshot-read path in CanonicalHash.
db/snapshotsync/freezeblocks/block_reader_test.go Adds tests validating non-caching behavior for DB hits and misses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/snapshotsync/freezeblocks/block_reader_test.go Outdated
Comment thread db/snapshotsync/freezeblocks/block_reader.go
@lupin012 lupin012 force-pushed the lupin012/addCacheOnCanonicalHash branch from 6451ea4 to 5d8a6cc Compare April 29, 2026 19:33
lupin012 and others added 3 commits April 29, 2026 21:40
Three benchmarks to quantify the effect of the LRU cache (PR #19173):
- DBPath: raw MDBX lookup; identical on main and this branch
- SnapshotNoCache: header.Hash() per call; what main pays for every
  snapshot-range lookup (real cost also includes headerFromSnapshot)
- SnapshotCacheHit: LRU Get() per call; steady-state cost on this branch

Run with: go test -bench=BenchmarkCanonicalHash -benchmem ./db/snapshotsync/freezeblocks/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istic cost

header.Hash() caches the result per-object via an atomic pointer; reusing the
same Header across benchmark iterations measured the atomic load (~18 ns) rather
than the actual hash computation. Switch to CalcHash() which always recomputes
the RLP+keccak256, matching the real cost of headerFromSnapshot producing a
fresh *Header on every CanonicalHash call on main.

Updated numbers on i9-14900HX:
  MDBXLookup:           201 ns  (baseline, same on both branches)
  HeaderHash_Realistic: 1138 ns (what main pays per call for snapshot blocks)
  LRUCacheHit:           35 ns  (what this branch pays after cache warmup)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TestCanonicalHashCache_SnapshotPath: creates a minimal real headers
  snapshot segment (no DB canonical entry) and asserts that CanonicalHash
  populates canonicalHashCache on the first call and serves subsequent
  calls from the cache.
- Fix misleading comment in TestCanonicalHashCache_MultipleBlocks: reads go
  through the DB path which intentionally does not populate the cache.
- Add defer rwTx.Rollback() in BenchmarkCanonicalHash_MDBXLookup to satisfy
  ruleguard lint rule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/snapshotsync/freezeblocks/block_reader_test.go
Comment thread db/snapshotsync/freezeblocks/block_reader_bench_test.go Outdated

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The headline "~39× speedup" overstates the win vs. main

The benchmark compares warm cache (209 ns) against the cold path (8088 ns). But main already caches the full *Header in headerByNumCache inside headerFromSnapshot
(block_reader.go:953), and Header.Hash() is itself per-object atomic-cached (execution/types/block.go:511-521). On main, the second call for the same block pays only:

  • MDBX miss (~200 ns)
  • headerByNumCache.Get (~30 ns)
  • header.Hash() returning the per-object cached value (~5 ns)

So on main the steady-state hot call is already ~235 ns — very close to this PR's 209 ns. The real wins this PR delivers are:

  • Larger working set (10k vs 1k slots) → more hits when the request stream cycles through many distinct historical blocks (e.g., archive eth_getLogs, range tracing, snapshot-range
    IsCanonical calls).
  • A small per-call constant saved on cache hits (~50 ns from skipping headerByNumCache.Get + cached Hash()).

A BenchmarkCanonicalHash_RealSnapshot_MainEquivalent (warm headerByNumCache only, no canonicalHashCache) would let you report the actual delta vs. main. The PR's own caveat — "less
than 1.5% of total RPC latency" — is the more honest framing; consider reflecting that in the PR description.

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz address the two new copilot comments

… benchmark comment

Close compressor and recsplit index explicitly before calling snapshots.OpenFolder()
in TestCanonicalHashCache_SnapshotPath — deferred closes run too late on Windows
where open mmapped files block subsequent mmap of the same path.

Remove BenchmarkCanonicalHash_HeaderHash_Minimal from the overview comment; the
function was never added.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012

Copy link
Copy Markdown
Contributor Author

@yperbasis

Results (real mainnet snapshots, i9-14900HX): - BenchmarkCanonicalHash_RealSnapshot 167 ns/op 8 B/op 1 allocs/op (this PR — canonicalHashCache hit)

  • BenchmarkCanonicalHash_RealSnapshot_MainEquivalent 572 ns/op 176 B/op 5 allocs/op (main equivalent — headerByNumCache warm)
  • BenchmarkCanonicalHash_RealSnapshot_Cold 5,119 ns/op 1,657 B/op 11 allocs/op (working set > 1,000 blocks)
    Why MainEquivalent (572 ns) is higher than the theoretical estimate (~235 ns): ViewSingleFile is called on every CanonicalHash invocation even when headerByNumCache is warm. It atomically increments the refcount of each non-frozen segment and allocates a release closure — 5 allocs, 176 B per call.

…_MainEquivalent

Measures main's steady-state cost with headerByNumCache warm and no
canonicalHashCache — simulated by replacing canonicalHashCache with a
size-1 LRU that always misses when cycling through 1,000 distinct blocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lupin012 lupin012 added this pull request to the merge queue May 3, 2026
Merged via the queue into main with commit 4d282a4 May 3, 2026
38 checks passed
@lupin012 lupin012 deleted the lupin012/addCacheOnCanonicalHash branch May 3, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants