rpc: add canonical hash cache#19173
Conversation
|
why Thought a cache for BN-to-hash translation (for historical block) would be more efficient (normally the api requests BN). |
yperbasis
left a comment
There was a problem hiding this comment.
Please provide performance measures
There was a problem hiding this comment.
Pull request overview
Adds an in-memory LRU cache for canonical block hashes returned from snapshot headers to reduce repeated snapshot lookups during RPC reads.
Changes:
- Introduce
canonicalHashCacheLRU onBlockReader, configurable viaRPC_CANONICAL_HASH_LRU. - Use the cache in
BlockReader.CanonicalHashfor snapshot-derived canonical hashes (DB results still take precedence and are not cached). - Add unit tests around DB-hit/DB-miss behavior and ensuring DB results are not cached.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| db/snapshotsync/freezeblocks/block_reader.go | Adds canonical-hash LRU cache and uses it on the snapshot-read path in CanonicalHash. |
| db/snapshotsync/freezeblocks/block_reader_test.go | Adds tests validating non-caching behavior for DB hits and misses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6451ea4 to
5d8a6cc
Compare
Three benchmarks to quantify the effect of the LRU cache (PR #19173): - DBPath: raw MDBX lookup; identical on main and this branch - SnapshotNoCache: header.Hash() per call; what main pays for every snapshot-range lookup (real cost also includes headerFromSnapshot) - SnapshotCacheHit: LRU Get() per call; steady-state cost on this branch Run with: go test -bench=BenchmarkCanonicalHash -benchmem ./db/snapshotsync/freezeblocks/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istic cost header.Hash() caches the result per-object via an atomic pointer; reusing the same Header across benchmark iterations measured the atomic load (~18 ns) rather than the actual hash computation. Switch to CalcHash() which always recomputes the RLP+keccak256, matching the real cost of headerFromSnapshot producing a fresh *Header on every CanonicalHash call on main. Updated numbers on i9-14900HX: MDBXLookup: 201 ns (baseline, same on both branches) HeaderHash_Realistic: 1138 ns (what main pays per call for snapshot blocks) LRUCacheHit: 35 ns (what this branch pays after cache warmup) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TestCanonicalHashCache_SnapshotPath: creates a minimal real headers snapshot segment (no DB canonical entry) and asserts that CanonicalHash populates canonicalHashCache on the first call and serves subsequent calls from the cache. - Fix misleading comment in TestCanonicalHashCache_MultipleBlocks: reads go through the DB path which intentionally does not populate the cache. - Add defer rwTx.Rollback() in BenchmarkCanonicalHash_MDBXLookup to satisfy ruleguard lint rule. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yperbasis
left a comment
There was a problem hiding this comment.
The headline "~39× speedup" overstates the win vs. main
The benchmark compares warm cache (209 ns) against the cold path (8088 ns). But main already caches the full *Header in headerByNumCache inside headerFromSnapshot
(block_reader.go:953), and Header.Hash() is itself per-object atomic-cached (execution/types/block.go:511-521). On main, the second call for the same block pays only:
- MDBX miss (~200 ns)
- headerByNumCache.Get (~30 ns)
- header.Hash() returning the per-object cached value (~5 ns)
So on main the steady-state hot call is already ~235 ns — very close to this PR's 209 ns. The real wins this PR delivers are:
- Larger working set (10k vs 1k slots) → more hits when the request stream cycles through many distinct historical blocks (e.g., archive eth_getLogs, range tracing, snapshot-range
IsCanonical calls). - A small per-call constant saved on cache hits (~50 ns from skipping headerByNumCache.Get + cached Hash()).
A BenchmarkCanonicalHash_RealSnapshot_MainEquivalent (warm headerByNumCache only, no canonicalHashCache) would let you report the actual delta vs. main. The PR's own caveat — "less
than 1.5% of total RPC latency" — is the more honest framing; consider reflecting that in the PR description.
yperbasis
left a comment
There was a problem hiding this comment.
Plz address the two new copilot comments
… benchmark comment Close compressor and recsplit index explicitly before calling snapshots.OpenFolder() in TestCanonicalHashCache_SnapshotPath — deferred closes run too late on Windows where open mmapped files block subsequent mmap of the same path. Remove BenchmarkCanonicalHash_HeaderHash_Minimal from the overview comment; the function was never added. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Results (real mainnet snapshots, i9-14900HX): - BenchmarkCanonicalHash_RealSnapshot 167 ns/op 8 B/op 1 allocs/op (this PR — canonicalHashCache hit)
|
…_MainEquivalent Measures main's steady-state cost with headerByNumCache warm and no canonicalHashCache — simulated by replacing canonicalHashCache with a size-1 LRU that always misses when cycling through 1,000 distinct blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a
canonicalHashCacheLRU (10,000 entries, 32 bytes/entry, ~320 KB) toBlockReaderto cacheblockNum → hashlookups for snapshot-range blocks.Why a separate cache instead of increasing
headerByNumCache:headerByNumCachestores full*Headerobjects (~500 bytes/entry, 1,000 slots, ~500 KB). To cover the same 10,000-block working set it would require ~5 MB.canonicalHashCacheachieves the same coverage at 16× less memory by storing only the 32-byte hash.Two regimes:
headerByNumCachewarm):canonicalHashCacheshort-circuitsViewSingleFile(refcount increments + closure allocation) which main always pays even when the*Headeris cached → ~3.3× (572 → 167 ns)headerByNumCacheevicted,canonicalHashCachestill warm): main falls back to full snapshot path → ~33× (5,119 → 167 ns)Results (real mainnet snapshots, i9-14900HX):
BenchmarkCanonicalHash_RealSnapshotcanonicalHashCachehitBenchmarkCanonicalHash_RealSnapshot_MainEquivalentheaderByNumCachewarmBenchmarkCanonicalHash_RealSnapshot_Cold