perf(txmetacache): pointer-backed cache backend + interface dispatch#915
Conversation
|
🤖 Claude Code Review Status: Complete SummaryThis PR introduces a well-designed pointer-backed cache implementation as an experimental alternative to the byte-backed cache. The code demonstrates careful attention to correctness, particularly around pointer aliasing prevention and concurrent access safety. All previously identified issues from earlier reviews have been resolved:
Key Strengths
No Critical Issues FoundThe implementation is production-ready pending the end-to-end smoketest mentioned in the PR description. The code follows project conventions from CLAUDE.md and AGENTS.md with minimal changes, good test coverage, and appropriate caution flags (opt-in via BucketType.Pointer). Recommendation: Approve after the planned smoketest validates behavior under production workload patterns. |
There was a problem hiding this comment.
Pull request overview
This PR refactors TxMetaCache to dispatch through a small internal backend interface and introduces an opt-in experimental pointer-backed cache backend (BucketType.Pointer) intended to reduce allocation churn on cache hits by storing *meta.Data directly. It also removes unused height-appending logic and drops the hitOldTx metric that depended on it.
Changes:
- Add
cacheBackendinterface and adapt the existingImprovedCacheviaimprovedCacheBackend; updateTxMetaCacheto use backend dispatch. - Introduce
PointerCache(sharded map + per-bucket FIFO ring eviction) plus tests/benchmarks, selectable viaBucketType.Pointer. - Update consumers/tests to use the new
GetMetaCached(ctx, hash) (*meta.Data, bool)API; remove height-encoding + related metric.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| stores/txmetacache/txmetacache.go | Switch TxMetaCache to backend interface dispatch, add Pointer bucket type, simplify Get/Set paths, remove height appending. |
| stores/txmetacache/cache_backend.go | Define the internal backend interface used by TxMetaCache. |
| stores/txmetacache/improved_cache_meta.go | Adapter implementing cacheBackend for the existing byte-backed ImprovedCache. |
| stores/txmetacache/pointer_cache.go | New pointer-backed cache implementation (sharded map + FIFO key ring eviction). |
| stores/txmetacache/pointer_cache_test.go | Unit tests for the new pointer-backed cache behavior. |
| stores/txmetacache/pointer_cache_bench_test.go | Benchmarks comparing byte-backed vs pointer-backed paths and GC behavior. |
| stores/txmetacache/metrics.go | Remove hit_old_tx Prometheus metric wiring (no longer applicable). |
| stores/txmetacache/txmetacache_test.go | Update tests/benchmarks for new GetMetaCached API and backend adapter access. |
| stores/txmetacache/txmetacache_conflicting_test.go | Update conflicting-transaction tests to use new GetMetaCached signature. |
| services/subtreevalidation/processTxMetaUsingCache.go | Switch cache reads to the new GetMetaCached API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ordishs
left a comment
There was a problem hiding this comment.
Overview
Introduces an opt-in PointerCache backend storing *meta.Data directly in sharded map[uint64]*meta.Data shards with per-bucket FIFO eviction. Refactors TxMetaCache to dispatch through a new internal cacheBackend interface. Collapses GetMetaCached/GetMetaCachedWithBuffer into a single zero-buffer API returning (*meta.Data, bool). Drops the height-encoding/expiration dead code path (height was appended on every write but never read by any eviction logic).
Benchmarks show ~4.5× faster Set, ~11.5× faster Get, and zero GCs vs. the byte backend. The default BucketType is unchanged; Pointer mode is opt-in.
🚨 Critical Issues
1. Hash collisions in PointerCache silently corrupt data
stores/txmetacache/pointer_cache.go (the pointerBucket map and Get):
The map is keyed solely by xxhash.Sum64(hash[:]) — the original 32-byte hash is never stored or compared on read:
type pointerBucket struct {
m map[uint64]*meta.Data // uint64 key, no full-hash verification
...
}
func (c *PointerCache) Get(hash chainhash.Hash) (*meta.Data, bool) {
k := xxhash.Sum64(hash[:])
b := c.bucketFor(k)
b.mu.RLock()
d, ok := b.m[k]
b.mu.RUnlock()
return d, ok
}Compare to ImprovedCache.bucketNative.Get (improved_cache.go:971), which verifies the full key on hit:
if string(key) == string(chunk[idx:idx+keyLen]) {For a 64-bit hash with N cached entries, expected collisions are ~N² / 2⁶⁵. At ~10M live entries the per-pair probability multiplied by pair count is ~3×10⁻⁶ — across a fleet running for days this will happen. Impact: tx A's lookup returns tx B's metadata (Fee, SizeInBytes, IsCoinbase, TxInpoints), feeding directly into validation. This is a consensus-correctness risk if PointerCache is ever wired into a path that influences acceptance.
Fix options:
- Use
map[chainhash.Hash]*meta.Data—[32]byteis comparable and works as a map key. Slightly slower lookups but eliminates the collision class entirely. Likely still beats the byte path by >10×. - Or store
(chainhash.Hash, *meta.Data)in a struct and compare the hash on read.
Either is a small change. Given the PR positions PointerCache as experimental and opt-in, this is fixable before any real adoption — but it MUST be fixed before anyone flips the switch.
⚠️ Moderate Issues
2. Silent error swallowing in improvedCacheBackend.Get
stores/txmetacache/improved_cache_meta.go (the new Get):
if err := b.cache.Get(&scratch, hash[:]); err != nil {
if errors.Is(err, errors.ErrNotFound) {
return nil, false
}
return nil, false // any other error: silently treated as miss
}
...
if err := meta.NewMetaDataFromBytes(scratch, data); err != nil {
return nil, false // deserialise failure: silently treated as miss
}The old GetMetaCachedWithBuffer returned non-NotFound errors to the caller, who logged them (e.g., processTxMetaUsingCache.go's Warnf("error retrieving txMeta…")). After this PR, real cache corruption or deserialiser failures become invisible misses. The cacheBackend.Get signature is (*meta.Data, bool) with no error channel.
Fix: Either log inside improvedCacheBackend.Get (e.g., debug-level — watch for log floods if corruption is repeated) or change the interface to surface errors. Quietly degrading correctness to a cache miss is exactly the kind of regression that's hard to detect in production.
3. GetMeta does shallow struct copy — aliases slices with the cache in Pointer mode
stores/txmetacache/txmetacache.go (post-PR GetMeta):
if cached, ok := t.cache.Get(*hash); ok {
t.metrics.hits.Add(1)
*data = *cached // shallow copy — slice headers aliased
data.BlockIDs = make([]uint32, 0)
return nil
}*data = *cached copies the struct but slice headers in meta.Data (notably TxInpoints.ParentTxHashes) still point at the cache's backing arrays. In byte mode this is fine (each Get allocates a fresh *meta.Data with fresh slices). In Pointer mode, callers of GetMeta now silently share slice memory with every other reader and with the cache itself. The pointer-mode contract ("treat returned *meta.Data as read-only") is documented on cacheBackend.Get, but GetMeta's callers don't see that interface and have no signal that their data argument is now an alias.
This is a quiet semantic change that's easy to miss when reviewing future callers. Worth either deep-copying slices here, or documenting at the GetMeta doc comment that the result must be treated as read-only.
4. Del-then-Set leaks FIFO ring slots
stores/txmetacache/pointer_cache.go (Del and insert):
Del removes from the map but doesn't touch the ring:
func (c *PointerCache) Del(key []byte) {
k := xxhash.Sum64(key)
b := c.bucketFor(k)
b.mu.Lock()
delete(b.m, k) // ring slot still references k
b.mu.Unlock()
}If the same key is later re-inserted, insert sees exists == false and allocates a fresh ring slot. The old ring slot still names the (now-deleted) key, so when its eviction turn arrives we delete(b.m, evict) — a no-op — but the ring has effectively wasted that slot for a full ring lap.
Worse, the pattern Set → Del → Set → Del → Set can fill a bucket's ring with duplicate references to the same key, silently reducing effective capacity. This matches a real workflow: SetCache → Delete (from SetMinedMulti eviction) → SetCache again on the same hash.
Fix: In Del, mark the ring slot stale (e.g., sentinel 0) so it's skipped on eviction; or accept and document the trade-off.
ℹ️ Minor Issues / Nits
5. Prometheus metric hit_old_tx deleted with no migration note
stores/txmetacache/metrics.go
Genuinely dead code internally — the height-based expiration was never wired to eviction — but the Prometheus gauge teranode_tx_meta_cache_hit_old_tx is a public surface. Any Grafana dashboard, alert, or recording rule referencing it will break on deploy. Worth a release-notes line.
6. getTxMetaCacheReadBuffer / putTxMetaCacheReadBuffer pool become dead code
stores/txmetacache/txmetacache.go
After this PR neither GetMetaCached nor GetMeta calls these helpers. The pool helpers and txMetaCacheReadBufferPool should be removed for clarity (along with txMetaCacheReadBufferMaxRetain if no longer needed by improvedCacheBackend), unless intentionally kept for future callers.
7. Removed warning log "txMetaCache empty for %s"
Previously emitted when a hit returned zero bytes. Corner-case, but it had operational value as a "corruption smoke" signal. Worth re-adding inside improvedCacheBackend.Get once it's logging non-NotFound errors (issue #2).
8. pointerCacheAvgEntryBytes = 300 is acknowledged guesswork
stores/txmetacache/pointer_cache.go
The PR description flags this. A tx with many parents (>5–6) overflows the estimate and the bucket holds fewer entries than the MB budget suggests. Consider deriving per-bucket entry count from observed runtime.MemStats at warmup, or at least logging the effective capacity at construction so operators can audit it.
9. Benchmark comment uses "Bitcoin transactions"
stores/txmetacache/pointer_cache_bench_test.go
"Two parents is a reasonable median for Bitcoin transactions…"
Prefer "BSV transactions" or just "transactions" in this codebase.
10. _ = entries[0].hash keepalive hack
stores/txmetacache/pointer_cache_bench_test.go (benchMemoryEfficiency)
Use runtime.KeepAlive(entries) for clarity.
✅ Strengths
- Compile-time interface checks (
var _ cacheBackend = (*PointerCache)(nil)). - Concurrent reader/writer test (
TestPointerCache_ConcurrentReadersAndWriters) and deterministic FIFO eviction test (TestPointerCache_FIFOEvictionWhenRingFull). - Benchmarks cover Set/Get/concurrent-read-heavy/GC-pause/memory-efficiency with both backends.
- The dead-code removal (height encoding,
hitOldTxinternals) is honestly dead. - Interface adapter (
improvedCacheBackend) preserves existing tests/benchmarks againstImprovedCache's native byte API. - PR description is unusually thorough — caveats, alloc counts, and untouched production paths all called out.
Recommendation
Issue #1 (collision-safety) is non-negotiable before any candidacy for production use: a 64-bit hash with no full-key verification will produce silent wrong-data returns at fleet scale, and the cached fields feed validation. Issues #2, #3, and #4 are quiet correctness/observability regressions that should be addressed in the same pass. The rest is bookkeeping.
The architecture and benchmark results are genuinely compelling — the byte-mode allocation cost is real, and pointer-native storage is the right idea. Worth landing once the collision class is closed and the silent-error paths are loud again.
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-26 13:32 UTC |
PR #915 added txmetacache.BucketType.Pointer (experimental pointer-backed cache) but the resolver in services/subtreevalidation/Server.go only mapped strings unallocated|preallocated|trimmed|native. Setting subtreevalidation_txMetaCacheBucketType=pointer in production would have hit the default branch, logged a Warn, and silently downgraded to Unallocated — pointer mode unreachable via configuration. This wires the missing case ("pointer" -> txmetacache.Pointer), extends the setting longdesc with the value + canary guidance (zero-alloc Get path at the cost of larger steady-state heap), and adds the TestResolveTxMetaCacheBucketType test cases.
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
Three correctness/observability fixes from the Copilot review on #915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
692c955 to
6ebc707
Compare
oskarszoon
left a comment
There was a problem hiding this comment.
Request changes. Findings beyond ordishs's review.
-
setMinedInCacheParallel(txmetacache.go:709-738) silently drops BlockIDs in pointer mode. Path: retrieve txMeta → append blockID → SetCache →metadataOnly()strips BlockIDs. The append is a no-op. Anything gating on block membership returns stale answers. -
PR body says
appendHeightToValuewas dropped — it wasn't. Still at line 770, called in all four SetCacheMulti* methods. Comment added but the description is factually wrong. In pointer mode trailing 4 bytes are tolerated byNewMetaDataFromBytes, butlen+4allocation perSetCacheFromBytespartially undermines the zero-alloc claim for byte ingestion. -
metadataOnly()value-copies TxInpoints but shares the backing slice arrays —GetMeta's*data = *cachedthen propagates aliasing to all callers. Reinforces ordishs's #3: callers must treat result read-only, nothing enforces it. Detectable with-race. -
GC bench at 64 MB / 100K entries — 12× below production (256 MB / ~800K). Pause numbers not representative at scale.
-
SetMultiSequentialWithHashesignores supplied hashes in pointer mode (keys by embedded hash). Divergence from byte-backend keying — latent hazard for callers relying on consistent semantics across modes.
Four issues raised in the second-round review on PR bsv-blockchain#915: 1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was 1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti fans out across shards. Restored fan-out: group keys by xxhash bucket, run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold the serial loop is still used to skip bucketing overhead on tiny batches. 2. **Del left stale entries in the FIFO ring.** Del removed the map entry but b.ring[head..] kept pointing at the deleted hash, and b.full stayed true. A Set→Del→Set cycle could fill the ring with duplicate references to the same key, silently reducing effective capacity. Fixed in insert: when the ring is full, walk past stale slots (key absent from map) before evicting, up to one full lap. Del stays O(1). 3. **metadataOnly aliased TxInpoints slices with the caller.** A struct copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs sharing backing arrays. Concurrent readers can observe in-place mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone via Serialize + NewTxInpointsFromBytes — the only public path through subtree's API that produces fresh backing slices given voutIdxs is unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data) path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser already returns a fresh TxInpoints. 4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17 via its own length check) and the verbose log+logger plumbing on unmarshal failure. Kept the evict + counter increment — a corrupt entry would otherwise be reparsed on every subsequent Get for the same key. Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST not count an eviction. All race tests pass with -tags testtxmetacache. Pointer-mode benchmark numbers shift on the SetCache path (125 ns → 288 ns) due to the TxInpoints deep-clone — correctness over headline speed. Get path is unchanged at ~39 ns / 0 allocs.
ordishs
left a comment
There was a problem hiding this comment.
Overall
Pointer mode is well-designed, well-tested, and explicitly opt-in. Headline number (11× Get speedup, 0 GCs) is meaningful for a hot path. Approving with changes — one issue should be fixed before pointer mode is enabled in production.
Required before flipping pointer mode on
SetMultiSequential contract violation in PointerCache
stores/txmetacache/pointer_cache.go:295-301. The interface contract (cache_backend.go:30-35) says SetMultiSequential is "without goroutine fan-out" — the byte path's whole purpose is to skip errgroup spawning for partition-aligned receivers. The pointer implementation forwards to SetMultiFromBytes, which does fan out via errgroup per non-empty bucket.
Under pointer mode, the v2 Kafka receiver's "sequential" path will multiply goroutines: N partition workers × ~M buckets each. This defeats the design. Fix is to implement SetMultiSequential as a flat loop over keys/values (mirroring the byte path's small-batch branch) and skip the errgroup branch entirely.
Recommended
PR description is stale vs. the actual diff
Description claims appendHeightToValue, noOfBlocksToKeepInTxMetaCache, and hitOldTx are dropped. None are dropped:
appendHeightToValueis still called on every byte-mode insert (txmetacache.go:293, 315, 342, 365)noOfBlocksToKeepInTxMetaCacheis still computed and stored but never read (txmetacache.go:233, 246) — dead codehitOldTxis still in metrics struct and reported via Prometheus (metrics.go:131, 146, 165)
Either update the description to reflect what was reverted, or remove the unused pieces as originally intended. The claim "Saves a 4-byte alloc + per-Set GetBlockHeight() call" is no longer accurate.
GetMeta shallow-copies into caller's struct, aliasing the cache in pointer mode
stores/txmetacache/txmetacache.go:428-432:
if cached, ok := t.cache.Get(*hash); ok {
t.metrics.hits.Add(1)
*data = *cached
data.BlockIDs = make([]uint32, 0)*data = *cached is a struct copy — data.TxInpoints.ParentTxHashes now aliases the cache entry's backing slice. In byte mode this is benign (the returned *meta.Data is freshly allocated). In pointer mode the caller's data shares slices with the cache.
The read-only contract is documented on GetMetaCached, but GetMeta takes a caller-provided *meta.Data — the conventional "mine to mutate" pattern. A caller doing data.TxInpoints.ParentTxHashes = append(...) would corrupt the cache for every concurrent reader. Either document the aliasing rule on GetMeta or deep-clone TxInpoints on the GetMeta path under pointer mode.
metadataOnly fallback path leaks aliased TxInpoints
stores/txmetacache/pointer_cache.go:162-190. When Serialize() or NewTxInpointsFromBytes fails, the function falls through to cloned.TxInpoints = data.TxInpoints — a struct copy sharing ParentTxHashes with the caller. The comment says this is "preferable to dropping the entry entirely," but if the cause is real corruption, holding an entry whose slice is externally mutable is worse than dropping. Consider returning nil and bumping a counter so the rate is observable.
pointerCacheAvgEntryBytes = 320 is workload-dependent
Acknowledged in the PR caveats. 320 is calibrated against 2-parent synthetic txs; production txs with >5-6 parents will overflow and the cache holds fewer entries than budgeted. The MB sizing parameter has a hidden parent-count multiplier. Suggestions:
- Track actual entries-per-MB in
UpdateStatsand expose it - Or, after the first ~1k inserts, recompute an average entry size and resize the per-bucket rings
Low-priority
- Ring walk under lock when entries are stale (pointer_cache.go:114-131): worst case after a
SetMinedMultiburst is a linear walk of the ring under the bucket lock. Worth a metric (ring_walk_distance_p99or similar) so the operator can see if this becomes a tail-latency contributor. - Pointer mode still pays
appendHeightToValuecost even though the deserializer ignores trailing bytes. Skip the append whenbucketType == Pointer. - Missing test for
improvedCacheBackend.Getcorruption path — theunmarshalErrorseviction is fail-safe code that should have a regression test.
Strengths
- Compile-time interface assertion catches contract drift at build time
- Defensive cloning in
metadataOnly—TxInpointsround-trips through serialize/deserialize so the cache never aliases caller-owned backing slices. Exactly the right shape for a shared-pointer cache. - Per-shard FIFO eviction is simple and predictable; lock granularity matches the byte path
TestPointerCache_DelDoesNotLeakRingCapacityis exactly the right shape — pins a subtle invariant with a deterministic same-bucket collision setup- Benchmark coverage is more thorough than most PRs
Introduces an experimental pointer-backed implementation of the txmeta cache, selected via the new BucketType.Pointer enum. PointerCache stores *meta.Data directly in sharded native maps with per-bucket FIFO key-ring eviction; reads return the stored pointer with no copy or deserialisation. Both backends (ImprovedCache and PointerCache) now satisfy a small internal cacheBackend interface (Set/SetFromBytes/SetMultiFromBytes/Get/Del/UpdateStats). TxMetaCache holds one interface field and dispatches uniformly — no per-call-site if/else branching. The reader path in subtreevalidation switches to the simplified GetMetaCached(ctx, hash) (*meta.Data, bool) signature; the old caller-managed scratch-buffer variant is removed. Also drops dead height-encoding code: appendHeightToValue, noOfBlocksToKeep- InTxMetaCache, and hitOldTx were referenced in docs but never read by any eviction path (eviction is the ring buffer's job). Head-to-head benchmarks (Apple M3 Max, 64 MB cache, count=3 x 2s): Operation Existing byte path Pointer native Delta Set 233 ns / 1 alloc 52 ns / 0 allocs 4.5x faster Get 426 ns / 8 allocs 37 ns / 0 allocs 11.5x faster Concurrent (32:1 R:W) 121 ns / 7 allocs 74 ns / 0 allocs 39% faster GC pause p99 ~0.06-0.19 ms 0 ms 0 GCs in loop Pointer cache produces zero GCs during the benchmark window: the byte path's per-Get deserialisation allocates 7 small objects, dominating allocation traffic. The pointer cache's larger steady-state heap is offset by near-zero allocation churn. Default BucketType remains Unallocated; pointer mode is opt-in via the Pointer enum value. No callsite or wire-protocol changes are required.
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
PR bsv-blockchain#912 (perf(txmeta): v2 wire format + partition-aware receiver) landed on main while this PR was open and added new uses of appendHeightToValue plus two new SetCacheMultiSequential* methods that call ImprovedCache's new SetMultiSequential / SetMultiSequentialWithHashes. My earlier "dead code removal" was wrong by the time it landed — height encoding is part of the active contract, not legacy. Changes: - Restore appendHeightToValue method, hitOldTx metric (declaration + atomic counter + Prometheus gauge), and noOfBlocksToKeepInTxMetaCache field + init. Functionally the height bytes are still ignored on read in the byte backend, but they're now load-bearing for the v2 receiver in bsv-blockchain#912 and the metric is still wired up. - Re-introduce the height suffix on SetCacheFromBytes and SetCacheMulti. - Add SetMultiSequential and SetMultiSequentialWithHashes to the cacheBackend interface; improvedCacheBackend forwards to the underlying *ImprovedCache; PointerCache shares its implementation with SetMultiFromBytes (the per-bucket goroutine fan-out doesn't exist there, so the partition-aware variant is identical for free). - Fix the empty-value SetCacheMulti subtest assertion: stored entry is again 4 bytes (the height suffix), not 0.
CI's golangci-lint flagged stores/txmetacache/txmetacache.go for gci import-grouping violations after the rebase. Ran gci with the project's default settings (stdlib + third-party, no per-prefix split) across the files touched by this PR. No behavioural change.
Four issues raised in the second-round review on PR bsv-blockchain#915: 1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was 1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti fans out across shards. Restored fan-out: group keys by xxhash bucket, run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold the serial loop is still used to skip bucketing overhead on tiny batches. 2. **Del left stale entries in the FIFO ring.** Del removed the map entry but b.ring[head..] kept pointing at the deleted hash, and b.full stayed true. A Set→Del→Set cycle could fill the ring with duplicate references to the same key, silently reducing effective capacity. Fixed in insert: when the ring is full, walk past stale slots (key absent from map) before evicting, up to one full lap. Del stays O(1). 3. **metadataOnly aliased TxInpoints slices with the caller.** A struct copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs sharing backing arrays. Concurrent readers can observe in-place mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone via Serialize + NewTxInpointsFromBytes — the only public path through subtree's API that produces fresh backing slices given voutIdxs is unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data) path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser already returns a fresh TxInpoints. 4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17 via its own length check) and the verbose log+logger plumbing on unmarshal failure. Kept the evict + counter increment — a corrupt entry would otherwise be reparsed on every subsequent Get for the same key. Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST not count an eviction. All race tests pass with -tags testtxmetacache. Pointer-mode benchmark numbers shift on the SetCache path (125 ns → 288 ns) due to the TxInpoints deep-clone — correctness over headline speed. Get path is unchanged at ~39 ns / 0 allocs.
Three corrections from the second round of reviews on PR bsv-blockchain#915: 1. **SetMultiSequential must not fan out** (ordishs, required). The cacheBackend.SetMultiSequential contract says "no goroutine fan-out — caller already has parallelism." PointerCache.SetMultiSequential was forwarding to SetMultiFromBytes which fans out via errgroup. Under the v2 Kafka receiver this multiplies goroutines: N partition workers × M non-empty buckets each. Now a flat serial loop. SetMultiSequentialWith- Hashes mirrors it. Regression test pins the contract at a coarse level. 2. **GetMeta TxInpoints aliasing in pointer mode** (ordishs + oskarszoon). `*data = *cached` is a struct copy but ParentTxHashes still points at the cache's backing slice. In pointer mode the cache hands the same pointer to every concurrent reader, so a caller doing `append(data.TxInpoints.ParentTxHashes, …)` would corrupt every other reader. GetMeta now deep-clones TxInpoints under pointer mode via the new cloneTxInpoints helper. Byte mode is unaffected (its cached.Data is already freshly allocated). On clone failure GetMeta falls through to the store path rather than serving aliased data. 3. **metadataOnly fallback must not return aliased struct** (ordishs). On Serialize/Deserialize round-trip failure, the previous code returned a shallow-copy *meta.Data sharing ParentTxHashes with the caller — reintroducing the very aliasing that the clone exists to prevent. Now returns (nil, err); Set surfaces the error. Practically infallible for valid TxInpoints; on the rare pathological case, refusing to cache is better than caching corruptly. cloneTxInpoints is extracted as a shared helper so the same Serialize + NewTxInpointsFromBytes round-trip is used by metadataOnly (Set path) and GetMeta (Get path with pointer-mode aliasing guard). Regression tests: - TestPointerCache_SetDoesNotAliasCallerSlice — mutates the caller's ParentTxHashes after Set and asserts the cached entry is unchanged. - TestPointerCache_SetMultiSequentialDoesNotFanOut — exercises the sequential path with a 1024-entry batch and verifies every key is present after the synchronous call. All race tests pass under -tags testtxmetacache. Pointer-mode Set is unchanged (~288 ns including the deep-clone). GetMeta under pointer mode gains one extra Serialize+Deserialize per hit (~150 ns) — paid only on the non-hot-path Get (validator hot path uses GetMetaCached).
1b08533 to
453f53f
Compare
oskarszoon
left a comment
There was a problem hiding this comment.
Re-review at d5e8118. Most prior findings addressed, but the BlockIDs blocker is still unfixed.
Still outstanding — must fix:
setMinedInCacheParallel (txmetacache.go:735-765) wasn't touched by any of the fixup commits. Path is still: append blockID → SetCache → cache.Set → metadataOnly() strips BlockIDs. Mined updates via this path don't persist in pointer mode. Two-option fix:
- (A) Replace
SetCachewithcache.Delto force a fresh store fetch on next read. - (B) Skip the SetCache call when BlockIDs are already set, matching byte-mode semantics.
Resolved:
- TxInpoints aliasing —
metadataOnlyandGetMetadeep-clone viacloneTxInpoints(Serialize + NewTxInpointsFromBytes).TestPointerCache_SetDoesNotAliasCallerSlicepins it. Race tests green. SetMultiSequentialWithHashesignored-hash divergence — documented as accepted (both backends produce identical end-state via re-derived bucket from key bytes).appendHeightToValueretention — documented with rationale in code, accepted.- ordishs's
SetMultiSequentialfan-out — fixed, flat serial loop, test added. - ordishs's
Delorphan ring slot — fixed, stale-slot walk ininsert(), test added. - 64-bit hash collision class — keys switched to full
chainhash.Hash.
Still worth noting (not blocking):
- GC bench at 64 MB / 100K entries — 12× below prod load (256 MB / 800K). The headline GC pause numbers in the PR body aren't representative.
setMinedInCacheParallel was reading the tx, appending the new blockID, then calling SetCache. SetCache routes through the cacheBackend.Set adapter which strips BlockIDs in both backends — ImprovedCache because MetaBytes's wire format doesn't carry the field, PointerCache because metadataOnly strips it to match. The append + SetCache dance was therefore a no-op: mined updates via this path didn't persist. Replace the SetCache call with an eviction. The next read on each hash falls through to the underlying store, which has the up-to-date BlockIDs. Matches the established pattern in SetMinedMulti above. The function's `blockID` parameter is now unused — kept for API symmetry with the byte path. The redundant Get + utxoStore.Get fallback is also gone; we don't need to read the entry just to delete it from cache. Regression test added: TestTxMetaCache_Coverage/SetMinedMultiParallel_EvictsCache seeds the cache, calls SetMinedMultiParallel, and asserts the entry is absent via GetMetaCached.
|
oskarszoon
left a comment
There was a problem hiding this comment.
Approve at cb606b158a. BlockIDs blocker correctly fixed.
ae4d4c4386 goes further than my suggested Option A. The whole Get+append+SetCache dance in setMinedInCacheParallel is gone, replaced with a single t.Delete(ctx, hash) per hash. blockID param is now _ uint32 (API symmetry only).
End-to-end in pointer mode: Delete evicts stale entry → next GetMeta misses cache → falls through to utxoStore.GetMeta → returns updated BlockIDs from store → len(data.BlockIDs) != 0 so SetCache is skipped and the entry stays out of cache. Matches SetMinedMulti semantics. Bug resolved.
SetMinedMultiParallel_EvictsCache test pins the eviction contract.
The 5 main merges (742793902, 958f99088, bb7e4f049, d2e5e09aa, 50822d938) are in unrelated areas (orphan pool, opcodeInvert, p2p fee policy, testcontainers deps, dashboards) with zero interaction with txmetacache.
Worth flagging for future maintainers (not a blocker)
SetMinedMultiParallel is now a cache-only operation — it never calls utxoStore.SetMinedMulti. The "store written first, then evict cache" invariant is the caller's responsibility. There are zero production callers today (tests only). When a real caller lands, it must follow the same pattern SetMinedMulti uses: write to store first, then call this to evict. Worth a one-line godoc on the function to lock that in.


Summary
PointerCache) selected via a newBucketType.Pointerenum value. Stores*meta.Datadirectly in sharded native maps with per-bucket FIFO key-ring eviction; reads return the stored pointer with no copy or deserialisation.cacheBackendinterface soTxMetaCachedispatches through a single field. AddsGetMetaCached(ctx, hash) (*meta.Data, bool)as the simplified read API; the old caller-managed scratch-buffer variant is removed.appendHeightToValue,noOfBlocksToKeepInTxMetaCache,hitOldTx) is preserved — earlier drafts attempted to remove these but PR perf(txmeta): v2 wire format + partition-aware receiver + pooled buffers #912 actively usesappendHeightToValuein its newSetCacheMultiSequential*paths. Cleanup deferred to a follow-up PR after perf(txmeta): v2 wire format + partition-aware receiver + pooled buffers #912 settles.Default
BucketTyperemainsUnallocated; pointer mode is opt-in via the new enum value. Kafka wire protocol is unchanged.Head-to-head benchmarks
Apple M3 Max, 64 MB cache, BucketsCount=8192,
count=3 -benchtime=2s. Numbers reflect the final code with full-hash keying, metadata-only stripping with deep-cloned TxInpoints, and per-bucket fan-out onSetMultiFromBytes.Pointer-native
Setis slower than the byte path due to the mandatory TxInpoints deep-clone (correctness fix from reviewer feedback — without it, callers could corrupt cached entries via slice aliasing). The Kafka path (SetCacheFromBytes) is not materially affected becauseNewMetaDataFromBytesalready produces a fresh TxInpoints. The dominant win is onGet, which is the validator hot path.The pointer cache produces zero GCs during the benchmark window: the byte path's per-Get deserialisation allocates 7 small objects, dominating allocation traffic. The pointer cache's larger steady-state heap is offset by near-zero allocation churn.
Reviewer-requested fixes (this PR)
chainhash.Hashmap key (eliminates 2⁻⁶⁴ collision risk)metadataOnlystrips non-cached fields and deep-clonesTxInpoints(no slice aliasing with caller; matches byte backend's effective field set)Del+inserthandle stale ring slots (no silent capacity loss under Set→Del→Set cycles)SetMultiSequentialis a flat serial loop in pointer mode (honours the no-fan-out contract used by the v2 Kafka receiver)GetMetadeep-clonesTxInpointsunder pointer mode (breaks aliasing with the shared cache entry)improvedCacheBackend.Getevicts + counts unmarshal failures (prevents repeated bad-parse CPU drain)Caveats
pointerCacheAvgEntryBytes = 320is calibrated against 2-parent synthetic txs. Production txs with >5–6 parents will overflow that estimate and the cache holds fewer entries than budgeted. Either expose entries/MB viaUpdateStatsor resize per-bucket rings dynamically in a follow-up.setMinedInCacheParallelappendsBlockIDsthen callsSetCache, which strips them — has been a no-op forBlockIDspurposes in the byte path since before this PR (sinceMetaBytesdoesn't carry BlockIDs). Not introduced or worsened here; flagged for a separate cleanup.TxMetaCacheso no UTXO-store fan-out cost is included. A smoketest withBucketType.Pointerselected (requires a single-line flip atservices/subtreevalidation/Server.go:231) is the next step before any production use.Test plan
go build ./...go vet ./stores/txmetacache/... ./services/subtreevalidation/...go test -race -tags testtxmetacache ./stores/txmetacache/... ./services/subtreevalidation/...go test -bench=. -benchmem ./stores/txmetacache/— numbers aboveBucketType.Pointerselected (manual step before flipping in production)