perf(txmetacache): pointer-backed cache backend + interface dispatch by icellan · Pull Request #915 · bsv-blockchain/teranode

icellan · 2026-05-20T15:50:29Z

Summary

Introduces an experimental pointer-backed implementation of the txmeta cache (PointerCache) selected via a new BucketType.Pointer enum value. Stores *meta.Data directly in sharded native maps with per-bucket FIFO key-ring eviction; reads return the stored pointer with no copy or deserialisation.
Collapses the dual cache implementations behind a small internal cacheBackend interface so TxMetaCache dispatches through a single field. Adds GetMetaCached(ctx, hash) (*meta.Data, bool) as the simplified read API; the old caller-managed scratch-buffer variant is removed.
All existing dead-code referenced by PR perf(txmeta): v2 wire format + partition-aware receiver + pooled buffers #912 (appendHeightToValue, noOfBlocksToKeepInTxMetaCache, hitOldTx) is preserved — earlier drafts attempted to remove these but PR perf(txmeta): v2 wire format + partition-aware receiver + pooled buffers #912 actively uses appendHeightToValue in its new SetCacheMultiSequential* paths. Cleanup deferred to a follow-up PR after perf(txmeta): v2 wire format + partition-aware receiver + pooled buffers #912 settles.

Default BucketType remains Unallocated; pointer mode is opt-in via the new enum value. Kafka wire protocol is unchanged.

Head-to-head benchmarks

Apple M3 Max, 64 MB cache, BucketsCount=8192, count=3 -benchtime=2s. Numbers reflect the final code with full-hash keying, metadata-only stripping with deep-cloned TxInpoints, and per-bucket fan-out on SetMultiFromBytes.

Operation	Existing byte path	Pointer (current)	Delta
Set (pointer-native)	233 ns / 1 alloc	288 ns / 6 allocs	-24% (slower)
Set via SetFromBytes	233 ns / 1 alloc	~250 ns / 7 allocs	comparable
Get	426 ns / 8 allocs	39 ns / 0 allocs	10.9× faster
Concurrent (32:1 R:W)	121 ns / 7 allocs	74 ns / 0 allocs	39% faster
GC pause p99 (sat.)	~0.06–0.19 ms	0 ms	0 GCs in loop

Pointer-native Set is slower than the byte path due to the mandatory TxInpoints deep-clone (correctness fix from reviewer feedback — without it, callers could corrupt cached entries via slice aliasing). The Kafka path (SetCacheFromBytes) is not materially affected because NewMetaDataFromBytes already produces a fresh TxInpoints. The dominant win is on Get, which is the validator hot path.

The pointer cache produces zero GCs during the benchmark window: the byte path's per-Get deserialisation allocates 7 small objects, dominating allocation traffic. The pointer cache's larger steady-state heap is offset by near-zero allocation churn.

Reviewer-requested fixes (this PR)

Full 32-byte chainhash.Hash map key (eliminates 2⁻⁶⁴ collision risk)
metadataOnly strips non-cached fields and deep-clones TxInpoints (no slice aliasing with caller; matches byte backend's effective field set)
Del + insert handle stale ring slots (no silent capacity loss under Set→Del→Set cycles)
SetMultiSequential is a flat serial loop in pointer mode (honours the no-fan-out contract used by the v2 Kafka receiver)
GetMeta deep-clones TxInpoints under pointer mode (breaks aliasing with the shared cache entry)
improvedCacheBackend.Get evicts + counts unmarshal failures (prevents repeated bad-parse CPU drain)

Caveats

pointerCacheAvgEntryBytes = 320 is calibrated against 2-parent synthetic txs. Production txs with >5–6 parents will overflow that estimate and the cache holds fewer entries than budgeted. Either expose entries/MB via UpdateStats or resize per-bucket rings dynamically in a follow-up.
setMinedInCacheParallel appends BlockIDs then calls SetCache, which strips them — has been a no-op for BlockIDs purposes in the byte path since before this PR (since MetaBytes doesn't carry BlockIDs). Not introduced or worsened here; flagged for a separate cleanup.
GC benchmark is at 64 MB / 100k entries. The trend (zero alloc on hot Get) is unchanged at scale; absolute pause numbers at 256 MB / ~800k entries may differ.
All benchmarks ran on a nullstore-backed TxMetaCache so no UTXO-store fan-out cost is included. A smoketest with BucketType.Pointer selected (requires a single-line flip at services/subtreevalidation/Server.go:231) is the next step before any production use.

Test plan

go build ./...
go vet ./stores/txmetacache/... ./services/subtreevalidation/...
go test -race -tags testtxmetacache ./stores/txmetacache/... ./services/subtreevalidation/...
go test -bench=. -benchmem ./stores/txmetacache/ — numbers above
End-to-end smoketest with BucketType.Pointer selected (manual step before flipping in production)

github-actions · 2026-05-20T15:51:48Z

🤖 Claude Code Review

Status: Complete

Summary

This PR introduces a well-designed pointer-backed cache implementation as an experimental alternative to the byte-backed cache. The code demonstrates careful attention to correctness, particularly around pointer aliasing prevention and concurrent access safety.

All previously identified issues from earlier reviews have been resolved:

✅ Full 32-byte hash keys (no collision risk)
✅ Proper metadata stripping to match byte backend semantics
✅ Deep cloning of TxInpoints to prevent aliasing
✅ Parallel processing in SetMultiFromBytes for large batches
✅ Stale ring slot handling in insert/Del
✅ Accurate documentation and comments
✅ Unmarshal failure handling with eviction

Key Strengths

Defensive aliasing prevention: The implementation correctly deep-clones TxInpoints in both metadataOnly() and GetMeta() to prevent shared slice corruption.
Backend parity: metadataOnly() ensures PointerCache matches the byte backend field set, avoiding semantic drift between implementations.
Correct eviction logic: The stale ring slot handling properly walks past deleted entries without silently losing capacity.
Smart batching: SetMultiFromBytes uses serial processing for small batches (≤256 entries) and parallel bucketing for larger batches, avoiding overhead when unnecessary.
Clear documentation: Comments accurately describe tradeoffs, contracts, and edge cases throughout.

No Critical Issues Found

The implementation is production-ready pending the end-to-end smoketest mentioned in the PR description. The code follows project conventions from CLAUDE.md and AGENTS.md with minimal changes, good test coverage, and appropriate caution flags (opt-in via BucketType.Pointer).

Recommendation: Approve after the planned smoketest validates behavior under production workload patterns.

Copilot

Pull request overview

This PR refactors TxMetaCache to dispatch through a small internal backend interface and introduces an opt-in experimental pointer-backed cache backend (BucketType.Pointer) intended to reduce allocation churn on cache hits by storing *meta.Data directly. It also removes unused height-appending logic and drops the hitOldTx metric that depended on it.

Changes:

Add cacheBackend interface and adapt the existing ImprovedCache via improvedCacheBackend; update TxMetaCache to use backend dispatch.
Introduce PointerCache (sharded map + per-bucket FIFO ring eviction) plus tests/benchmarks, selectable via BucketType.Pointer.
Update consumers/tests to use the new GetMetaCached(ctx, hash) (*meta.Data, bool) API; remove height-encoding + related metric.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
stores/txmetacache/txmetacache.go	Switch `TxMetaCache` to backend interface dispatch, add `Pointer` bucket type, simplify Get/Set paths, remove height appending.
stores/txmetacache/cache_backend.go	Define the internal backend interface used by `TxMetaCache`.
stores/txmetacache/improved_cache_meta.go	Adapter implementing `cacheBackend` for the existing byte-backed `ImprovedCache`.
stores/txmetacache/pointer_cache.go	New pointer-backed cache implementation (sharded map + FIFO key ring eviction).
stores/txmetacache/pointer_cache_test.go	Unit tests for the new pointer-backed cache behavior.
stores/txmetacache/pointer_cache_bench_test.go	Benchmarks comparing byte-backed vs pointer-backed paths and GC behavior.
stores/txmetacache/metrics.go	Remove `hit_old_tx` Prometheus metric wiring (no longer applicable).
stores/txmetacache/txmetacache_test.go	Update tests/benchmarks for new `GetMetaCached` API and backend adapter access.
stores/txmetacache/txmetacache_conflicting_test.go	Update conflicting-transaction tests to use new `GetMetaCached` signature.
services/subtreevalidation/processTxMetaUsingCache.go	Switch cache reads to the new `GetMetaCached` API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ordishs

Overview

Introduces an opt-in PointerCache backend storing *meta.Data directly in sharded map[uint64]*meta.Data shards with per-bucket FIFO eviction. Refactors TxMetaCache to dispatch through a new internal cacheBackend interface. Collapses GetMetaCached/GetMetaCachedWithBuffer into a single zero-buffer API returning (*meta.Data, bool). Drops the height-encoding/expiration dead code path (height was appended on every write but never read by any eviction logic).

Benchmarks show ~4.5× faster Set, ~11.5× faster Get, and zero GCs vs. the byte backend. The default BucketType is unchanged; Pointer mode is opt-in.

🚨 Critical Issues

1. Hash collisions in `PointerCache` silently corrupt data

stores/txmetacache/pointer_cache.go (the pointerBucket map and Get):

The map is keyed solely by xxhash.Sum64(hash[:]) — the original 32-byte hash is never stored or compared on read:

type pointerBucket struct {
    m map[uint64]*meta.Data   // uint64 key, no full-hash verification
    ...
}

func (c *PointerCache) Get(hash chainhash.Hash) (*meta.Data, bool) {
    k := xxhash.Sum64(hash[:])
    b := c.bucketFor(k)
    b.mu.RLock()
    d, ok := b.m[k]
    b.mu.RUnlock()
    return d, ok
}

Compare to ImprovedCache.bucketNative.Get (improved_cache.go:971), which verifies the full key on hit:

if string(key) == string(chunk[idx:idx+keyLen]) {

For a 64-bit hash with N cached entries, expected collisions are ~N² / 2⁶⁵. At ~10M live entries the per-pair probability multiplied by pair count is ~3×10⁻⁶ — across a fleet running for days this will happen. Impact: tx A's lookup returns tx B's metadata (Fee, SizeInBytes, IsCoinbase, TxInpoints), feeding directly into validation. This is a consensus-correctness risk if PointerCache is ever wired into a path that influences acceptance.

Fix options:

Use map[chainhash.Hash]*meta.Data — [32]byte is comparable and works as a map key. Slightly slower lookups but eliminates the collision class entirely. Likely still beats the byte path by >10×.
Or store (chainhash.Hash, *meta.Data) in a struct and compare the hash on read.

Either is a small change. Given the PR positions PointerCache as experimental and opt-in, this is fixable before any real adoption — but it MUST be fixed before anyone flips the switch.

⚠️ Moderate Issues

2. Silent error swallowing in `improvedCacheBackend.Get`

stores/txmetacache/improved_cache_meta.go (the new Get):

if err := b.cache.Get(&scratch, hash[:]); err != nil {
    if errors.Is(err, errors.ErrNotFound) {
        return nil, false
    }
    return nil, false   // any other error: silently treated as miss
}
...
if err := meta.NewMetaDataFromBytes(scratch, data); err != nil {
    return nil, false   // deserialise failure: silently treated as miss
}

The old GetMetaCachedWithBuffer returned non-NotFound errors to the caller, who logged them (e.g., processTxMetaUsingCache.go's Warnf("error retrieving txMeta…")). After this PR, real cache corruption or deserialiser failures become invisible misses. The cacheBackend.Get signature is (*meta.Data, bool) with no error channel.

Fix: Either log inside improvedCacheBackend.Get (e.g., debug-level — watch for log floods if corruption is repeated) or change the interface to surface errors. Quietly degrading correctness to a cache miss is exactly the kind of regression that's hard to detect in production.

3. `GetMeta` does shallow struct copy — aliases slices with the cache in Pointer mode

stores/txmetacache/txmetacache.go (post-PR GetMeta):

if cached, ok := t.cache.Get(*hash); ok {
    t.metrics.hits.Add(1)
    *data = *cached                       // shallow copy — slice headers aliased
    data.BlockIDs = make([]uint32, 0)
    return nil
}

*data = *cached copies the struct but slice headers in meta.Data (notably TxInpoints.ParentTxHashes) still point at the cache's backing arrays. In byte mode this is fine (each Get allocates a fresh *meta.Data with fresh slices). In Pointer mode, callers of GetMeta now silently share slice memory with every other reader and with the cache itself. The pointer-mode contract ("treat returned *meta.Data as read-only") is documented on cacheBackend.Get, but GetMeta's callers don't see that interface and have no signal that their data argument is now an alias.

This is a quiet semantic change that's easy to miss when reviewing future callers. Worth either deep-copying slices here, or documenting at the GetMeta doc comment that the result must be treated as read-only.

4. Del-then-Set leaks FIFO ring slots

stores/txmetacache/pointer_cache.go (Del and insert):

Del removes from the map but doesn't touch the ring:

func (c *PointerCache) Del(key []byte) {
    k := xxhash.Sum64(key)
    b := c.bucketFor(k)
    b.mu.Lock()
    delete(b.m, k)        // ring slot still references k
    b.mu.Unlock()
}

If the same key is later re-inserted, insert sees exists == false and allocates a fresh ring slot. The old ring slot still names the (now-deleted) key, so when its eviction turn arrives we delete(b.m, evict) — a no-op — but the ring has effectively wasted that slot for a full ring lap.

Worse, the pattern Set → Del → Set → Del → Set can fill a bucket's ring with duplicate references to the same key, silently reducing effective capacity. This matches a real workflow: SetCache → Delete (from SetMinedMulti eviction) → SetCache again on the same hash.

Fix: In Del, mark the ring slot stale (e.g., sentinel 0) so it's skipped on eviction; or accept and document the trade-off.

ℹ️ Minor Issues / Nits

5. Prometheus metric `hit_old_tx` deleted with no migration note

stores/txmetacache/metrics.go

Genuinely dead code internally — the height-based expiration was never wired to eviction — but the Prometheus gauge teranode_tx_meta_cache_hit_old_tx is a public surface. Any Grafana dashboard, alert, or recording rule referencing it will break on deploy. Worth a release-notes line.

6. `getTxMetaCacheReadBuffer` / `putTxMetaCacheReadBuffer` pool become dead code

stores/txmetacache/txmetacache.go

After this PR neither GetMetaCached nor GetMeta calls these helpers. The pool helpers and txMetaCacheReadBufferPool should be removed for clarity (along with txMetaCacheReadBufferMaxRetain if no longer needed by improvedCacheBackend), unless intentionally kept for future callers.

7. Removed warning log `"txMetaCache empty for %s"`

Previously emitted when a hit returned zero bytes. Corner-case, but it had operational value as a "corruption smoke" signal. Worth re-adding inside improvedCacheBackend.Get once it's logging non-NotFound errors (issue #2).

8. `pointerCacheAvgEntryBytes = 300` is acknowledged guesswork

stores/txmetacache/pointer_cache.go

The PR description flags this. A tx with many parents (>5–6) overflows the estimate and the bucket holds fewer entries than the MB budget suggests. Consider deriving per-bucket entry count from observed runtime.MemStats at warmup, or at least logging the effective capacity at construction so operators can audit it.

9. Benchmark comment uses "Bitcoin transactions"

stores/txmetacache/pointer_cache_bench_test.go

"Two parents is a reasonable median for Bitcoin transactions…"

Prefer "BSV transactions" or just "transactions" in this codebase.

10. `_ = entries[0].hash` keepalive hack

stores/txmetacache/pointer_cache_bench_test.go (benchMemoryEfficiency)

Use runtime.KeepAlive(entries) for clarity.

✅ Strengths

Compile-time interface checks (var _ cacheBackend = (*PointerCache)(nil)).
Concurrent reader/writer test (TestPointerCache_ConcurrentReadersAndWriters) and deterministic FIFO eviction test (TestPointerCache_FIFOEvictionWhenRingFull).
Benchmarks cover Set/Get/concurrent-read-heavy/GC-pause/memory-efficiency with both backends.
The dead-code removal (height encoding, hitOldTx internals) is honestly dead.
Interface adapter (improvedCacheBackend) preserves existing tests/benchmarks against ImprovedCache's native byte API.
PR description is unusually thorough — caveats, alloc counts, and untouched production paths all called out.

Recommendation

Issue #1 (collision-safety) is non-negotiable before any candidacy for production use: a 64-bit hash with no full-key verification will produce silent wrong-data returns at fleet scale, and the cached fields feed validation. Issues #2, #3, and #4 are quiet correctness/observability regressions that should be addressed in the same pass. The rest is bookkeeping.

The architecture and benchmark results are genuinely compelling — the byte-mode allocation cost is real, and pointer-native storage is the right idea. Worth landing once the collision class is closed and the silent-error paths are loud again.

github-actions · 2026-05-20T16:06:18Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-915 (5a1fe8b)

Summary

Regressions: 0
Improvements: 0
Unchanged: 144
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.632µ	1.628µ	~	0.500
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	71.36n	71.73n	~	0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.85n	71.31n	~	0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.23n	71.28n	~	0.600
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	40.89n	39.16n	~	0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	60.13n	60.04n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	209.5n	199.1n	~	0.800
MiningCandidate_Stringify_Short-4	222.6n	221.8n	~	0.200
MiningCandidate_Stringify_Long-4	1.673µ	1.658µ	~	0.200
MiningSolution_Stringify-4	864.8n	860.5n	~	0.100
BlockInfo_MarshalJSON-4	1.810µ	1.808µ	~	1.000
NewFromBytes-4	130.6n	143.0n	~	0.100
AddTxBatchColumnar_Validation-4	2.711µ	2.489µ	~	0.700
OffsetValidationLoop-4	641.7n	641.9n	~	1.000
Mine_EasyDifficulty-4	68.41µ	67.78µ	~	0.100
Mine_WithAddress-4	7.159µ	7.850µ	~	0.100
BlockAssembler_AddTx-4	0.02938n	0.02736n	~	0.700
AddNode-4	11.27	11.29	~	1.000
AddNodeWithMap-4	12.34	11.84	~	0.100
DirectSubtreeAdd/4_per_subtree-4	58.53n	60.45n	~	0.700
DirectSubtreeAdd/64_per_subtree-4	30.48n	29.26n	~	0.700
DirectSubtreeAdd/256_per_subtree-4	28.96n	28.28n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	27.85n	27.23n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	27.51n	26.90n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	289.5n	289.5n	~	0.800
SubtreeProcessorAdd/64_per_subtree-4	285.9n	292.3n	~	0.700
SubtreeProcessorAdd/256_per_subtree-4	285.3n	284.7n	~	1.000
SubtreeProcessorAdd/1024_per_subtree-4	281.7n	279.8n	~	0.700
SubtreeProcessorAdd/2048_per_subtree-4	279.2n	281.4n	~	0.400
SubtreeProcessorRotate/4_per_subtree-4	281.7n	280.5n	~	0.700
SubtreeProcessorRotate/64_per_subtree-4	279.6n	277.6n	~	0.700
SubtreeProcessorRotate/256_per_subtree-4	279.1n	278.5n	~	0.400
SubtreeProcessorRotate/1024_per_subtree-4	279.1n	279.6n	~	0.800
SubtreeNodeAddOnly/4_per_subtree-4	54.62n	56.24n	~	0.400
SubtreeNodeAddOnly/64_per_subtree-4	34.26n	34.26n	~	1.000
SubtreeNodeAddOnly/256_per_subtree-4	33.32n	33.21n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	32.70n	32.56n	~	0.100
SubtreeCreationOnly/4_per_subtree-4	115.3n	114.2n	~	0.300
SubtreeCreationOnly/64_per_subtree-4	404.5n	402.9n	~	0.100
SubtreeCreationOnly/256_per_subtree-4	1.374µ	1.349µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	4.507µ	4.410µ	~	0.200
SubtreeCreationOnly/2048_per_subtree-4	8.375µ	8.243µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	277.2n	279.8n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	276.7n	277.6n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	2.192m	2.209m	~	0.100
ParallelGetAndSetIfNotExists/10k_nodes-4	5.275m	5.328m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	7.161m	7.398m	~	0.700
ParallelGetAndSetIfNotExists/100k_nodes-4	9.772m	9.757m	~	0.700
SequentialGetAndSetIfNotExists/1k_nodes-4	1.970m	1.958m	~	0.400
SequentialGetAndSetIfNotExists/10k_nodes-4	4.395m	4.380m	~	0.700
SequentialGetAndSetIfNotExists/50k_nodes-4	13.18m	12.08m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	23.41m	21.85m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	2.240m	2.244m	~	1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	8.080m	8.122m	~	0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	12.90m	12.91m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	1.993m	1.983m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	7.441m	7.408m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	41.17m	40.22m	~	0.400
DiskTxMap_SetIfNotExists-4	3.860µ	3.743µ	~	0.200
DiskTxMap_SetIfNotExists_Parallel-4	3.462µ	3.538µ	~	0.200
DiskTxMap_ExistenceOnly-4	341.2n	337.1n	~	0.400
Queue-4	184.9n	187.0n	~	0.100
AtomicPointer-4	3.635n	3.634n	~	1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	818.9µ	816.6µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4	753.6µ	785.4µ	~	0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4	106.5µ	104.7µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	64.25µ	64.44µ	~	0.700
ReorgOptimizations/HashSlicePool/Old/10K-4	50.89µ	51.79µ	~	0.700
ReorgOptimizations/HashSlicePool/New/10K-4	11.05µ	11.17µ	~	0.700
ReorgOptimizations/NodeFlags/Old/10K-4	4.407µ	5.064µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.487µ	1.943µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.355m	9.171m	~	0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4	9.521m	9.557m	~	1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.071m	1.078m	~	0.200
ReorgOptimizations/AllMarkFalse/New/100K-4	708.6µ	705.5µ	~	0.200
ReorgOptimizations/HashSlicePool/Old/100K-4	520.9µ	614.4µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	211.7µ	217.5µ	~	0.400
ReorgOptimizations/NodeFlags/Old/100K-4	44.47µ	42.62µ	~	0.700
ReorgOptimizations/NodeFlags/New/100K-4	14.66µ	16.37µ	~	0.200
TxMapSetIfNotExists-4	49.59n	49.34n	~	0.400
TxMapSetIfNotExistsDuplicate-4	41.54n	41.53n	~	1.000
ChannelSendReceive-4	578.0n	570.1n	~	0.100
CalcBlockWork-4	526.8n	528.9n	~	0.100
CalculateWork-4	713.0n	752.6n	~	0.700
BuildBlockLocatorString_Helpers/Size_10-4	1.337µ	1.355µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	12.93µ	13.10µ	~	0.400
BuildBlockLocatorString_Helpers/Size_1000-4	156.4µ	151.8µ	~	1.000
CatchupWithHeaderCache-4	104.5m	104.4m	~	0.400
_BufferPoolAllocation/16KB-4	3.925µ	4.055µ	~	0.100
_BufferPoolAllocation/32KB-4	9.244µ	9.011µ	~	1.000
_BufferPoolAllocation/64KB-4	22.49µ	17.96µ	~	0.700
_BufferPoolAllocation/128KB-4	35.99µ	32.38µ	~	1.000
_BufferPoolAllocation/512KB-4	113.8µ	112.2µ	~	0.700
_BufferPoolConcurrent/32KB-4	18.28µ	19.62µ	~	0.100
_BufferPoolConcurrent/64KB-4	30.10µ	30.24µ	~	1.000
_BufferPoolConcurrent/512KB-4	145.2µ	146.5µ	~	0.700
_SubtreeDeserializationWithBufferSizes/16KB-4	639.9µ	649.1µ	~	0.400
_SubtreeDeserializationWithBufferSizes/32KB-4	633.5µ	636.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	642.4µ	640.5µ	~	1.000
_SubtreeDeserializationWithBufferSizes/128KB-4	640.3µ	644.7µ	~	1.000
_SubtreeDeserializationWithBufferSizes/512KB-4	635.7µ	633.4µ	~	0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4	35.97m	36.49m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4	35.70m	36.36m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4	35.80m	36.30m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4	35.86m	36.01m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4	35.50m	35.67m	~	1.000
_PooledVsNonPooled/Pooled-4	736.4n	738.7n	~	0.100
_PooledVsNonPooled/NonPooled-4	7.576µ	7.835µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.463µ	6.458µ	~	0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.576µ	9.513µ	~	0.400
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.064µ	9.321µ	~	0.600
_prepareTxsPerLevel-4	428.4m	417.7m	~	0.100
_prepareTxsPerLevelOrdered-4	4.046m	3.776m	~	0.100
_prepareTxsPerLevel_Comparison/Original-4	427.4m	415.1m	~	0.100
_prepareTxsPerLevel_Comparison/Optimized-4	4.858m	3.901m	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.257m	1.253m	~	1.000
SubtreeSizes/10k_tx_16_per_subtree-4	296.8µ	298.0µ	~	1.000
SubtreeSizes/10k_tx_64_per_subtree-4	72.94µ	70.60µ	~	0.100
SubtreeSizes/10k_tx_256_per_subtree-4	18.20µ	17.83µ	~	0.100
SubtreeSizes/10k_tx_512_per_subtree-4	8.912µ	8.792µ	~	0.100
SubtreeSizes/10k_tx_1024_per_subtree-4	4.449µ	4.368µ	~	0.200
SubtreeSizes/10k_tx_2k_per_subtree-4	2.196µ	2.170µ	~	0.400
BlockSizeScaling/10k_tx_64_per_subtree-4	70.20µ	70.55µ	~	0.400
BlockSizeScaling/10k_tx_256_per_subtree-4	17.49µ	17.54µ	~	1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.367µ	4.404µ	~	0.700
BlockSizeScaling/50k_tx_64_per_subtree-4	367.4µ	368.9µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	89.97µ	88.62µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	21.73µ	21.85µ	~	0.700
SubtreeAllocations/small_subtrees_exists_check-4	150.4µ	151.3µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	160.5µ	160.9µ	~	0.700
SubtreeAllocations/small_subtrees_full_validation-4	308.6µ	311.5µ	~	0.200
SubtreeAllocations/medium_subtrees_exists_check-4	9.035µ	8.940µ	~	1.000
SubtreeAllocations/medium_subtrees_data_fetch-4	9.385µ	9.378µ	~	0.700
SubtreeAllocations/medium_subtrees_full_validation-4	17.55µ	17.46µ	~	1.000
SubtreeAllocations/large_subtrees_exists_check-4	2.094µ	2.129µ	~	0.200
SubtreeAllocations/large_subtrees_data_fetch-4	2.281µ	2.251µ	~	0.700
SubtreeAllocations/large_subtrees_full_validation-4	4.421µ	4.380µ	~	0.400
StoreBlock_Sequential/BelowCSVHeight-4	337.6µ	339.1µ	~	1.000
StoreBlock_Sequential/AboveCSVHeight-4	336.3µ	338.1µ	~	0.400
GetUtxoHashes-4	273.6n	273.2n	~	1.000
GetUtxoHashes_ManyOutputs-4	45.45µ	45.53µ	~	1.000
_NewMetaDataFromBytes-4	214.8n	214.1n	~	0.700
_Bytes-4	397.7n	399.8n	~	0.400
_MetaBytes-4	140.8n	138.6n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-26 13:32 UTC

PR #915 added txmetacache.BucketType.Pointer (experimental pointer-backed cache) but the resolver in services/subtreevalidation/Server.go only mapped strings unallocated|preallocated|trimmed|native. Setting subtreevalidation_txMetaCacheBucketType=pointer in production would have hit the default branch, logged a Warn, and silently downgraded to Unallocated — pointer mode unreachable via configuration. This wires the missing case ("pointer" -> txmetacache.Pointer), extends the setting longdesc with the value + canary guidance (zero-alloc Get path at the cost of larger steady-state heap), and adds the TestResolveTxMetaCacheBucketType test cases.

Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.

Three correctness/observability fixes from the Copilot review on #915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.

Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.

oskarszoon

Request changes. Findings beyond ordishs's review.

setMinedInCacheParallel (txmetacache.go:709-738) silently drops BlockIDs in pointer mode. Path: retrieve txMeta → append blockID → SetCache → metadataOnly() strips BlockIDs. The append is a no-op. Anything gating on block membership returns stale answers.
PR body says appendHeightToValue was dropped — it wasn't. Still at line 770, called in all four SetCacheMulti* methods. Comment added but the description is factually wrong. In pointer mode trailing 4 bytes are tolerated by NewMetaDataFromBytes, but len+4 allocation per SetCacheFromBytes partially undermines the zero-alloc claim for byte ingestion.
metadataOnly() value-copies TxInpoints but shares the backing slice arrays — GetMeta's *data = *cached then propagates aliasing to all callers. Reinforces ordishs's #3: callers must treat result read-only, nothing enforces it. Detectable with -race.
GC bench at 64 MB / 100K entries — 12× below production (256 MB / ~800K). Pause numbers not representative at scale.
SetMultiSequentialWithHashes ignores supplied hashes in pointer mode (keys by embedded hash). Divergence from byte-backend keying — latent hazard for callers relying on consistent semantics across modes.

Four issues raised in the second-round review on PR bsv-blockchain#915: 1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was 1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti fans out across shards. Restored fan-out: group keys by xxhash bucket, run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold the serial loop is still used to skip bucketing overhead on tiny batches. 2. **Del left stale entries in the FIFO ring.** Del removed the map entry but b.ring[head..] kept pointing at the deleted hash, and b.full stayed true. A Set→Del→Set cycle could fill the ring with duplicate references to the same key, silently reducing effective capacity. Fixed in insert: when the ring is full, walk past stale slots (key absent from map) before evicting, up to one full lap. Del stays O(1). 3. **metadataOnly aliased TxInpoints slices with the caller.** A struct copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs sharing backing arrays. Concurrent readers can observe in-place mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone via Serialize + NewTxInpointsFromBytes — the only public path through subtree's API that produces fresh backing slices given voutIdxs is unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data) path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser already returns a fresh TxInpoints. 4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17 via its own length check) and the verbose log+logger plumbing on unmarshal failure. Kept the evict + counter increment — a corrupt entry would otherwise be reparsed on every subsequent Get for the same key. Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST not count an eviction. All race tests pass with -tags testtxmetacache. Pointer-mode benchmark numbers shift on the SetCache path (125 ns → 288 ns) due to the TxInpoints deep-clone — correctness over headline speed. Get path is unchanged at ~39 ns / 0 allocs.

ordishs

Overall

Pointer mode is well-designed, well-tested, and explicitly opt-in. Headline number (11× Get speedup, 0 GCs) is meaningful for a hot path. Approving with changes — one issue should be fixed before pointer mode is enabled in production.

Required before flipping pointer mode on

SetMultiSequential contract violation in PointerCache

stores/txmetacache/pointer_cache.go:295-301. The interface contract (cache_backend.go:30-35) says SetMultiSequential is "without goroutine fan-out" — the byte path's whole purpose is to skip errgroup spawning for partition-aligned receivers. The pointer implementation forwards to SetMultiFromBytes, which does fan out via errgroup per non-empty bucket.

Under pointer mode, the v2 Kafka receiver's "sequential" path will multiply goroutines: N partition workers × ~M buckets each. This defeats the design. Fix is to implement SetMultiSequential as a flat loop over keys/values (mirroring the byte path's small-batch branch) and skip the errgroup branch entirely.

appendHeightToValue is still called on every byte-mode insert (txmetacache.go:293, 315, 342, 365)
noOfBlocksToKeepInTxMetaCache is still computed and stored but never read (txmetacache.go:233, 246) — dead code
hitOldTx is still in metrics struct and reported via Prometheus (metrics.go:131, 146, 165)

Either update the description to reflect what was reverted, or remove the unused pieces as originally intended. The claim "Saves a 4-byte alloc + per-Set GetBlockHeight() call" is no longer accurate.

GetMeta shallow-copies into caller's struct, aliasing the cache in pointer mode

stores/txmetacache/txmetacache.go:428-432:

if cached, ok := t.cache.Get(*hash); ok {
    t.metrics.hits.Add(1)
    *data = *cached
    data.BlockIDs = make([]uint32, 0)

*data = *cached is a struct copy — data.TxInpoints.ParentTxHashes now aliases the cache entry's backing slice. In byte mode this is benign (the returned *meta.Data is freshly allocated). In pointer mode the caller's data shares slices with the cache.

The read-only contract is documented on GetMetaCached, but GetMeta takes a caller-provided *meta.Data — the conventional "mine to mutate" pattern. A caller doing data.TxInpoints.ParentTxHashes = append(...) would corrupt the cache for every concurrent reader. Either document the aliasing rule on GetMeta or deep-clone TxInpoints on the GetMeta path under pointer mode.

metadataOnly fallback path leaks aliased TxInpoints

stores/txmetacache/pointer_cache.go:162-190. When Serialize() or NewTxInpointsFromBytes fails, the function falls through to cloned.TxInpoints = data.TxInpoints — a struct copy sharing ParentTxHashes with the caller. The comment says this is "preferable to dropping the entry entirely," but if the cause is real corruption, holding an entry whose slice is externally mutable is worse than dropping. Consider returning nil and bumping a counter so the rate is observable.

pointerCacheAvgEntryBytes = 320 is workload-dependent

Acknowledged in the PR caveats. 320 is calibrated against 2-parent synthetic txs; production txs with >5-6 parents will overflow and the cache holds fewer entries than budgeted. The MB sizing parameter has a hidden parent-count multiplier. Suggestions:

Track actual entries-per-MB in UpdateStats and expose it
Or, after the first ~1k inserts, recompute an average entry size and resize the per-bucket rings

Low-priority

Ring walk under lock when entries are stale (pointer_cache.go:114-131): worst case after a SetMinedMulti burst is a linear walk of the ring under the bucket lock. Worth a metric (ring_walk_distance_p99 or similar) so the operator can see if this becomes a tail-latency contributor.
Pointer mode still pays appendHeightToValue cost even though the deserializer ignores trailing bytes. Skip the append when bucketType == Pointer.
Missing test for improvedCacheBackend.Get corruption path — the unmarshalErrors eviction is fail-safe code that should have a regression test.

Strengths

Compile-time interface assertion catches contract drift at build time
Defensive cloning in metadataOnly — TxInpoints round-trips through serialize/deserialize so the cache never aliases caller-owned backing slices. Exactly the right shape for a shared-pointer cache.
Per-shard FIFO eviction is simple and predictable; lock granularity matches the byte path
TestPointerCache_DelDoesNotLeakRingCapacity is exactly the right shape — pins a subtle invariant with a deterministic same-bucket collision setup
Benchmark coverage is more thorough than most PRs

Introduces an experimental pointer-backed implementation of the txmeta cache, selected via the new BucketType.Pointer enum. PointerCache stores *meta.Data directly in sharded native maps with per-bucket FIFO key-ring eviction; reads return the stored pointer with no copy or deserialisation. Both backends (ImprovedCache and PointerCache) now satisfy a small internal cacheBackend interface (Set/SetFromBytes/SetMultiFromBytes/Get/Del/UpdateStats). TxMetaCache holds one interface field and dispatches uniformly — no per-call-site if/else branching. The reader path in subtreevalidation switches to the simplified GetMetaCached(ctx, hash) (*meta.Data, bool) signature; the old caller-managed scratch-buffer variant is removed. Also drops dead height-encoding code: appendHeightToValue, noOfBlocksToKeep- InTxMetaCache, and hitOldTx were referenced in docs but never read by any eviction path (eviction is the ring buffer's job). Head-to-head benchmarks (Apple M3 Max, 64 MB cache, count=3 x 2s): Operation Existing byte path Pointer native Delta Set 233 ns / 1 alloc 52 ns / 0 allocs 4.5x faster Get 426 ns / 8 allocs 37 ns / 0 allocs 11.5x faster Concurrent (32:1 R:W) 121 ns / 7 allocs 74 ns / 0 allocs 39% faster GC pause p99 ~0.06-0.19 ms 0 ms 0 GCs in loop Pointer cache produces zero GCs during the benchmark window: the byte path's per-Get deserialisation allocates 7 small objects, dominating allocation traffic. The pointer cache's larger steady-state heap is offset by near-zero allocation churn. Default BucketType remains Unallocated; pointer mode is opt-in via the Pointer enum value. No callsite or wire-protocol changes are required.

Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus the doc cleanup it flagged. 1. **PointerCache: full-hash map key (collision safety).** Switched the per- shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount for speed; the map lookup itself uses the full hash, eliminating the 2⁻⁶⁴ collision window that could silently return wrong metadata for a different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns. 2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs, ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer backend was retaining whatever the caller passed in, diverging in semantics and potentially pinning whole transactions in memory. Set now clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns). SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes already produces a metadata-only struct. 3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt cached entry would otherwise be re-parsed on every subsequent Get for the same key. The backend now deletes the bad entry, increments an unmarshalErrors counter, and emits a warn log (logger is plumbed from NewTxMetaCache). Plus the three smaller items: GetMetaCached docstring updated to match the new (*meta.Data, bool) signature; "height encoding" subtest names/comments replaced with their current behaviour ("bypasses cache", "round-trip"); and the package doc no longer claims "height-based expiration" — eviction is the backend's job in both modes. Test added: TestPointerCache_SetStripsNonCachedFields pins the new contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.

PR bsv-blockchain#912 (perf(txmeta): v2 wire format + partition-aware receiver) landed on main while this PR was open and added new uses of appendHeightToValue plus two new SetCacheMultiSequential* methods that call ImprovedCache's new SetMultiSequential / SetMultiSequentialWithHashes. My earlier "dead code removal" was wrong by the time it landed — height encoding is part of the active contract, not legacy. Changes: - Restore appendHeightToValue method, hitOldTx metric (declaration + atomic counter + Prometheus gauge), and noOfBlocksToKeepInTxMetaCache field + init. Functionally the height bytes are still ignored on read in the byte backend, but they're now load-bearing for the v2 receiver in bsv-blockchain#912 and the metric is still wired up. - Re-introduce the height suffix on SetCacheFromBytes and SetCacheMulti. - Add SetMultiSequential and SetMultiSequentialWithHashes to the cacheBackend interface; improvedCacheBackend forwards to the underlying *ImprovedCache; PointerCache shares its implementation with SetMultiFromBytes (the per-bucket goroutine fan-out doesn't exist there, so the partition-aware variant is identical for free). - Fix the empty-value SetCacheMulti subtest assertion: stored entry is again 4 bytes (the height suffix), not 0.

CI's golangci-lint flagged stores/txmetacache/txmetacache.go for gci import-grouping violations after the rebase. Ran gci with the project's default settings (stdlib + third-party, no per-prefix split) across the files touched by this PR. No behavioural change.

Four issues raised in the second-round review on PR bsv-blockchain#915: 1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was 1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti fans out across shards. Restored fan-out: group keys by xxhash bucket, run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold the serial loop is still used to skip bucketing overhead on tiny batches. 2. **Del left stale entries in the FIFO ring.** Del removed the map entry but b.ring[head..] kept pointing at the deleted hash, and b.full stayed true. A Set→Del→Set cycle could fill the ring with duplicate references to the same key, silently reducing effective capacity. Fixed in insert: when the ring is full, walk past stale slots (key absent from map) before evicting, up to one full lap. Del stays O(1). 3. **metadataOnly aliased TxInpoints slices with the caller.** A struct copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs sharing backing arrays. Concurrent readers can observe in-place mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone via Serialize + NewTxInpointsFromBytes — the only public path through subtree's API that produces fresh backing slices given voutIdxs is unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data) path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser already returns a fresh TxInpoints. 4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17 via its own length check) and the verbose log+logger plumbing on unmarshal failure. Kept the evict + counter increment — a corrupt entry would otherwise be reparsed on every subsequent Get for the same key. Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST not count an eviction. All race tests pass with -tags testtxmetacache. Pointer-mode benchmark numbers shift on the SetCache path (125 ns → 288 ns) due to the TxInpoints deep-clone — correctness over headline speed. Get path is unchanged at ~39 ns / 0 allocs.

Three corrections from the second round of reviews on PR bsv-blockchain#915: 1. **SetMultiSequential must not fan out** (ordishs, required). The cacheBackend.SetMultiSequential contract says "no goroutine fan-out — caller already has parallelism." PointerCache.SetMultiSequential was forwarding to SetMultiFromBytes which fans out via errgroup. Under the v2 Kafka receiver this multiplies goroutines: N partition workers × M non-empty buckets each. Now a flat serial loop. SetMultiSequentialWith- Hashes mirrors it. Regression test pins the contract at a coarse level. 2. **GetMeta TxInpoints aliasing in pointer mode** (ordishs + oskarszoon). `*data = *cached` is a struct copy but ParentTxHashes still points at the cache's backing slice. In pointer mode the cache hands the same pointer to every concurrent reader, so a caller doing `append(data.TxInpoints.ParentTxHashes, …)` would corrupt every other reader. GetMeta now deep-clones TxInpoints under pointer mode via the new cloneTxInpoints helper. Byte mode is unaffected (its cached.Data is already freshly allocated). On clone failure GetMeta falls through to the store path rather than serving aliased data. 3. **metadataOnly fallback must not return aliased struct** (ordishs). On Serialize/Deserialize round-trip failure, the previous code returned a shallow-copy *meta.Data sharing ParentTxHashes with the caller — reintroducing the very aliasing that the clone exists to prevent. Now returns (nil, err); Set surfaces the error. Practically infallible for valid TxInpoints; on the rare pathological case, refusing to cache is better than caching corruptly. cloneTxInpoints is extracted as a shared helper so the same Serialize + NewTxInpointsFromBytes round-trip is used by metadataOnly (Set path) and GetMeta (Get path with pointer-mode aliasing guard). Regression tests: - TestPointerCache_SetDoesNotAliasCallerSlice — mutates the caller's ParentTxHashes after Set and asserts the cached entry is unchanged. - TestPointerCache_SetMultiSequentialDoesNotFanOut — exercises the sequential path with a 1024-entry batch and verifies every key is present after the synchronous call. All race tests pass under -tags testtxmetacache. Pointer-mode Set is unchanged (~288 ns including the deep-clone). GetMeta under pointer mode gains one extra Serialize+Deserialize per hit (~150 ns) — paid only on the non-hot-path Get (validator hot path uses GetMetaCached).

oskarszoon

Re-review at d5e8118. Most prior findings addressed, but the BlockIDs blocker is still unfixed.

Still outstanding — must fix:

setMinedInCacheParallel (txmetacache.go:735-765) wasn't touched by any of the fixup commits. Path is still: append blockID → SetCache → cache.Set → metadataOnly() strips BlockIDs. Mined updates via this path don't persist in pointer mode. Two-option fix:

(A) Replace SetCache with cache.Del to force a fresh store fetch on next read.
(B) Skip the SetCache call when BlockIDs are already set, matching byte-mode semantics.

Resolved:

TxInpoints aliasing — metadataOnly and GetMeta deep-clone via cloneTxInpoints (Serialize + NewTxInpointsFromBytes). TestPointerCache_SetDoesNotAliasCallerSlice pins it. Race tests green.
SetMultiSequentialWithHashes ignored-hash divergence — documented as accepted (both backends produce identical end-state via re-derived bucket from key bytes).
appendHeightToValue retention — documented with rationale in code, accepted.
ordishs's SetMultiSequential fan-out — fixed, flat serial loop, test added.
ordishs's Del orphan ring slot — fixed, stale-slot walk in insert(), test added.
64-bit hash collision class — keys switched to full chainhash.Hash.

Still worth noting (not blocking):

GC bench at 64 MB / 100K entries — 12× below prod load (256 MB / 800K). The headline GC pause numbers in the PR body aren't representative.

setMinedInCacheParallel was reading the tx, appending the new blockID, then calling SetCache. SetCache routes through the cacheBackend.Set adapter which strips BlockIDs in both backends — ImprovedCache because MetaBytes's wire format doesn't carry the field, PointerCache because metadataOnly strips it to match. The append + SetCache dance was therefore a no-op: mined updates via this path didn't persist. Replace the SetCache call with an eviction. The next read on each hash falls through to the underlying store, which has the up-to-date BlockIDs. Matches the established pattern in SetMinedMulti above. The function's `blockID` parameter is now unused — kept for API symmetry with the byte path. The redundant Get + utxoStore.Get fallback is also gone; we don't need to read the entry just to delete it from cache. Regression test added: TestTxMetaCache_Coverage/SetMinedMultiParallel_EvictsCache seeds the cache, calls SetMinedMultiParallel, and asserts the entry is absent via GetMetaCached.

sonarqubecloud · 2026-05-26T13:28:58Z

Quality Gate failed

Failed conditions
69.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

oskarszoon

Approve at cb606b158a. BlockIDs blocker correctly fixed.

ae4d4c4386 goes further than my suggested Option A. The whole Get+append+SetCache dance in setMinedInCacheParallel is gone, replaced with a single t.Delete(ctx, hash) per hash. blockID param is now _ uint32 (API symmetry only).

End-to-end in pointer mode: Delete evicts stale entry → next GetMeta misses cache → falls through to utxoStore.GetMeta → returns updated BlockIDs from store → len(data.BlockIDs) != 0 so SetCache is skipped and the entry stays out of cache. Matches SetMinedMulti semantics. Bug resolved.

SetMinedMultiParallel_EvictsCache test pins the eviction contract.

The 5 main merges (742793902, 958f99088, bb7e4f049, d2e5e09aa, 50822d938) are in unrelated areas (orphan pool, opcodeInvert, p2p fee policy, testcontainers deps, dashboards) with zero interaction with txmetacache.

Worth flagging for future maintainers (not a blocker)

SetMinedMultiParallel is now a cache-only operation — it never calls utxoStore.SetMinedMulti. The "store written first, then evict cache" invariant is the caller's responsibility. There are zero production callers today (tests only). When a real caller lands, it must follow the same pattern SetMinedMulti uses: write to store first, then call this to evict. Worth a one-line godoc on the function to lock that in.

icellan requested review from Copilot, gokutheengineer and ordishs May 20, 2026 15:50

Copilot started reviewing on behalf of icellan May 20, 2026 15:51 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

Comment thread stores/txmetacache/pointer_cache.go

Comment thread stores/txmetacache/txmetacache.go

Comment thread stores/txmetacache/txmetacache.go Outdated

Comment thread stores/txmetacache/improved_cache_meta.go

Comment thread stores/txmetacache/txmetacache_test.go Outdated

ordishs reviewed May 20, 2026

View reviewed changes

icellan self-assigned this May 20, 2026

icellan force-pushed the perf/txmeta-pointer-cache branch from 692c955 to 6ebc707 Compare May 21, 2026 08:19

gokutheengineer reviewed May 21, 2026

View reviewed changes

Comment thread stores/txmetacache/improved_cache_meta.go

Comment thread stores/txmetacache/pointer_cache.go

Comment thread stores/txmetacache/pointer_cache.go

Comment thread stores/txmetacache/pointer_cache.go

oskarszoon requested changes May 21, 2026

View reviewed changes

ordishs approved these changes May 21, 2026

View reviewed changes

icellan added 6 commits May 26, 2026 11:13

icellan force-pushed the perf/txmeta-pointer-cache branch from 1b08533 to 453f53f Compare May 26, 2026 09:25

icellan requested a review from oskarszoon May 26, 2026 09:42

Merge branch 'main' into perf/txmeta-pointer-cache

d5e8118

oskarszoon requested changes May 26, 2026

View reviewed changes

icellan and others added 2 commits May 26, 2026 12:10

Merge branch 'main' into perf/txmeta-pointer-cache

cb606b1

icellan requested a review from oskarszoon May 26, 2026 13:28

oskarszoon approved these changes May 26, 2026

View reviewed changes

icellan merged commit 148efa6 into bsv-blockchain:main May 26, 2026
25 checks passed

Conversation

icellan commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Head-to-head benchmarks

Reviewer-requested fixes (this PR)

Caveats

Test plan

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Strengths

No Critical Issues Found

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Overview

🚨 Critical Issues

1. Hash collisions in PointerCache silently corrupt data

⚠️ Moderate Issues

2. Silent error swallowing in improvedCacheBackend.Get

3. GetMeta does shallow struct copy — aliases slices with the cache in Pointer mode

4. Del-then-Set leaks FIFO ring slots

ℹ️ Minor Issues / Nits

5. Prometheus metric hit_old_tx deleted with no migration note

6. getTxMetaCacheReadBuffer / putTxMetaCacheReadBuffer pool become dead code

7. Removed warning log "txMetaCache empty for %s"

8. pointerCacheAvgEntryBytes = 300 is acknowledged guesswork

9. Benchmark comment uses "Bitcoin transactions"

10. _ = entries[0].hash keepalive hack

✅ Strengths

Recommendation

Uh oh!

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Overall

Required before flipping pointer mode on

SetMultiSequential contract violation in PointerCache

Recommended

PR description is stale vs. the actual diff

GetMeta shallow-copies into caller's struct, aliasing the cache in pointer mode

metadataOnly fallback path leaks aliased TxInpoints

pointerCacheAvgEntryBytes = 320 is workload-dependent

Low-priority

Strengths

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Quality Gate failed

Uh oh!

oskarszoon left a comment

Choose a reason for hiding this comment

Worth flagging for future maintainers (not a blocker)

Uh oh!

Uh oh!

icellan commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

1. Hash collisions in `PointerCache` silently corrupt data

2. Silent error swallowing in `improvedCacheBackend.Get`

3. `GetMeta` does shallow struct copy — aliases slices with the cache in Pointer mode

5. Prometheus metric `hit_old_tx` deleted with no migration note

6. `getTxMetaCacheReadBuffer` / `putTxMetaCacheReadBuffer` pool become dead code

7. Removed warning log `"txMetaCache empty for %s"`

8. `pointerCacheAvgEntryBytes = 300` is acknowledged guesswork

10. `_ = entries[0].hash` keepalive hack

github-actions Bot commented May 20, 2026 •

edited

Loading