Skip to content

perf(txmetacache): pointer-backed cache backend + interface dispatch#915

Merged
icellan merged 9 commits into
bsv-blockchain:mainfrom
icellan:perf/txmeta-pointer-cache
May 26, 2026
Merged

perf(txmetacache): pointer-backed cache backend + interface dispatch#915
icellan merged 9 commits into
bsv-blockchain:mainfrom
icellan:perf/txmeta-pointer-cache

Conversation

@icellan

@icellan icellan commented May 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Default BucketType remains Unallocated; pointer mode is opt-in via the new enum value. Kafka wire protocol is unchanged.

Head-to-head benchmarks

Apple M3 Max, 64 MB cache, BucketsCount=8192, count=3 -benchtime=2s. Numbers reflect the final code with full-hash keying, metadata-only stripping with deep-cloned TxInpoints, and per-bucket fan-out on SetMultiFromBytes.

Operation Existing byte path Pointer (current) Delta
Set (pointer-native) 233 ns / 1 alloc 288 ns / 6 allocs -24% (slower)
Set via SetFromBytes 233 ns / 1 alloc ~250 ns / 7 allocs comparable
Get 426 ns / 8 allocs 39 ns / 0 allocs 10.9× faster
Concurrent (32:1 R:W) 121 ns / 7 allocs 74 ns / 0 allocs 39% faster
GC pause p99 (sat.) ~0.06–0.19 ms 0 ms 0 GCs in loop

Pointer-native Set is slower than the byte path due to the mandatory TxInpoints deep-clone (correctness fix from reviewer feedback — without it, callers could corrupt cached entries via slice aliasing). The Kafka path (SetCacheFromBytes) is not materially affected because NewMetaDataFromBytes already produces a fresh TxInpoints. The dominant win is on Get, which is the validator hot path.

The pointer cache produces zero GCs during the benchmark window: the byte path's per-Get deserialisation allocates 7 small objects, dominating allocation traffic. The pointer cache's larger steady-state heap is offset by near-zero allocation churn.

Reviewer-requested fixes (this PR)

  • Full 32-byte chainhash.Hash map key (eliminates 2⁻⁶⁴ collision risk)
  • metadataOnly strips non-cached fields and deep-clones TxInpoints (no slice aliasing with caller; matches byte backend's effective field set)
  • Del + insert handle stale ring slots (no silent capacity loss under Set→Del→Set cycles)
  • SetMultiSequential is a flat serial loop in pointer mode (honours the no-fan-out contract used by the v2 Kafka receiver)
  • GetMeta deep-clones TxInpoints under pointer mode (breaks aliasing with the shared cache entry)
  • improvedCacheBackend.Get evicts + counts unmarshal failures (prevents repeated bad-parse CPU drain)

Caveats

  • pointerCacheAvgEntryBytes = 320 is calibrated against 2-parent synthetic txs. Production txs with >5–6 parents will overflow that estimate and the cache holds fewer entries than budgeted. Either expose entries/MB via UpdateStats or resize per-bucket rings dynamically in a follow-up.
  • setMinedInCacheParallel appends BlockIDs then calls SetCache, which strips them — has been a no-op for BlockIDs purposes in the byte path since before this PR (since MetaBytes doesn't carry BlockIDs). Not introduced or worsened here; flagged for a separate cleanup.
  • GC benchmark is at 64 MB / 100k entries. The trend (zero alloc on hot Get) is unchanged at scale; absolute pause numbers at 256 MB / ~800k entries may differ.
  • All benchmarks ran on a nullstore-backed TxMetaCache so no UTXO-store fan-out cost is included. A smoketest with BucketType.Pointer selected (requires a single-line flip at services/subtreevalidation/Server.go:231) is the next step before any production use.

Test plan

  • go build ./...
  • go vet ./stores/txmetacache/... ./services/subtreevalidation/...
  • go test -race -tags testtxmetacache ./stores/txmetacache/... ./services/subtreevalidation/...
  • go test -bench=. -benchmem ./stores/txmetacache/ — numbers above
  • End-to-end smoketest with BucketType.Pointer selected (manual step before flipping in production)

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Summary

This PR introduces a well-designed pointer-backed cache implementation as an experimental alternative to the byte-backed cache. The code demonstrates careful attention to correctness, particularly around pointer aliasing prevention and concurrent access safety.

All previously identified issues from earlier reviews have been resolved:

  • ✅ Full 32-byte hash keys (no collision risk)
  • ✅ Proper metadata stripping to match byte backend semantics
  • ✅ Deep cloning of TxInpoints to prevent aliasing
  • ✅ Parallel processing in SetMultiFromBytes for large batches
  • ✅ Stale ring slot handling in insert/Del
  • ✅ Accurate documentation and comments
  • ✅ Unmarshal failure handling with eviction

Key Strengths

  1. Defensive aliasing prevention: The implementation correctly deep-clones TxInpoints in both metadataOnly() and GetMeta() to prevent shared slice corruption.

  2. Backend parity: metadataOnly() ensures PointerCache matches the byte backend field set, avoiding semantic drift between implementations.

  3. Correct eviction logic: The stale ring slot handling properly walks past deleted entries without silently losing capacity.

  4. Smart batching: SetMultiFromBytes uses serial processing for small batches (≤256 entries) and parallel bucketing for larger batches, avoiding overhead when unnecessary.

  5. Clear documentation: Comments accurately describe tradeoffs, contracts, and edge cases throughout.

No Critical Issues Found

The implementation is production-ready pending the end-to-end smoketest mentioned in the PR description. The code follows project conventions from CLAUDE.md and AGENTS.md with minimal changes, good test coverage, and appropriate caution flags (opt-in via BucketType.Pointer).

Recommendation: Approve after the planned smoketest validates behavior under production workload patterns.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors TxMetaCache to dispatch through a small internal backend interface and introduces an opt-in experimental pointer-backed cache backend (BucketType.Pointer) intended to reduce allocation churn on cache hits by storing *meta.Data directly. It also removes unused height-appending logic and drops the hitOldTx metric that depended on it.

Changes:

  • Add cacheBackend interface and adapt the existing ImprovedCache via improvedCacheBackend; update TxMetaCache to use backend dispatch.
  • Introduce PointerCache (sharded map + per-bucket FIFO ring eviction) plus tests/benchmarks, selectable via BucketType.Pointer.
  • Update consumers/tests to use the new GetMetaCached(ctx, hash) (*meta.Data, bool) API; remove height-encoding + related metric.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
stores/txmetacache/txmetacache.go Switch TxMetaCache to backend interface dispatch, add Pointer bucket type, simplify Get/Set paths, remove height appending.
stores/txmetacache/cache_backend.go Define the internal backend interface used by TxMetaCache.
stores/txmetacache/improved_cache_meta.go Adapter implementing cacheBackend for the existing byte-backed ImprovedCache.
stores/txmetacache/pointer_cache.go New pointer-backed cache implementation (sharded map + FIFO key ring eviction).
stores/txmetacache/pointer_cache_test.go Unit tests for the new pointer-backed cache behavior.
stores/txmetacache/pointer_cache_bench_test.go Benchmarks comparing byte-backed vs pointer-backed paths and GC behavior.
stores/txmetacache/metrics.go Remove hit_old_tx Prometheus metric wiring (no longer applicable).
stores/txmetacache/txmetacache_test.go Update tests/benchmarks for new GetMetaCached API and backend adapter access.
stores/txmetacache/txmetacache_conflicting_test.go Update conflicting-transaction tests to use new GetMetaCached signature.
services/subtreevalidation/processTxMetaUsingCache.go Switch cache reads to the new GetMetaCached API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stores/txmetacache/pointer_cache.go
Comment thread stores/txmetacache/txmetacache.go
Comment thread stores/txmetacache/txmetacache.go Outdated
Comment thread stores/txmetacache/improved_cache_meta.go
Comment thread stores/txmetacache/txmetacache_test.go Outdated

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

Introduces an opt-in PointerCache backend storing *meta.Data directly in sharded map[uint64]*meta.Data shards with per-bucket FIFO eviction. Refactors TxMetaCache to dispatch through a new internal cacheBackend interface. Collapses GetMetaCached/GetMetaCachedWithBuffer into a single zero-buffer API returning (*meta.Data, bool). Drops the height-encoding/expiration dead code path (height was appended on every write but never read by any eviction logic).

Benchmarks show ~4.5× faster Set, ~11.5× faster Get, and zero GCs vs. the byte backend. The default BucketType is unchanged; Pointer mode is opt-in.


🚨 Critical Issues

1. Hash collisions in PointerCache silently corrupt data

stores/txmetacache/pointer_cache.go (the pointerBucket map and Get):

The map is keyed solely by xxhash.Sum64(hash[:]) — the original 32-byte hash is never stored or compared on read:

type pointerBucket struct {
    m map[uint64]*meta.Data   // uint64 key, no full-hash verification
    ...
}

func (c *PointerCache) Get(hash chainhash.Hash) (*meta.Data, bool) {
    k := xxhash.Sum64(hash[:])
    b := c.bucketFor(k)
    b.mu.RLock()
    d, ok := b.m[k]
    b.mu.RUnlock()
    return d, ok
}

Compare to ImprovedCache.bucketNative.Get (improved_cache.go:971), which verifies the full key on hit:

if string(key) == string(chunk[idx:idx+keyLen]) {

For a 64-bit hash with N cached entries, expected collisions are ~N² / 2⁶⁵. At ~10M live entries the per-pair probability multiplied by pair count is ~3×10⁻⁶ — across a fleet running for days this will happen. Impact: tx A's lookup returns tx B's metadata (Fee, SizeInBytes, IsCoinbase, TxInpoints), feeding directly into validation. This is a consensus-correctness risk if PointerCache is ever wired into a path that influences acceptance.

Fix options:

  • Use map[chainhash.Hash]*meta.Data[32]byte is comparable and works as a map key. Slightly slower lookups but eliminates the collision class entirely. Likely still beats the byte path by >10×.
  • Or store (chainhash.Hash, *meta.Data) in a struct and compare the hash on read.

Either is a small change. Given the PR positions PointerCache as experimental and opt-in, this is fixable before any real adoption — but it MUST be fixed before anyone flips the switch.


⚠️ Moderate Issues

2. Silent error swallowing in improvedCacheBackend.Get

stores/txmetacache/improved_cache_meta.go (the new Get):

if err := b.cache.Get(&scratch, hash[:]); err != nil {
    if errors.Is(err, errors.ErrNotFound) {
        return nil, false
    }
    return nil, false   // any other error: silently treated as miss
}
...
if err := meta.NewMetaDataFromBytes(scratch, data); err != nil {
    return nil, false   // deserialise failure: silently treated as miss
}

The old GetMetaCachedWithBuffer returned non-NotFound errors to the caller, who logged them (e.g., processTxMetaUsingCache.go's Warnf("error retrieving txMeta…")). After this PR, real cache corruption or deserialiser failures become invisible misses. The cacheBackend.Get signature is (*meta.Data, bool) with no error channel.

Fix: Either log inside improvedCacheBackend.Get (e.g., debug-level — watch for log floods if corruption is repeated) or change the interface to surface errors. Quietly degrading correctness to a cache miss is exactly the kind of regression that's hard to detect in production.

3. GetMeta does shallow struct copy — aliases slices with the cache in Pointer mode

stores/txmetacache/txmetacache.go (post-PR GetMeta):

if cached, ok := t.cache.Get(*hash); ok {
    t.metrics.hits.Add(1)
    *data = *cached                       // shallow copy — slice headers aliased
    data.BlockIDs = make([]uint32, 0)
    return nil
}

*data = *cached copies the struct but slice headers in meta.Data (notably TxInpoints.ParentTxHashes) still point at the cache's backing arrays. In byte mode this is fine (each Get allocates a fresh *meta.Data with fresh slices). In Pointer mode, callers of GetMeta now silently share slice memory with every other reader and with the cache itself. The pointer-mode contract ("treat returned *meta.Data as read-only") is documented on cacheBackend.Get, but GetMeta's callers don't see that interface and have no signal that their data argument is now an alias.

This is a quiet semantic change that's easy to miss when reviewing future callers. Worth either deep-copying slices here, or documenting at the GetMeta doc comment that the result must be treated as read-only.

4. Del-then-Set leaks FIFO ring slots

stores/txmetacache/pointer_cache.go (Del and insert):

Del removes from the map but doesn't touch the ring:

func (c *PointerCache) Del(key []byte) {
    k := xxhash.Sum64(key)
    b := c.bucketFor(k)
    b.mu.Lock()
    delete(b.m, k)        // ring slot still references k
    b.mu.Unlock()
}

If the same key is later re-inserted, insert sees exists == false and allocates a fresh ring slot. The old ring slot still names the (now-deleted) key, so when its eviction turn arrives we delete(b.m, evict) — a no-op — but the ring has effectively wasted that slot for a full ring lap.

Worse, the pattern Set → Del → Set → Del → Set can fill a bucket's ring with duplicate references to the same key, silently reducing effective capacity. This matches a real workflow: SetCacheDelete (from SetMinedMulti eviction) → SetCache again on the same hash.

Fix: In Del, mark the ring slot stale (e.g., sentinel 0) so it's skipped on eviction; or accept and document the trade-off.


ℹ️ Minor Issues / Nits

5. Prometheus metric hit_old_tx deleted with no migration note

stores/txmetacache/metrics.go

Genuinely dead code internally — the height-based expiration was never wired to eviction — but the Prometheus gauge teranode_tx_meta_cache_hit_old_tx is a public surface. Any Grafana dashboard, alert, or recording rule referencing it will break on deploy. Worth a release-notes line.

6. getTxMetaCacheReadBuffer / putTxMetaCacheReadBuffer pool become dead code

stores/txmetacache/txmetacache.go

After this PR neither GetMetaCached nor GetMeta calls these helpers. The pool helpers and txMetaCacheReadBufferPool should be removed for clarity (along with txMetaCacheReadBufferMaxRetain if no longer needed by improvedCacheBackend), unless intentionally kept for future callers.

7. Removed warning log "txMetaCache empty for %s"

Previously emitted when a hit returned zero bytes. Corner-case, but it had operational value as a "corruption smoke" signal. Worth re-adding inside improvedCacheBackend.Get once it's logging non-NotFound errors (issue #2).

8. pointerCacheAvgEntryBytes = 300 is acknowledged guesswork

stores/txmetacache/pointer_cache.go

The PR description flags this. A tx with many parents (>5–6) overflows the estimate and the bucket holds fewer entries than the MB budget suggests. Consider deriving per-bucket entry count from observed runtime.MemStats at warmup, or at least logging the effective capacity at construction so operators can audit it.

9. Benchmark comment uses "Bitcoin transactions"

stores/txmetacache/pointer_cache_bench_test.go

"Two parents is a reasonable median for Bitcoin transactions…"

Prefer "BSV transactions" or just "transactions" in this codebase.

10. _ = entries[0].hash keepalive hack

stores/txmetacache/pointer_cache_bench_test.go (benchMemoryEfficiency)

Use runtime.KeepAlive(entries) for clarity.


✅ Strengths

  • Compile-time interface checks (var _ cacheBackend = (*PointerCache)(nil)).
  • Concurrent reader/writer test (TestPointerCache_ConcurrentReadersAndWriters) and deterministic FIFO eviction test (TestPointerCache_FIFOEvictionWhenRingFull).
  • Benchmarks cover Set/Get/concurrent-read-heavy/GC-pause/memory-efficiency with both backends.
  • The dead-code removal (height encoding, hitOldTx internals) is honestly dead.
  • Interface adapter (improvedCacheBackend) preserves existing tests/benchmarks against ImprovedCache's native byte API.
  • PR description is unusually thorough — caveats, alloc counts, and untouched production paths all called out.

Recommendation

Issue #1 (collision-safety) is non-negotiable before any candidacy for production use: a 64-bit hash with no full-key verification will produce silent wrong-data returns at fleet scale, and the cached fields feed validation. Issues #2, #3, and #4 are quiet correctness/observability regressions that should be addressed in the same pass. The rest is bookkeeping.

The architecture and benchmark results are genuinely compelling — the byte-mode allocation cost is real, and pointer-native storage is the right idea. Worth landing once the collision class is closed and the silent-error paths are loud again.

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-915 (5a1fe8b)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.632µ 1.628µ ~ 0.500
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.36n 71.73n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.85n 71.31n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.23n 71.28n ~ 0.600
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 40.89n 39.16n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 60.13n 60.04n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 209.5n 199.1n ~ 0.800
MiningCandidate_Stringify_Short-4 222.6n 221.8n ~ 0.200
MiningCandidate_Stringify_Long-4 1.673µ 1.658µ ~ 0.200
MiningSolution_Stringify-4 864.8n 860.5n ~ 0.100
BlockInfo_MarshalJSON-4 1.810µ 1.808µ ~ 1.000
NewFromBytes-4 130.6n 143.0n ~ 0.100
AddTxBatchColumnar_Validation-4 2.711µ 2.489µ ~ 0.700
OffsetValidationLoop-4 641.7n 641.9n ~ 1.000
Mine_EasyDifficulty-4 68.41µ 67.78µ ~ 0.100
Mine_WithAddress-4 7.159µ 7.850µ ~ 0.100
BlockAssembler_AddTx-4 0.02938n 0.02736n ~ 0.700
AddNode-4 11.27 11.29 ~ 1.000
AddNodeWithMap-4 12.34 11.84 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 58.53n 60.45n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 30.48n 29.26n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 28.96n 28.28n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 27.85n 27.23n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 27.51n 26.90n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 289.5n 289.5n ~ 0.800
SubtreeProcessorAdd/64_per_subtree-4 285.9n 292.3n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 285.3n 284.7n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 281.7n 279.8n ~ 0.700
SubtreeProcessorAdd/2048_per_subtree-4 279.2n 281.4n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 281.7n 280.5n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 279.6n 277.6n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 279.1n 278.5n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 279.1n 279.6n ~ 0.800
SubtreeNodeAddOnly/4_per_subtree-4 54.62n 56.24n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 34.26n 34.26n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 33.32n 33.21n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.70n 32.56n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 115.3n 114.2n ~ 0.300
SubtreeCreationOnly/64_per_subtree-4 404.5n 402.9n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.374µ 1.349µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.507µ 4.410µ ~ 0.200
SubtreeCreationOnly/2048_per_subtree-4 8.375µ 8.243µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 277.2n 279.8n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 276.7n 277.6n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.192m 2.209m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.275m 5.328m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.161m 7.398m ~ 0.700
ParallelGetAndSetIfNotExists/100k_nodes-4 9.772m 9.757m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.970m 1.958m ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 4.395m 4.380m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 13.18m 12.08m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 23.41m 21.85m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.240m 2.244m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.080m 8.122m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.90m 12.91m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.993m 1.983m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.441m 7.408m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 41.17m 40.22m ~ 0.400
DiskTxMap_SetIfNotExists-4 3.860µ 3.743µ ~ 0.200
DiskTxMap_SetIfNotExists_Parallel-4 3.462µ 3.538µ ~ 0.200
DiskTxMap_ExistenceOnly-4 341.2n 337.1n ~ 0.400
Queue-4 184.9n 187.0n ~ 0.100
AtomicPointer-4 3.635n 3.634n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 818.9µ 816.6µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4 753.6µ 785.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 106.5µ 104.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.25µ 64.44µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/10K-4 50.89µ 51.79µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.05µ 11.17µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 4.407µ 5.064µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.487µ 1.943µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.355m 9.171m ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.521m 9.557m ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.071m 1.078m ~ 0.200
ReorgOptimizations/AllMarkFalse/New/100K-4 708.6µ 705.5µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/100K-4 520.9µ 614.4µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 211.7µ 217.5µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 44.47µ 42.62µ ~ 0.700
ReorgOptimizations/NodeFlags/New/100K-4 14.66µ 16.37µ ~ 0.200
TxMapSetIfNotExists-4 49.59n 49.34n ~ 0.400
TxMapSetIfNotExistsDuplicate-4 41.54n 41.53n ~ 1.000
ChannelSendReceive-4 578.0n 570.1n ~ 0.100
CalcBlockWork-4 526.8n 528.9n ~ 0.100
CalculateWork-4 713.0n 752.6n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.337µ 1.355µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 12.93µ 13.10µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 156.4µ 151.8µ ~ 1.000
CatchupWithHeaderCache-4 104.5m 104.4m ~ 0.400
_BufferPoolAllocation/16KB-4 3.925µ 4.055µ ~ 0.100
_BufferPoolAllocation/32KB-4 9.244µ 9.011µ ~ 1.000
_BufferPoolAllocation/64KB-4 22.49µ 17.96µ ~ 0.700
_BufferPoolAllocation/128KB-4 35.99µ 32.38µ ~ 1.000
_BufferPoolAllocation/512KB-4 113.8µ 112.2µ ~ 0.700
_BufferPoolConcurrent/32KB-4 18.28µ 19.62µ ~ 0.100
_BufferPoolConcurrent/64KB-4 30.10µ 30.24µ ~ 1.000
_BufferPoolConcurrent/512KB-4 145.2µ 146.5µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 639.9µ 649.1µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/32KB-4 633.5µ 636.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 642.4µ 640.5µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/128KB-4 640.3µ 644.7µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/512KB-4 635.7µ 633.4µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.97m 36.49m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.70m 36.36m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.80m 36.30m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.86m 36.01m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.50m 35.67m ~ 1.000
_PooledVsNonPooled/Pooled-4 736.4n 738.7n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.576µ 7.835µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.463µ 6.458µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.576µ 9.513µ ~ 0.400
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.064µ 9.321µ ~ 0.600
_prepareTxsPerLevel-4 428.4m 417.7m ~ 0.100
_prepareTxsPerLevelOrdered-4 4.046m 3.776m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 427.4m 415.1m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 4.858m 3.901m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.257m 1.253m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 296.8µ 298.0µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 72.94µ 70.60µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.20µ 17.83µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 8.912µ 8.792µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.449µ 4.368µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.196µ 2.170µ ~ 0.400
BlockSizeScaling/10k_tx_64_per_subtree-4 70.20µ 70.55µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 17.49µ 17.54µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.367µ 4.404µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 367.4µ 368.9µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 89.97µ 88.62µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.73µ 21.85µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 150.4µ 151.3µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 160.5µ 160.9µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 308.6µ 311.5µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 9.035µ 8.940µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.385µ 9.378µ ~ 0.700
SubtreeAllocations/medium_subtrees_full_validation-4 17.55µ 17.46µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.094µ 2.129µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.281µ 2.251µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.421µ 4.380µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 337.6µ 339.1µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 336.3µ 338.1µ ~ 0.400
GetUtxoHashes-4 273.6n 273.2n ~ 1.000
GetUtxoHashes_ManyOutputs-4 45.45µ 45.53µ ~ 1.000
_NewMetaDataFromBytes-4 214.8n 214.1n ~ 0.700
_Bytes-4 397.7n 399.8n ~ 0.400
_MetaBytes-4 140.8n 138.6n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-26 13:32 UTC

icellan added a commit that referenced this pull request May 20, 2026
PR #915 added txmetacache.BucketType.Pointer (experimental pointer-backed
cache) but the resolver in services/subtreevalidation/Server.go only
mapped strings unallocated|preallocated|trimmed|native. Setting
subtreevalidation_txMetaCacheBucketType=pointer in production would have
hit the default branch, logged a Warn, and silently downgraded to
Unallocated — pointer mode unreachable via configuration.

This wires the missing case ("pointer" -> txmetacache.Pointer), extends
the setting longdesc with the value + canary guidance (zero-alloc Get
path at the cost of larger steady-state heap), and adds the
TestResolveTxMetaCacheBucketType test cases.
@icellan icellan self-assigned this May 20, 2026
icellan added a commit to icellan/teranode that referenced this pull request May 20, 2026
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus
the doc cleanup it flagged.

1. **PointerCache: full-hash map key (collision safety).** Switched the per-
   shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash
   (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount
   for speed; the map lookup itself uses the full hash, eliminating the
   2⁻⁶⁴ collision window that could silently return wrong metadata for a
   different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns.

2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores
   MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs,
   ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer
   backend was retaining whatever the caller passed in, diverging in
   semantics and potentially pinning whole transactions in memory. Set now
   clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's
   field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns).
   SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes
   already produces a metadata-only struct.

3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt
   cached entry would otherwise be re-parsed on every subsequent Get for the
   same key. The backend now deletes the bad entry, increments an
   unmarshalErrors counter, and emits a warn log (logger is plumbed from
   NewTxMetaCache).

Plus the three smaller items: GetMetaCached docstring updated to match the
new (*meta.Data, bool) signature; "height encoding" subtest names/comments
replaced with their current behaviour ("bypasses cache", "round-trip"); and
the package doc no longer claims "height-based expiration" — eviction is the
backend's job in both modes.

Test added: TestPointerCache_SetStripsNonCachedFields pins the new
contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
icellan added a commit that referenced this pull request May 20, 2026
Three correctness/observability fixes from the Copilot review on #915, plus
the doc cleanup it flagged.

1. **PointerCache: full-hash map key (collision safety).** Switched the per-
   shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash
   (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount
   for speed; the map lookup itself uses the full hash, eliminating the
   2⁻⁶⁴ collision window that could silently return wrong metadata for a
   different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns.

2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores
   MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs,
   ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer
   backend was retaining whatever the caller passed in, diverging in
   semantics and potentially pinning whole transactions in memory. Set now
   clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's
   field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns).
   SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes
   already produces a metadata-only struct.

3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt
   cached entry would otherwise be re-parsed on every subsequent Get for the
   same key. The backend now deletes the bad entry, increments an
   unmarshalErrors counter, and emits a warn log (logger is plumbed from
   NewTxMetaCache).

Plus the three smaller items: GetMetaCached docstring updated to match the
new (*meta.Data, bool) signature; "height encoding" subtest names/comments
replaced with their current behaviour ("bypasses cache", "round-trip"); and
the package doc no longer claims "height-based expiration" — eviction is the
backend's job in both modes.

Test added: TestPointerCache_SetStripsNonCachedFields pins the new
contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
icellan added a commit to icellan/teranode that referenced this pull request May 21, 2026
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus
the doc cleanup it flagged.

1. **PointerCache: full-hash map key (collision safety).** Switched the per-
   shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash
   (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount
   for speed; the map lookup itself uses the full hash, eliminating the
   2⁻⁶⁴ collision window that could silently return wrong metadata for a
   different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns.

2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores
   MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs,
   ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer
   backend was retaining whatever the caller passed in, diverging in
   semantics and potentially pinning whole transactions in memory. Set now
   clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's
   field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns).
   SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes
   already produces a metadata-only struct.

3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt
   cached entry would otherwise be re-parsed on every subsequent Get for the
   same key. The backend now deletes the bad entry, increments an
   unmarshalErrors counter, and emits a warn log (logger is plumbed from
   NewTxMetaCache).

Plus the three smaller items: GetMetaCached docstring updated to match the
new (*meta.Data, bool) signature; "height encoding" subtest names/comments
replaced with their current behaviour ("bypasses cache", "round-trip"); and
the package doc no longer claims "height-based expiration" — eviction is the
backend's job in both modes.

Test added: TestPointerCache_SetStripsNonCachedFields pins the new
contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
@icellan icellan force-pushed the perf/txmeta-pointer-cache branch from 692c955 to 6ebc707 Compare May 21, 2026 08:19
Comment thread stores/txmetacache/improved_cache_meta.go
Comment thread stores/txmetacache/pointer_cache.go
Comment thread stores/txmetacache/pointer_cache.go
Comment thread stores/txmetacache/pointer_cache.go

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes. Findings beyond ordishs's review.

  1. setMinedInCacheParallel (txmetacache.go:709-738) silently drops BlockIDs in pointer mode. Path: retrieve txMeta → append blockID → SetCache → metadataOnly() strips BlockIDs. The append is a no-op. Anything gating on block membership returns stale answers.

  2. PR body says appendHeightToValue was dropped — it wasn't. Still at line 770, called in all four SetCacheMulti* methods. Comment added but the description is factually wrong. In pointer mode trailing 4 bytes are tolerated by NewMetaDataFromBytes, but len+4 allocation per SetCacheFromBytes partially undermines the zero-alloc claim for byte ingestion.

  3. metadataOnly() value-copies TxInpoints but shares the backing slice arrays — GetMeta's *data = *cached then propagates aliasing to all callers. Reinforces ordishs's #3: callers must treat result read-only, nothing enforces it. Detectable with -race.

  4. GC bench at 64 MB / 100K entries — 12× below production (256 MB / ~800K). Pause numbers not representative at scale.

  5. SetMultiSequentialWithHashes ignores supplied hashes in pointer mode (keys by embedded hash). Divergence from byte-backend keying — latent hazard for callers relying on consistent semantics across modes.

icellan added a commit to icellan/teranode that referenced this pull request May 21, 2026
Four issues raised in the second-round review on PR bsv-blockchain#915:

1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was
   1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti
   fans out across shards. Restored fan-out: group keys by xxhash bucket,
   run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold
   the serial loop is still used to skip bucketing overhead on tiny batches.

2. **Del left stale entries in the FIFO ring.** Del removed the map entry
   but b.ring[head..] kept pointing at the deleted hash, and b.full stayed
   true. A Set→Del→Set cycle could fill the ring with duplicate references
   to the same key, silently reducing effective capacity. Fixed in insert:
   when the ring is full, walk past stale slots (key absent from map)
   before evicting, up to one full lap. Del stays O(1).

3. **metadataOnly aliased TxInpoints slices with the caller.** A struct
   copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs
   sharing backing arrays. Concurrent readers can observe in-place
   mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone
   via Serialize + NewTxInpointsFromBytes — the only public path through
   subtree's API that produces fresh backing slices given voutIdxs is
   unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data)
   path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser
   already returns a fresh TxInpoints.

4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the
   `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17
   via its own length check) and the verbose log+logger plumbing on
   unmarshal failure. Kept the evict + counter increment — a corrupt entry
   would otherwise be reparsed on every subsequent Get for the same key.

Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that
pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST
not count an eviction.

All race tests pass with -tags testtxmetacache. Pointer-mode benchmark
numbers shift on the SetCache path (125 ns → 288 ns) due to the
TxInpoints deep-clone — correctness over headline speed. Get path is
unchanged at ~39 ns / 0 allocs.

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Pointer mode is well-designed, well-tested, and explicitly opt-in. Headline number (11× Get speedup, 0 GCs) is meaningful for a hot path. Approving with changes — one issue should be fixed before pointer mode is enabled in production.

Required before flipping pointer mode on

SetMultiSequential contract violation in PointerCache

stores/txmetacache/pointer_cache.go:295-301. The interface contract (cache_backend.go:30-35) says SetMultiSequential is "without goroutine fan-out" — the byte path's whole purpose is to skip errgroup spawning for partition-aligned receivers. The pointer implementation forwards to SetMultiFromBytes, which does fan out via errgroup per non-empty bucket.

Under pointer mode, the v2 Kafka receiver's "sequential" path will multiply goroutines: N partition workers × ~M buckets each. This defeats the design. Fix is to implement SetMultiSequential as a flat loop over keys/values (mirroring the byte path's small-batch branch) and skip the errgroup branch entirely.

Recommended

PR description is stale vs. the actual diff

Description claims appendHeightToValue, noOfBlocksToKeepInTxMetaCache, and hitOldTx are dropped. None are dropped:

  • appendHeightToValue is still called on every byte-mode insert (txmetacache.go:293, 315, 342, 365)
  • noOfBlocksToKeepInTxMetaCache is still computed and stored but never read (txmetacache.go:233, 246) — dead code
  • hitOldTx is still in metrics struct and reported via Prometheus (metrics.go:131, 146, 165)

Either update the description to reflect what was reverted, or remove the unused pieces as originally intended. The claim "Saves a 4-byte alloc + per-Set GetBlockHeight() call" is no longer accurate.

GetMeta shallow-copies into caller's struct, aliasing the cache in pointer mode

stores/txmetacache/txmetacache.go:428-432:

if cached, ok := t.cache.Get(*hash); ok {
    t.metrics.hits.Add(1)
    *data = *cached
    data.BlockIDs = make([]uint32, 0)

*data = *cached is a struct copy — data.TxInpoints.ParentTxHashes now aliases the cache entry's backing slice. In byte mode this is benign (the returned *meta.Data is freshly allocated). In pointer mode the caller's data shares slices with the cache.

The read-only contract is documented on GetMetaCached, but GetMeta takes a caller-provided *meta.Data — the conventional "mine to mutate" pattern. A caller doing data.TxInpoints.ParentTxHashes = append(...) would corrupt the cache for every concurrent reader. Either document the aliasing rule on GetMeta or deep-clone TxInpoints on the GetMeta path under pointer mode.

metadataOnly fallback path leaks aliased TxInpoints

stores/txmetacache/pointer_cache.go:162-190. When Serialize() or NewTxInpointsFromBytes fails, the function falls through to cloned.TxInpoints = data.TxInpoints — a struct copy sharing ParentTxHashes with the caller. The comment says this is "preferable to dropping the entry entirely," but if the cause is real corruption, holding an entry whose slice is externally mutable is worse than dropping. Consider returning nil and bumping a counter so the rate is observable.

pointerCacheAvgEntryBytes = 320 is workload-dependent

Acknowledged in the PR caveats. 320 is calibrated against 2-parent synthetic txs; production txs with >5-6 parents will overflow and the cache holds fewer entries than budgeted. The MB sizing parameter has a hidden parent-count multiplier. Suggestions:

  • Track actual entries-per-MB in UpdateStats and expose it
  • Or, after the first ~1k inserts, recompute an average entry size and resize the per-bucket rings

Low-priority

  • Ring walk under lock when entries are stale (pointer_cache.go:114-131): worst case after a SetMinedMulti burst is a linear walk of the ring under the bucket lock. Worth a metric (ring_walk_distance_p99 or similar) so the operator can see if this becomes a tail-latency contributor.
  • Pointer mode still pays appendHeightToValue cost even though the deserializer ignores trailing bytes. Skip the append when bucketType == Pointer.
  • Missing test for improvedCacheBackend.Get corruption path — the unmarshalErrors eviction is fail-safe code that should have a regression test.

Strengths

  • Compile-time interface assertion catches contract drift at build time
  • Defensive cloning in metadataOnlyTxInpoints round-trips through serialize/deserialize so the cache never aliases caller-owned backing slices. Exactly the right shape for a shared-pointer cache.
  • Per-shard FIFO eviction is simple and predictable; lock granularity matches the byte path
  • TestPointerCache_DelDoesNotLeakRingCapacity is exactly the right shape — pins a subtle invariant with a deterministic same-bucket collision setup
  • Benchmark coverage is more thorough than most PRs

icellan added 6 commits May 26, 2026 11:13
Introduces an experimental pointer-backed implementation of the txmeta cache,
selected via the new BucketType.Pointer enum. PointerCache stores *meta.Data
directly in sharded native maps with per-bucket FIFO key-ring eviction; reads
return the stored pointer with no copy or deserialisation.

Both backends (ImprovedCache and PointerCache) now satisfy a small internal
cacheBackend interface (Set/SetFromBytes/SetMultiFromBytes/Get/Del/UpdateStats).
TxMetaCache holds one interface field and dispatches uniformly — no per-call-site
if/else branching. The reader path in subtreevalidation switches to the simplified
GetMetaCached(ctx, hash) (*meta.Data, bool) signature; the old caller-managed
scratch-buffer variant is removed.

Also drops dead height-encoding code: appendHeightToValue, noOfBlocksToKeep-
InTxMetaCache, and hitOldTx were referenced in docs but never read by any
eviction path (eviction is the ring buffer's job).

Head-to-head benchmarks (Apple M3 Max, 64 MB cache, count=3 x 2s):

  Operation              Existing byte path   Pointer native   Delta
  Set                    233 ns / 1 alloc     52 ns / 0 allocs  4.5x faster
  Get                    426 ns / 8 allocs    37 ns / 0 allocs  11.5x faster
  Concurrent (32:1 R:W)  121 ns / 7 allocs    74 ns / 0 allocs  39% faster
  GC pause p99           ~0.06-0.19 ms        0 ms              0 GCs in loop

Pointer cache produces zero GCs during the benchmark window: the byte path's
per-Get deserialisation allocates 7 small objects, dominating allocation
traffic. The pointer cache's larger steady-state heap is offset by near-zero
allocation churn.

Default BucketType remains Unallocated; pointer mode is opt-in via the
Pointer enum value. No callsite or wire-protocol changes are required.
Three correctness/observability fixes from the Copilot review on bsv-blockchain#915, plus
the doc cleanup it flagged.

1. **PointerCache: full-hash map key (collision safety).** Switched the per-
   shard map and FIFO ring from uint64 (truncated hash) to chainhash.Hash
   (full 32 bytes). Bucket selection still uses xxhash.Sum64(key) % BucketsCount
   for speed; the map lookup itself uses the full hash, eliminating the
   2⁻⁶⁴ collision window that could silently return wrong metadata for a
   different txid. Memory cost ~24 bytes more per cell, Get latency 37→40 ns.

2. **PointerCache.Set: strip non-cached fields.** ImprovedCache stores
   MetaBytes(), which drops Tx, BlockIDs, BlockHeights, SubtreeIdxs,
   ConflictingChildren, SpendingDatas, LockTime, CreatedAt, etc. The pointer
   backend was retaining whatever the caller passed in, diverging in
   semantics and potentially pinning whole transactions in memory. Set now
   clones into a metadata-only snapshot (`metadataOnly`) matching MetaBytes's
   field set. Cost: 1 alloc per Set in the SetCache path (52→125 ns).
   SetCacheFromBytes (Kafka ingest) is unaffected — NewMetaDataFromBytes
   already produces a metadata-only struct.

3. **improvedCacheBackend.Get: evict + log on unmarshal failure.** A corrupt
   cached entry would otherwise be re-parsed on every subsequent Get for the
   same key. The backend now deletes the bad entry, increments an
   unmarshalErrors counter, and emits a warn log (logger is plumbed from
   NewTxMetaCache).

Plus the three smaller items: GetMetaCached docstring updated to match the
new (*meta.Data, bool) signature; "height encoding" subtest names/comments
replaced with their current behaviour ("bypasses cache", "round-trip"); and
the package doc no longer claims "height-based expiration" — eviction is the
backend's job in both modes.

Test added: TestPointerCache_SetStripsNonCachedFields pins the new
contract — Set with a fat *meta.Data must not retain Tx/BlockIDs/etc.
PR bsv-blockchain#912 (perf(txmeta): v2 wire format + partition-aware receiver) landed on
main while this PR was open and added new uses of appendHeightToValue plus
two new SetCacheMultiSequential* methods that call ImprovedCache's new
SetMultiSequential / SetMultiSequentialWithHashes. My earlier "dead code
removal" was wrong by the time it landed — height encoding is part of the
active contract, not legacy.

Changes:
- Restore appendHeightToValue method, hitOldTx metric (declaration + atomic
  counter + Prometheus gauge), and noOfBlocksToKeepInTxMetaCache field +
  init. Functionally the height bytes are still ignored on read in the byte
  backend, but they're now load-bearing for the v2 receiver in bsv-blockchain#912 and the
  metric is still wired up.
- Re-introduce the height suffix on SetCacheFromBytes and SetCacheMulti.
- Add SetMultiSequential and SetMultiSequentialWithHashes to the
  cacheBackend interface; improvedCacheBackend forwards to the underlying
  *ImprovedCache; PointerCache shares its implementation with
  SetMultiFromBytes (the per-bucket goroutine fan-out doesn't exist there,
  so the partition-aware variant is identical for free).
- Fix the empty-value SetCacheMulti subtest assertion: stored entry is
  again 4 bytes (the height suffix), not 0.
CI's golangci-lint flagged stores/txmetacache/txmetacache.go for gci
import-grouping violations after the rebase. Ran gci with the project's
default settings (stdlib + third-party, no per-prefix split) across the
files touched by this PR.

No behavioural change.
Four issues raised in the second-round review on PR bsv-blockchain#915:

1. **SetMultiFromBytes was a serial loop.** A 1000-tx Kafka batch was
   1000 sequential lock-acquire/insert cycles, while ImprovedCache.SetMulti
   fans out across shards. Restored fan-out: group keys by xxhash bucket,
   run one goroutine per non-empty bucket. Below smallSetMultiBatchThreshold
   the serial loop is still used to skip bucketing overhead on tiny batches.

2. **Del left stale entries in the FIFO ring.** Del removed the map entry
   but b.ring[head..] kept pointing at the deleted hash, and b.full stayed
   true. A Set→Del→Set cycle could fill the ring with duplicate references
   to the same key, silently reducing effective capacity. Fixed in insert:
   when the ring is full, walk past stale slots (key absent from map)
   before evicting, up to one full lap. Del stays O(1).

3. **metadataOnly aliased TxInpoints slices with the caller.** A struct
   copy of TxInpoints leaves ParentTxHashes (exported) and voutIdxs
   sharing backing arrays. Concurrent readers can observe in-place
   mutations from a caller doing `append(ParentTxHashes, …)`. Deep-clone
   via Serialize + NewTxInpointsFromBytes — the only public path through
   subtree's API that produces fresh backing slices given voutIdxs is
   unexported. Cost: ~160 ns + 5 allocs on the SetCache(hash, *meta.Data)
   path; SetCacheFromBytes (Kafka) is unaffected because the deserialiser
   already returns a fresh TxInpoints.

4. **Redundant defensive code in improvedCacheBackend.Get.** Removed the
   `len(scratch) == 0` early return (NewMetaDataFromBytes errors on len<17
   via its own length check) and the verbose log+logger plumbing on
   unmarshal failure. Kept the evict + counter increment — a corrupt entry
   would otherwise be reparsed on every subsequent Get for the same key.

Added regression test (TestPointerCache_DelDoesNotLeakRingCapacity) that
pins bsv-blockchain#2: per-bucket ring size 1, Set → Del → Set on a colliding key MUST
not count an eviction.

All race tests pass with -tags testtxmetacache. Pointer-mode benchmark
numbers shift on the SetCache path (125 ns → 288 ns) due to the
TxInpoints deep-clone — correctness over headline speed. Get path is
unchanged at ~39 ns / 0 allocs.
Three corrections from the second round of reviews on PR bsv-blockchain#915:

1. **SetMultiSequential must not fan out** (ordishs, required).
   The cacheBackend.SetMultiSequential contract says "no goroutine fan-out
   — caller already has parallelism." PointerCache.SetMultiSequential was
   forwarding to SetMultiFromBytes which fans out via errgroup. Under the
   v2 Kafka receiver this multiplies goroutines: N partition workers × M
   non-empty buckets each. Now a flat serial loop. SetMultiSequentialWith-
   Hashes mirrors it. Regression test pins the contract at a coarse level.

2. **GetMeta TxInpoints aliasing in pointer mode** (ordishs + oskarszoon).
   `*data = *cached` is a struct copy but ParentTxHashes still points at
   the cache's backing slice. In pointer mode the cache hands the same
   pointer to every concurrent reader, so a caller doing
   `append(data.TxInpoints.ParentTxHashes, …)` would corrupt every other
   reader. GetMeta now deep-clones TxInpoints under pointer mode via the
   new cloneTxInpoints helper. Byte mode is unaffected (its cached.Data is
   already freshly allocated). On clone failure GetMeta falls through to
   the store path rather than serving aliased data.

3. **metadataOnly fallback must not return aliased struct** (ordishs).
   On Serialize/Deserialize round-trip failure, the previous code returned
   a shallow-copy *meta.Data sharing ParentTxHashes with the caller —
   reintroducing the very aliasing that the clone exists to prevent. Now
   returns (nil, err); Set surfaces the error. Practically infallible for
   valid TxInpoints; on the rare pathological case, refusing to cache is
   better than caching corruptly.

cloneTxInpoints is extracted as a shared helper so the same Serialize +
NewTxInpointsFromBytes round-trip is used by metadataOnly (Set path) and
GetMeta (Get path with pointer-mode aliasing guard).

Regression tests:
- TestPointerCache_SetDoesNotAliasCallerSlice — mutates the caller's
  ParentTxHashes after Set and asserts the cached entry is unchanged.
- TestPointerCache_SetMultiSequentialDoesNotFanOut — exercises the
  sequential path with a 1024-entry batch and verifies every key is
  present after the synchronous call.

All race tests pass under -tags testtxmetacache. Pointer-mode Set is
unchanged (~288 ns including the deep-clone). GetMeta under pointer mode
gains one extra Serialize+Deserialize per hit (~150 ns) — paid only on
the non-hot-path Get (validator hot path uses GetMetaCached).
@icellan icellan force-pushed the perf/txmeta-pointer-cache branch from 1b08533 to 453f53f Compare May 26, 2026 09:25
@icellan icellan requested a review from oskarszoon May 26, 2026 09:42

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review at d5e8118. Most prior findings addressed, but the BlockIDs blocker is still unfixed.

Still outstanding — must fix:

setMinedInCacheParallel (txmetacache.go:735-765) wasn't touched by any of the fixup commits. Path is still: append blockID → SetCachecache.SetmetadataOnly() strips BlockIDs. Mined updates via this path don't persist in pointer mode. Two-option fix:

  • (A) Replace SetCache with cache.Del to force a fresh store fetch on next read.
  • (B) Skip the SetCache call when BlockIDs are already set, matching byte-mode semantics.

Resolved:

  • TxInpoints aliasing — metadataOnly and GetMeta deep-clone via cloneTxInpoints (Serialize + NewTxInpointsFromBytes). TestPointerCache_SetDoesNotAliasCallerSlice pins it. Race tests green.
  • SetMultiSequentialWithHashes ignored-hash divergence — documented as accepted (both backends produce identical end-state via re-derived bucket from key bytes).
  • appendHeightToValue retention — documented with rationale in code, accepted.
  • ordishs's SetMultiSequential fan-out — fixed, flat serial loop, test added.
  • ordishs's Del orphan ring slot — fixed, stale-slot walk in insert(), test added.
  • 64-bit hash collision class — keys switched to full chainhash.Hash.

Still worth noting (not blocking):

  • GC bench at 64 MB / 100K entries — 12× below prod load (256 MB / 800K). The headline GC pause numbers in the PR body aren't representative.

icellan and others added 2 commits May 26, 2026 12:10
setMinedInCacheParallel was reading the tx, appending the new blockID, then
calling SetCache. SetCache routes through the cacheBackend.Set adapter
which strips BlockIDs in both backends — ImprovedCache because MetaBytes's
wire format doesn't carry the field, PointerCache because metadataOnly
strips it to match. The append + SetCache dance was therefore a no-op:
mined updates via this path didn't persist.

Replace the SetCache call with an eviction. The next read on each hash
falls through to the underlying store, which has the up-to-date BlockIDs.
Matches the established pattern in SetMinedMulti above.

The function's `blockID` parameter is now unused — kept for API symmetry
with the byte path. The redundant Get + utxoStore.Get fallback is also
gone; we don't need to read the entry just to delete it from cache.

Regression test added: TestTxMetaCache_Coverage/SetMinedMultiParallel_EvictsCache
seeds the cache, calls SetMinedMultiParallel, and asserts the entry is
absent via GetMetaCached.
@icellan icellan requested a review from oskarszoon May 26, 2026 13:28
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
69.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve at cb606b158a. BlockIDs blocker correctly fixed.

ae4d4c4386 goes further than my suggested Option A. The whole Get+append+SetCache dance in setMinedInCacheParallel is gone, replaced with a single t.Delete(ctx, hash) per hash. blockID param is now _ uint32 (API symmetry only).

End-to-end in pointer mode: Delete evicts stale entry → next GetMeta misses cache → falls through to utxoStore.GetMeta → returns updated BlockIDs from store → len(data.BlockIDs) != 0 so SetCache is skipped and the entry stays out of cache. Matches SetMinedMulti semantics. Bug resolved.

SetMinedMultiParallel_EvictsCache test pins the eviction contract.

The 5 main merges (742793902, 958f99088, bb7e4f049, d2e5e09aa, 50822d938) are in unrelated areas (orphan pool, opcodeInvert, p2p fee policy, testcontainers deps, dashboards) with zero interaction with txmetacache.

Worth flagging for future maintainers (not a blocker)

SetMinedMultiParallel is now a cache-only operation — it never calls utxoStore.SetMinedMulti. The "store written first, then evict cache" invariant is the caller's responsibility. There are zero production callers today (tests only). When a real caller lands, it must follow the same pattern SetMinedMulti uses: write to store first, then call this to evict. Worth a one-line godoc on the function to lock that in.

@icellan icellan merged commit 148efa6 into bsv-blockchain:main May 26, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants