Skip to content

tx metacache performance improvements#820

Merged
gokutheengineer merged 2 commits into
bsv-blockchain:mainfrom
gokutheengineer:gokhan/improve-txmetacache-performance
May 14, 2026
Merged

tx metacache performance improvements#820
gokutheengineer merged 2 commits into
bsv-blockchain:mainfrom
gokutheengineer:gokhan/improve-txmetacache-performance

Conversation

@gokutheengineer

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Summary: This PR implements focused performance optimizations for tx metacache: buffer pooling, small-batch optimization, metrics aggregation, and a critical slice bounds fix. The changes are well-structured and maintain correctness.

Positive findings:

  • Buffer pooling (txmetacache.go:60-81): Reduces allocations in hot paths with proper size limits and cleanup
  • Small-batch optimization (improved_cache.go:92, 397-403, 460-466, 513-519): Avoids goroutine overhead for ≤32 operations
  • Metrics aggregation (metrics.go:55-169): Clean registry pattern for multi-instance metrics with proper locking
  • Critical fix (improved_cache.go:1574): Changed len(data) > 0 to len(data) >= ChunkSize, preventing undersized chunk allocation
  • New API GetMetaCachedWithBuffer: Enables buffer reuse across calls in hot loops (processTxMetaUsingCache.go:105, 124)

No critical issues found. The existing inline comment threads on lines 74 and 399 have already been addressed by previous reviews.

Comment thread stores/txmetacache/improved_cache.go
Comment thread stores/txmetacache/txmetacache.go
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-820 (313aff7)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.746µ 1.582µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.48n 71.29n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.33n 72.33n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.16n 70.98n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 34.03n 34.29n ~ 0.500
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 57.99n 57.81n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 143.1n 143.0n ~ 1.000
MiningCandidate_Stringify_Short-4 224.6n 222.5n ~ 0.100
MiningCandidate_Stringify_Long-4 1.651µ 1.679µ ~ 0.100
MiningSolution_Stringify-4 849.4n 860.6n ~ 0.100
BlockInfo_MarshalJSON-4 1.755µ 1.782µ ~ 0.300
NewFromBytes-4 128.3n 129.0n ~ 0.200
Mine_EasyDifficulty-4 67.21µ 67.38µ ~ 0.700
Mine_WithAddress-4 7.453µ 7.302µ ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 57.68n 61.70n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 31.42n 31.59n ~ 1.000
DirectSubtreeAdd/256_per_subtree-4 30.22n 30.19n ~ 0.400
DirectSubtreeAdd/1024_per_subtree-4 28.96n 29.04n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.53n 28.64n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 279.3n 279.2n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 271.1n 271.4n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 272.2n 274.8n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 265.6n 264.6n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 262.8n 266.3n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 268.9n 271.0n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 269.7n 270.7n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 268.9n 270.7n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 269.3n 268.6n ~ 1.000
SubtreeNodeAddOnly/4_per_subtree-4 54.54n 54.05n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 34.14n 34.50n ~ 0.400
SubtreeNodeAddOnly/256_per_subtree-4 33.41n 33.34n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 32.58n 32.69n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 113.2n 112.1n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 391.2n 391.9n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.318µ 1.320µ ~ 1.000
SubtreeCreationOnly/1024_per_subtree-4 4.371µ 4.335µ ~ 0.700
SubtreeCreationOnly/2048_per_subtree-4 7.924µ 8.087µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 268.4n 266.9n ~ 0.800
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 268.6n 270.8n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 795.8µ 583.2µ ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 1.579m 1.342m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.735m 6.618m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 13.38m 13.22m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 657.0µ 655.0µ ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 2.772m 2.737m ~ 0.200
SequentialGetAndSetIfNotExists/50k_nodes-4 10.39m 10.37m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 19.79m 19.95m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 621.6µ 627.8µ ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.158m 4.141m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.50m 16.48m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 685.8µ 689.7µ ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 5.811m 5.838m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 37.72m 37.53m ~ 0.400
BlockAssembler_AddTx-4 0.02803n 0.02802n ~ 0.700
AddNode-4 11.54 10.98 ~ 0.700
AddNodeWithMap-4 10.74 10.90 ~ 1.000
DiskTxMap_SetIfNotExists-4 3.570µ 3.674µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.210µ 3.724µ ~ 0.100
DiskTxMap_ExistenceOnly-4 300.3n 299.3n ~ 1.000
Queue-4 152.2n 151.0n ~ 0.400
AtomicPointer-4 2.834n 2.842n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 664.5µ 684.8µ ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/10K-4 610.4µ 652.2µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 95.79µ 93.02µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 49.91µ 49.51µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 42.92µ 55.97µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 8.539µ 8.684µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 3.616µ 3.793µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.249µ 1.310µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 7.448m 7.849m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 7.386m 7.436m ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/100K-4 893.8µ 881.3µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 546.8µ 549.6µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 477.8µ 464.8µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 193.5µ 203.1µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/100K-4 38.75µ 40.75µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 13.97µ 14.16µ ~ 0.100
TxMapSetIfNotExists-4 35.70n 35.67n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 29.94n 29.91n ~ 1.000
ChannelSendReceive-4 469.8n 446.9n ~ 0.100
CalcBlockWork-4 532.6n 547.0n ~ 1.000
CalculateWork-4 668.1n 671.7n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.308µ 1.290µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 15.89µ 13.12µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 123.0µ 121.9µ ~ 0.200
CatchupWithHeaderCache-4 104.2m 104.3m ~ 0.400
_prepareTxsPerLevel-4 406.7m 425.2m ~ 0.100
_prepareTxsPerLevelOrdered-4 3.693m 3.877m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 413.3m 426.9m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.709m 3.575m ~ 0.200
_BufferPoolAllocation/16KB-4 2.376µ 2.507µ ~ 0.400
_BufferPoolAllocation/32KB-4 5.363µ 4.999µ ~ 0.400
_BufferPoolAllocation/64KB-4 10.58µ 12.25µ ~ 0.100
_BufferPoolAllocation/128KB-4 21.65µ 24.59µ ~ 0.100
_BufferPoolAllocation/512KB-4 89.95µ 85.42µ ~ 0.700
_BufferPoolConcurrent/32KB-4 13.47µ 12.45µ ~ 0.400
_BufferPoolConcurrent/64KB-4 21.60µ 19.45µ ~ 0.100
_BufferPoolConcurrent/512KB-4 108.3µ 107.0µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 481.4µ 468.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 482.5µ 483.9µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/64KB-4 475.0µ 491.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 477.0µ 488.6µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/512KB-4 490.1µ 490.8µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 27.31m 27.78m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 27.25m 27.31m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 27.63m 27.08m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 27.18m 27.17m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 27.17m 26.86m ~ 0.700
_PooledVsNonPooled/Pooled-4 642.1n 641.6n ~ 0.700
_PooledVsNonPooled/NonPooled-4 4.891µ 5.606µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 5.322µ 5.440µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 7.249µ 7.072µ ~ 0.400
_MemoryFootprint/Alternative_64KB_32concurrent-4 6.952µ 6.728µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.468m 1.410m ~ 0.100
SubtreeSizes/10k_tx_16_per_subtree-4 337.3µ 326.4µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 82.37µ 78.95µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 20.99µ 20.05µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 10.282µ 9.844µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 5.078µ 4.898µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.553µ 2.455µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 80.07µ 77.31µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 20.30µ 19.70µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 5.114µ 4.875µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 401.6µ 398.7µ ~ 0.400
BlockSizeScaling/50k_tx_256_per_subtree-4 101.42µ 96.55µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 25.31µ 23.83µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 164.8µ 157.3µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 174.3µ 169.8µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 333.8µ 319.0µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 10.127µ 9.634µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 11.00µ 10.69µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 21.00µ 20.06µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.537µ 2.376µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.707µ 2.636µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 5.245µ 5.081µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 326.7µ 323.9µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 317.3µ 328.3µ ~ 0.100
GetUtxoHashes-4 257.2n 256.1n ~ 0.400
GetUtxoHashes_ManyOutputs-4 47.58µ 43.71µ ~ 0.100
_NewMetaDataFromBytes-4 239.9n 241.8n ~ 0.100
_Bytes-4 634.7n 635.6n ~ 1.000
_MetaBytes-4 575.3n 573.3n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-13 12:57 UTC

@gokutheengineer gokutheengineer force-pushed the gokhan/improve-txmetacache-performance branch from feda242 to 8a0594b Compare May 11, 2026 12:52
@gokutheengineer gokutheengineer enabled auto-merge (squash) May 13, 2026 11:26
@gokutheengineer gokutheengineer disabled auto-merge May 13, 2026 11:26
@gokutheengineer gokutheengineer force-pushed the gokhan/improve-txmetacache-performance branch from 532bc21 to 2028ab0 Compare May 13, 2026 11:31
@sonarqubecloud

Copy link
Copy Markdown

}

found, err = p.cache.GetMetaCached(p.ctx, txHash, &txMeta)
cachedBytes, found, err = p.cache.GetMetaCachedWithBuffer(p.ctx, txHash, &txMeta, cachedBytes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass in and return? If the variable being passed in, or passed in by reference, you stay on the stack, instead of heap

@gokutheengineer gokutheengineer merged commit 3c30bc9 into bsv-blockchain:main May 14, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants