Skip to content

perf(subtreeprocessor): optimize reorg with parallel bulk operations#526

Merged
freemans13 merged 32 commits into
bsv-blockchain:mainfrom
freemans13:stu/blockassembly-reorg-performance-optimisation
Jun 5, 2026
Merged

perf(subtreeprocessor): optimize reorg with parallel bulk operations#526
freemans13 merged 32 commits into
bsv-blockchain:mainfrom
freemans13:stu/blockassembly-reorg-performance-optimisation

Conversation

@freemans13

@freemans13 freemans13 commented Feb 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Profiling reorg operations at 100K transaction scale revealed three main bottlenecks: per-node lock acquisition in SplitTxInpointsMap (O(N) sequential lock/unlock cycles), sequential UTXO marking where mark(true) and mark(false) operate on disjoint hash sets but ran one after the other, and sequential subtree construction through addNode() calls.

This PR addresses each bottleneck:

  • Replace txmap.SyncedMap with swiss.Map + sync.Mutex per bucket in SplitTxInpointsMap, matching the proven pattern already used by SplitSwissMap, SplitSyncedParentMap, and the block persister UTXO maps. Contiguous []txInpointsBucket slice instead of map[uint16]* for cache-friendly access.

  • Add ParallelBulkSetIfNotExists that groups entries by bucket, then processes all non-empty buckets in parallel with a single lock acquisition per bucket. Reduces lock overhead from O(N) sequential to O(N/buckets) parallel. Wired into moveBackBlockBulkBuild and processRemainderTxHashes.

  • Introduce bulkBuildSubtrees for parallel subtree construction, replacing per-node addNode() calls (which each acquire a mutex and check IsComplete). Full subtrees are built concurrently via errgroup.

  • Run MarkTransactionsOnLongestChain(true) and MarkTransactionsOnLongestChain(false) concurrently via errgroup — they operate on disjoint hash sets.

  • Parallel subtree announcements: batch send to newSubtreeChan, then batch wait for responses, overlapping send/receive.

  • Remove hash_slice_pool.gosync.Pool overhead exceeded the allocation savings at reorg scale.

Benchmark Results

Reorg performance (mock UTXO store, no race detector):

Scale MoveBack MoveForward Total Alloc (MB) Throughput
1K 2ms 7ms 8ms 146 239K tx/sec
10K 7ms 28ms 35ms 294 565K tx/sec
50K 10ms 34ms 44ms 389 2.3M tx/sec
100K 13ms 55ms 68ms 443 2.9M tx/sec

Behavioral change (not purely perf)

Beyond the performance work above, this PR also introduces new invalidation-gating logic that does not exist on main. When the reorg is a single-block invalidation (len(moveBackBlocks) == 1 && len(moveForwardBlocks) == 0) and the moved-back block's header is marked Invalid, markNotOnLongestChain no longer unconditionally marks every passed tx as not-on-longest-chain. Instead, the new checkMarkNotOnLongestChain filters the set:

  • A tx whose only BlockID is the invalid block → marked not-on-longest-chain.
  • A tx still present in any of the last 1000 block headers → left untouched (still on longest chain).
  • Otherwise → resolved via blockchainClient.CheckBlockIsInCurrentChain, marking only those not on the current chain.

This is a consensus-adjacent change to observable behavior during block invalidations, not a performance optimization. Flagging it explicitly so it is not overlooked under the perf(...) title.

Covered by new tests: TestCheckMarkNotOnLongestChain, TestCheckMarkNotOnLongestChain_TxNotFound, and TestMarkNotOnLongestChain_InvalidGating.

Test plan

  • All existing subtreeprocessor tests pass with -race
  • New SplitTxInpointsMap unit tests (9 tests covering CRUD, concurrency, bucket distribution, bulk operations)
  • New reorg benchmarks at 1K–100K scale

🤖 Generated with Claude Code

…InpointsMap with parallel bulk operations

- Deleted hashSlicePool to streamline memory management.
- Introduced parallel bulk operations in SplitTxInpointsMap for improved performance during transaction handling.
- Added comprehensive tests for SplitTxInpointsMap functionalities, ensuring correctness under concurrent access.
- Removed outdated benchmark tests and added new benchmarks for reorg operations.
@freemans13 freemans13 self-assigned this Feb 24, 2026
@github-actions

github-actions Bot commented Feb 24, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


All previously reported issues have been verified as fixed. The PR demonstrates strong engineering discipline:

Performance Optimizations (verified):

  • SplitTxInpointsMap now uses swiss.Map + per-bucket mutex (matching existing patterns)
  • Parallel bulk operations reduce lock overhead from O(N) to O(N/buckets)
  • Concurrent mark(true)/mark(false) operations with disjoint hash sets
  • Parallel subtree construction with bounded concurrency (GOMAXPROCS)

Behavioral Change (well-tested):
The invalidation-gating logic introduces new consensus-adjacent behavior. When a single block is invalidated (len(moveBackBlocks)==1, len(moveForwardBlocks)==0, blockHeaderMeta.Invalid==true), transactions are filtered:

  • Only in invalid block → marked not-on-longest-chain
  • In recent 1000 blocks → left untouched
  • Otherwise → resolved via CheckBlockIsInCurrentChain

This is thoroughly covered by TestCheckMarkNotOnLongestChain, TestCheckMarkNotOnLongestChain_TxNotFound, and TestMarkNotOnLongestChain_InvalidGating.

Code Quality:

  • Consistent use of SafeSetLimit across all errgroup operations
  • Proper context propagation through all call chains
  • Comprehensive test coverage including edge cases and concurrency safety
  • Clear documentation of safety invariants (disjointness guarantees, etc.)

History:

  • ✅ Fixed: Context now properly passed through bulkBuildSubtrees (line 2169)
  • ✅ Fixed: foundInRecentBlock flag correctly skips RPC call (lines 3551-3559)

No new issues identified.

@freemans13 freemans13 changed the title =refactor(subtreeprocessor): remove hashSlicePool and enhance SplitTx… perf(subtreeprocessor): optimize reorg with parallel bulk operations Feb 24, 2026
…ocessing

- Updated the calculation of numWorkers in processRemainderTxHashes to cap the maximum number of workers at 16, enhancing resource management during parallel processing.
…and improve error handling

- Updated the moveBackBlock method to return fewer values, enhancing clarity and reducing complexity.
- Adjusted related benchmark and test cases to reflect the new method signature.
- Improved error handling for cases where a block is not provided, ensuring consistent error messaging.
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
…eorg-performance-optimisation

# Conflicts:
#	services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on improving subtreeprocessor reorg throughput by reducing lock contention and introducing parallel/bulk-style processing paths, alongside adding benchmarks to quantify the impact.

Changes:

  • Reworks SplitTxInpointsMap to use swiss.Map with per-bucket mutexes and adds ParallelBulkSetIfNotExists.
  • Adds bulk/parallel subtree rebuild logic for reorg paths and runs mark-on-longest-chain operations concurrently.
  • Adds production-scale reorg/marking benchmarks and removes older allocation-focused benchmark/pooling code.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
services/blockassembly/subtreeprocessor/SubtreeProcessor.go Adds bulk subtree construction + concurrent marking; refactors reorg/moveBack remainder processing to use bulk paths.
services/blockassembly/subtreeprocessor/map.go Updates SplitTxInpointsMap implementation and introduces ParallelBulkSetIfNotExists.
services/blockassembly/subtreeprocessor/map_test.go Adds unit tests for the new SplitTxInpointsMap behavior and bulk insert correctness/concurrency.
services/blockassembly/subtreeprocessor/reorg_benchmark_test.go Adds large-scale reorg benchmarks and baseline test runner.
stores/utxo/aerospike/longest_chain_production_bench_test.go Adds benchmark comparing sequential vs concurrent marking on disjoint sets.
services/blockassembly/subtreeprocessor/reorg_alloc_benchmark_test.go Removes prior allocation-focused benchmark suite.
services/blockassembly/subtreeprocessor/hash_slice_pool.go Removes hash slice pooling implementation.
services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go Updates tests to call moveBackBlock directly and uses a blockchain mock where now required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/reorg_benchmark_test.go Outdated
Comment thread services/blockassembly/subtreeprocessor/reorg_benchmark_test.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/map.go
- Fix logic bug where continue only broke inner loop, causing
  CheckBlockIsInCurrentChain to always be called even when tx was
  found in recent blocks. Use flag variable instead.
- Thread ctx through bulkBuildSubtrees to propagate cancellation
  and tracing spans via errgroup.WithContext(ctx).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use stp.newSubtree() instead of subtreepkg.NewTreeByLeafCount() in
  bulkBuildSubtrees to respect mmap configuration
- Add SubtreeIndex bookkeeping for diskTxMap in bulkBuildSubtrees so
  removeTxFromSubtrees can do O(1) lookup instead of linear scan
- Reset currentSubtree when completed subtree moves to chainedSubtrees
  to prevent dual-reference corruption
- Add nil guard for previousCurrentSubtree in moveBackBlockBulkBuild
- Fix error wrapping that included potentially nil err variable
- Add precondition checks in ParallelBulkSetIfNotExists
- Fix benchmark goroutine leak with done channel cleanup
- Replace var _ = bt.NewTx() with var _ *bt.Tx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/reorg_benchmark_test.go
Comment thread services/blockassembly/subtreeprocessor/reorg_benchmark_test.go
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go
Comment thread services/blockassembly/subtreeprocessor/map.go
- Add DeletedTxs and OnStorageComplete to finalizeBulkBuildSubtrees
  NewSubtreeRequest to prevent race conditions with tx inpoints and
  cleanup leaked deletedTxs entries
- Replace goroutine-per-subtree with batch send/wait pattern using
  buffered errChans to bound concurrency
- Use explicit per-iteration copy for hash pointer in
  checkMarkNotOnLongestChain for clarity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-526 (8c9b7c7)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.615µ 1.610µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.26n 71.39n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.40n 71.56n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.18n 71.49n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.44n 33.17n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.52n 55.63n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 132.7n 144.3n ~ 0.100
MiningCandidate_Stringify_Short-4 221.6n 221.2n ~ 1.000
MiningCandidate_Stringify_Long-4 1.630µ 1.702µ ~ 0.100
MiningSolution_Stringify-4 852.6n 854.9n ~ 0.700
BlockInfo_MarshalJSON-4 1.758µ 1.790µ ~ 0.200
NewFromBytes-4 128.9n 131.1n ~ 0.200
AddTxBatchColumnar_Validation-4 2.510µ 2.497µ ~ 0.300
OffsetValidationLoop-4 636.6n 640.6n ~ 1.000
Mine_EasyDifficulty-4 66.14µ 66.48µ ~ 0.400
Mine_WithAddress-4 7.413µ 7.461µ ~ 0.700
BlockAssembler_AddTx-4 0.02492n 0.02152n ~ 0.100
AddNode-4 10.970 9.797 ~ 0.100
AddNodeWithMap-4 11.500 9.950 ~ 0.100
DiskTxMap_SetIfNotExists-4 3.293µ 3.566µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.307µ 3.397µ ~ 0.100
DiskTxMap_ExistenceOnly-4 292.9n 311.6n ~ 0.100
Queue-4 188.3n 188.5n ~ 1.000
AtomicPointer-4 4.915n 4.492n ~ 0.100
TxMapSetIfNotExists-4 52.59n 51.94n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 40.60n 39.92n ~ 0.100
ChannelSendReceive-4 603.0n 581.2n ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 57.36n 63.52n ~ 0.200
DirectSubtreeAdd/64_per_subtree-4 29.91n 29.56n ~ 0.400
DirectSubtreeAdd/256_per_subtree-4 27.92n 28.22n ~ 0.200
DirectSubtreeAdd/1024_per_subtree-4 26.72n 26.83n ~ 0.600
DirectSubtreeAdd/2048_per_subtree-4 26.44n 26.32n ~ 0.400
SubtreeProcessorAdd/4_per_subtree-4 300.1n 332.9n ~ 0.100
SubtreeProcessorAdd/64_per_subtree-4 294.5n 309.2n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 301.3n 298.6n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 293.8n 311.1n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 295.7n 298.3n ~ 0.700
SubtreeProcessorRotate/4_per_subtree-4 290.3n 313.9n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 302.0n 302.6n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 310.3n 305.6n ~ 1.000
SubtreeProcessorRotate/1024_per_subtree-4 374.6n 299.2n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 55.96n 54.94n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 36.47n 36.25n ~ 0.200
SubtreeNodeAddOnly/256_per_subtree-4 35.43n 35.21n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 34.84n 34.64n ~ 0.600
SubtreeCreationOnly/4_per_subtree-4 113.4n 115.8n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 371.5n 397.8n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.305µ 1.299µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.016µ 4.599µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 7.277µ 8.471µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 283.2n 303.2n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 337.8n 322.6n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 2.064m 12.948m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.434m 15.595m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 8.325m 17.866m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 11.42m 21.69m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.822m 12.189m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 6.080m 17.264m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 19.74m 25.67m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 28.31m 31.10m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.114m 11.105m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.920m 15.356m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.29m 19.76m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.916m 12.959m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.597m 17.435m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 49.91m 67.03m ~ 0.100
CalcBlockWork-4 475.2n 468.8n ~ 0.100
CalculateWork-4 640.1n 647.3n ~ 0.200
CheckOldBlockIDs/on-chain-prefetch/1000-4 68.60µ 68.61µ ~ 1.000
CheckOldBlockIDs/off-chain-prefetch/1000-4 51.92µ 52.59µ ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/10000-4 464.0µ 461.0µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/10000-4 347.9µ 349.2µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.358µ 1.339µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 12.94µ 12.91µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_1000-4 127.0µ 127.1µ ~ 1.000
CatchupWithHeaderCache-4 104.3m 104.6m ~ 0.700
_prepareTxsPerLevel-4 453.3m 459.7m ~ 1.000
_prepareTxsPerLevelOrdered-4 4.130m 4.237m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 446.2m 464.1m ~ 0.200
_prepareTxsPerLevel_Comparison/Optimized-4 4.256m 4.336m ~ 0.400
_BufferPoolAllocation/16KB-4 4.507µ 4.005µ ~ 0.200
_BufferPoolAllocation/32KB-4 8.328µ 8.609µ ~ 1.000
_BufferPoolAllocation/64KB-4 16.76µ 21.77µ ~ 0.100
_BufferPoolAllocation/128KB-4 33.38µ 31.10µ ~ 0.200
_BufferPoolAllocation/512KB-4 107.1µ 120.2µ ~ 0.100
_BufferPoolConcurrent/32KB-4 19.46µ 19.64µ ~ 1.000
_BufferPoolConcurrent/64KB-4 32.86µ 30.83µ ~ 0.100
_BufferPoolConcurrent/512KB-4 152.6µ 154.3µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 686.6µ 678.8µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/32KB-4 701.4µ 662.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 690.8µ 674.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 714.0µ 656.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 649.8µ 604.9µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.69m 36.88m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.25m 36.47m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.19m 36.81m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 37.20m 36.92m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.81m 36.48m ~ 0.700
_PooledVsNonPooled/Pooled-4 841.6n 676.9n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.328µ 7.827µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 7.509µ 7.691µ ~ 1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.025µ 9.489µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.418µ 9.089µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.270m 1.424m ~ 0.100
SubtreeSizes/10k_tx_16_per_subtree-4 303.2µ 307.7µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 71.71µ 72.87µ ~ 0.200
SubtreeSizes/10k_tx_256_per_subtree-4 17.74µ 18.22µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 8.807µ 8.952µ ~ 0.200
SubtreeSizes/10k_tx_1024_per_subtree-4 4.361µ 4.362µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.224µ 2.186µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 70.87µ 69.99µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 17.52µ 17.60µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.310µ 4.327µ ~ 0.400
BlockSizeScaling/50k_tx_64_per_subtree-4 370.3µ 370.2µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 87.37µ 88.17µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.28µ 21.65µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 150.0µ 150.0µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 158.7µ 157.8µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 309.7µ 310.8µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 8.806µ 8.944µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 9.257µ 9.342µ ~ 0.200
SubtreeAllocations/medium_subtrees_full_validation-4 17.41µ 17.35µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.074µ 2.078µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.225µ 2.239µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.341µ 4.354µ ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 338.0µ 333.3µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 337.5µ 337.3µ ~ 1.000
GetUtxoHashes-4 277.0n 279.3n ~ 0.400
GetUtxoHashes_ManyOutputs-4 46.78µ 46.00µ ~ 0.100
_NewMetaDataFromBytes-4 213.1n 214.1n ~ 0.200
_Bytes-4 390.0n 391.2n ~ 0.400
_MetaBytes-4 138.7n 139.1n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-05 16:34 UTC

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
return true
})
rawLosingTxHashes = append(rawLosingTxHashes, losingTxHashesMap.Keys()...)

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reorgBlocks, building rawLosingTxHashes via losingTxHashesMap.Keys() reintroduces a full-slice allocation/copy of all losing hashes. Since txmap.TxMap already supports Iter(...) (used previously), this can be avoided by iterating the map directly and building the deduped/filtered losing set without the intermediate []chainhash.Hash.

This is in the critical reorg hot path and appears to negate the intended “avoid Keys()” optimization described in the PR summary.

Suggested change
rawLosingTxHashes = append(rawLosingTxHashes, losingTxHashesMap.Keys()...)
losingTxHashesMap.Iter(func(hash chainhash.Hash, _ struct{}) bool {
rawLosingTxHashes = append(rawLosingTxHashes, hash)
return true
})

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — switched from losingTxHashesMap.Keys() to losingTxHashesMap.Iter() to avoid the intermediate slice allocation, consistent with how transactionMap.Iter() is used just above for the winning tx set.

Comment on lines +4070 to +4073
// Pre-allocate result arrays indexed by position
existedInTxMap := make([]bool, n) // true if SetIfExists found the key
existsInLosingMap := make([]bool, n) // true if in losingTxHashesMap
isRemoveMap := make([]bool, n) // true if in removeMap (to delete)

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processRemainderTxHashes switched from a packed []byte bitset to three separate []bool arrays (existedInTxMap, existsInLosingMap, isRemoveMap). For large subtrees this is a 3× memory increase (plus GC pressure) in a very hot reorg path.

If the previous bitset approach was correct, consider restoring it (or another packed representation) to keep per-node state to ~1 byte rather than ~3 bytes.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and the 3x memory difference (3 bytes vs 1 byte per node for []bool vs packed bitset) is negligible relative to the node data itself (each subtreepkg.Node is ~72 bytes with hash + fee + size). At 100K nodes, it's ~300KB vs ~100KB for the flags, while the node data is ~7MB. The clarity benefit of separate named bools (existedInTxMap, existsInLosingMap, isRemoveMap) outweighs the marginal memory saving in this case. Happy to revisit if profiling shows this becoming a bottleneck at larger scales.

Comment on lines +754 to +760
// TestReorgBenchmarkBaseline runs the full reorg at multiple scales and prints a summary table.
// This is a test (not benchmark) for easy "go test -run" invocation with human-readable output.
func TestReorgBenchmarkBaseline(t *testing.T) {
if testing.Short() {
t.Skip("skipping baseline benchmark in short mode")
}

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestReorgBenchmarkBaseline is a regular test (not a benchmark) and will run in normal go test runs unless -short is enabled. Given it performs multi-scale reorg runs up to 100K transactions, this is likely to make CI/unit test runs very slow or flaky.

Consider moving this into a Benchmark... (so it only runs under -bench), gating it behind a build tag (e.g., //go:build longtest), or requiring an explicit env var to run.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test already has a testing.Short() guard that skips it in short mode (line 758). Our CI uses make test which passes -short, so this won't run in normal CI. For local go test without -short, the 100K scale takes ~15s total which is reasonable for a correctness baseline. I think the current gating is sufficient, but happy to add a build tag if it causes issues in practice.

- Replace losingTxHashesMap.Keys() with Iter() to avoid intermediate
  slice allocation in the reorg hot path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3017 to +3024
h := hash // per-iteration copy for pointer safety
g.Go(func() error {
txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)
if err != nil {
return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)
}

txMetas[idx] = txMeta

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutine closure inside the for idx, hash := range markNotOnLongestChain loop captures the loop variable idx. Because range variables are reused, concurrent goroutines can write to the wrong index in txMetas, leading to nil entries or out-of-bounds behavior and incorrect marking decisions. Capture idx per iteration (e.g., idx := idx or pass it as a parameter to the closure) before assigning txMetas[idx] = txMeta.

Suggested change
h := hash // per-iteration copy for pointer safety
g.Go(func() error {
txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)
if err != nil {
return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)
}
txMetas[idx] = txMeta
h := hash // per-iteration copy for pointer safety
idxCopy := idx // per-iteration copy to avoid closure capturing loop variable
g.Go(func() error {
txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)
if err != nil {
return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)
}
txMetas[idxCopy] = txMeta

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer applies on current HEAD (Go 1.26 per go.mod). Since Go 1.22 loop variables are per-iteration scoped, so the closure capturing idx is race-free, and each goroutine writes a distinct txMetas[idx]. The code also keeps the defensive h := hash copy for the &h pointer. No shared-index write race exists.

Comment on lines +254 to +260
for i := 0; i < count; i++ {
if isFirst && i == 0 {
// Skip first slot for coinbase in first subtree
continue
}
err = subtree.AddSubtreeNode(subtreepkg.Node{
Hash: txHashes[idx+i],

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first subtree case, the loop for i := 0; i < count; i++ { if isFirst && i == 0 { continue } ... txHashes[idx+i] } skips txHashes[idx] entirely, but idx is later advanced by subtree.Length()-1. This drops the first tx hash and then causes the last tx of the first subtree to be duplicated at the start of the next subtree, skewing the benchmark scenario (and potentially introducing duplicates). Adjust the indexing/count logic for the coinbase slot (e.g., iterate over tx hashes separately from leaf positions, or set count to min(remaining, subtreeSize-1) for the first subtree).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit dc3e520. buildAndStoreSubtrees now tracks a separate consumed counter and indexes txHashes[idx+consumed], and advances idx by consumed — so txHashes[idx] is no longer dropped and no tx is duplicated across subtree boundaries.

Comment on lines +989 to +995
for i := 0; i < count; i++ {
if isFirst && i == 0 {
continue
}
err = subtree.AddSubtreeNode(subtreepkg.Node{
Hash: txHashes[idx+i],
Fee: 100,

Copilot AI Mar 25, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same off-by-one issue as buildAndStoreSubtrees: when isFirst && i == 0 is skipped, txHashes[idx] is never added but idx is only advanced by subtree.Length()-1, causing the last tx from the first subtree to be reused as the first tx in the next subtree. This makes the benchmark data contain unintended gaps/duplicates. Fix by aligning tx-hash indexing with the coinbase leaf handling (e.g., reduce count for the first subtree or increment a separate tx index when adding nodes).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit dc3e520, same approach as buildAndStoreSubtrees: buildAndStoreSubtreesT uses a consumed counter for txHashes indexing and advances idx by consumed, eliminating the dropped-first-tx / duplicated-last-tx off-by-one.

freemans13 and others added 4 commits April 9, 2026 09:35
…erge

The merge from upstream/main changed the markNotOnLongestChain function
signature to include moveBackBlocks and moveForwardBlocks parameters.
The reset path passes nil for both since it only needs to unmark
assembly tx hashes directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck test

TestHandleReorgWithInvalidBlock_Integration (added in bsv-blockchain#545) fails on this
branch and on main because the test setup does not run the BlockValidation
service. InvalidateBlock sets mined_set=false and emits BlockMinedUnset,
which BlockValidation normally consumes to re-set mined_set=true. With no
BlockValidation running, BA's reset() blocks in waitForBlockMinedSet until
its retry budget exhausts — past the test's 15s Eventually window.

Capture the hashes returned by InvalidateBlock and call SetBlockMinedSet
on each to simulate BlockValidation's async reaction.

This change duplicates the fix on main in bsv-blockchain#714 to unblock CI on this PR
before bsv-blockchain#714 is merged and rolled forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
freemans13 and others added 4 commits June 2, 2026 10:57
…d Idxs field

go-subtree v1.4.2 removed the public TxInpoints.Idxs [][]uint32 field in favour
of a packed layout; external callers must construct via NewTxInpointsFromPacked.
The reorg benchmark/test was still using the struct-literal Idxs field, failing
golangci-lint typecheck.
… bulk build

Review follow-ups on the parallel reorg optimization:

- Extract partitionLongestChainMarks so the concurrent
  MarkTransactionsOnLongestChain(true/false) calls operate on provably
  disjoint, de-duplicated sets and can never race on the same UTXO
  record. Overlapping hashes resolve to mark(false), preserving the
  prior sequential last-write-wins semantics.
- Bound bulkBuildSubtrees goroutine fan-out at GOMAXPROCS.
- Remove dead finalizeBulkBuildSubtrees (never called).
- Tests: partition disjointness/dedup (incl. heavy-overlap stress) and
  checkMarkNotOnLongestChain branch coverage + invalidation gating.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

[Critical] Race condition found at services/blockassembly/subtreeprocessor/SubtreeProcessor.go:3511

The loop variable idx is captured by reference in the goroutine closure. All goroutines will see the final value of idx after the loop completes, causing multiple goroutines to write to the same txMetas[idx] position and leaving other positions uninitialized.

Fix: Add idx := idx inside the loop body before g.Go() to capture the loop variable, just like the existing h := hash capture on line 3504.

@freemans13

Copy link
Copy Markdown
Collaborator Author

Re: the flagged [Critical] race at SubtreeProcessor.go:3511 (loop variable idx capture) — this is not a bug on the current code.

The module is on go 1.26.0 (see go.mod), where loop variables have per-iteration scope (the Go 1.22 loopvar change). Each g.Go closure captures its own idx, so every goroutine writes a distinct txMetas[idx] — there is no shared final-value aliasing, and no positions are left uninitialized. The suggested idx := idx capture is therefore a no-op under this toolchain.

Verified empirically: go test -race across the package (243 tests) is clean. The h := hash copy on the preceding line is retained only because we take &h. Resolving as not-an-issue.

@freemans13

Copy link
Copy Markdown
Collaborator Author

Follow-up commit 85c55fee3 — review hardening + tests

A fresh review pass surfaced one real correctness gap plus some cleanup. All addressed here.

Correctness — disjoint mark sets (concurrency safety)
The parallel MarkTransactionsOnLongestChain(true) / (false) calls relied on an assumed disjointness between the winning-tx set and the assembly/losing set — but filteredMarkOnLongestChain was only filtered against the losing set, not the assembly set. If a winning-block tx were also still in assembly, the two concurrent store writes could race on the same record with a nondeterministic winner. (The previous sequential code was immune because mark(false) ran last and deterministically won.)

Extracted partitionLongestChainMarks, which builds provably disjoint, de-duplicated true/false sets. Overlapping hashes resolve to mark(false), preserving the prior last-write-wins semantics.

Hardening / cleanup

  • Bounded bulkBuildSubtrees goroutine fan-out at GOMAXPROCS (small subtree sizes on a large reorg could otherwise spawn thousands of goroutines).
  • Removed dead finalizeBulkBuildSubtrees (zero call sites — see thread above).

Tests

  • partitionLongestChainMarks: 7 table cases (incl. winning-tx-also-in-assembly → false wins) + a 5k heavy-overlap disjointness stress test.
  • checkMarkNotOnLongestChain: all four decision branches (only-in-invalid-block, recent-header, on-current-chain, forked-chain) against a real sqlitememory store, plus the tx-not-found error path and invalidation gating.

Verification

  • go test -race -short ./services/blockassembly/subtreeprocessor/243 pass, race-clean
  • go vet + gofmt clean; stores/utxo ProcessConflicting -race clean

Benchmarks — no regression vs pre-fix (TestReorgBenchmarkBaseline, sqlitememory):

Scale Total Alloc
1K 11 ms 155 MB
10K 23 ms 168 MB
50K 37 ms 213 MB
100K 47 ms 263 MB

BenchmarkFullReorg 200k-tx reorg: 41.8 ms / 275 MB / 594k allocs. For contrast, the per-node addNode path this replaces costs 24.7 ms for 100k nodes alone (BenchmarkAddNodeSequential) — so the parallel bulk-build win is fully intact.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First full review across consensus, Go, perf, and QA (my earlier note was just the rebase). The parallelization is consensus-faithful — reorg tx-set/ordering/conflicting-cascade/coinbase semantics are preserved vs the serial path, concurrent marks are race-safe by disjoint-set construction (partitionLongestChainMarks, stress-tested), and the full package passes -race. The perf win is real (100K-block reorg ~62ms, normal tx-ingress path untouched). On the deleted reverse-cascade dequeue: it's redundant, not a bug — stp.Remove routes those hashes into removeMap, which persists across the reorg and the dequeue drops them; but the comment at :3066-3068 still describes the deleted call and is now stale.

Two things before merge:

  • P1 — reorgBlocks can hang on cancellation (SubtreeProcessor.go:3360-3372). The newSubtreeChan send loop has no select/ctx; if the consumer is cancelled while the loop is blocked on a full buffer during a >1000-subtree reorg, the reorg goroutine blocks forever. Your other sends (:669/731/874) all use select { case chan <- …; case <-processorCtx.Done(): } — match that here.
  • P1 — reorg tests don't pin the correctness this PR claims. reset_reorg_test.go:1074/1247 assert only the chain-tip header; the tx-set checks are t.Logf("✅/❌") that never fail (a wrong bulkBuildSubtrees set passes green). Every Reorg() test uses empty Subtrees, so the new moveBackBlockBulkBuild deserialization path is never exercised by a correctness test, and there's no serial-vs-parallel equivalence test nor a double-spend-across-reorg case. Promote the t.Logf assertions to require, add a reorg through real subtrees, and a competing-spend-across-switch case.

Cleanup: int16(baseIdx+i+1) at :2263 overflows past 32767 subtrees (use int32); delete the now-dead addMoveBackBlockNodesToSubtrees/flatIndexToSubtreeNode/etc.; fix the stale :3066-3068 comment; revert winningTxSet/losingTxSet to map[…]struct{}. Details in the review notes.

…, cleanup)

Addresses oskarszoon's CHANGES_REQUESTED review on PR bsv-blockchain#526:

- P1 cancellation hang: the reorg newSubtreeChan batch send/wait loops had
  no ctx awareness; a full buffer during a >1000-subtree reorg could block
  the reorg goroutine forever after the consumer was cancelled. Both the
  send and the response-wait are now wrapped in ctx-aware selects (matching
  the :669/731/874 pattern), preserving the send/receive overlap.

- P1 test correctness: the Reorg tests asserted only via non-failing
  t.Logf("✅/❌") guarded by `if err == nil`, so a failing reorg (or a wrong
  bulkBuildSubtrees set) passed green. Promoted those three tests to
  require.NoError + require tx-set assertions (and fixed their headers so the
  reorg actually succeeds), and added TestSubtreeProcessor_ReorgThroughReal-
  Subtrees with two cases that exercise the moveBackBlockBulkBuild
  deserialization path through REAL serialized subtrees: tx recovery, and a
  competing-spend-across-switch (double-spend) case.

- Cleanup: reverted winningTxSet/losingTxSet to map[chainhash.Hash]struct{};
  deleted the now-dead addMoveBackBlockNodesToSubtrees / flatIndexToSubtreeNode
  (and their test); and fixed the stale reverseCascadedConflictingSet comment
  that described a deleted final dequeueDuringBlockMovement call.

The int16 SubtreeIndex cast is left unchanged: SubtreeIndex is int16 in the
external go-subtree TxInpoints type, so the cast matches the storage width and
the three pre-existing sibling sites; a true int32 widening requires an
upstream go-subtree change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@freemans13

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review, @oskarszoon. Addressed in bfd429f:

P1 — cancellation hang (newSubtreeChan batch send): both the send loop and the response-wait loop are now wrapped in ctx-aware selects (matching the :669/731/874 pattern), returning a ProcessingError on ctx.Done(). Kept the two-loop batch structure so the send/receive overlap is preserved.

P1 — reorg tests don't pin correctness:

  • Promoted the masked if err == nil { … t.Logf("✅/❌") } checks in the three Reorg subtests to require.NoError + require tx-set assertions. While doing so I found those tests were silently passing via their else branch — the moved-forward block built on the wrong parent, so Reorg was actually erroring and the assertions never ran. Fixed the headers so the reorg succeeds and the assertions are real.
  • Added TestSubtreeProcessor_ReorgThroughRealSubtrees with two cases that exercise the moveBackBlockBulkBuild deserialization path through real serialized subtrees (not empty Subtrees): (a) moved-back txs are recovered into block assembly, and (b) a competing-spend-across-switch where the double-spent tx is mined in the moved-forward block and must not be left pending, while the back-only tx is recovered.

Cleanup:

  • winningTxSet/losingTxSet reverted to map[chainhash.Hash]struct{}.
  • Deleted the dead addMoveBackBlockNodesToSubtrees / flatIndexToSubtreeNode (and their TestFlatIndexToSubtreeNode_SkipsEmptySubtrees).
  • Fixed the stale reverseCascadedConflictingSet comment — it described a deleted final dequeueDuringBlockMovement; it now describes the actual mechanism (the set is returned as both conflictingSet and losingTxHashesMap, feeding processRemainderTxHashes and dequeueDuringBlockMovement).

On the int16(baseIdx+i+1) overflow: I left this as int16. The value is stored into subtreepkg.TxInpoints.SubtreeIndex, which is int16 in the external go-subtree dependency (inpoints.go:32) and serialized as 2 bytes by DiskTxMap; the cast also matches the three pre-existing sites (:2109 / :2198 / :5243), so this PR doesn't introduce a new overflow class. A genuine fix would require widening SubtreeIndex to int32 in go-subtree itself plus a coordinated dep bump — happy to file that separately if you'd like, but it felt out of scope for this perf PR (and >32767 subtrees in a single moved-back block isn't reachable at current subtree sizes).

Verification: full subtreeprocessor package green under -race; the four reorg tests pass -count=3 with no flakiness.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Comment on lines +286 to +292
if len(inpoints) != n {
panic("SplitTxInpointsMap.ParallelBulkSetIfNotExists: len(inpoints) must equal len(hashes)")
}
if wasSet == nil || len(wasSet) != n {
panic("SplitTxInpointsMap.ParallelBulkSetIfNotExists: len(wasSet) must equal len(hashes)")
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the upcoming commit: ParallelBulkSetIfNotExists now explicitly sets wasSet[idx] = false on the existing-key path, so the slice no longer carries stale true values if a caller reuses it. The sole caller allocates a fresh slice each call, but this makes the documented contract hold unconditionally.

Comment on lines +254 to +269
for i := 0; i < count; i++ {
if isFirst && i == 0 {
// Skip first slot for coinbase in first subtree
continue
}
err = subtree.AddSubtreeNode(subtreepkg.Node{
Hash: txHashes[idx+i],
Fee: 100,
SizeInBytes: 250,
})
require.NoError(b, err)

if subtree.IsComplete() {
break
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. buildAndStoreSubtrees now tracks a separate consumed counter and indexes txHashes[idx+consumed] instead of txHashes[idx+i], so the coinbase slot in the first subtree no longer drops txHashes[idx]. idx now advances by the number of tx hashes actually consumed.

Comment on lines +986 to +1000
for i := 0; i < count; i++ {
if isFirst && i == 0 {
continue
}
err = subtree.AddSubtreeNode(subtreepkg.Node{
Hash: txHashes[idx+i],
Fee: 100,
SizeInBytes: 250,
})
require.NoError(t, err)

if subtree.IsComplete() {
break
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with the same approach as buildAndStoreSubtrees: buildAndStoreSubtreesT now uses a consumed counter for txHashes indexing so no tx hash is skipped, keeping TestReorgBenchmarkBaseline operating on the configured tx count (verified passing).

Comment on lines +216 to +228
for i := 0; i < b.N; i++ {
start := time.Now()

err := store.MarkTransactionsOnLongestChain(ctx, markTrueHashes, true)
require.NoError(b, err)

err = store.MarkTransactionsOnLongestChain(ctx, markFalseHashes, false)
require.NoError(b, err)

elapsed := time.Since(start)
b.Logf(" Iter %d: %v (%.0f tx/sec total)", i+1, elapsed,
float64(tc.count*2)/elapsed.Seconds())
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the per-iteration time.Now/time.Since and b.Logf from the timed b.N loop in the sequential benchmark; throughput is reported from b.Elapsed()/ReportMetric after StopTimer, which is the authoritative measurement.

Comment on lines +246 to +262
for i := 0; i < b.N; i++ {
start := time.Now()

g, gCtx := errgroup.WithContext(ctx)
g.Go(func() error {
return store.MarkTransactionsOnLongestChain(gCtx, markTrueHashes, true)
})
g.Go(func() error {
return store.MarkTransactionsOnLongestChain(gCtx, markFalseHashes, false)
})
err := g.Wait()
require.NoError(b, err)

elapsed := time.Since(start)
b.Logf(" Iter %d: %v (%.0f tx/sec total)", i+1, elapsed,
float64(tc.count*2)/elapsed.Seconds())
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the concurrent loop the same way (and also in the third b.N loop in BenchmarkMarkTransactionsOnLongestChain which had the identical pattern). The timed sections no longer include logging or time bookkeeping.

- ParallelBulkSetIfNotExists: explicitly set wasSet[idx]=false on existing
  keys so the result slice honors its contract under caller reuse
- reorg benchmark helpers: fix off-by-one that dropped txHashes[idx] in the
  first subtree (coinbase slot); track consumed tx count explicitly
- aerospike longest-chain bench: remove per-iteration time.Now/time.Since and
  b.Logf from the timed b.N loops so logging overhead is not measured

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment on lines +301 to +325
var wg sync.WaitGroup
for bIdx := uint16(0); bIdx < s.nrOfBuckets; bIdx++ {
indices := bucketIndices[bIdx]
if len(indices) == 0 {
continue
}
wg.Add(1)
go func(b *txInpointsBucket, indices []int) {
defer wg.Done()
b.mu.Lock()
for _, idx := range indices {
if !b.m.Has(hashes[idx]) {
b.m.Put(hashes[idx], inpoints[idx])
wasSet[idx] = true
} else {
// Explicitly record the existing-key result so the contract
// (wasSet[i] reflects the outcome for hashes[i]) holds even if
// the caller reuses a slice with stale true values.
wasSet[idx] = false
}
}
b.mu.Unlock()
}(&s.buckets[bIdx], indices)
}
wg.Wait()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed. ParallelBulkSetIfNotExists now uses a bounded pool of stride workers (numWorkers = min(GOMAXPROCS, nrOfBuckets)) instead of one goroutine per non-empty bucket, matching the existing SplitSwissMap.Clear / bulkBuildSubtrees pattern. Each worker strides over buckets, so a 100K-entry insert no longer spawns ~16K goroutines. Verified passing under -race.

Comment on lines 264 to +270
func (s *SplitTxInpointsMap) Clear() {
for _, syncedMap := range s.m {
syncedMap.Clear()
for i := uint16(0); i < s.nrOfBuckets; i++ {
b := &s.buckets[i]
b.mu.Lock()
b.m = swiss.NewMap[chainhash.Hash, *subtreepkg.TxInpoints](64)
b.mu.Unlock()
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SplitTxInpointsMap.Clear now calls b.m.Clear() in place (swiss.Map.Clear zeroes the control/group arrays and retains capacity) instead of allocating a fresh swiss.NewMap(64), matching SplitSwissMap.Clear. This is on the currentTxMap double-buffer reuse path, so it avoids repeated allocation + rehashing on refill. Double-buffer and map tests pass under -race.

- ParallelBulkSetIfNotExists: bound fan-out with a stride-worker pool
  (min(GOMAXPROCS, nrOfBuckets)) instead of one goroutine per non-empty
  bucket, avoiding ~16K goroutines on large bulk inserts
- SplitTxInpointsMap.Clear: clear each bucket in place via swiss.Map.Clear
  to retain capacity on the currentTxMap double-buffer reuse path instead
  of reallocating a fresh map per Clear

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment on lines 195 to 199
func NewSplitTxInpointsMap(nrOfBuckets uint16) *SplitTxInpointsMap {
m := make(map[uint16]*txmap.SyncedMap[chainhash.Hash, *subtreepkg.TxInpoints], nrOfBuckets)
buckets := make([]txInpointsBucket, nrOfBuckets)
for i := uint16(0); i < nrOfBuckets; i++ {
m[i] = txmap.NewSyncedMap[chainhash.Hash, *subtreepkg.TxInpoints]()
buckets[i].m = swiss.NewMap[chainhash.Hash, *subtreepkg.TxInpoints](64)
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive. Go's make accepts any integer type for the length argument (per the spec, the size arguments must be of integer type), so make([]txInpointsBucket, nrOfBuckets) with a uint16 compiles fine — and this is pre-existing code (introduced in #385, unchanged by this PR). The package builds and go vet passes. No change needed.

Comment on lines +299 to +303
// Phase 1: Group indices by bucket (O(N), no locks)
bucketIndices := make([][]int, s.nrOfBuckets)
for i := 0; i < n; i++ {
bucket := txmap.Bytes2Uint16Buckets(hashes[i], s.nrOfBuckets)
bucketIndices[bucket] = append(bucketIndices[bucket], i)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same false positive: make([][]int, s.nrOfBuckets) with a uint16 length is valid Go and compiles (the make size argument may be any integer type). This line is pre-existing in ParallelBulkSetIfNotExists and was not changed by this PR; the package builds and vets cleanly. No change needed.

Comment on lines +3512 to +3514
txMetas := make([]*meta.Data, len(markNotOnLongestChain))
g, gCtx := errgroup.WithContext(ctx)
g.SetLimit(max(stp.settings.UtxoStore.MaxMinedRoutines, stp.settings.UtxoStore.GetBatcherSize*2))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Strictly, errgroup.SetLimit(0) does not panic (n==0 creates a zero-capacity semaphore; only n<0 disables the limit) — the failure mode would be a g.Go deadlock, and only if both MaxMinedRoutines and GetBatcherSize were 0 (defaults are 128, so max(...) is always >= 1). To match the convention used by every other SetLimit call in this file and surface the misconfiguration loudly, I switched this to util.SafeSetLimit, which panics at setup if the computed limit is 0 instead of deadlocking later.

- checkMarkNotOnLongestChain: use util.SafeSetLimit for the errgroup
  concurrency cap (matching every other SetLimit call in the file) so a
  0 limit panics loudly at setup instead of risking a g.Go deadlock

The two reported make([]T, uint16) 'won't compile' comments are false
positives — make accepts any integer-type length and those lines are
pre-existing; no change made there.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review @ d6e16b6 — approving; both blocking asks resolved.

  • Cancellation hang fixed: both the send and response-wait loops in reorgBlocks (SubtreeProcessor.go:3358-3391) now have ctx-aware select/ctx.Done() guards on the same processorCtx as the canonical sends.
  • Reorg tests pin correctness: the soft t.Logf("✅/❌") checks are promoted to hard require, the silent else-branch error is fixed, and TestSubtreeProcessor_ReorgThroughRealSubtrees exercises the real moveBackBlockBulkBuild deserialization path including a competing-spend-across-switch (require.False(doubleSpend mined) / require.True(backOnly recovered)). Full package -race green, 246 tests.

The parallelization checks out: consensus-faithful (tx-set/ordering/cascade preserved, double-spend invariant enforced, merkle untouched), the concurrent longest-chain marks are provably disjoint at slice + UTXO-record level (partitionLongestChainMarks), and the perf win is real (~41ms for a 100K reorg vs ~25ms for the old per-node build alone). Copilot's ":3511 loopvar race" is a false positive (go 1.26 per-iteration scoping + explicit copy).

One non-blocking follow-up: the int16(...) casts at SubtreeProcessor.go:2198,2263 have no overflow guard. Not a consensus or corruption risk — SubtreeIndex is runtime-only (never serialized into any block/merkle/UTXO commitment) and the consumer re-validates with a linear-scan fallback, so a wrap only degrades O(1)→O(N) — and it's unreachable at the default 1M-item subtree size (~34B txs needed in one moved-back block). A cheap > math.MaxInt16 guard + warn (or safeconversion.IntToInt16) would harden a misconfigured tiny-subtree node, but it's not a blocker.

…y CI job

The subtreeprocessor-heavy benchmark-compare job timed out (>30m). Root cause
was test-only, not production code: the new BenchmarkAddNodeSequential matched
the job's unanchored `BenchmarkAddNode` filter, ignored `-short`, and called
runtime.GC() every b.N iteration. With tiny per-iteration timed work, b.N
ramped into the thousands, so thousands of full GCs ran (~4min on a 16-core
machine, far worse on the CI runner). The production map rewrite is unaffected
and is actually slightly faster on BenchmarkAddNode.

- Gate all 6 heavy reorg benchmarks behind testing.Short() (CI passes -short),
  so they skip in CI while remaining runnable locally.
- Remove the per-iteration runtime.GC() from the 5 throughput benchmarks; it
  inflated wall-clock without affecting B/op accuracy. Kept the GC in
  BenchmarkReorgMemoryProfile where it establishes the MemStats baseline.
- Anchor the subtreeprocessor-heavy bench_filter to the three intended
  benchmarks so future additions can't be swept in by substring matching.

Heavy filter under -short now completes in ~1m (was projected >25m); all six
benchmarks still run correctly without -short.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 5, 2026

Copy link
Copy Markdown

@freemans13 freemans13 merged commit 3cb4027 into bsv-blockchain:main Jun 5, 2026
50 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants