perf(subtreeprocessor): optimize reorg with parallel bulk operations by freemans13 · Pull Request #526 · bsv-blockchain/teranode

freemans13 · 2026-02-24T11:56:24Z

Summary

Profiling reorg operations at 100K transaction scale revealed three main bottlenecks: per-node lock acquisition in SplitTxInpointsMap (O(N) sequential lock/unlock cycles), sequential UTXO marking where mark(true) and mark(false) operate on disjoint hash sets but ran one after the other, and sequential subtree construction through addNode() calls.

This PR addresses each bottleneck:

Replace txmap.SyncedMap with swiss.Map + sync.Mutex per bucket in SplitTxInpointsMap, matching the proven pattern already used by SplitSwissMap, SplitSyncedParentMap, and the block persister UTXO maps. Contiguous []txInpointsBucket slice instead of map[uint16]* for cache-friendly access.
Add ParallelBulkSetIfNotExists that groups entries by bucket, then processes all non-empty buckets in parallel with a single lock acquisition per bucket. Reduces lock overhead from O(N) sequential to O(N/buckets) parallel. Wired into moveBackBlockBulkBuild and processRemainderTxHashes.
Introduce bulkBuildSubtrees for parallel subtree construction, replacing per-node addNode() calls (which each acquire a mutex and check IsComplete). Full subtrees are built concurrently via errgroup.
Run MarkTransactionsOnLongestChain(true) and MarkTransactionsOnLongestChain(false) concurrently via errgroup — they operate on disjoint hash sets.
Parallel subtree announcements: batch send to newSubtreeChan, then batch wait for responses, overlapping send/receive.
Remove hash_slice_pool.go — sync.Pool overhead exceeded the allocation savings at reorg scale.

Benchmark Results

Reorg performance (mock UTXO store, no race detector):

Scale	MoveBack	MoveForward	Total	Alloc (MB)	Throughput
1K	2ms	7ms	8ms	146	239K tx/sec
10K	7ms	28ms	35ms	294	565K tx/sec
50K	10ms	34ms	44ms	389	2.3M tx/sec
100K	13ms	55ms	68ms	443	2.9M tx/sec

Behavioral change (not purely perf)

Beyond the performance work above, this PR also introduces new invalidation-gating logic that does not exist on main. When the reorg is a single-block invalidation (len(moveBackBlocks) == 1 && len(moveForwardBlocks) == 0) and the moved-back block's header is marked Invalid, markNotOnLongestChain no longer unconditionally marks every passed tx as not-on-longest-chain. Instead, the new checkMarkNotOnLongestChain filters the set:

A tx whose only BlockID is the invalid block → marked not-on-longest-chain.
A tx still present in any of the last 1000 block headers → left untouched (still on longest chain).
Otherwise → resolved via blockchainClient.CheckBlockIsInCurrentChain, marking only those not on the current chain.

This is a consensus-adjacent change to observable behavior during block invalidations, not a performance optimization. Flagging it explicitly so it is not overlooked under the perf(...) title.

Covered by new tests: TestCheckMarkNotOnLongestChain, TestCheckMarkNotOnLongestChain_TxNotFound, and TestMarkNotOnLongestChain_InvalidGating.

Test plan

All existing subtreeprocessor tests pass with -race
New SplitTxInpointsMap unit tests (9 tests covering CRUD, concurrency, bucket distribution, bulk operations)
New reorg benchmarks at 1K–100K scale

🤖 Generated with Claude Code

…InpointsMap with parallel bulk operations - Deleted hashSlicePool to streamline memory management. - Introduced parallel bulk operations in SplitTxInpointsMap for improved performance during transaction handling. - Added comprehensive tests for SplitTxInpointsMap functionalities, ensuring correctness under concurrent access. - Removed outdated benchmark tests and added new benchmarks for reorg operations.

github-actions · 2026-02-24T11:57:29Z

🤖 Claude Code Review

Status: Complete

All previously reported issues have been verified as fixed. The PR demonstrates strong engineering discipline:

Performance Optimizations (verified):

SplitTxInpointsMap now uses swiss.Map + per-bucket mutex (matching existing patterns)
Parallel bulk operations reduce lock overhead from O(N) to O(N/buckets)
Concurrent mark(true)/mark(false) operations with disjoint hash sets
Parallel subtree construction with bounded concurrency (GOMAXPROCS)

Behavioral Change (well-tested):
The invalidation-gating logic introduces new consensus-adjacent behavior. When a single block is invalidated (len(moveBackBlocks)==1, len(moveForwardBlocks)==0, blockHeaderMeta.Invalid==true), transactions are filtered:

Only in invalid block → marked not-on-longest-chain
In recent 1000 blocks → left untouched
Otherwise → resolved via CheckBlockIsInCurrentChain

This is thoroughly covered by TestCheckMarkNotOnLongestChain, TestCheckMarkNotOnLongestChain_TxNotFound, and TestMarkNotOnLongestChain_InvalidGating.

Code Quality:

Consistent use of SafeSetLimit across all errgroup operations
Proper context propagation through all call chains
Comprehensive test coverage including edge cases and concurrency safety
Clear documentation of safety invariants (disjointness guarantees, etc.)

History:

✅ Fixed: Context now properly passed through bulkBuildSubtrees (line 2169)
✅ Fixed: foundInRecentBlock flag correctly skips RPC call (lines 3551-3559)

No new issues identified.

…ocessing - Updated the calculation of numWorkers in processRemainderTxHashes to cap the maximum number of workers at 16, enhancing resource management during parallel processing.

…and improve error handling - Updated the moveBackBlock method to return fewer values, enhancing clarity and reducing complexity. - Adjusted related benchmark and test cases to reflect the new method signature. - Improved error handling for cases where a block is not provided, ensuring consistent error messaging.

…eorg-performance-optimisation # Conflicts: # services/blockassembly/subtreeprocessor/SubtreeProcessor.go

Copilot

Pull request overview

This PR focuses on improving subtreeprocessor reorg throughput by reducing lock contention and introducing parallel/bulk-style processing paths, alongside adding benchmarks to quantify the impact.

Changes:

Reworks SplitTxInpointsMap to use swiss.Map with per-bucket mutexes and adds ParallelBulkSetIfNotExists.
Adds bulk/parallel subtree rebuild logic for reorg paths and runs mark-on-longest-chain operations concurrently.
Adds production-scale reorg/marking benchmarks and removes older allocation-focused benchmark/pooling code.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`services/blockassembly/subtreeprocessor/SubtreeProcessor.go`	Adds bulk subtree construction + concurrent marking; refactors reorg/moveBack remainder processing to use bulk paths.
`services/blockassembly/subtreeprocessor/map.go`	Updates `SplitTxInpointsMap` implementation and introduces `ParallelBulkSetIfNotExists`.
`services/blockassembly/subtreeprocessor/map_test.go`	Adds unit tests for the new `SplitTxInpointsMap` behavior and bulk insert correctness/concurrency.
`services/blockassembly/subtreeprocessor/reorg_benchmark_test.go`	Adds large-scale reorg benchmarks and baseline test runner.
`stores/utxo/aerospike/longest_chain_production_bench_test.go`	Adds benchmark comparing sequential vs concurrent marking on disjoint sets.
`services/blockassembly/subtreeprocessor/reorg_alloc_benchmark_test.go`	Removes prior allocation-focused benchmark suite.
`services/blockassembly/subtreeprocessor/hash_slice_pool.go`	Removes hash slice pooling implementation.
`services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go`	Updates tests to call `moveBackBlock` directly and uses a blockchain mock where now required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix logic bug where continue only broke inner loop, causing CheckBlockIsInCurrentChain to always be called even when tx was found in recent blocks. Use flag variable instead. - Thread ctx through bulkBuildSubtrees to propagate cancellation and tracing spans via errgroup.WithContext(ctx). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use stp.newSubtree() instead of subtreepkg.NewTreeByLeafCount() in bulkBuildSubtrees to respect mmap configuration - Add SubtreeIndex bookkeeping for diskTxMap in bulkBuildSubtrees so removeTxFromSubtrees can do O(1) lookup instead of linear scan - Reset currentSubtree when completed subtree moves to chainedSubtrees to prevent dual-reference corruption - Add nil guard for previousCurrentSubtree in moveBackBlockBulkBuild - Fix error wrapping that included potentially nil err variable - Add precondition checks in ParallelBulkSetIfNotExists - Fix benchmark goroutine leak with done channel cleanup - Replace var _ = bt.NewTx() with var _ *bt.Tx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Add DeletedTxs and OnStorageComplete to finalizeBulkBuildSubtrees NewSubtreeRequest to prevent race conditions with tx inpoints and cleanup leaked deletedTxs entries - Replace goroutine-per-subtree with batch send/wait pattern using buffered errChans to bound concurrency - Use explicit per-iteration copy for hash pointer in checkMarkNotOnLongestChain for clarity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-25T16:07:30Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-526 (8c9b7c7)

Summary

Regressions: 0
Improvements: 0
Unchanged: 132
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.615µ	1.610µ	~	1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	71.26n	71.39n	~	0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.40n	71.56n	~	1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.18n	71.49n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	33.44n	33.17n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	56.52n	55.63n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	132.7n	144.3n	~	0.100
MiningCandidate_Stringify_Short-4	221.6n	221.2n	~	1.000
MiningCandidate_Stringify_Long-4	1.630µ	1.702µ	~	0.100
MiningSolution_Stringify-4	852.6n	854.9n	~	0.700
BlockInfo_MarshalJSON-4	1.758µ	1.790µ	~	0.200
NewFromBytes-4	128.9n	131.1n	~	0.200
AddTxBatchColumnar_Validation-4	2.510µ	2.497µ	~	0.300
OffsetValidationLoop-4	636.6n	640.6n	~	1.000
Mine_EasyDifficulty-4	66.14µ	66.48µ	~	0.400
Mine_WithAddress-4	7.413µ	7.461µ	~	0.700
BlockAssembler_AddTx-4	0.02492n	0.02152n	~	0.100
AddNode-4	10.970	9.797	~	0.100
AddNodeWithMap-4	11.500	9.950	~	0.100
DiskTxMap_SetIfNotExists-4	3.293µ	3.566µ	~	0.100
DiskTxMap_SetIfNotExists_Parallel-4	3.307µ	3.397µ	~	0.100
DiskTxMap_ExistenceOnly-4	292.9n	311.6n	~	0.100
Queue-4	188.3n	188.5n	~	1.000
AtomicPointer-4	4.915n	4.492n	~	0.100
TxMapSetIfNotExists-4	52.59n	51.94n	~	0.100
TxMapSetIfNotExistsDuplicate-4	40.60n	39.92n	~	0.100
ChannelSendReceive-4	603.0n	581.2n	~	0.100
DirectSubtreeAdd/4_per_subtree-4	57.36n	63.52n	~	0.200
DirectSubtreeAdd/64_per_subtree-4	29.91n	29.56n	~	0.400
DirectSubtreeAdd/256_per_subtree-4	27.92n	28.22n	~	0.200
DirectSubtreeAdd/1024_per_subtree-4	26.72n	26.83n	~	0.600
DirectSubtreeAdd/2048_per_subtree-4	26.44n	26.32n	~	0.400
SubtreeProcessorAdd/4_per_subtree-4	300.1n	332.9n	~	0.100
SubtreeProcessorAdd/64_per_subtree-4	294.5n	309.2n	~	0.700
SubtreeProcessorAdd/256_per_subtree-4	301.3n	298.6n	~	0.700
SubtreeProcessorAdd/1024_per_subtree-4	293.8n	311.1n	~	0.100
SubtreeProcessorAdd/2048_per_subtree-4	295.7n	298.3n	~	0.700
SubtreeProcessorRotate/4_per_subtree-4	290.3n	313.9n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	302.0n	302.6n	~	0.700
SubtreeProcessorRotate/256_per_subtree-4	310.3n	305.6n	~	1.000
SubtreeProcessorRotate/1024_per_subtree-4	374.6n	299.2n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	55.96n	54.94n	~	0.100
SubtreeNodeAddOnly/64_per_subtree-4	36.47n	36.25n	~	0.200
SubtreeNodeAddOnly/256_per_subtree-4	35.43n	35.21n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	34.84n	34.64n	~	0.600
SubtreeCreationOnly/4_per_subtree-4	113.4n	115.8n	~	0.100
SubtreeCreationOnly/64_per_subtree-4	371.5n	397.8n	~	0.100
SubtreeCreationOnly/256_per_subtree-4	1.305µ	1.299µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	4.016µ	4.599µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	7.277µ	8.471µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	283.2n	303.2n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	337.8n	322.6n	~	0.700
ParallelGetAndSetIfNotExists/1k_nodes-4	2.064m	12.948m	~	0.100
ParallelGetAndSetIfNotExists/10k_nodes-4	5.434m	15.595m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	8.325m	17.866m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	11.42m	21.69m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	1.822m	12.189m	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	6.080m	17.264m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	19.74m	25.67m	~	0.700
SequentialGetAndSetIfNotExists/100k_nodes-4	28.31m	31.10m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	2.114m	11.105m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	8.920m	15.356m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	16.29m	19.76m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	1.916m	12.959m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	8.597m	17.435m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	49.91m	67.03m	~	0.100
CalcBlockWork-4	475.2n	468.8n	~	0.100
CalculateWork-4	640.1n	647.3n	~	0.200
CheckOldBlockIDs/on-chain-prefetch/1000-4	68.60µ	68.61µ	~	1.000
CheckOldBlockIDs/off-chain-prefetch/1000-4	51.92µ	52.59µ	~	1.000
CheckOldBlockIDs/on-chain-prefetch/10000-4	464.0µ	461.0µ	~	0.100
CheckOldBlockIDs/off-chain-prefetch/10000-4	347.9µ	349.2µ	~	1.000
BuildBlockLocatorString_Helpers/Size_10-4	1.358µ	1.339µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	12.94µ	12.91µ	~	0.200
BuildBlockLocatorString_Helpers/Size_1000-4	127.0µ	127.1µ	~	1.000
CatchupWithHeaderCache-4	104.3m	104.6m	~	0.700
_prepareTxsPerLevel-4	453.3m	459.7m	~	1.000
_prepareTxsPerLevelOrdered-4	4.130m	4.237m	~	0.400
_prepareTxsPerLevel_Comparison/Original-4	446.2m	464.1m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	4.256m	4.336m	~	0.400
_BufferPoolAllocation/16KB-4	4.507µ	4.005µ	~	0.200
_BufferPoolAllocation/32KB-4	8.328µ	8.609µ	~	1.000
_BufferPoolAllocation/64KB-4	16.76µ	21.77µ	~	0.100
_BufferPoolAllocation/128KB-4	33.38µ	31.10µ	~	0.200
_BufferPoolAllocation/512KB-4	107.1µ	120.2µ	~	0.100
_BufferPoolConcurrent/32KB-4	19.46µ	19.64µ	~	1.000
_BufferPoolConcurrent/64KB-4	32.86µ	30.83µ	~	0.100
_BufferPoolConcurrent/512KB-4	152.6µ	154.3µ	~	1.000
_SubtreeDeserializationWithBufferSizes/16KB-4	686.6µ	678.8µ	~	0.400
_SubtreeDeserializationWithBufferSizes/32KB-4	701.4µ	662.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	690.8µ	674.5µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	714.0µ	656.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	649.8µ	604.9µ	~	0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4	37.69m	36.88m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4	37.25m	36.47m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4	37.19m	36.81m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4	37.20m	36.92m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4	36.81m	36.48m	~	0.700
_PooledVsNonPooled/Pooled-4	841.6n	676.9n	~	0.100
_PooledVsNonPooled/NonPooled-4	8.328µ	7.827µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	7.509µ	7.691µ	~	1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4	11.025µ	9.489µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	10.418µ	9.089µ	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.270m	1.424m	~	0.100
SubtreeSizes/10k_tx_16_per_subtree-4	303.2µ	307.7µ	~	0.400
SubtreeSizes/10k_tx_64_per_subtree-4	71.71µ	72.87µ	~	0.200
SubtreeSizes/10k_tx_256_per_subtree-4	17.74µ	18.22µ	~	0.100
SubtreeSizes/10k_tx_512_per_subtree-4	8.807µ	8.952µ	~	0.200
SubtreeSizes/10k_tx_1024_per_subtree-4	4.361µ	4.362µ	~	0.700
SubtreeSizes/10k_tx_2k_per_subtree-4	2.224µ	2.186µ	~	1.000
BlockSizeScaling/10k_tx_64_per_subtree-4	70.87µ	69.99µ	~	0.700
BlockSizeScaling/10k_tx_256_per_subtree-4	17.52µ	17.60µ	~	1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.310µ	4.327µ	~	0.400
BlockSizeScaling/50k_tx_64_per_subtree-4	370.3µ	370.2µ	~	1.000
BlockSizeScaling/50k_tx_256_per_subtree-4	87.37µ	88.17µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	21.28µ	21.65µ	~	0.400
SubtreeAllocations/small_subtrees_exists_check-4	150.0µ	150.0µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	158.7µ	157.8µ	~	1.000
SubtreeAllocations/small_subtrees_full_validation-4	309.7µ	310.8µ	~	0.200
SubtreeAllocations/medium_subtrees_exists_check-4	8.806µ	8.944µ	~	0.700
SubtreeAllocations/medium_subtrees_data_fetch-4	9.257µ	9.342µ	~	0.200
SubtreeAllocations/medium_subtrees_full_validation-4	17.41µ	17.35µ	~	1.000
SubtreeAllocations/large_subtrees_exists_check-4	2.074µ	2.078µ	~	1.000
SubtreeAllocations/large_subtrees_data_fetch-4	2.225µ	2.239µ	~	0.700
SubtreeAllocations/large_subtrees_full_validation-4	4.341µ	4.354µ	~	1.000
StoreBlock_Sequential/BelowCSVHeight-4	338.0µ	333.3µ	~	0.200
StoreBlock_Sequential/AboveCSVHeight-4	337.5µ	337.3µ	~	1.000
GetUtxoHashes-4	277.0n	279.3n	~	0.400
GetUtxoHashes_ManyOutputs-4	46.78µ	46.00µ	~	0.100
_NewMetaDataFromBytes-4	213.1n	214.1n	~	0.200
_Bytes-4	390.0n	391.2n	~	0.400
_MetaBytes-4	138.7n	139.1n	~	1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-05 16:34 UTC

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T16:10:13Z

-				}
-				return true
-			})
+			rawLosingTxHashes = append(rawLosingTxHashes, losingTxHashesMap.Keys()...)


In reorgBlocks, building rawLosingTxHashes via losingTxHashesMap.Keys() reintroduces a full-slice allocation/copy of all losing hashes. Since txmap.TxMap already supports Iter(...) (used previously), this can be avoided by iterating the map directly and building the deduped/filtered losing set without the intermediate []chainhash.Hash.

This is in the critical reorg hot path and appears to negate the intended “avoid Keys()” optimization described in the PR summary.

Suggested change

rawLosingTxHashes = append(rawLosingTxHashes, losingTxHashesMap.Keys()...)

losingTxHashesMap.Iter(func(hash chainhash.Hash, _ struct{}) bool {

rawLosingTxHashes = append(rawLosingTxHashes, hash)

return true

})

Good catch — switched from losingTxHashesMap.Keys() to losingTxHashesMap.Iter() to avoid the intermediate slice allocation, consistent with how transactionMap.Iter() is used just above for the winning tx set.

Copilot · 2026-03-25T16:10:14Z

+			// Pre-allocate result arrays indexed by position
+			existedInTxMap := make([]bool, n)    // true if SetIfExists found the key
+			existsInLosingMap := make([]bool, n) // true if in losingTxHashesMap
+			isRemoveMap := make([]bool, n)       // true if in removeMap (to delete)


processRemainderTxHashes switched from a packed []byte bitset to three separate []bool arrays (existedInTxMap, existsInLosingMap, isRemoveMap). For large subtrees this is a 3× memory increase (plus GC pressure) in a very hot reorg path.

If the previous bitset approach was correct, consider restoring it (or another packed representation) to keep per-node state to ~1 byte rather than ~3 bytes.

I looked into this and the 3x memory difference (3 bytes vs 1 byte per node for []bool vs packed bitset) is negligible relative to the node data itself (each subtreepkg.Node is ~72 bytes with hash + fee + size). At 100K nodes, it's ~300KB vs ~100KB for the flags, while the node data is ~7MB. The clarity benefit of separate named bools (existedInTxMap, existsInLosingMap, isRemoveMap) outweighs the marginal memory saving in this case. Happy to revisit if profiling shows this becoming a bottleneck at larger scales.

Copilot · 2026-03-25T16:10:14Z

+// TestReorgBenchmarkBaseline runs the full reorg at multiple scales and prints a summary table.
+// This is a test (not benchmark) for easy "go test -run" invocation with human-readable output.
+func TestReorgBenchmarkBaseline(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping baseline benchmark in short mode")
+	}
+


TestReorgBenchmarkBaseline is a regular test (not a benchmark) and will run in normal go test runs unless -short is enabled. Given it performs multi-scale reorg runs up to 100K transactions, this is likely to make CI/unit test runs very slow or flaky.

Consider moving this into a Benchmark... (so it only runs under -bench), gating it behind a build tag (e.g., //go:build longtest), or requiring an explicit env var to run.

The test already has a testing.Short() guard that skips it in short mode (line 758). Our CI uses make test which passes -short, so this won't run in normal CI. For local go test without -short, the 100K scale takes ~15s total which is reasonable for a correctness baseline. I think the current gating is sufficient, but happy to add a build tag if it causes issues in practice.

- Replace losingTxHashesMap.Keys() with Iter() to avoid intermediate slice allocation in the reorg hot path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T16:22:04Z

+		h := hash // per-iteration copy for pointer safety
+		g.Go(func() error {
+			txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)
+			if err != nil {
+				return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)
+			}
+
+			txMetas[idx] = txMeta


The goroutine closure inside the for idx, hash := range markNotOnLongestChain loop captures the loop variable idx. Because range variables are reused, concurrent goroutines can write to the wrong index in txMetas, leading to nil entries or out-of-bounds behavior and incorrect marking decisions. Capture idx per iteration (e.g., idx := idx or pass it as a parameter to the closure) before assigning txMetas[idx] = txMeta.

Suggested change

h := hash // per-iteration copy for pointer safety

g.Go(func() error {

txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)

if err != nil {

return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)

}

txMetas[idx] = txMeta

h := hash // per-iteration copy for pointer safety

idxCopy := idx // per-iteration copy to avoid closure capturing loop variable

g.Go(func() error {

txMeta, err := stp.utxoStore.Get(gCtx, &h, fields.BlockIDs)

if err != nil {

return errors.NewProcessingError("[reorgBlocks] error getting transaction from utxo store", err)

}

txMetas[idxCopy] = txMeta

No longer applies on current HEAD (Go 1.26 per go.mod). Since Go 1.22 loop variables are per-iteration scoped, so the closure capturing idx is race-free, and each goroutine writes a distinct txMetas[idx]. The code also keeps the defensive h := hash copy for the &h pointer. No shared-index write race exists.

Copilot · 2026-03-25T16:22:04Z

+		for i := 0; i < count; i++ {
+			if isFirst && i == 0 {
+				// Skip first slot for coinbase in first subtree
+				continue
+			}
+			err = subtree.AddSubtreeNode(subtreepkg.Node{
+				Hash:        txHashes[idx+i],


In the first subtree case, the loop for i := 0; i < count; i++ { if isFirst && i == 0 { continue } ... txHashes[idx+i] } skips txHashes[idx] entirely, but idx is later advanced by subtree.Length()-1. This drops the first tx hash and then causes the last tx of the first subtree to be duplicated at the start of the next subtree, skewing the benchmark scenario (and potentially introducing duplicates). Adjust the indexing/count logic for the coinbase slot (e.g., iterate over tx hashes separately from leaf positions, or set count to min(remaining, subtreeSize-1) for the first subtree).

Fixed in commit dc3e520. buildAndStoreSubtrees now tracks a separate consumed counter and indexes txHashes[idx+consumed], and advances idx by consumed — so txHashes[idx] is no longer dropped and no tx is duplicated across subtree boundaries.

Copilot · 2026-03-25T16:22:05Z

+		for i := 0; i < count; i++ {
+			if isFirst && i == 0 {
+				continue
+			}
+			err = subtree.AddSubtreeNode(subtreepkg.Node{
+				Hash:        txHashes[idx+i],
+				Fee:         100,


Same off-by-one issue as buildAndStoreSubtrees: when isFirst && i == 0 is skipped, txHashes[idx] is never added but idx is only advanced by subtree.Length()-1, causing the last tx from the first subtree to be reused as the first tx in the next subtree. This makes the benchmark data contain unintended gaps/duplicates. Fix by aligning tx-hash indexing with the coinbase leaf handling (e.g., reduce count for the first subtree or increment a separate tx index when adding nodes).

Fixed in commit dc3e520, same approach as buildAndStoreSubtrees: buildAndStoreSubtreesT uses a consumed counter for txHashes indexing and advances idx by consumed, eliminating the dropped-first-tx / duplicated-last-tx off-by-one.

…eorg-performance-optimisation

…erge The merge from upstream/main changed the markNotOnLongestChain function signature to include moveBackBlocks and moveForwardBlocks parameters. The reset path passes nil for both since it only needs to unmark assembly tx hashes directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ck test TestHandleReorgWithInvalidBlock_Integration (added in bsv-blockchain#545) fails on this branch and on main because the test setup does not run the BlockValidation service. InvalidateBlock sets mined_set=false and emits BlockMinedUnset, which BlockValidation normally consumes to re-set mined_set=true. With no BlockValidation running, BA's reset() blocks in waitForBlockMinedSet until its retry budget exhausts — past the test's 15s Eventually window. Capture the hashes returned by InvalidateBlock and call SetBlockMinedSet on each to simulate BlockValidation's async reaction. This change duplicates the fix on main in bsv-blockchain#714 to unblock CI on this PR before bsv-blockchain#714 is merged and rolled forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d Idxs field go-subtree v1.4.2 removed the public TxInpoints.Idxs [][]uint32 field in favour of a packed layout; external callers must construct via NewTxInpointsFromPacked. The reorg benchmark/test was still using the struct-literal Idxs field, failing golangci-lint typecheck.

… bulk build Review follow-ups on the parallel reorg optimization: - Extract partitionLongestChainMarks so the concurrent MarkTransactionsOnLongestChain(true/false) calls operate on provably disjoint, de-duplicated sets and can never race on the same UTXO record. Overlapping hashes resolve to mark(false), preserving the prior sequential last-write-wins semantics. - Bound bulkBuildSubtrees goroutine fan-out at GOMAXPROCS. - Remove dead finalizeBulkBuildSubtrees (never called). - Tests: partition disjointness/dedup (incl. heavy-overlap stress) and checkMarkNotOnLongestChain branch coverage + invalidation gating. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-04T13:59:16Z

[Critical] Race condition found at services/blockassembly/subtreeprocessor/SubtreeProcessor.go:3511

The loop variable idx is captured by reference in the goroutine closure. All goroutines will see the final value of idx after the loop completes, causing multiple goroutines to write to the same txMetas[idx] position and leaving other positions uninitialized.

Fix: Add idx := idx inside the loop body before g.Go() to capture the loop variable, just like the existing h := hash capture on line 3504.

freemans13 · 2026-06-04T15:56:39Z

Re: the flagged [Critical] race at SubtreeProcessor.go:3511 (loop variable idx capture) — this is not a bug on the current code.

The module is on go 1.26.0 (see go.mod), where loop variables have per-iteration scope (the Go 1.22 loopvar change). Each g.Go closure captures its own idx, so every goroutine writes a distinct txMetas[idx] — there is no shared final-value aliasing, and no positions are left uninitialized. The suggested idx := idx capture is therefore a no-op under this toolchain.

Verified empirically: go test -race across the package (243 tests) is clean. The h := hash copy on the preceding line is retained only because we take &h. Resolving as not-an-issue.

freemans13 · 2026-06-04T15:56:40Z

Follow-up commit `85c55fee3` — review hardening + tests

A fresh review pass surfaced one real correctness gap plus some cleanup. All addressed here.

Correctness — disjoint mark sets (concurrency safety)
The parallel MarkTransactionsOnLongestChain(true) / (false) calls relied on an assumed disjointness between the winning-tx set and the assembly/losing set — but filteredMarkOnLongestChain was only filtered against the losing set, not the assembly set. If a winning-block tx were also still in assembly, the two concurrent store writes could race on the same record with a nondeterministic winner. (The previous sequential code was immune because mark(false) ran last and deterministically won.)

Extracted partitionLongestChainMarks, which builds provably disjoint, de-duplicated true/false sets. Overlapping hashes resolve to mark(false), preserving the prior last-write-wins semantics.

Hardening / cleanup

Bounded bulkBuildSubtrees goroutine fan-out at GOMAXPROCS (small subtree sizes on a large reorg could otherwise spawn thousands of goroutines).
Removed dead finalizeBulkBuildSubtrees (zero call sites — see thread above).

Tests

partitionLongestChainMarks: 7 table cases (incl. winning-tx-also-in-assembly → false wins) + a 5k heavy-overlap disjointness stress test.
checkMarkNotOnLongestChain: all four decision branches (only-in-invalid-block, recent-header, on-current-chain, forked-chain) against a real sqlitememory store, plus the tx-not-found error path and invalidation gating.

Verification

go test -race -short ./services/blockassembly/subtreeprocessor/ → 243 pass, race-clean
go vet + gofmt clean; stores/utxo ProcessConflicting -race clean

Benchmarks — no regression vs pre-fix (TestReorgBenchmarkBaseline, sqlitememory):

Scale	Total	Alloc
1K	11 ms	155 MB
10K	23 ms	168 MB
50K	37 ms	213 MB
100K	47 ms	263 MB

BenchmarkFullReorg 200k-tx reorg: 41.8 ms / 275 MB / 594k allocs. For contrast, the per-node addNode path this replaces costs 24.7 ms for 100k nodes alone (BenchmarkAddNodeSequential) — so the parallel bulk-build win is fully intact.

oskarszoon

First full review across consensus, Go, perf, and QA (my earlier note was just the rebase). The parallelization is consensus-faithful — reorg tx-set/ordering/conflicting-cascade/coinbase semantics are preserved vs the serial path, concurrent marks are race-safe by disjoint-set construction (partitionLongestChainMarks, stress-tested), and the full package passes -race. The perf win is real (100K-block reorg ~62ms, normal tx-ingress path untouched). On the deleted reverse-cascade dequeue: it's redundant, not a bug — stp.Remove routes those hashes into removeMap, which persists across the reorg and the dequeue drops them; but the comment at :3066-3068 still describes the deleted call and is now stale.

Two things before merge:

P1 — reorgBlocks can hang on cancellation (SubtreeProcessor.go:3360-3372). The newSubtreeChan send loop has no select/ctx; if the consumer is cancelled while the loop is blocked on a full buffer during a >1000-subtree reorg, the reorg goroutine blocks forever. Your other sends (:669/731/874) all use select { case chan <- …; case <-processorCtx.Done(): } — match that here.
P1 — reorg tests don't pin the correctness this PR claims. reset_reorg_test.go:1074/1247 assert only the chain-tip header; the tx-set checks are t.Logf("✅/❌") that never fail (a wrong bulkBuildSubtrees set passes green). Every Reorg() test uses empty Subtrees, so the new moveBackBlockBulkBuild deserialization path is never exercised by a correctness test, and there's no serial-vs-parallel equivalence test nor a double-spend-across-reorg case. Promote the t.Logf assertions to require, add a reorg through real subtrees, and a competing-spend-across-switch case.

Cleanup: int16(baseIdx+i+1) at :2263 overflows past 32767 subtrees (use int32); delete the now-dead addMoveBackBlockNodesToSubtrees/flatIndexToSubtreeNode/etc.; fix the stale :3066-3068 comment; revert winningTxSet/losingTxSet to map[…]struct{}. Details in the review notes.

…, cleanup) Addresses oskarszoon's CHANGES_REQUESTED review on PR bsv-blockchain#526: - P1 cancellation hang: the reorg newSubtreeChan batch send/wait loops had no ctx awareness; a full buffer during a >1000-subtree reorg could block the reorg goroutine forever after the consumer was cancelled. Both the send and the response-wait are now wrapped in ctx-aware selects (matching the :669/731/874 pattern), preserving the send/receive overlap. - P1 test correctness: the Reorg tests asserted only via non-failing t.Logf("✅/❌") guarded by `if err == nil`, so a failing reorg (or a wrong bulkBuildSubtrees set) passed green. Promoted those three tests to require.NoError + require tx-set assertions (and fixed their headers so the reorg actually succeeds), and added TestSubtreeProcessor_ReorgThroughReal- Subtrees with two cases that exercise the moveBackBlockBulkBuild deserialization path through REAL serialized subtrees: tx recovery, and a competing-spend-across-switch (double-spend) case. - Cleanup: reverted winningTxSet/losingTxSet to map[chainhash.Hash]struct{}; deleted the now-dead addMoveBackBlockNodesToSubtrees / flatIndexToSubtreeNode (and their test); and fixed the stale reverseCascadedConflictingSet comment that described a deleted final dequeueDuringBlockMovement call. The int16 SubtreeIndex cast is left unchanged: SubtreeIndex is int16 in the external go-subtree TxInpoints type, so the cast matches the storage width and the three pre-existing sibling sites; a true int32 widening requires an upstream go-subtree change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

freemans13 · 2026-06-05T11:17:04Z

Thanks for the thorough review, @oskarszoon. Addressed in bfd429f:

P1 — cancellation hang (newSubtreeChan batch send): both the send loop and the response-wait loop are now wrapped in ctx-aware selects (matching the :669/731/874 pattern), returning a ProcessingError on ctx.Done(). Kept the two-loop batch structure so the send/receive overlap is preserved.

P1 — reorg tests don't pin correctness:

Promoted the masked if err == nil { … t.Logf("✅/❌") } checks in the three Reorg subtests to require.NoError + require tx-set assertions. While doing so I found those tests were silently passing via their else branch — the moved-forward block built on the wrong parent, so Reorg was actually erroring and the assertions never ran. Fixed the headers so the reorg succeeds and the assertions are real.
Added TestSubtreeProcessor_ReorgThroughRealSubtrees with two cases that exercise the moveBackBlockBulkBuild deserialization path through real serialized subtrees (not empty Subtrees): (a) moved-back txs are recovered into block assembly, and (b) a competing-spend-across-switch where the double-spent tx is mined in the moved-forward block and must not be left pending, while the back-only tx is recovered.

Cleanup:

winningTxSet/losingTxSet reverted to map[chainhash.Hash]struct{}.
Deleted the dead addMoveBackBlockNodesToSubtrees / flatIndexToSubtreeNode (and their TestFlatIndexToSubtreeNode_SkipsEmptySubtrees).
Fixed the stale reverseCascadedConflictingSet comment — it described a deleted final dequeueDuringBlockMovement; it now describes the actual mechanism (the set is returned as both conflictingSet and losingTxHashesMap, feeding processRemainderTxHashes and dequeueDuringBlockMovement).

On the int16(baseIdx+i+1) overflow: I left this as int16. The value is stored into subtreepkg.TxInpoints.SubtreeIndex, which is int16 in the external go-subtree dependency (inpoints.go:32) and serialized as 2 bytes by DiskTxMap; the cast also matches the three pre-existing sites (:2109 / :2198 / :5243), so this PR doesn't introduce a new overflow class. A genuine fix would require widening SubtreeIndex to int32 in go-subtree itself plus a coordinated dep bump — happy to file that separately if you'd like, but it felt out of scope for this perf PR (and >32767 subtrees in a single moved-back block isn't reachable at current subtree sizes).

Verification: full subtreeprocessor package green under -race; the four reorg tests pass -count=3 with no flakiness.

…eorg-performance-optimisation

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

freemans13 · 2026-06-05T11:45:33Z

+	if len(inpoints) != n {
+		panic("SplitTxInpointsMap.ParallelBulkSetIfNotExists: len(inpoints) must equal len(hashes)")
 	}
+	if wasSet == nil || len(wasSet) != n {
+		panic("SplitTxInpointsMap.ParallelBulkSetIfNotExists: len(wasSet) must equal len(hashes)")
+	}
+


Fixed in the upcoming commit: ParallelBulkSetIfNotExists now explicitly sets wasSet[idx] = false on the existing-key path, so the slice no longer carries stale true values if a caller reuses it. The sole caller allocates a fresh slice each call, but this makes the documented contract hold unconditionally.

freemans13 · 2026-06-05T11:45:34Z

+		for i := 0; i < count; i++ {
+			if isFirst && i == 0 {
+				// Skip first slot for coinbase in first subtree
+				continue
+			}
+			err = subtree.AddSubtreeNode(subtreepkg.Node{
+				Hash:        txHashes[idx+i],
+				Fee:         100,
+				SizeInBytes: 250,
+			})
+			require.NoError(b, err)
+
+			if subtree.IsComplete() {
+				break
+			}
+		}


Fixed. buildAndStoreSubtrees now tracks a separate consumed counter and indexes txHashes[idx+consumed] instead of txHashes[idx+i], so the coinbase slot in the first subtree no longer drops txHashes[idx]. idx now advances by the number of tx hashes actually consumed.

freemans13 · 2026-06-05T11:45:35Z

+		for i := 0; i < count; i++ {
+			if isFirst && i == 0 {
+				continue
+			}
+			err = subtree.AddSubtreeNode(subtreepkg.Node{
+				Hash:        txHashes[idx+i],
+				Fee:         100,
+				SizeInBytes: 250,
+			})
+			require.NoError(t, err)
+
+			if subtree.IsComplete() {
+				break
+			}
+		}


Fixed with the same approach as buildAndStoreSubtrees: buildAndStoreSubtreesT now uses a consumed counter for txHashes indexing so no tx hash is skipped, keeping TestReorgBenchmarkBaseline operating on the configured tx count (verified passing).

freemans13 · 2026-06-05T11:45:37Z

+				for i := 0; i < b.N; i++ {
+					start := time.Now()
+
+					err := store.MarkTransactionsOnLongestChain(ctx, markTrueHashes, true)
+					require.NoError(b, err)
+
+					err = store.MarkTransactionsOnLongestChain(ctx, markFalseHashes, false)
+					require.NoError(b, err)
+
+					elapsed := time.Since(start)
+					b.Logf("  Iter %d: %v (%.0f tx/sec total)", i+1, elapsed,
+						float64(tc.count*2)/elapsed.Seconds())
+				}


Fixed. Removed the per-iteration time.Now/time.Since and b.Logf from the timed b.N loop in the sequential benchmark; throughput is reported from b.Elapsed()/ReportMetric after StopTimer, which is the authoritative measurement.

freemans13 · 2026-06-05T11:45:38Z

+				for i := 0; i < b.N; i++ {
+					start := time.Now()
+
+					g, gCtx := errgroup.WithContext(ctx)
+					g.Go(func() error {
+						return store.MarkTransactionsOnLongestChain(gCtx, markTrueHashes, true)
+					})
+					g.Go(func() error {
+						return store.MarkTransactionsOnLongestChain(gCtx, markFalseHashes, false)
+					})
+					err := g.Wait()
+					require.NoError(b, err)
+
+					elapsed := time.Since(start)
+					b.Logf("  Iter %d: %v (%.0f tx/sec total)", i+1, elapsed,
+						float64(tc.count*2)/elapsed.Seconds())
+				}


Fixed in the concurrent loop the same way (and also in the third b.N loop in BenchmarkMarkTransactionsOnLongestChain which had the identical pattern). The timed sections no longer include logging or time bookkeeping.

- ParallelBulkSetIfNotExists: explicitly set wasSet[idx]=false on existing keys so the result slice honors its contract under caller reuse - reorg benchmark helpers: fix off-by-one that dropped txHashes[idx] in the first subtree (coinbase slot); track consumed tx count explicitly - aerospike longest-chain bench: remove per-iteration time.Now/time.Since and b.Logf from the timed b.N loops so logging overhead is not measured Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

freemans13 · 2026-06-05T12:06:30Z

+	var wg sync.WaitGroup
+	for bIdx := uint16(0); bIdx < s.nrOfBuckets; bIdx++ {
+		indices := bucketIndices[bIdx]
+		if len(indices) == 0 {
+			continue
+		}
+		wg.Add(1)
+		go func(b *txInpointsBucket, indices []int) {
+			defer wg.Done()
+			b.mu.Lock()
+			for _, idx := range indices {
+				if !b.m.Has(hashes[idx]) {
+					b.m.Put(hashes[idx], inpoints[idx])
+					wasSet[idx] = true
+				} else {
+					// Explicitly record the existing-key result so the contract
+					// (wasSet[i] reflects the outcome for hashes[i]) holds even if
+					// the caller reuses a slice with stale true values.
+					wasSet[idx] = false
+				}
+			}
+			b.mu.Unlock()
+		}(&s.buckets[bIdx], indices)
+	}
+	wg.Wait()


Good catch — fixed. ParallelBulkSetIfNotExists now uses a bounded pool of stride workers (numWorkers = min(GOMAXPROCS, nrOfBuckets)) instead of one goroutine per non-empty bucket, matching the existing SplitSwissMap.Clear / bulkBuildSubtrees pattern. Each worker strides over buckets, so a 100K-entry insert no longer spawns ~16K goroutines. Verified passing under -race.

freemans13 · 2026-06-05T12:06:32Z

 func (s *SplitTxInpointsMap) Clear() {
-	for _, syncedMap := range s.m {
-		syncedMap.Clear()
+	for i := uint16(0); i < s.nrOfBuckets; i++ {
+		b := &s.buckets[i]
+		b.mu.Lock()
+		b.m = swiss.NewMap[chainhash.Hash, *subtreepkg.TxInpoints](64)
+		b.mu.Unlock()
+	}


Fixed. SplitTxInpointsMap.Clear now calls b.m.Clear() in place (swiss.Map.Clear zeroes the control/group arrays and retains capacity) instead of allocating a fresh swiss.NewMap(64), matching SplitSwissMap.Clear. This is on the currentTxMap double-buffer reuse path, so it avoids repeated allocation + rehashing on refill. Double-buffer and map tests pass under -race.

- ParallelBulkSetIfNotExists: bound fan-out with a stride-worker pool (min(GOMAXPROCS, nrOfBuckets)) instead of one goroutine per non-empty bucket, avoiding ~16K goroutines on large bulk inserts - SplitTxInpointsMap.Clear: clear each bucket in place via swiss.Map.Clear to retain capacity on the currentTxMap double-buffer reuse path instead of reallocating a fresh map per Clear Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

freemans13 · 2026-06-05T12:40:17Z

 func NewSplitTxInpointsMap(nrOfBuckets uint16) *SplitTxInpointsMap {
-	m := make(map[uint16]*txmap.SyncedMap[chainhash.Hash, *subtreepkg.TxInpoints], nrOfBuckets)
+	buckets := make([]txInpointsBucket, nrOfBuckets)
 	for i := uint16(0); i < nrOfBuckets; i++ {
-		m[i] = txmap.NewSyncedMap[chainhash.Hash, *subtreepkg.TxInpoints]()
+		buckets[i].m = swiss.NewMap[chainhash.Hash, *subtreepkg.TxInpoints](64)
 	}


This is a false positive. Go's make accepts any integer type for the length argument (per the spec, the size arguments must be of integer type), so make([]txInpointsBucket, nrOfBuckets) with a uint16 compiles fine — and this is pre-existing code (introduced in #385, unchanged by this PR). The package builds and go vet passes. No change needed.

freemans13 · 2026-06-05T12:40:19Z

+	// Phase 1: Group indices by bucket (O(N), no locks)
+	bucketIndices := make([][]int, s.nrOfBuckets)
+	for i := 0; i < n; i++ {
+		bucket := txmap.Bytes2Uint16Buckets(hashes[i], s.nrOfBuckets)
+		bucketIndices[bucket] = append(bucketIndices[bucket], i)


Same false positive: make([][]int, s.nrOfBuckets) with a uint16 length is valid Go and compiles (the make size argument may be any integer type). This line is pre-existing in ParallelBulkSetIfNotExists and was not changed by this PR; the package builds and vets cleanly. No change needed.

freemans13 · 2026-06-05T12:40:20Z

+	txMetas := make([]*meta.Data, len(markNotOnLongestChain))
+	g, gCtx := errgroup.WithContext(ctx)
+	g.SetLimit(max(stp.settings.UtxoStore.MaxMinedRoutines, stp.settings.UtxoStore.GetBatcherSize*2))


Addressed. Strictly, errgroup.SetLimit(0) does not panic (n==0 creates a zero-capacity semaphore; only n<0 disables the limit) — the failure mode would be a g.Go deadlock, and only if both MaxMinedRoutines and GetBatcherSize were 0 (defaults are 128, so max(...) is always >= 1). To match the convention used by every other SetLimit call in this file and surface the misconfiguration loudly, I switched this to util.SafeSetLimit, which panics at setup if the computed limit is 0 instead of deadlocking later.

- checkMarkNotOnLongestChain: use util.SafeSetLimit for the errgroup concurrency cap (matching every other SetLimit call in the file) so a 0 limit panics loudly at setup instead of risking a g.Go deadlock The two reported make([]T, uint16) 'won't compile' comments are false positives — make accepts any integer-type length and those lines are pre-existing; no change made there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

oskarszoon

Re-review @ d6e16b6 — approving; both blocking asks resolved.

Cancellation hang fixed: both the send and response-wait loops in reorgBlocks (SubtreeProcessor.go:3358-3391) now have ctx-aware select/ctx.Done() guards on the same processorCtx as the canonical sends.
Reorg tests pin correctness: the soft t.Logf("✅/❌") checks are promoted to hard require, the silent else-branch error is fixed, and TestSubtreeProcessor_ReorgThroughRealSubtrees exercises the real moveBackBlockBulkBuild deserialization path including a competing-spend-across-switch (require.False(doubleSpend mined) / require.True(backOnly recovered)). Full package -race green, 246 tests.

The parallelization checks out: consensus-faithful (tx-set/ordering/cascade preserved, double-spend invariant enforced, merkle untouched), the concurrent longest-chain marks are provably disjoint at slice + UTXO-record level (partitionLongestChainMarks), and the perf win is real (~41ms for a 100K reorg vs ~25ms for the old per-node build alone). Copilot's ":3511 loopvar race" is a false positive (go 1.26 per-iteration scoping + explicit copy).

One non-blocking follow-up: the int16(...) casts at SubtreeProcessor.go:2198,2263 have no overflow guard. Not a consensus or corruption risk — SubtreeIndex is runtime-only (never serialized into any block/merkle/UTXO commitment) and the consumer re-validates with a linear-scan fallback, so a wrap only degrades O(1)→O(N) — and it's unreachable at the default 1M-item subtree size (~34B txs needed in one moved-back block). A cheap > math.MaxInt16 guard + warn (or safeconversion.IntToInt16) would harden a misconfigured tiny-subtree node, but it's not a blocker.

…y CI job The subtreeprocessor-heavy benchmark-compare job timed out (>30m). Root cause was test-only, not production code: the new BenchmarkAddNodeSequential matched the job's unanchored `BenchmarkAddNode` filter, ignored `-short`, and called runtime.GC() every b.N iteration. With tiny per-iteration timed work, b.N ramped into the thousands, so thousands of full GCs ran (~4min on a 16-core machine, far worse on the CI runner). The production map rewrite is unaffected and is actually slightly faster on BenchmarkAddNode. - Gate all 6 heavy reorg benchmarks behind testing.Short() (CI passes -short), so they skip in CI while remaining runnable locally. - Remove the per-iteration runtime.GC() from the 5 throughput benchmarks; it inflated wall-clock without affecting B/op accuracy. Kept the GC in BenchmarkReorgMemoryProfile where it establishes the MemStats baseline. - Anchor the subtreeprocessor-heavy bench_filter to the three intended benchmarks so future additions can't be swept in by substring matching. Heavy filter under -short now completes in ~1m (was projected >25m); all six benchmarks still run correctly without -short. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sonarqubecloud · 2026-06-05T16:57:57Z

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
83.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

freemans13 self-assigned this Feb 24, 2026

freemans13 changed the title ~~=refactor(subtreeprocessor): remove hashSlicePool and enhance SplitTx…~~ perf(subtreeprocessor): optimize reorg with parallel bulk operations Feb 24, 2026

freemans13 added 2 commits February 24, 2026 12:05

=refactor(subtreeprocessor): limit maximum workers for transaction pr…

9426d9e

…ocessing - Updated the calculation of numWorkers in processRemainderTxHashes to cap the maximum number of workers at 16, enhancing resource management during parallel processing.

github-actions Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go

Merge remote-tracking branch 'upstream/main' into stu/blockassembly-r…

9dcc3ae

…eorg-performance-optimisation # Conflicts: # services/blockassembly/subtreeprocessor/SubtreeProcessor.go

freemans13 requested a review from Copilot March 25, 2026 15:36

Copilot started reviewing on behalf of freemans13 March 25, 2026 15:36 View session

github-actions Bot reviewed Mar 25, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated

Copilot AI reviewed Mar 25, 2026

View reviewed changes

freemans13 requested a review from Copilot March 25, 2026 15:51

Copilot started reviewing on behalf of freemans13 March 25, 2026 15:51 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

freemans13 requested a review from Copilot March 25, 2026 16:03

Copilot started reviewing on behalf of freemans13 March 25, 2026 16:03 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

fix: address copilot review round 4

19a9e9a

- Replace losingTxHashesMap.Keys() with Iter() to avoid intermediate slice allocation in the reorg hot path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

freemans13 requested a review from Copilot March 25, 2026 16:15

Copilot started reviewing on behalf of freemans13 March 25, 2026 16:16 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

freemans13 and others added 4 commits April 9, 2026 09:35

Merge remote-tracking branch 'upstream/main' into stu/blockassembly-r…

722fba8

…eorg-performance-optimisation

Merge remote-tracking branch 'upstream/main' into stu/blockassembly-r…

23c8e38

…eorg-performance-optimisation

freemans13 and others added 4 commits June 2, 2026 10:57

Merge remote-tracking branch 'upstream/main' into HEAD

e5a930b

Merge remote-tracking branch 'upstream/main' into HEAD

2df5f33

oskarszoon requested changes Jun 5, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into stu/blockassembly-r…

34bc3ad

…eorg-performance-optimisation

freemans13 requested a review from Copilot June 5, 2026 11:29

Copilot started reviewing on behalf of freemans13 June 5, 2026 11:29 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

freemans13 requested a review from Copilot June 5, 2026 11:47

Copilot started reviewing on behalf of freemans13 June 5, 2026 11:47 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

freemans13 requested a review from Copilot June 5, 2026 12:07

Copilot started reviewing on behalf of freemans13 June 5, 2026 12:07 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

freemans13 requested a review from Copilot June 5, 2026 12:41

Copilot started reviewing on behalf of freemans13 June 5, 2026 12:41 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

oskarszoon approved these changes Jun 5, 2026

View reviewed changes

freemans13 merged commit 3cb4027 into bsv-blockchain:main Jun 5, 2026
50 of 53 checks passed

Conversation

freemans13 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results

Behavioral change (not purely perf)

Test plan

Uh oh!

github-actions Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

freemans13 commented Feb 24, 2026 •

edited

Loading

github-actions Bot commented Feb 24, 2026 •

edited

Loading

github-actions Bot commented Mar 25, 2026 •

edited

Loading

Follow-up commit `85c55fee3` — review hardening + tests