perf(blockassembly): recycle tx maps + bucket-sharded inserts to cut moveForwardBlock latency by icellan · Pull Request #877 · bsv-blockchain/teranode

icellan · 2026-05-15T13:18:11Z

Summary

Block-assembly's processNewBlockAnnouncement is on the critical path between a competing miner's block landing and BA being ready to mine on top — every second of it is lost expected revenue. On a 558M-tx workload (scale-2) the window measured 26-33 s wall-clock per block; CPU profiling attributed:

~39 % of total CPU to Go runtime GC (scanObject / scanSpan / tryDeferToSpanScan / spanClass.sizeclass), driven by per-block fresh allocation of a ~614 M-entry SplitSwissMap (transactionMap in CreateTransactionMap) and a fresh 4096-shard SplitTxInpointsMap (currentTxMap in resetSubtreeState).
~17 % of every SyncedMap.SetIfNotExists call inside RWMutex.Lock / Unlock — real per-bucket contention with the default 375 workers.
Only ~3.5 of 170 requested cores effectively used during the phase. The pod is GC-bound and wait-bound, not CPU-bound.

This PR attacks every measured source of slack across moveForwardBlock, moveBackBlock, and reorgBlocks. No semantics change.

Changes

#	Change	Files
1	Pool the transactionMap (`SplitSwissMap`). Adds `Clear()` (in-place per-bucket `swiss.Map.Clear`), a lazy `SubtreeProcessor.txMapPool` field; `CreateTransactionMap` now `Clear()`s the pool instead of allocating fresh per block.	`map.go`, `SubtreeProcessor.go`
2	Double-buffer the currentTxMap (`SplitTxInpointsMap`). Adds a pre-allocated `currentTxMapShadow`; `resetSubtreeState` swaps pointers, `moveForwardBlock` `Clear()`s the shadow at commit. An internal deferred rollback inside `moveForwardBlock` swaps back if any post-reset step fails, preserving the caller's `originalCurrentTxMap`-based rollback contract.	`SubtreeProcessor.go`
3	Bucket-affinity sharding in `parallelGetAndSetIfNotExists`. Adds `PutMultiBucketTxInpoints` + `BucketFor`. The hot path bucket-partitions input nodes and dispatches one goroutine per non-empty destination bucket — every bucket has exactly one writer, so the per-bucket `RWMutex` is uncontended across workers. The legacy per-node parallel scheme is kept as a fallback for the DiskTxMap path. Promotes `splitMapBuckets` from a `const` to `BlockAssembly.SplitMapBuckets` (default bumped from 4096 → 16384).	`map.go`, `SubtreeProcessor.go`, `settings/blockassembly_settings.go`, `settings/settings.go`
4	Drop the literal 16-worker cap in `processRemainderTxHashes` Phase 1. The old `min(NumCPU, n/100, 16)` hard-limited Phase 1 to 16 workers on a 170-core pod; new `min(NumCPU, max(2, n/1024))` scales linearly with `NumCPU`.	`SubtreeProcessor.go`
5	Block + mutex profiling. New top-level `BlockProfileRate` and `MutexProfileFraction` settings (default 0 = off). `startProfilerAndMetrics` wires both into `runtime.SetBlockProfileRate` / `SetMutexProfileFraction`. Required to quantify residual wait time directly — both endpoints previously reported `Count=0`.	`settings/interface.go`, `settings/settings.go`, `daemon/daemon_services.go`
6	Parallel moveBackBlock txInpoints insert. New helper `addMoveBackBlockNodesToSubtrees` splits the previous serial `addNode` loop into Stage A (parallel chunked `SetIfNotExists` into `currentTxMap`) and Stage B (sequential subtree fill via the existing `addNodePreValidated`, skipping `wasInserted=false` duplicates exactly like the original `addNode`). Same fallback for DiskTxMap / sub-threshold inputs.	`SubtreeProcessor.go`
7	reorgBlocks rollback contract preservation. `reorgBlocks` captures `originalCurrentTxMap` once and expects the pointer to remain valid for the full multi-block loop. New `disableCurrentTxMapPool` flag toggled by `reorgBlocks` for the duration of its call — `resetSubtreeState` then falls back to fresh allocation (original behaviour) and `swapCurrentTxMapBack` / `clearCurrentTxMapShadow` become no-ops. Single-goroutine driven; no synchronisation needed.	`SubtreeProcessor.go`

Expected effect (to verify post-deploy)

Metric	Before	Target
`processNewBlockAnnouncement DONE in X`	26-33 s	< 10 s
`CreateTransactionMap DONE`	~8-11 s	2-4 s
`processing N remainder tx hashes DONE`	~17-22 s	5-8 s
CPU profile GC funcs (cumulative)	~39 %	single digits
`RWMutex.Lock`/`Unlock` under `SyncedMap.SetIfNotExists`	17 %	near zero

Recommended cluster configuration

For dev-scale-2 verification, set in the block-assembly configmap:

blockassembly_splitMapBuckets=16384
profiler_blockProfileRate=1000000
profiler_mutexProfileFraction=100

SubtreeProcessorConcurrentReads and ProcessRemainderTxHashesConcurrency already default to 375 — no change needed.

Memory cost

Pooled txMapPool: ~40 GB resident vs. ~40 GB allocated-and-freed per block before — same peak, allocation rate drops to near zero.
Shadow currentTxMapShadow: holds peak ~32 M TxInpoints pointers between blocks. Estimated 1-2 GB. Pod limit 950 Gi; current peak ~263 GB. Plenty of headroom.

Test plan

Out of scope

Block-validator (services/blockvalidation/...). Block.Valid runs in background and does not gate the next block being mined.
Subtree-validator's Kafka cache throughput.
Any short-circuit tied to incoming block content — the block arrives from a competing miner; BA has no a-priori knowledge of its contents.
GOMAXPROCS exposure; CPU is not the binding resource.
Parallelising the still-serial AddSubtreeNode fill in moveBackBlock. If it becomes the new bottleneck post-deploy, follow-up.

…ps and bucket-sharded inserts Profiling block-assembly on a 558M-tx workload showed processNewBlockAnnouncement spending 26-33 s wall-clock between block landing and BA being ready to mine on top — the entirety of the "lost revenue" window between two blocks. CPU profile attribution: - ~39 % runtime GC (scanObject/scanSpan/tryDeferToSpanScan/spanClass) driven by per-block fresh allocation of a ~614M-entry SplitSwissMap (transactionMap in CreateTransactionMap) and a fresh 4096-shard SplitTxInpointsMap (currentTxMap in resetSubtreeState) - ~17 % inside SyncedMap.SetIfNotExists RWMutex Lock/Unlock paths — real per-bucket contention with 375 workers - effective parallelism ~3.5 of 170 requested cores during the phase This change attacks every measured source of slack across moveForwardBlock, moveBackBlock, and reorgBlocks. 1. Pool the transactionMap (SplitSwissMap). Adds SplitSwissMap.Clear() that walks each per-bucket swiss.Map and calls its in-place Clear (which retains capacity by zeroing control + group arrays — no allocation). Adds a SubtreeProcessor.txMapPool field allocated lazily on the first CreateTransactionMap call and Clear()ed between blocks. The returned map is purely local to one moveForwardBlock and never escapes, so single- goroutine ownership is sufficient. 2. Double-buffer the currentTxMap (SplitTxInpointsMap). Adds a currentTxMapShadow field initialized at constructor time alongside the active map. resetSubtreeState now swaps the two pointers instead of allocating a fresh map; moveForwardBlock Clear()s the shadow after processCoinbaseUtxos succeeds. An internal deferred rollback inside moveForwardBlock swaps back if any post-reset step fails, preserving the captured originalCurrentTxMap pointer for the surrounding rollback code at the callers' three rollback sites (line 745 main-loop, lines 2760/2801 reorg paths) — those sites continue to do stp.currentTxMap = originalCurrentTxMap as a harmless no-op. 3. Bucket-affinity sharding in parallelGetAndSetIfNotExists. Adds SplitTxInpointsMap.PutMultiBucketTxInpoints and BucketFor. The hot path now bucket-partitions input nodes (sequential pre-filter for coinbase + removeMap; sequential bucket count + slice fill) and dispatches one goroutine per non-empty bucket, capped by ProcessRemainderTxHashesConcurrency. Every destination bucket has exactly one writer — the per-bucket RWMutex is uncontended across workers, eliminating the 17 % mutex tax shown in profiling. The legacy per-node parallel scheme is retained as a fallback for the DiskTxMap path under legacyParallelGetAndSetIfNotExists. Bumps splitMapBuckets default from 4096 to 16384 via the new BlockAssembly.SplitMapBuckets setting; smaller per-bucket maps Clear faster and any residual contention drops further. Total memory unchanged. 4. processRemainderTxHashes Phase 1 — drop the literal 16-worker cap. The previous min(NumCPU, n/100, 16) hard-limited Phase 1 to 16 workers on a 170-core pod when processing 30M-tx remainders. New min(NumCPU, max(2, n/1024)) chunks input in 1K blocks and scales with NumCPU. No new setting. 5. Block + mutex profiling. Adds top-level BlockProfileRate and MutexProfileFraction settings (default 0 = off; recommended on-cluster values 1000000 and 100). startProfilerAndMetrics now wires both into the Go runtime via SetBlockProfileRate / SetMutexProfileFraction. Necessary to quantify any residual wait time directly rather than infer from CPU-side mutex samples — both endpoints previously reported Count=0. 6. moveBackBlock parallel txInpoints insert. The previous serial addNode loops in moveBackBlockCreateNewSubtrees ran ~558M SetIfNotExists + AddSubtreeNode calls one at a time — a multi-minute serial path during reorgs. New helper addMoveBackBlockNodesToSubtrees splits the work into: Stage A — parallel chunked SetIfNotExists into currentTxMap, tracking per-node wasInserted using flatIndexToSubtreeNode for chunk-start translation. The map pointers alias the input subtreeMetaTxInpoints slice elements directly (no per-node allocation, same semantics as the original addNode call). Stage B — sequential subtree fill in canonical order calling addNodePreValidated (already existed in the codebase from the moveForward path). Duplicates from Stage A's wasInserted=false are skipped to match the original addNode's wasSet==false branch exactly. Falls back to the original serial path for DiskTxMap or sub- ParallelSetIfNotExistsThreshold inputs. 7. reorgBlocks rollback contract preservation. reorgBlocks captures originalCurrentTxMap once and expects the pointer to remain valid for the full multi-block loop so rollback can restore it. The double-buffer swap pattern breaks that contract across multiple iterations. Added a disableCurrentTxMapPool flag toggled by reorgBlocks for the duration of its call — resetSubtreeState then falls back to fresh allocation (original behaviour) and swapCurrentTxMapBack / clearCurrentTxMapShadow become no-ops. Single-goroutine, no synchronisation needed. New unit tests cover: - SplitSwissMap.Clear empties every bucket and preserves usability for further inserts. - SplitTxInpointsMap.PutMultiBucketTxInpoints honours duplicate handling matching SetIfNotExists. - SplitTxInpointsMap.Clear preserves bucket structure. - Double-buffer swap-then-clear lifecycle: swap → shadow holds prior data, current is empty; swap-back restores original pointer identity; clear-shadow on commit leaves the shadow empty. - disableCurrentTxMapPool=true reduces swap-back and clear-shadow to no-ops, preserving the reorg rollback contract. Existing tests in services/blockassembly/subtreeprocessor pass unchanged under -race. No code outside blockassembly, settings, daemon changes. Expected effect: moveForwardBlock wall-clock should drop from 26-33 s toward under 10 s on a 500M-tx block at scale-2 settings; CreateTransactionMap from ~11 s toward 2-4 s; processing N remainder tx hashes from ~20 s toward 5-8 s. GC-related funcs in CPU profiles should fall from cumulative ~39 % to single digits; RWMutex Lock/Unlock under SyncedMap.SetIfNotExists from 17 % to near zero.

github-actions · 2026-05-15T13:19:21Z

🤖 Claude Code Review

Status: Complete

Summary

This PR implements significant performance optimizations for block assembly by recycling transaction maps and introducing bucket-sharded parallel inserts. The implementation is technically sound with comprehensive test coverage and proper error handling.

Issues Found

[Minor] Dependency on pre-release version
The PR depends on go-tx-map v1.3.6-0.20260515143932-e509e98caded (pseudo-version), but v1.3.6 is now officially released (2026-05-15) and includes the required SetIfNotExistsMulti feature from PR #114. The pseudo-version commit diverges slightly from the release tag.

Recommendation: Update go.mod line 196 to use the stable release:

github.com/bsv-blockchain/go-tx-map v1.3.6

Remove the explanatory comment at lines 190-195 once updated.

Previously Reported Issues (Now Fixed)

All previously flagged issues have been addressed with proper fixes and test coverage:

✅ Double-buffer rollback corruption - Fixed with Clear() before swap
✅ SplitMapBuckets validation - Added range checks and clamping
✅ Empty subtree panic - Added loop to skip empty slices
✅ TxMapPool growth - Implemented high-water mark resizing

Code Quality Notes

Strengths:

Excellent inline documentation explaining complex double-buffering logic
Comprehensive unit tests covering edge cases (empty subtrees, partial failures, growth scenarios)
Proper error path handling with deferred rollback
Settings validation with sensible defaults and warnings

Performance Impact:
The optimizations target measured bottlenecks (39% GC time, 17% mutex contention) with data-driven approaches. The bucket-sharded insert design eliminates cross-worker contention by construction.

Overall Assessment: Well-engineered performance optimization with one minor dependency hygiene issue. Recommend updating to stable v1.3.6 release before merge.

Copilot

Pull request overview

Reduces moveForwardBlock latency on the block-assembly critical path by recycling the per-block transactionMap (SplitSwissMap.Clear), double-buffering the currentTxMap (SplitTxInpointsMap), partitioning the parallel SetIfNotExists workload by destination bucket so each bucket has a single writer, lifting the literal 16-worker cap in processRemainderTxHashes, parallelising the moveBackBlock map insert, and exposing block/mutex profiling via new top-level settings. Bucket count is promoted from a const to the configurable BlockAssembly.SplitMapBuckets (default raised 4096 → 16384).

Changes:

Pool transactionMap and double-buffer currentTxMap, with a disableCurrentTxMapPool flag to keep the multi-block reorg rollback contract intact.
Bucket-affinity sharding for parallelGetAndSetIfNotExists plus a parallel two-stage addMoveBackBlockNodesToSubtrees.
New top-level BlockProfileRate / MutexProfileFraction settings wired into runtime.SetBlockProfileRate / SetMutexProfileFraction.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
settings/settings.go	Wires new `profiler_blockProfileRate`, `profiler_mutexProfileFraction`, and `blockassembly_splitMapBuckets` config keys.
settings/interface.go	Adds `BlockProfileRate` / `MutexProfileFraction` fields with docs.
settings/blockassembly_settings.go	Adds `SplitMapBuckets` field with docs (claims "Must be in (0, 65535]").
daemon/daemon_services.go	Enables runtime block / mutex profiling when configured.
services/blockassembly/subtreeprocessor/map.go	Adds `SplitSwissMap.Clear`, `SplitTxInpointsMap.Buckets`/`BucketFor`/`PutMultiBucketTxInpoints`.
services/blockassembly/subtreeprocessor/SubtreeProcessor.go	Pools tx map, double-buffers `currentTxMap`, bucket-shards parallel inserts, parallelises moveBack subtree fill, removes 16-worker cap, adds `disableCurrentTxMapPool` for reorgs.
services/blockassembly/subtreeprocessor/map_test.go	Tests for `Clear` and bulk-bucket insert helpers.
services/blockassembly/subtreeprocessor/double_buffer_test.go	Tests for double-buffer swap/clear and disable-via-flag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-05-15T13:34:22Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-877 (c37e5fa)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.669µ	1.652µ	~	0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	61.42n	61.71n	~	0.200
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	61.49n	62.11n	~	0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	61.52n	61.67n	~	0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	29.56n	29.54n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	51.64n	50.77n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	107.5n	111.2n	~	0.100
MiningCandidate_Stringify_Short-4	279.2n	262.0n	~	0.100
MiningCandidate_Stringify_Long-4	1.914µ	1.913µ	~	0.700
MiningSolution_Stringify-4	977.0n	972.6n	~	0.600
BlockInfo_MarshalJSON-4	1.762µ	1.758µ	~	0.700
NewFromBytes-4	125.8n	150.3n	~	0.100
Mine_EasyDifficulty-4	60.55µ	61.36µ	~	0.700
Mine_WithAddress-4	7.481µ	6.790µ	~	0.700
BlockAssembler_AddTx-4	0.02969n	0.02801n	~	1.000
AddNode-4	10.98	10.49	~	1.000
AddNodeWithMap-4	11.13	10.73	~	0.700
DiskTxMap_SetIfNotExists-4	4.831µ	4.482µ	~	0.200
DiskTxMap_SetIfNotExists_Parallel-4	4.145µ	4.242µ	~	0.700
DiskTxMap_ExistenceOnly-4	385.3n	389.1n	~	0.700
Queue-4	209.1n	203.9n	~	0.100
AtomicPointer-4	4.879n	4.625n	~	0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	1.066m	1.042m	~	0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4	929.3µ	921.6µ	~	0.700
ReorgOptimizations/AllMarkFalse/Old/10K-4	140.7µ	118.5µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	62.60µ	62.02µ	~	0.400
ReorgOptimizations/HashSlicePool/Old/10K-4	69.01µ	65.21µ	~	0.100
ReorgOptimizations/HashSlicePool/New/10K-4	11.35µ	11.40µ	~	0.700
ReorgOptimizations/NodeFlags/Old/10K-4	6.930µ	6.171µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	2.962µ	2.218µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	15.75m	13.94m	~	0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4	14.07m	14.34m	~	1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.292m	1.250m	~	0.100
ReorgOptimizations/AllMarkFalse/New/100K-4	705.2µ	701.1µ	~	0.700
ReorgOptimizations/HashSlicePool/Old/100K-4	713.1µ	590.4µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	308.5µ	330.2µ	~	0.200
ReorgOptimizations/NodeFlags/Old/100K-4	60.16µ	60.34µ	~	0.700
ReorgOptimizations/NodeFlags/New/100K-4	20.84µ	20.49µ	~	0.100
TxMapSetIfNotExists-4	52.17n	53.02n	~	0.100
TxMapSetIfNotExistsDuplicate-4	38.26n	38.00n	~	0.400
ChannelSendReceive-4	625.5n	628.5n	~	0.700
DirectSubtreeAdd/4_per_subtree-4	56.58n	56.97n	~	1.000
DirectSubtreeAdd/64_per_subtree-4	29.14n	28.96n	~	1.000
DirectSubtreeAdd/256_per_subtree-4	28.71n	27.67n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	26.40n	26.51n	~	0.500
DirectSubtreeAdd/2048_per_subtree-4	25.99n	26.03n	~	0.700
SubtreeProcessorAdd/4_per_subtree-4	281.2n	288.7n	~	0.100
SubtreeProcessorAdd/64_per_subtree-4	273.0n	283.9n	~	0.100
SubtreeProcessorAdd/256_per_subtree-4	277.5n	285.8n	~	0.700
SubtreeProcessorAdd/1024_per_subtree-4	267.0n	277.4n	~	0.100
SubtreeProcessorAdd/2048_per_subtree-4	270.1n	276.2n	~	0.100
SubtreeProcessorRotate/4_per_subtree-4	270.3n	281.2n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	266.9n	278.3n	~	0.100
SubtreeProcessorRotate/256_per_subtree-4	267.9n	277.7n	~	0.100
SubtreeProcessorRotate/1024_per_subtree-4	268.4n	281.5n	~	0.100
SubtreeNodeAddOnly/4_per_subtree-4	55.34n	55.79n	~	0.700
SubtreeNodeAddOnly/64_per_subtree-4	36.53n	36.19n	~	0.800
SubtreeNodeAddOnly/256_per_subtree-4	35.08n	35.24n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	34.48n	34.58n	~	0.100
SubtreeCreationOnly/4_per_subtree-4	110.2n	110.1n	~	1.000
SubtreeCreationOnly/64_per_subtree-4	347.4n	346.3n	~	0.400
SubtreeCreationOnly/256_per_subtree-4	1.206µ	1.234µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	3.771µ	3.821µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	6.717µ	6.848µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	272.5n	278.7n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	266.9n	277.8n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	542.8µ	1998.5µ	~	0.100
ParallelGetAndSetIfNotExists/10k_nodes-4	1.310m	5.173m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	6.544m	7.072m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	13.375m	9.965m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	628.7µ	1800.2µ	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	3.160m	4.463m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	11.70m	13.48m	~	0.100
SequentialGetAndSetIfNotExists/100k_nodes-4	22.58m	24.83m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	613.2µ	2079.5µ	~	0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	4.559m	8.558m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	16.83m	13.32m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	686.1µ	1823.0µ	~	0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	6.270m	8.070m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	39.49m	42.57m	~	0.100
CalcBlockWork-4	356.3n	353.9n	~	0.200
CalculateWork-4	472.9n	489.3n	~	0.100
BuildBlockLocatorString_Helpers/Size_10-4	1.371µ	1.349µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	12.99µ	12.93µ	~	0.400
BuildBlockLocatorString_Helpers/Size_1000-4	154.3µ	157.7µ	~	0.700
CatchupWithHeaderCache-4	104.3m	104.4m	~	1.000
_BufferPoolAllocation/16KB-4	3.796µ	4.711µ	~	0.700
_BufferPoolAllocation/32KB-4	9.544µ	7.492µ	~	0.700
_BufferPoolAllocation/64KB-4	16.39µ	14.44µ	~	0.100
_BufferPoolAllocation/128KB-4	22.73µ	23.76µ	~	1.000
_BufferPoolAllocation/512KB-4	98.76µ	93.25µ	~	0.400
_BufferPoolConcurrent/32KB-4	18.26µ	17.47µ	~	0.700
_BufferPoolConcurrent/64KB-4	27.89µ	28.61µ	~	0.700
_BufferPoolConcurrent/512KB-4	142.5µ	139.2µ	~	0.200
_SubtreeDeserializationWithBufferSizes/16KB-4	610.2µ	696.8µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	604.8µ	662.7µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	598.5µ	639.5µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	585.2µ	585.5µ	~	1.000
_SubtreeDeserializationWithBufferSizes/512KB-4	598.1µ	598.7µ	~	1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.01m	36.12m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4	35.91m	35.93m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/64KB-4	35.76m	35.80m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/128KB-4	35.81m	35.84m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4	35.51m	35.29m	~	0.700
_PooledVsNonPooled/Pooled-4	735.0n	736.3n	~	0.800
_PooledVsNonPooled/NonPooled-4	8.326µ	7.145µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.483µ	6.405µ	~	0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.251µ	9.601µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	8.856µ	9.234µ	~	0.200
_prepareTxsPerLevel-4	463.5m	432.5m	~	0.400
_prepareTxsPerLevelOrdered-4	4.380m	5.212m	~	0.100
_prepareTxsPerLevel_Comparison/Original-4	498.2m	429.1m	~	0.100
_prepareTxsPerLevel_Comparison/Optimized-4	4.262m	5.227m	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.446m	1.371m	~	0.400
SubtreeSizes/10k_tx_16_per_subtree-4	345.4µ	320.6µ	~	0.100
SubtreeSizes/10k_tx_64_per_subtree-4	80.78µ	76.88µ	~	0.100
SubtreeSizes/10k_tx_256_per_subtree-4	19.93µ	19.34µ	~	0.100
SubtreeSizes/10k_tx_512_per_subtree-4	9.976µ	9.550µ	~	0.100
SubtreeSizes/10k_tx_1024_per_subtree-4	4.910µ	4.700µ	~	0.100
SubtreeSizes/10k_tx_2k_per_subtree-4	2.443µ	2.353µ	~	0.100
BlockSizeScaling/10k_tx_64_per_subtree-4	79.15µ	75.50µ	~	0.100
BlockSizeScaling/10k_tx_256_per_subtree-4	19.61µ	18.78µ	~	0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.894µ	4.658µ	~	0.100
BlockSizeScaling/50k_tx_64_per_subtree-4	415.2µ	396.3µ	~	0.100
BlockSizeScaling/50k_tx_256_per_subtree-4	100.76µ	93.76µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	24.31µ	23.34µ	~	0.100
SubtreeAllocations/small_subtrees_exists_check-4	166.5µ	157.7µ	~	0.100
SubtreeAllocations/small_subtrees_data_fetch-4	174.3µ	165.8µ	~	0.100
SubtreeAllocations/small_subtrees_full_validation-4	339.6µ	329.3µ	~	0.100
SubtreeAllocations/medium_subtrees_exists_check-4	10.011µ	9.249µ	~	0.100
SubtreeAllocations/medium_subtrees_data_fetch-4	10.208µ	9.747µ	~	0.100
SubtreeAllocations/medium_subtrees_full_validation-4	20.04µ	19.07µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.392µ	2.209µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.483µ	2.350µ	~	0.100
SubtreeAllocations/large_subtrees_full_validation-4	4.986µ	4.772µ	~	0.100
StoreBlock_Sequential/BelowCSVHeight-4	318.5µ	320.1µ	~	0.100
StoreBlock_Sequential/AboveCSVHeight-4	321.0µ	322.6µ	~	1.000
GetUtxoHashes-4	271.2n	275.2n	~	1.000
GetUtxoHashes_ManyOutputs-4	46.16µ	46.02µ	~	0.700
_NewMetaDataFromBytes-4	231.3n	231.3n	~	1.000
_Bytes-4	620.4n	611.8n	~	0.200
_MetaBytes-4	570.3n	561.4n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-18 09:44 UTC

Each per-bucket swiss.Map.Clear walks its own ctrl + group arrays and is independent of every other bucket (separate RWMutex, separate storage). On the pooled transactionMap (~614M slots, 1024 buckets, ~19 GB of key/value/ ctrl memory to zero), the previous serial loop was memory-bandwidth-bound and synchronously held up the start of CreateTransactionMap on every block. Dispatch min(NumCPU, nrOfBuckets) workers; each strides through its share of bucket indices and Clear()s them. Stride pattern keeps the share even for any bucket count and avoids a separate chunking step. Single-worker (or single-bucket) path falls through to the previous loop to skip goroutine overhead on tiny maps. No correctness change; existing TestSplitSwissMap_ClearEmptiesAndPreservesUsability continues to pass under -race.

PutMultiBucketTxInpoints previously delegated to per-entry SyncedMap.SetIfNotExists, which acquires and releases the SyncedMap's internal RWMutex on every call. In the bucket-affinity hot path each worker owns one or more buckets exclusively, so the lock is never contended across workers — but the per-entry Lock/Unlock pair is still measurable work (profiling attributed ~17 % of every SetIfNotExists call to Lock/Unlock paths). go-tx-map upstream now exposes SetIfNotExistsMulti (see bsv-blockchain/go-tx-map#114) which takes the write-lock once for a batch of (key, value) pairs and walks both slices, inserting each pair via setUnlocked iff the key is absent. Per-element wasInserted semantics match SetIfNotExists exactly. This change: - Bumps go-tx-map to a pseudo-version pinned to commit e509e98cad on go-tx-map#114. go.mod carries a multi-line comment explaining why we are on a pseudo-version and that it should be replaced with the next tagged release (>= v1.3.6) once the upstream PR ships. - Switches PutMultiBucketTxInpoints to parallel-slice signature (keys []Hash, values []*TxInpoints) so it can pass through to the new bulk method without re-packing. Drops the unused TxInpointsEntry helper struct. - Updates the sole caller bucketShardedGetAndSetIfNotExists to build the parallel slices directly while iterating its bucket-local indices. - Updates the corresponding unit test to use the new signature via a small testInpointPair helper (renamed and tidied so the inline struct literals don't bloat the assertions). No semantics change. Existing tests in services/blockassembly/subtreeprocessor continue to pass under -race; staticcheck and golangci-lint on the package report zero issues.

…on + SplitMapBuckets bounds Two issues surfaced in code review on #877: 1. swapCurrentTxMapBack left the shadow in a corrupted state on rollback. Trace: after resetSubtreeState swaps, workers may partially populate the freshly-current map before moveForwardBlock returns an error. The deferred swapCurrentTxMapBack then swaps the pointers back so callers see pre-reset state in stp.currentTxMap — correct — but the shadow now holds those partial entries. The next resetSubtreeState swap exposes that garbage as the new "current", silently violating the "freshly-current is always empty" invariant. Any later SetIfNotExists whose key happens to collide with a leftover partial entry returns wasSet=false and the corresponding tx is silently dropped from the subtree fill. Fix: Clear the freshly-current map before swapping it back to shadow. Adds work to the rare error path but preserves the invariant unconditionally. Regression test TestDoubleBuffer_SwapBackWipesPartialFailedWrite exercises the exact sequence: seed current → swap → partially populate new current → swapCurrentTxMapBack → assert shadow is empty and original data restored. 2. SplitMapBuckets was narrowed to uint16 in the constructor without range validation. A configured value of 0 fell through to the internal const (legacy 4096, not the documented default), and a value above 65535 silently wrapped modulo 65536 (e.g. 70000 → 4464). The nolint:gosec marker claimed validation existed but none did. Fix: - Bump the internal constant splitMapBuckets from 4096 to 16384 so the fallback value matches the documented setting default. - Add splitMapBucketsMax (65535) for explicit bounds. - Switch on the configured value: <=0 falls back to the default, > 65535 clamps to 65535 with a Warnf log, otherwise pass through. - Drop the nolint marker — the cast is now provably safe. - Update the settings longdesc with an explicit Range section describing each branch. Regression test TestSplitMapBuckets_ConfigClampingAndDefault covers zero, negative, in-range, exact max, above-max, and far-above-max inputs via table-driven cases. No behaviour change on the happy path; full subtreeprocessor test pass under -race remains green; staticcheck and golangci-lint clean on the touched packages.

icellan · 2026-05-15T15:17:03Z

Both review comments addressed in e135f2c27:

1. swapCurrentTxMapBack shadow corruption (SubtreeProcessor.go:4204)

You're right — the partial-failed-write left in the freshly-current map would survive as the shadow and re-emerge as "current" on the next reset, silently dropping any tx whose hash collides with the leftover entry via SetIfNotExists returning wasSet=false. Fixed by calling Clear() on the partial-data map before swapping it back to shadow, so the "freshly-current is always empty" invariant holds unconditionally. Adds work only on the rare error path.

Regression test TestDoubleBuffer_SwapBackWipesPartialFailedWrite walks the exact corruption sequence end-to-end (double_buffer_test.go).

2. SplitMapBuckets silent narrowing (SubtreeProcessor.go:470, settings/blockassembly_settings.go)

You're right on both points. Fix:

Bumped the internal const splitMapBuckets 4096 → 16384 so the fallback matches the documented setting default.
Added explicit splitMapBucketsMax = 65535 and a switch in the constructor: <=0 falls back to default, >65535 clamps to max with a Warnf log, in-range values pass through.
Dropped the //nolint:gosec marker — the cast is now provably safe.
Updated the longdesc with an explicit Range section describing each branch.

Regression test TestSplitMapBuckets_ConfigClampingAndDefault is table-driven across zero, negative, in-range, exact max, above-max, and far-above-max inputs.

Full subtreeprocessor test pass under -race remains green; staticcheck and golangci-lint clean on the touched packages.

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

…on + SplitMapBuckets bounds Two issues surfaced in code review on #877: 1. swapCurrentTxMapBack left the shadow in a corrupted state on rollback. Trace: after resetSubtreeState swaps, workers may partially populate the freshly-current map before moveForwardBlock returns an error. The deferred swapCurrentTxMapBack then swaps the pointers back so callers see pre-reset state in stp.currentTxMap — correct — but the shadow now holds those partial entries. The next resetSubtreeState swap exposes that garbage as the new "current", silently violating the "freshly-current is always empty" invariant. Any later SetIfNotExists whose key happens to collide with a leftover partial entry returns wasSet=false and the corresponding tx is silently dropped from the subtree fill. Fix: Clear the freshly-current map before swapping it back to shadow. Adds work to the rare error path but preserves the invariant unconditionally. Regression test TestDoubleBuffer_SwapBackWipesPartialFailedWrite exercises the exact sequence: seed current → swap → partially populate new current → swapCurrentTxMapBack → assert shadow is empty and original data restored. 2. SplitMapBuckets was narrowed to uint16 in the constructor without range validation. A configured value of 0 fell through to the internal const (legacy 4096, not the documented default), and a value above 65535 silently wrapped modulo 65536 (e.g. 70000 → 4464). The nolint:gosec marker claimed validation existed but none did. Fix: - Bump the internal constant splitMapBuckets from 4096 to 16384 so the fallback value matches the documented setting default. - Add splitMapBucketsMax (65535) for explicit bounds. - Switch on the configured value: <=0 falls back to the default, > 65535 clamps to 65535 with a Warnf log, otherwise pass through. - Drop the nolint marker — the cast is now provably safe. - Update the settings longdesc with an explicit Range section describing each branch. Regression test TestSplitMapBuckets_ConfigClampingAndDefault covers zero, negative, in-range, exact max, above-max, and far-above-max inputs via table-driven cases. No behaviour change on the happy path; full subtreeprocessor test pass under -race remains green; staticcheck and golangci-lint clean on the touched packages.

…p on demand Two issues surfaced in CI code review on #877: 1. addMoveBackBlockNodesToSubtrees Stage A worker advanced its flat (sIdx, i) cursor with a single if-step: i++ then, on overflow, sIdx++ and i=0. If the next subtree slice was empty (or if several consecutive slices were empty), the very next iteration would access subtreesNodes[sIdx][0] with len==0 and panic. flatIndexToSubtreeNode already handled empty slices in its sequential count-down loop, but the worker's increment did not. Replace the single if with a for-loop that consumes any number of empty subtrees in a row. Added regression test TestFlatIndexToSubtreeNode_SkipsEmptySubtrees that walks a contrived subtreesNodes layout with three empty inner slices interleaved with non-empty ones and asserts the flat-to-(sIdx, i) mapping skips them. 2. CreateTransactionMap sized txMapPool lazily on the first block and never grew it afterwards. If the first observed block was modest in size (e.g. node starts during a low-traffic window) and later blocks were much larger, every subsequent block would silently trigger swiss.Map's internal auto-rehash inside the pool — allocating on the hot path and partially negating the pooling win that was the whole point of the change. Track txMapPoolCapacity on the SubtreeProcessor and adopt a high- water-mark policy: - nil pool: allocate at mapSize. - mapSize > capacity: reallocate at the new mapSize, record it as the new capacity. One alloc instead of an in-flight rehash. - else: Clear() in place (existing fast path). Reallocations only happen when block size grows beyond the recorded high-water mark; on steady-state larger blocks the pool is reused indefinitely. Added regression test TestTxMapPool_GrowsOnLargerBlock that asserts a smaller-or-equal request reuses the same *SplitSwissMap pointer (no reallocation) and a larger request returns a fresh pointer with updated capacity. No behaviour change on the happy path. Existing tests under -race remain green; staticcheck and go vet on the touched package clean.

icellan · 2026-05-15T15:59:14Z

Three follow-up Copilot comments addressed in `aff1a0f81`:

1. Empty-subtree iteration panic (`SubtreeProcessor.go:3518`) — fixed

You're right. The addMoveBackBlockNodesToSubtrees Stage A worker advanced its flat (sIdx, i) cursor with a single `if-step` — if the next subtree slice was empty (or several were), the next iteration's `subtreesNodes[sIdx][0]` with `len==0` would panic. `flatIndexToSubtreeNode` already handles empty slices for the chunk-start lookup, but the worker's increment did not.

Replaced the `if` with a `for`-loop that consumes any number of consecutive empty subtrees. Regression test `TestFlatIndexToSubtreeNode_SkipsEmptySubtrees` walks a contrived layout with three empty inner slices interleaved with non-empty ones and asserts the flat-to-`(sIdx, i)` mapping skips them.

2. `txMapPool` sized to first block only (`SubtreeProcessor.go:4957`) — fixed

Excellent catch — the lazy first-block sizing was a real foot-gun. If the first observed block were small (node started during a low-traffic window) and later blocks were much larger, every subsequent block would silently trigger `swiss.Map`'s auto-rehash inside the pool, defeating most of the pooling win.

Tracked `txMapPoolCapacity` on the `SubtreeProcessor` and switched to a high-water-mark policy in `CreateTransactionMap`:

nil pool: allocate at `mapSize`.
`mapSize > capacity`: reallocate at the new `mapSize`, record as new capacity. One alloc instead of an in-flight rehash.
else: `Clear()` in place.

Reallocations only happen when block size grows past the recorded high-water mark; steady-state larger blocks reuse the pool indefinitely. Regression test `TestTxMapPool_GrowsOnLargerBlock` asserts a smaller-or-equal request reuses the same `*SplitSwissMap` pointer (no realloc) and a larger request returns a fresh pointer with updated capacity.

3. Pseudo-version dependency on go-tx-map (`go.mod:203`) — noted

This is the documented un-blocker, not a fix this PR can apply. The pseudo-version pins bsv-blockchain/go-tx-map#114 (`SetIfNotExistsMulti`), which is required by `PutMultiBucketTxInpoints` to take one Lock per bucket instead of N. The 6-line comment in `go.mod` flags this and names the exact replacement step: when go-tx-map#114 merges and a `v1.3.6+` tag is cut, swap the pseudo-version for the tag and delete the comment. Until then this PR is gated on that upstream merge, which is appropriate — the coupling is intentional.

Full `go test -race ./services/blockassembly/subtreeprocessor/` remains green; `staticcheck` and `go vet` clean on the touched package.

…p on demand Two issues surfaced in CI code review on #877: 1. addMoveBackBlockNodesToSubtrees Stage A worker advanced its flat (sIdx, i) cursor with a single if-step: i++ then, on overflow, sIdx++ and i=0. If the next subtree slice was empty (or if several consecutive slices were empty), the very next iteration would access subtreesNodes[sIdx][0] with len==0 and panic. flatIndexToSubtreeNode already handled empty slices in its sequential count-down loop, but the worker's increment did not. Replace the single if with a for-loop that consumes any number of empty subtrees in a row. Added regression test TestFlatIndexToSubtreeNode_SkipsEmptySubtrees that walks a contrived subtreesNodes layout with three empty inner slices interleaved with non-empty ones and asserts the flat-to-(sIdx, i) mapping skips them. 2. CreateTransactionMap sized txMapPool lazily on the first block and never grew it afterwards. If the first observed block was modest in size (e.g. node starts during a low-traffic window) and later blocks were much larger, every subsequent block would silently trigger swiss.Map's internal auto-rehash inside the pool — allocating on the hot path and partially negating the pooling win that was the whole point of the change. Track txMapPoolCapacity on the SubtreeProcessor and adopt a high- water-mark policy: - nil pool: allocate at mapSize. - mapSize > capacity: reallocate at the new mapSize, record it as the new capacity. One alloc instead of an in-flight rehash. - else: Clear() in place (existing fast path). Reallocations only happen when block size grows beyond the recorded high-water mark; on steady-state larger blocks the pool is reused indefinitely. Added regression test TestTxMapPool_GrowsOnLargerBlock that asserts a smaller-or-equal request reuses the same *SplitSwissMap pointer (no reallocation) and a larger request returns a fresh pointer with updated capacity. No behaviour change on the happy path. Existing tests under -race remain green; staticcheck and go vet on the touched package clean.

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated no new comments.

ordishs · 2026-05-16T07:20:15Z

Latent rollback hole in `moveForwardBlock`

services/blockassembly/subtreeprocessor/SubtreeProcessor.go:4167-4177:

if err = stp.resetSubtreeState(createProperlySizedSubtrees); err != nil {
    return nil, nil, errors.NewProcessingError(...)
}

defer func() {
    if err != nil {
        stp.swapCurrentTxMapBack()
    }
}()

The in-memory swap inside resetSubtreeState (line 3900) happens before the call that can fail (stp.newSubtree(subtreeSize) at line 3914). If newSubtree errors, resetSubtreeState returns and moveForwardBlock early-returns at 4168 — before the rollback defer at 4173 is registered. The swap is left committed: currentTxMap points to the empty shadow buffer, the prior block's data is stranded in currentTxMapShadow.

The next moveForwardBlock would then capture the empty pointer as originalCurrentTxMap and processRemainderTxHashes parent lookups would fail with "node X not found in currentTxMap", cascading the failure.

Suggested fix: either register the defer before resetSubtreeState (safe — the in-memory swap is unconditional and atomic) or fold rollback into resetSubtreeState so the function is atomic w.r.t. swap state. A failure-injection test on newSubtree would lock the fix in. TestDoubleBuffer_SwapBackWipesPartialFailedWrite already pins the analogous hazard one layer down, so this is the natural neighbour test.

@ordishs

…inside resetSubtreeState resetSubtreeState performs the in-memory pool swap (currentTxMap ↔ currentTxMapShadow) before calling stp.newSubtree, which can fail (e.g. subtreepkg.NewTreeByLeafCount returns ErrNotPowerOfTwo for a non-power- of-two leaf count). Until now, that failure path left the swap committed: - moveForwardBlock's swap-back defer is registered AFTER the call to resetSubtreeState returns, so an early-return at line 4167 from a resetSubtreeState error never reaches the defer registration. - resetSubtreeState itself returned the error without any compensating action. Net result: currentTxMap pointed to the empty shadow buffer, the prior block's data was stranded in currentTxMapShadow, and the next moveForwardBlock would capture the empty pointer as originalCurrentTxMap — processRemainderTxHashes parent lookups then failed with "node X not found in currentTxMap", cascading the failure across blocks. Make resetSubtreeState atomic w.r.t. the swap. Track whether the swap ran via a local swappedHere bool, and in a deferred guard call stp.swapCurrentTxMapBack when err != nil && swappedHere. The other branches (DiskTxMap, disableCurrentTxMapPool) don't swap and so don't need rollback; the flag ensures we only roll back when we have to. moveForwardBlock's own swap-back defer (registered after a successful resetSubtreeState return) continues to handle failures from processRemainderTransactionsAndDequeue, processCoinbaseUtxos, etc., so each layer is responsible for the rollback of its own swap window. Adds failure-injection test TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure which sets currentItemsPerFile to 3 (non-power-of-two), seeds the pre- swap map with a sentinel tx, calls resetSubtreeState(true), and asserts that the returned error is surfaced AND that both the current and shadow map pointers round-trip to their pre-call identity, with the sentinel data still visible via currentTxMap. Pins the analogous-to-but-deeper- than TestDoubleBuffer_SwapBackWipesPartialFailedWrite hazard. Reported by @ordishs in #877.

icellan · 2026-05-18T09:30:49Z

@ordishs nice catch — fixed in `77b0059cb`.

You're right: the swap in `resetSubtreeState` is unconditional and happens at line 3900, but `stp.newSubtree(subtreeSize)` at line 3914 can fail (`subtreepkg.NewTreeByLeafCount` returns `ErrNotPowerOfTwo` for a non-power-of-two leaf count). On that error path, `resetSubtreeState` returned to `moveForwardBlock`'s line 4167 early-return — before the swap-back defer at line 4173 was registered — and the swap was left committed.

Took your suggestion of folding the rollback into `resetSubtreeState` (your option B). Approach:

A local `swappedHere bool` flag is set true only in the in-memory swap branch (`DiskTxMap` and `disableCurrentTxMapPool` branches don't swap).
A deferred guard at the top of the function calls `stp.swapCurrentTxMapBack()` iff `err != nil && swappedHere`.
`moveForwardBlock`'s existing rollback defer continues to handle the post-reset window (`processRemainderTransactionsAndDequeue`, `processCoinbaseUtxos`, …). Each layer is now responsible for the rollback of its own swap window.

This option (B) is more defensive than option A (move the caller's defer up): a future addition of any new failure path before the swap in `resetSubtreeState` would otherwise drive `swapCurrentTxMapBack` to swap shadow→current when no swap happened, which would clobber the prior block's data. The flag pins the rollback decision to the actual swap site.

The failure-injection test `TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure` sets `currentItemsPerFile=3` to make `NewTreeByLeafCount` fail, seeds the pre-swap map with a sentinel tx, calls `resetSubtreeState(true)`, and asserts:

the error is surfaced;
`stp.currentTxMap` round-trips to the pre-call pointer identity;
`stp.currentTxMapShadow` round-trips to the pre-call shadow pointer;
the sentinel tx is still visible via `stp.currentTxMap.Exists(...)`;
the shadow is empty.

This pins the analogous-to-but-deeper-than `TestDoubleBuffer_SwapBackWipesPartialFailedWrite` hazard, as you suggested.

Full `go test -race ./services/blockassembly/subtreeprocessor/` remains green (211 s); `staticcheck` and `go vet` clean on the touched package.

Pre-existing note (not introduced by this PR but worth flagging): the line at 3911-3913 that `cs.Close()`s the old subtree before `stp.newSubtree` runs means a `newSubtree` failure also leaves the prior `currentSubtree` closed and the field still pointing at it. That's orthogonal to the swap rollback and would benefit from its own audit, but I haven't tackled it in this PR — happy to follow up if you want.

@ordishs

…inside resetSubtreeState resetSubtreeState performs the in-memory pool swap (currentTxMap ↔ currentTxMapShadow) before calling stp.newSubtree, which can fail (e.g. subtreepkg.NewTreeByLeafCount returns ErrNotPowerOfTwo for a non-power- of-two leaf count). Until now, that failure path left the swap committed: - moveForwardBlock's swap-back defer is registered AFTER the call to resetSubtreeState returns, so an early-return at line 4167 from a resetSubtreeState error never reaches the defer registration. - resetSubtreeState itself returned the error without any compensating action. Net result: currentTxMap pointed to the empty shadow buffer, the prior block's data was stranded in currentTxMapShadow, and the next moveForwardBlock would capture the empty pointer as originalCurrentTxMap — processRemainderTxHashes parent lookups then failed with "node X not found in currentTxMap", cascading the failure across blocks. Make resetSubtreeState atomic w.r.t. the swap. Track whether the swap ran via a local swappedHere bool, and in a deferred guard call stp.swapCurrentTxMapBack when err != nil && swappedHere. The other branches (DiskTxMap, disableCurrentTxMapPool) don't swap and so don't need rollback; the flag ensures we only roll back when we have to. moveForwardBlock's own swap-back defer (registered after a successful resetSubtreeState return) continues to handle failures from processRemainderTransactionsAndDequeue, processCoinbaseUtxos, etc., so each layer is responsible for the rollback of its own swap window. Adds failure-injection test TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure which sets currentItemsPerFile to 3 (non-power-of-two), seeds the pre- swap map with a sentinel tx, calls resetSubtreeState(true), and asserts that the returned error is surfaced AND that both the current and shadow map pointers round-trip to their pre-call identity, with the sentinel data still visible via currentTxMap. Pins the analogous-to-but-deeper- than TestDoubleBuffer_SwapBackWipesPartialFailedWrite hazard. Reported by @ordishs in #877.

sonarqubecloud · 2026-05-18T09:42:27Z

Quality Gate failed

Failed conditions
66.8% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

icellan requested review from Copilot, freemans13 and ordishs and removed request for freemans13 and ordishs May 15, 2026 13:25

Copilot started reviewing on behalf of icellan May 15, 2026 13:26 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated

icellan added 3 commits May 15, 2026 16:34

icellan requested a review from Copilot May 15, 2026 15:23

Copilot started reviewing on behalf of icellan May 15, 2026 15:24 View session

freemans13 approved these changes May 15, 2026

View reviewed changes

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated

Comment thread go.mod

icellan self-assigned this May 15, 2026

icellan requested a review from Copilot May 15, 2026 16:02

Copilot started reviewing on behalf of icellan May 15, 2026 16:03 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Merge branch 'main' into perf/block-assembly-chain-movement

25fd070

oskarszoon approved these changes May 18, 2026

View reviewed changes

icellan merged commit 8fea71e into main May 18, 2026
30 checks passed

icellan deleted the perf/block-assembly-chain-movement branch May 18, 2026 11:20

Conversation

icellan commented May 15, 2026

Summary

Changes

Expected effect (to verify post-deploy)

Recommended cluster configuration

Memory cost

Test plan

Out of scope

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Found

Previously Reported Issues (Now Fixed)

Code Quality Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

icellan commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icellan commented May 15, 2026

1. Empty-subtree iteration panic (SubtreeProcessor.go:3518) — fixed

2. `txMapPool` sized to first block only (`SubtreeProcessor.go:4957`) — fixed

3. Pseudo-version dependency on go-tx-map (`go.mod:203`) — noted

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

ordishs commented May 16, 2026

Latent rollback hole in moveForwardBlock

Uh oh!

icellan commented May 18, 2026

Uh oh!

sonarqubecloud Bot commented May 18, 2026

Quality Gate failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

1. Empty-subtree iteration panic (`SubtreeProcessor.go:3518`) — fixed

Latent rollback hole in `moveForwardBlock`