perf(blockassembly): recycle tx maps + bucket-sharded inserts to cut moveForwardBlock latency#877
Conversation
…ps and bucket-sharded inserts
Profiling block-assembly on a 558M-tx workload showed processNewBlockAnnouncement
spending 26-33 s wall-clock between block landing and BA being ready to mine on
top — the entirety of the "lost revenue" window between two blocks. CPU
profile attribution:
- ~39 % runtime GC (scanObject/scanSpan/tryDeferToSpanScan/spanClass)
driven by per-block fresh allocation of a ~614M-entry SplitSwissMap
(transactionMap in CreateTransactionMap) and a fresh 4096-shard
SplitTxInpointsMap (currentTxMap in resetSubtreeState)
- ~17 % inside SyncedMap.SetIfNotExists RWMutex Lock/Unlock paths —
real per-bucket contention with 375 workers
- effective parallelism ~3.5 of 170 requested cores during the phase
This change attacks every measured source of slack across moveForwardBlock,
moveBackBlock, and reorgBlocks.
1. Pool the transactionMap (SplitSwissMap). Adds SplitSwissMap.Clear() that
walks each per-bucket swiss.Map and calls its in-place Clear (which retains
capacity by zeroing control + group arrays — no allocation). Adds a
SubtreeProcessor.txMapPool field allocated lazily on the first
CreateTransactionMap call and Clear()ed between blocks. The returned map
is purely local to one moveForwardBlock and never escapes, so single-
goroutine ownership is sufficient.
2. Double-buffer the currentTxMap (SplitTxInpointsMap). Adds a
currentTxMapShadow field initialized at constructor time alongside the
active map. resetSubtreeState now swaps the two pointers instead of
allocating a fresh map; moveForwardBlock Clear()s the shadow after
processCoinbaseUtxos succeeds. An internal deferred rollback inside
moveForwardBlock swaps back if any post-reset step fails, preserving the
captured originalCurrentTxMap pointer for the surrounding rollback code at
the callers' three rollback sites (line 745 main-loop, lines 2760/2801
reorg paths) — those sites continue to do
stp.currentTxMap = originalCurrentTxMap as a harmless no-op.
3. Bucket-affinity sharding in parallelGetAndSetIfNotExists. Adds
SplitTxInpointsMap.PutMultiBucketTxInpoints and BucketFor. The hot path
now bucket-partitions input nodes (sequential pre-filter for coinbase +
removeMap; sequential bucket count + slice fill) and dispatches one
goroutine per non-empty bucket, capped by ProcessRemainderTxHashesConcurrency.
Every destination bucket has exactly one writer — the per-bucket RWMutex
is uncontended across workers, eliminating the 17 % mutex tax shown in
profiling. The legacy per-node parallel scheme is retained as a fallback
for the DiskTxMap path under legacyParallelGetAndSetIfNotExists. Bumps
splitMapBuckets default from 4096 to 16384 via the new
BlockAssembly.SplitMapBuckets setting; smaller per-bucket maps Clear
faster and any residual contention drops further. Total memory unchanged.
4. processRemainderTxHashes Phase 1 — drop the literal 16-worker cap. The
previous min(NumCPU, n/100, 16) hard-limited Phase 1 to 16 workers on a
170-core pod when processing 30M-tx remainders. New
min(NumCPU, max(2, n/1024)) chunks input in 1K blocks and scales with
NumCPU. No new setting.
5. Block + mutex profiling. Adds top-level BlockProfileRate and
MutexProfileFraction settings (default 0 = off; recommended on-cluster
values 1000000 and 100). startProfilerAndMetrics now wires both into the
Go runtime via SetBlockProfileRate / SetMutexProfileFraction. Necessary
to quantify any residual wait time directly rather than infer from
CPU-side mutex samples — both endpoints previously reported Count=0.
6. moveBackBlock parallel txInpoints insert. The previous serial
addNode loops in moveBackBlockCreateNewSubtrees ran ~558M
SetIfNotExists + AddSubtreeNode calls one at a time — a multi-minute
serial path during reorgs. New helper addMoveBackBlockNodesToSubtrees
splits the work into:
Stage A — parallel chunked SetIfNotExists into currentTxMap, tracking
per-node wasInserted using flatIndexToSubtreeNode for chunk-start
translation. The map pointers alias the input subtreeMetaTxInpoints
slice elements directly (no per-node allocation, same semantics as the
original addNode call).
Stage B — sequential subtree fill in canonical order calling
addNodePreValidated (already existed in the codebase from the
moveForward path). Duplicates from Stage A's wasInserted=false are
skipped to match the original addNode's wasSet==false branch exactly.
Falls back to the original serial path for DiskTxMap or sub-
ParallelSetIfNotExistsThreshold inputs.
7. reorgBlocks rollback contract preservation. reorgBlocks captures
originalCurrentTxMap once and expects the pointer to remain valid for the
full multi-block loop so rollback can restore it. The double-buffer swap
pattern breaks that contract across multiple iterations. Added a
disableCurrentTxMapPool flag toggled by reorgBlocks for the duration of
its call — resetSubtreeState then falls back to fresh allocation
(original behaviour) and swapCurrentTxMapBack / clearCurrentTxMapShadow
become no-ops. Single-goroutine, no synchronisation needed.
New unit tests cover:
- SplitSwissMap.Clear empties every bucket and preserves usability for
further inserts.
- SplitTxInpointsMap.PutMultiBucketTxInpoints honours duplicate handling
matching SetIfNotExists.
- SplitTxInpointsMap.Clear preserves bucket structure.
- Double-buffer swap-then-clear lifecycle: swap → shadow holds prior data,
current is empty; swap-back restores original pointer identity;
clear-shadow on commit leaves the shadow empty.
- disableCurrentTxMapPool=true reduces swap-back and clear-shadow to
no-ops, preserving the reorg rollback contract.
Existing tests in services/blockassembly/subtreeprocessor pass unchanged under
-race. No code outside blockassembly, settings, daemon changes.
Expected effect: moveForwardBlock wall-clock should drop from 26-33 s toward
under 10 s on a 500M-tx block at scale-2 settings; CreateTransactionMap from
~11 s toward 2-4 s; processing N remainder tx hashes from ~20 s toward 5-8 s.
GC-related funcs in CPU profiles should fall from cumulative ~39 % to single
digits; RWMutex Lock/Unlock under SyncedMap.SetIfNotExists from 17 % to near
zero.
|
🤖 Claude Code Review Status: Complete SummaryThis PR implements significant performance optimizations for block assembly by recycling transaction maps and introducing bucket-sharded parallel inserts. The implementation is technically sound with comprehensive test coverage and proper error handling. Issues Found[Minor] Dependency on pre-release version Recommendation: Update Remove the explanatory comment at lines 190-195 once updated. Previously Reported Issues (Now Fixed)All previously flagged issues have been addressed with proper fixes and test coverage:
Code Quality NotesStrengths:
Performance Impact: Overall Assessment: Well-engineered performance optimization with one minor dependency hygiene issue. Recommend updating to stable |
There was a problem hiding this comment.
Pull request overview
Reduces moveForwardBlock latency on the block-assembly critical path by recycling the per-block transactionMap (SplitSwissMap.Clear), double-buffering the currentTxMap (SplitTxInpointsMap), partitioning the parallel SetIfNotExists workload by destination bucket so each bucket has a single writer, lifting the literal 16-worker cap in processRemainderTxHashes, parallelising the moveBackBlock map insert, and exposing block/mutex profiling via new top-level settings. Bucket count is promoted from a const to the configurable BlockAssembly.SplitMapBuckets (default raised 4096 → 16384).
Changes:
- Pool
transactionMapand double-buffercurrentTxMap, with adisableCurrentTxMapPoolflag to keep the multi-block reorg rollback contract intact. - Bucket-affinity sharding for
parallelGetAndSetIfNotExistsplus a parallel two-stageaddMoveBackBlockNodesToSubtrees. - New top-level
BlockProfileRate/MutexProfileFractionsettings wired intoruntime.SetBlockProfileRate/SetMutexProfileFraction.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| settings/settings.go | Wires new profiler_blockProfileRate, profiler_mutexProfileFraction, and blockassembly_splitMapBuckets config keys. |
| settings/interface.go | Adds BlockProfileRate / MutexProfileFraction fields with docs. |
| settings/blockassembly_settings.go | Adds SplitMapBuckets field with docs (claims "Must be in (0, 65535]"). |
| daemon/daemon_services.go | Enables runtime block / mutex profiling when configured. |
| services/blockassembly/subtreeprocessor/map.go | Adds SplitSwissMap.Clear, SplitTxInpointsMap.Buckets/BucketFor/PutMultiBucketTxInpoints. |
| services/blockassembly/subtreeprocessor/SubtreeProcessor.go | Pools tx map, double-buffers currentTxMap, bucket-shards parallel inserts, parallelises moveBack subtree fill, removes 16-worker cap, adds disableCurrentTxMapPool for reorgs. |
| services/blockassembly/subtreeprocessor/map_test.go | Tests for Clear and bulk-bucket insert helpers. |
| services/blockassembly/subtreeprocessor/double_buffer_test.go | Tests for double-buffer swap/clear and disable-via-flag. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-18 09:44 UTC |
Each per-bucket swiss.Map.Clear walks its own ctrl + group arrays and is independent of every other bucket (separate RWMutex, separate storage). On the pooled transactionMap (~614M slots, 1024 buckets, ~19 GB of key/value/ ctrl memory to zero), the previous serial loop was memory-bandwidth-bound and synchronously held up the start of CreateTransactionMap on every block. Dispatch min(NumCPU, nrOfBuckets) workers; each strides through its share of bucket indices and Clear()s them. Stride pattern keeps the share even for any bucket count and avoids a separate chunking step. Single-worker (or single-bucket) path falls through to the previous loop to skip goroutine overhead on tiny maps. No correctness change; existing TestSplitSwissMap_ClearEmptiesAndPreservesUsability continues to pass under -race.
PutMultiBucketTxInpoints previously delegated to per-entry SyncedMap.SetIfNotExists, which acquires and releases the SyncedMap's internal RWMutex on every call. In the bucket-affinity hot path each worker owns one or more buckets exclusively, so the lock is never contended across workers — but the per-entry Lock/Unlock pair is still measurable work (profiling attributed ~17 % of every SetIfNotExists call to Lock/Unlock paths). go-tx-map upstream now exposes SetIfNotExistsMulti (see bsv-blockchain/go-tx-map#114) which takes the write-lock once for a batch of (key, value) pairs and walks both slices, inserting each pair via setUnlocked iff the key is absent. Per-element wasInserted semantics match SetIfNotExists exactly. This change: - Bumps go-tx-map to a pseudo-version pinned to commit e509e98cad on go-tx-map#114. go.mod carries a multi-line comment explaining why we are on a pseudo-version and that it should be replaced with the next tagged release (>= v1.3.6) once the upstream PR ships. - Switches PutMultiBucketTxInpoints to parallel-slice signature (keys []Hash, values []*TxInpoints) so it can pass through to the new bulk method without re-packing. Drops the unused TxInpointsEntry helper struct. - Updates the sole caller bucketShardedGetAndSetIfNotExists to build the parallel slices directly while iterating its bucket-local indices. - Updates the corresponding unit test to use the new signature via a small testInpointPair helper (renamed and tidied so the inline struct literals don't bloat the assertions). No semantics change. Existing tests in services/blockassembly/subtreeprocessor continue to pass under -race; staticcheck and golangci-lint on the package report zero issues.
…on + SplitMapBuckets bounds Two issues surfaced in code review on #877: 1. swapCurrentTxMapBack left the shadow in a corrupted state on rollback. Trace: after resetSubtreeState swaps, workers may partially populate the freshly-current map before moveForwardBlock returns an error. The deferred swapCurrentTxMapBack then swaps the pointers back so callers see pre-reset state in stp.currentTxMap — correct — but the shadow now holds those partial entries. The next resetSubtreeState swap exposes that garbage as the new "current", silently violating the "freshly-current is always empty" invariant. Any later SetIfNotExists whose key happens to collide with a leftover partial entry returns wasSet=false and the corresponding tx is silently dropped from the subtree fill. Fix: Clear the freshly-current map before swapping it back to shadow. Adds work to the rare error path but preserves the invariant unconditionally. Regression test TestDoubleBuffer_SwapBackWipesPartialFailedWrite exercises the exact sequence: seed current → swap → partially populate new current → swapCurrentTxMapBack → assert shadow is empty and original data restored. 2. SplitMapBuckets was narrowed to uint16 in the constructor without range validation. A configured value of 0 fell through to the internal const (legacy 4096, not the documented default), and a value above 65535 silently wrapped modulo 65536 (e.g. 70000 → 4464). The nolint:gosec marker claimed validation existed but none did. Fix: - Bump the internal constant splitMapBuckets from 4096 to 16384 so the fallback value matches the documented setting default. - Add splitMapBucketsMax (65535) for explicit bounds. - Switch on the configured value: <=0 falls back to the default, > 65535 clamps to 65535 with a Warnf log, otherwise pass through. - Drop the nolint marker — the cast is now provably safe. - Update the settings longdesc with an explicit Range section describing each branch. Regression test TestSplitMapBuckets_ConfigClampingAndDefault covers zero, negative, in-range, exact max, above-max, and far-above-max inputs via table-driven cases. No behaviour change on the happy path; full subtreeprocessor test pass under -race remains green; staticcheck and golangci-lint clean on the touched packages.
|
Both review comments addressed in 1. You're right — the partial-failed-write left in the freshly-current map would survive as the shadow and re-emerge as "current" on the next reset, silently dropping any tx whose hash collides with the leftover entry via Regression test 2. You're right on both points. Fix:
Regression test Full subtreeprocessor test pass under |
…on + SplitMapBuckets bounds Two issues surfaced in code review on #877: 1. swapCurrentTxMapBack left the shadow in a corrupted state on rollback. Trace: after resetSubtreeState swaps, workers may partially populate the freshly-current map before moveForwardBlock returns an error. The deferred swapCurrentTxMapBack then swaps the pointers back so callers see pre-reset state in stp.currentTxMap — correct — but the shadow now holds those partial entries. The next resetSubtreeState swap exposes that garbage as the new "current", silently violating the "freshly-current is always empty" invariant. Any later SetIfNotExists whose key happens to collide with a leftover partial entry returns wasSet=false and the corresponding tx is silently dropped from the subtree fill. Fix: Clear the freshly-current map before swapping it back to shadow. Adds work to the rare error path but preserves the invariant unconditionally. Regression test TestDoubleBuffer_SwapBackWipesPartialFailedWrite exercises the exact sequence: seed current → swap → partially populate new current → swapCurrentTxMapBack → assert shadow is empty and original data restored. 2. SplitMapBuckets was narrowed to uint16 in the constructor without range validation. A configured value of 0 fell through to the internal const (legacy 4096, not the documented default), and a value above 65535 silently wrapped modulo 65536 (e.g. 70000 → 4464). The nolint:gosec marker claimed validation existed but none did. Fix: - Bump the internal constant splitMapBuckets from 4096 to 16384 so the fallback value matches the documented setting default. - Add splitMapBucketsMax (65535) for explicit bounds. - Switch on the configured value: <=0 falls back to the default, > 65535 clamps to 65535 with a Warnf log, otherwise pass through. - Drop the nolint marker — the cast is now provably safe. - Update the settings longdesc with an explicit Range section describing each branch. Regression test TestSplitMapBuckets_ConfigClampingAndDefault covers zero, negative, in-range, exact max, above-max, and far-above-max inputs via table-driven cases. No behaviour change on the happy path; full subtreeprocessor test pass under -race remains green; staticcheck and golangci-lint clean on the touched packages.
…p on demand Two issues surfaced in CI code review on #877: 1. addMoveBackBlockNodesToSubtrees Stage A worker advanced its flat (sIdx, i) cursor with a single if-step: i++ then, on overflow, sIdx++ and i=0. If the next subtree slice was empty (or if several consecutive slices were empty), the very next iteration would access subtreesNodes[sIdx][0] with len==0 and panic. flatIndexToSubtreeNode already handled empty slices in its sequential count-down loop, but the worker's increment did not. Replace the single if with a for-loop that consumes any number of empty subtrees in a row. Added regression test TestFlatIndexToSubtreeNode_SkipsEmptySubtrees that walks a contrived subtreesNodes layout with three empty inner slices interleaved with non-empty ones and asserts the flat-to-(sIdx, i) mapping skips them. 2. CreateTransactionMap sized txMapPool lazily on the first block and never grew it afterwards. If the first observed block was modest in size (e.g. node starts during a low-traffic window) and later blocks were much larger, every subsequent block would silently trigger swiss.Map's internal auto-rehash inside the pool — allocating on the hot path and partially negating the pooling win that was the whole point of the change. Track txMapPoolCapacity on the SubtreeProcessor and adopt a high- water-mark policy: - nil pool: allocate at mapSize. - mapSize > capacity: reallocate at the new mapSize, record it as the new capacity. One alloc instead of an in-flight rehash. - else: Clear() in place (existing fast path). Reallocations only happen when block size grows beyond the recorded high-water mark; on steady-state larger blocks the pool is reused indefinitely. Added regression test TestTxMapPool_GrowsOnLargerBlock that asserts a smaller-or-equal request reuses the same *SplitSwissMap pointer (no reallocation) and a larger request returns a fresh pointer with updated capacity. No behaviour change on the happy path. Existing tests under -race remain green; staticcheck and go vet on the touched package clean.
|
Three follow-up Copilot comments addressed in `aff1a0f81`: 1. Empty-subtree iteration panic (
|
…p on demand Two issues surfaced in CI code review on #877: 1. addMoveBackBlockNodesToSubtrees Stage A worker advanced its flat (sIdx, i) cursor with a single if-step: i++ then, on overflow, sIdx++ and i=0. If the next subtree slice was empty (or if several consecutive slices were empty), the very next iteration would access subtreesNodes[sIdx][0] with len==0 and panic. flatIndexToSubtreeNode already handled empty slices in its sequential count-down loop, but the worker's increment did not. Replace the single if with a for-loop that consumes any number of empty subtrees in a row. Added regression test TestFlatIndexToSubtreeNode_SkipsEmptySubtrees that walks a contrived subtreesNodes layout with three empty inner slices interleaved with non-empty ones and asserts the flat-to-(sIdx, i) mapping skips them. 2. CreateTransactionMap sized txMapPool lazily on the first block and never grew it afterwards. If the first observed block was modest in size (e.g. node starts during a low-traffic window) and later blocks were much larger, every subsequent block would silently trigger swiss.Map's internal auto-rehash inside the pool — allocating on the hot path and partially negating the pooling win that was the whole point of the change. Track txMapPoolCapacity on the SubtreeProcessor and adopt a high- water-mark policy: - nil pool: allocate at mapSize. - mapSize > capacity: reallocate at the new mapSize, record it as the new capacity. One alloc instead of an in-flight rehash. - else: Clear() in place (existing fast path). Reallocations only happen when block size grows beyond the recorded high-water mark; on steady-state larger blocks the pool is reused indefinitely. Added regression test TestTxMapPool_GrowsOnLargerBlock that asserts a smaller-or-equal request reuses the same *SplitSwissMap pointer (no reallocation) and a larger request returns a fresh pointer with updated capacity. No behaviour change on the happy path. Existing tests under -race remain green; staticcheck and go vet on the touched package clean.
Latent rollback hole in
|
…inside resetSubtreeState
resetSubtreeState performs the in-memory pool swap (currentTxMap ↔
currentTxMapShadow) before calling stp.newSubtree, which can fail (e.g.
subtreepkg.NewTreeByLeafCount returns ErrNotPowerOfTwo for a non-power-
of-two leaf count). Until now, that failure path left the swap committed:
- moveForwardBlock's swap-back defer is registered AFTER the call to
resetSubtreeState returns, so an early-return at line 4167 from a
resetSubtreeState error never reaches the defer registration.
- resetSubtreeState itself returned the error without any compensating
action.
Net result: currentTxMap pointed to the empty shadow buffer, the prior
block's data was stranded in currentTxMapShadow, and the next
moveForwardBlock would capture the empty pointer as originalCurrentTxMap
— processRemainderTxHashes parent lookups then failed with "node X not
found in currentTxMap", cascading the failure across blocks.
Make resetSubtreeState atomic w.r.t. the swap. Track whether the swap
ran via a local swappedHere bool, and in a deferred guard call
stp.swapCurrentTxMapBack when err != nil && swappedHere. The other
branches (DiskTxMap, disableCurrentTxMapPool) don't swap and so don't
need rollback; the flag ensures we only roll back when we have to.
moveForwardBlock's own swap-back defer (registered after a successful
resetSubtreeState return) continues to handle failures from
processRemainderTransactionsAndDequeue, processCoinbaseUtxos, etc., so
each layer is responsible for the rollback of its own swap window.
Adds failure-injection test TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure
which sets currentItemsPerFile to 3 (non-power-of-two), seeds the pre-
swap map with a sentinel tx, calls resetSubtreeState(true), and asserts
that the returned error is surfaced AND that both the current and shadow
map pointers round-trip to their pre-call identity, with the sentinel
data still visible via currentTxMap. Pins the analogous-to-but-deeper-
than TestDoubleBuffer_SwapBackWipesPartialFailedWrite hazard.
Reported by @ordishs in #877.
|
@ordishs nice catch — fixed in `77b0059cb`. You're right: the swap in `resetSubtreeState` is unconditional and happens at line 3900, but `stp.newSubtree(subtreeSize)` at line 3914 can fail (`subtreepkg.NewTreeByLeafCount` returns `ErrNotPowerOfTwo` for a non-power-of-two leaf count). On that error path, `resetSubtreeState` returned to `moveForwardBlock`'s line 4167 early-return — before the swap-back defer at line 4173 was registered — and the swap was left committed. Took your suggestion of folding the rollback into `resetSubtreeState` (your option B). Approach:
This option (B) is more defensive than option A (move the caller's defer up): a future addition of any new failure path before the swap in `resetSubtreeState` would otherwise drive `swapCurrentTxMapBack` to swap shadow→current when no swap happened, which would clobber the prior block's data. The flag pins the rollback decision to the actual swap site. The failure-injection test `TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure` sets `currentItemsPerFile=3` to make `NewTreeByLeafCount` fail, seeds the pre-swap map with a sentinel tx, calls `resetSubtreeState(true)`, and asserts:
This pins the analogous-to-but-deeper-than `TestDoubleBuffer_SwapBackWipesPartialFailedWrite` hazard, as you suggested. Full `go test -race ./services/blockassembly/subtreeprocessor/` remains green (211 s); `staticcheck` and `go vet` clean on the touched package. Pre-existing note (not introduced by this PR but worth flagging): the line at 3911-3913 that `cs.Close()`s the old subtree before `stp.newSubtree` runs means a `newSubtree` failure also leaves the prior `currentSubtree` closed and the field still pointing at it. That's orthogonal to the swap rollback and would benefit from its own audit, but I haven't tackled it in this PR — happy to follow up if you want. |
…inside resetSubtreeState
resetSubtreeState performs the in-memory pool swap (currentTxMap ↔
currentTxMapShadow) before calling stp.newSubtree, which can fail (e.g.
subtreepkg.NewTreeByLeafCount returns ErrNotPowerOfTwo for a non-power-
of-two leaf count). Until now, that failure path left the swap committed:
- moveForwardBlock's swap-back defer is registered AFTER the call to
resetSubtreeState returns, so an early-return at line 4167 from a
resetSubtreeState error never reaches the defer registration.
- resetSubtreeState itself returned the error without any compensating
action.
Net result: currentTxMap pointed to the empty shadow buffer, the prior
block's data was stranded in currentTxMapShadow, and the next
moveForwardBlock would capture the empty pointer as originalCurrentTxMap
— processRemainderTxHashes parent lookups then failed with "node X not
found in currentTxMap", cascading the failure across blocks.
Make resetSubtreeState atomic w.r.t. the swap. Track whether the swap
ran via a local swappedHere bool, and in a deferred guard call
stp.swapCurrentTxMapBack when err != nil && swappedHere. The other
branches (DiskTxMap, disableCurrentTxMapPool) don't swap and so don't
need rollback; the flag ensures we only roll back when we have to.
moveForwardBlock's own swap-back defer (registered after a successful
resetSubtreeState return) continues to handle failures from
processRemainderTransactionsAndDequeue, processCoinbaseUtxos, etc., so
each layer is responsible for the rollback of its own swap window.
Adds failure-injection test TestResetSubtreeState_RollsBackSwapOnNewSubtreeFailure
which sets currentItemsPerFile to 3 (non-power-of-two), seeds the pre-
swap map with a sentinel tx, calls resetSubtreeState(true), and asserts
that the returned error is surfaced AND that both the current and shadow
map pointers round-trip to their pre-call identity, with the sentinel
data still visible via currentTxMap. Pins the analogous-to-but-deeper-
than TestDoubleBuffer_SwapBackWipesPartialFailedWrite hazard.
Reported by @ordishs in #877.
|


Summary
Block-assembly's
processNewBlockAnnouncementis on the critical path between a competing miner's block landing and BA being ready to mine on top — every second of it is lost expected revenue. On a 558M-tx workload (scale-2) the window measured 26-33 s wall-clock per block; CPU profiling attributed:scanObject/scanSpan/tryDeferToSpanScan/spanClass.sizeclass), driven by per-block fresh allocation of a ~614 M-entrySplitSwissMap(transactionMapinCreateTransactionMap) and a fresh 4096-shardSplitTxInpointsMap(currentTxMapinresetSubtreeState).SyncedMap.SetIfNotExistscall insideRWMutex.Lock/Unlock— real per-bucket contention with the default 375 workers.This PR attacks every measured source of slack across
moveForwardBlock,moveBackBlock, andreorgBlocks. No semantics change.Changes
SplitSwissMap). AddsClear()(in-place per-bucketswiss.Map.Clear), a lazySubtreeProcessor.txMapPoolfield;CreateTransactionMapnowClear()s the pool instead of allocating fresh per block.map.go,SubtreeProcessor.goSplitTxInpointsMap). Adds a pre-allocatedcurrentTxMapShadow;resetSubtreeStateswaps pointers,moveForwardBlockClear()s the shadow at commit. An internal deferred rollback insidemoveForwardBlockswaps back if any post-reset step fails, preserving the caller'soriginalCurrentTxMap-based rollback contract.SubtreeProcessor.goparallelGetAndSetIfNotExists. AddsPutMultiBucketTxInpoints+BucketFor. The hot path bucket-partitions input nodes and dispatches one goroutine per non-empty destination bucket — every bucket has exactly one writer, so the per-bucketRWMutexis uncontended across workers. The legacy per-node parallel scheme is kept as a fallback for the DiskTxMap path. PromotessplitMapBucketsfrom aconsttoBlockAssembly.SplitMapBuckets(default bumped from 4096 → 16384).map.go,SubtreeProcessor.go,settings/blockassembly_settings.go,settings/settings.goprocessRemainderTxHashesPhase 1. The oldmin(NumCPU, n/100, 16)hard-limited Phase 1 to 16 workers on a 170-core pod; newmin(NumCPU, max(2, n/1024))scales linearly withNumCPU.SubtreeProcessor.goBlockProfileRateandMutexProfileFractionsettings (default 0 = off).startProfilerAndMetricswires both intoruntime.SetBlockProfileRate/SetMutexProfileFraction. Required to quantify residual wait time directly — both endpoints previously reportedCount=0.settings/interface.go,settings/settings.go,daemon/daemon_services.goaddMoveBackBlockNodesToSubtreessplits the previous serialaddNodeloop into Stage A (parallel chunkedSetIfNotExistsintocurrentTxMap) and Stage B (sequential subtree fill via the existingaddNodePreValidated, skippingwasInserted=falseduplicates exactly like the originaladdNode). Same fallback for DiskTxMap / sub-threshold inputs.SubtreeProcessor.goreorgBlockscapturesoriginalCurrentTxMaponce and expects the pointer to remain valid for the full multi-block loop. NewdisableCurrentTxMapPoolflag toggled byreorgBlocksfor the duration of its call —resetSubtreeStatethen falls back to fresh allocation (original behaviour) andswapCurrentTxMapBack/clearCurrentTxMapShadowbecome no-ops. Single-goroutine driven; no synchronisation needed.SubtreeProcessor.goExpected effect (to verify post-deploy)
processNewBlockAnnouncement DONE in XCreateTransactionMap DONEprocessing N remainder tx hashes DONERWMutex.Lock/UnlockunderSyncedMap.SetIfNotExistsRecommended cluster configuration
For dev-scale-2 verification, set in the block-assembly configmap:
SubtreeProcessorConcurrentReadsandProcessRemainderTxHashesConcurrencyalready default to 375 — no change needed.Memory cost
txMapPool: ~40 GB resident vs. ~40 GB allocated-and-freed per block before — same peak, allocation rate drops to near zero.currentTxMapShadow: holds peak ~32 MTxInpointspointers between blocks. Estimated 1-2 GB. Pod limit 950 Gi; current peak ~263 GB. Plenty of headroom.Test plan
go build ./...cleango vet ./...cleanstaticcheckon touched packages cleangolangci-lint runon touched packages clean (one pre-existing prealloc warning in unrelateddaemon/test_daemon.go)go test -race ./services/blockassembly/subtreeprocessor/passed (213 s)go test -race ./services/blockassembly/mining/,./settings/,./daemon/all passedTestSplitSwissMap_ClearEmptiesAndPreservesUsabilityTestSplitTxInpointsMap_PutMultiBucketTxInpointsRespectsExistingEntriesTestSplitTxInpointsMap_ClearPreservesBucketStructureTestDoubleBuffer_SwapAndClearLifecycleTestDoubleBuffer_DisableViaFlagmoveForwardBlock DONE in Xdrops under 10 s on the next 500M-tx blockRWMutexshare approaches zero/debug/pprof/blockand/debug/pprof/mutexfor any new top wait callersdisableCurrentTxMapPoolrollback path end-to-endOut of scope
services/blockvalidation/...).Block.Validruns in background and does not gate the next block being mined.AddSubtreeNodefill inmoveBackBlock. If it becomes the new bottleneck post-deploy, follow-up.