perf(subtreeprocessor): eliminate SplitSwissMap contention in moveForwardBlock by icellan · Pull Request #1078 · bsv-blockchain/teranode

icellan · 2026-06-12T09:53:15Z

Problem

A CPU profile captured on the scaling cluster (scale-2 block-assembly, 192-core node) during a foreign 420M-tx block's moveForwardBlock showed both hot phases plateauing at ~15M map-ops/s with only ~27/192 cores busy:

CreateTransactionMap: ~375 concurrent per-subtree goroutines each spawned one goroutine per non-empty bucket — all contending on the same 1024 bucket RWMutexes (mutex queueing + 375×1024 goroutine churn).
processRemainderTxHashes / dequeueDuringBlockMovement: ~440M Exists() probes each paying RLock/RUnlock — 11% of all CPU was sync/atomic.(*Int32).Add (readerCount cache-line ping-pong) on a map that is strictly read-only during those phases.

This is the dominant constant cost behind non-mining nodes lagging the miner by a median ~55s per block in "mining on top of the new tip".

Changes

bucketInserter (new): single-writer-per-bucket streaming insert pipeline for CreateTransactionMap. Each bucket is exclusively owned by one worker; submits are batched one message per (submitter, worker). The per-bucket handoff variant left 39% of profile samples in runtime.runqgrab scheduler churn — batching removed it.
SplitSwissMap.Freeze(): called once the insert pipeline drains (the errgroup join + worker join is the happens-before edge). Exists then skips the per-bucket RLock; Put/PutMultiBucket fail loudly on a frozen map; Clear() un-freezes for pooled reuse across blocks.
Slice-indexed buckets: m/mu are []*swiss.Map/[]*sync.RWMutex instead of map[uint16]… — the per-op Go-map lookup cost ~5% of insert CPU.

TDD: baseline benchmarks were written and recorded first (txmap_contention_benchmark_test.go), then map_freeze_test.go / bucket_inserter_test.go red→green, then the wire-in. All new concurrency tests run under -race.

Benchmarks (M3 Max, 16 cores)

Benchmark	Before	After	Delta
PhaseA insert 4M / 64 subtrees	165M tx/s	279M tx/s	1.7×
PhaseA insert 16M / 256 subtrees	126M tx/s	227M tx/s	1.8×
PhaseA insert 40M / 256 subtrees	147M tx/s	195M tx/s	1.3×
PhaseB Exists 16M (all cores)	15.2 ns/op	6.0 ns/op	2.6×
`CreateTransactionMap` e2e	8.13 ms	3.43 ms	2.4×

A 16-core desktop does not reproduce the 192-core contention collapse (~15M ops/s in production vs ~130M+ locally), so the production win should be substantially larger. Post-deploy verification plan: re-capture a 30s CPU profile on a scaling node during a foreign-block moveForwardBlock and compare the "processing subtrees into transaction map DONE" / "remainder tx hashes DONE" log durations against the current 28s/30s.

Also includes a standalone commit pinning pruner_utxoDefensiveEnabled=false in the sql utxostore test setup — TestMinedThenSpendAllPrunes asserts default-mode pruning and flipped when a developer's settings_local.conf set the flag; CI is unchanged.

Verification

go test -race ./services/blockassembly/... — pass
Full repo unit suite (-race -tags testtxmetacache, all packages) — pass after the pruner-test pin
go vet, golangci-lint run (0 issues), staticcheck — clean
No regression at GOMAXPROCS=4 (insert 1.1× faster, Exists strictly fewer ops/probe)

…wardBlock A CPU profile of a foreign 420M-tx block on a 192-core scaling node showed moveForwardBlock plateauing at ~15M map-ops/s with only ~27 cores busy: CreateTransactionMap's ~375 concurrent subtree goroutines each spawned one goroutine per bucket, all contending on the 1024 bucket RWMutexes, and the ~440M Exists probes in processRemainderTxHashes each paid RLock/RUnlock readerCount atomics (11% of all CPU) on a map that is read-only by then. Changes: - bucketInserter: single-writer-per-bucket streaming insert pipeline for CreateTransactionMap. Each bucket is owned by exactly one worker, so the per-bucket mutex is never contended. Submits are batched one message per (submitter, worker) - the per-bucket variant showed 39% of samples in runtime.runqgrab scheduler churn. - SplitSwissMap.Freeze(): after the insert pipeline drains, the map is frozen; Exists skips the per-bucket RLock, writes fail loudly, Clear() un-freezes for pooled reuse across blocks. - SplitSwissMap buckets are slice-indexed instead of map[uint16]-keyed (the per-op Go-map lookup cost ~5% of insert CPU). Benchmarks (M3 Max, 16 cores; baselines in txmap_contention_benchmark_test.go): - insert 16M hashes/256 subtrees: 126M -> 227M tx/s (1.8x) - insert 40M hashes/256 subtrees: 147M -> 195M tx/s (1.3x) - Exists 16M, all cores: 15.2ns -> 6.0ns per probe (2.6x) - CreateTransactionMap e2e: 8.13ms -> 3.43ms (2.4x) The production win is larger: the old shape collapsed under contention at 192 cores in a way a 16-core box does not reproduce.

…est setup TestMinedThenSpendAllPrunes asserts that the pruner deletes a fully-spent mined tx in default pruning mode. A developer settings_local.conf setting pruner_utxoDefensiveEnabled=true leaks into local test runs and flips the assertion. Pin the flag in the two store setup helpers so the suite is deterministic regardless of local config; CI behaviour is unchanged (default is already false).

github-actions · 2026-06-12T09:57:09Z

🤖 Claude Code Review

Status: Complete

Current Review: No new issues found.

I focused on the concurrency-critical parts of this perf change and the design holds up:

bucketInserter (single-writer-per-bucket): worker w exclusively owns buckets {b : b%numWorkers==w}, so the per-bucket mutex inside PutMultiBucket is taken but never contended. Concurrent submit() → channel sends are safe; workers drain until close, so submitters cannot deadlock (the frozen-error drain path is explicitly tested). closeAndWait is idempotent via CompareAndSwap.
Freeze() + lock-free Exists: the happens-before edge is sound — g.Wait() joins all submitters, closeAndWait’s wg.Wait() joins all writer workers, then Freeze() stores the atomic. A reader observing frozen==true transitively sees every bucket write. The non-frozen Exists path still RLocks, so existing concurrent-with-writer callers are unaffected.
Clear() un-freeze for pool reuse: correct given the established single-goroutine, sequential block-movement lifecycle (read phase of block N completes before block N+1’s CreateTransactionMap→Clear). Writes after freeze fail loudly rather than silently corrupting the lock-free read path.
Slice-indexed buckets: safe — the bucket router was already the indexing contract; out-of-range would have panicked under the old map[uint16] shape too.
sql test pins (pruner_utxoDefensiveEnabled=false): sensible determinism fix; CI default is already false, so no behavior change.

The lock-free read’s safety rests on the freeze→read→clear ordering invariant (call-site discipline, not a type guarantee). This is well-documented in map.go and already raised in @freemans13’s open thread — worth considering the suggested debug assertion/metric on rejected frozen writes as future insurance, but non-blocking.

History:

✅ Fixed: Clear() worker pool now sized by GOMAXPROCS(0) instead of NumCPU() (commit 36ff3dd9f) — resolves my earlier consistency note.
✅ Addressed: test goroutine require→t.Errorf fix and bucketInserter GOMAXPROCS sizing landed in follow-up commits.

github-actions · 2026-06-12T10:08:30Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1078 (9cb590e)

Summary

Regressions: 0
Improvements: 0
Unchanged: 132
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.813µ	1.792µ	~	1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	61.69n	61.64n	~	1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	61.63n	61.80n	~	0.100
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	61.60n	61.65n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	30.79n	31.00n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	53.10n	52.26n	~	1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	107.1n	105.6n	~	0.100
MiningCandidate_Stringify_Short-4	261.4n	262.4n	~	0.800
MiningCandidate_Stringify_Long-4	1.903µ	1.906µ	~	0.300
MiningSolution_Stringify-4	971.6n	971.1n	~	0.700
BlockInfo_MarshalJSON-4	1.790µ	1.789µ	~	1.000
NewFromBytes-4	125.5n	123.7n	~	0.100
AddTxBatchColumnar_Validation-4	2.414µ	2.370µ	~	0.700
OffsetValidationLoop-4	725.7n	719.9n	~	0.400
Mine_EasyDifficulty-4	56.54µ	55.60µ	~	1.000
Mine_WithAddress-4	5.797µ	5.818µ	~	1.000
DirectSubtreeAdd/4_per_subtree-4	58.30n	59.21n	~	0.400
DirectSubtreeAdd/64_per_subtree-4	30.11n	29.89n	~	0.400
DirectSubtreeAdd/256_per_subtree-4	29.07n	28.89n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	28.21n	28.08n	~	0.700
DirectSubtreeAdd/2048_per_subtree-4	27.69n	27.57n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	239.4n	240.7n	~	0.300
SubtreeProcessorAdd/64_per_subtree-4	236.1n	236.5n	~	0.400
SubtreeProcessorAdd/256_per_subtree-4	236.4n	238.1n	~	0.100
SubtreeProcessorAdd/1024_per_subtree-4	235.4n	236.3n	~	1.000
SubtreeProcessorAdd/2048_per_subtree-4	230.5n	231.1n	~	0.700
SubtreeProcessorRotate/4_per_subtree-4	234.8n	232.2n	~	0.100
SubtreeProcessorRotate/64_per_subtree-4	232.8n	231.0n	~	0.400
SubtreeProcessorRotate/256_per_subtree-4	232.6n	229.5n	~	0.700
SubtreeProcessorRotate/1024_per_subtree-4	232.9n	232.9n	~	0.700
SubtreeNodeAddOnly/4_per_subtree-4	56.26n	54.13n	~	0.100
SubtreeNodeAddOnly/64_per_subtree-4	34.43n	34.24n	~	0.100
SubtreeNodeAddOnly/256_per_subtree-4	33.40n	33.34n	~	1.000
SubtreeNodeAddOnly/1024_per_subtree-4	32.62n	32.63n	~	0.800
SubtreeCreationOnly/4_per_subtree-4	114.9n	113.6n	~	0.100
SubtreeCreationOnly/64_per_subtree-4	399.0n	393.5n	~	0.100
SubtreeCreationOnly/256_per_subtree-4	1.337µ	1.431µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	4.380µ	4.680µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	7.960µ	8.198µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	230.8n	232.7n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	230.7n	233.7n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	9.040m	10.364m	~	0.100
ParallelGetAndSetIfNotExists/10k_nodes-4	13.17m	13.09m	~	1.000
ParallelGetAndSetIfNotExists/50k_nodes-4	15.96m	16.02m	~	0.700
ParallelGetAndSetIfNotExists/100k_nodes-4	19.28m	19.54m	~	0.400
SequentialGetAndSetIfNotExists/1k_nodes-4	9.278m	11.542m	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	13.40m	15.98m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	24.31m	21.45m	~	0.200
SequentialGetAndSetIfNotExists/100k_nodes-4	29.88m	29.04m	~	0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	9.368m	12.319m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	16.87m	19.28m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	20.91m	21.48m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	12.94m	13.11m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	16.36m	16.79m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	61.06m	56.28m	~	0.100
DiskTxMap_SetIfNotExists-4	3.567µ	3.614µ	~	0.700
DiskTxMap_SetIfNotExists_Parallel-4	3.351µ	3.384µ	~	1.000
DiskTxMap_ExistenceOnly-4	326.7n	322.9n	~	0.700
Queue-4	188.9n	190.7n	~	0.400
AtomicPointer-4	4.730n	4.483n	~	0.700
TxMapSetIfNotExists-4	52.92n	52.81n	~	1.000
TxMapSetIfNotExistsDuplicate-4	40.01n	40.29n	~	0.200
ChannelSendReceive-4	584.5n	579.5n	~	0.100
BlockAssembler_AddTx-4	0.02933n	0.03126n	~	0.400
AddNode-4	11.05	11.11	~	0.400
AddNodeWithMap-4	11.31	11.59	~	0.700
CalcBlockWork-4	473.2n	469.7n	~	0.700
CalculateWork-4	641.9n	643.4n	~	1.000
CheckOldBlockIDs/on-chain-prefetch/1000-4	63.91µ	57.75µ	~	0.400
CheckOldBlockIDs/off-chain-prefetch/1000-4	47.99µ	48.40µ	~	0.200
CheckOldBlockIDs/on-chain-prefetch/10000-4	428.6µ	436.9µ	~	0.400
CheckOldBlockIDs/off-chain-prefetch/10000-4	343.4µ	350.9µ	~	0.200
BuildBlockLocatorString_Helpers/Size_10-4	1.393µ	1.500µ	~	0.100
BuildBlockLocatorString_Helpers/Size_100-4	13.26µ	14.29µ	~	0.100
BuildBlockLocatorString_Helpers/Size_1000-4	130.6µ	136.5µ	~	0.100
CatchupWithHeaderCache-4	104.6m	104.7m	~	0.700
_prepareTxsPerLevel-4	409.9m	432.0m	~	0.200
_prepareTxsPerLevelOrdered-4	3.858m	4.402m	~	0.400
_prepareTxsPerLevel_Comparison/Original-4	425.0m	421.5m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	3.824m	4.015m	~	0.200
SubtreeSizes/10k_tx_4_per_subtree-4	1.340m	1.322m	~	0.700
SubtreeSizes/10k_tx_16_per_subtree-4	313.5µ	312.7µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	75.46µ	74.62µ	~	0.400
SubtreeSizes/10k_tx_256_per_subtree-4	18.57µ	18.63µ	~	0.700
SubtreeSizes/10k_tx_512_per_subtree-4	9.249µ	9.246µ	~	0.700
SubtreeSizes/10k_tx_1024_per_subtree-4	4.584µ	4.613µ	~	0.200
SubtreeSizes/10k_tx_2k_per_subtree-4	2.308µ	2.315µ	~	1.000
BlockSizeScaling/10k_tx_64_per_subtree-4	73.27µ	73.00µ	~	0.700
BlockSizeScaling/10k_tx_256_per_subtree-4	18.44µ	18.34µ	~	0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.623µ	4.595µ	~	1.000
BlockSizeScaling/50k_tx_64_per_subtree-4	386.8µ	399.2µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	92.66µ	92.91µ	~	0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4	23.08µ	22.73µ	~	0.700
SubtreeAllocations/small_subtrees_exists_check-4	155.1µ	156.5µ	~	0.400
SubtreeAllocations/small_subtrees_data_fetch-4	163.0µ	161.1µ	~	0.100
SubtreeAllocations/small_subtrees_full_validation-4	324.3µ	321.0µ	~	0.400
SubtreeAllocations/medium_subtrees_exists_check-4	9.258µ	9.193µ	~	0.400
SubtreeAllocations/medium_subtrees_data_fetch-4	9.642µ	9.595µ	~	1.000
SubtreeAllocations/medium_subtrees_full_validation-4	18.72µ	18.84µ	~	0.100
SubtreeAllocations/large_subtrees_exists_check-4	2.207µ	2.205µ	~	1.000
SubtreeAllocations/large_subtrees_data_fetch-4	2.333µ	2.351µ	~	1.000
SubtreeAllocations/large_subtrees_full_validation-4	4.676µ	4.681µ	~	1.000
_BufferPoolAllocation/16KB-4	3.827µ	3.838µ	~	0.700
_BufferPoolAllocation/32KB-4	8.821µ	8.027µ	~	0.700
_BufferPoolAllocation/64KB-4	17.25µ	15.63µ	~	0.400
_BufferPoolAllocation/128KB-4	33.52µ	26.53µ	~	0.100
_BufferPoolAllocation/512KB-4	111.7µ	112.6µ	~	0.400
_BufferPoolConcurrent/32KB-4	18.70µ	18.84µ	~	0.700
_BufferPoolConcurrent/64KB-4	29.99µ	29.14µ	~	1.000
_BufferPoolConcurrent/512KB-4	146.0µ	150.4µ	~	0.100
_SubtreeDeserializationWithBufferSizes/16KB-4	629.2µ	679.8µ	~	0.100
_SubtreeDeserializationWithBufferSizes/32KB-4	613.5µ	604.1µ	~	0.700
_SubtreeDeserializationWithBufferSizes/64KB-4	651.2µ	616.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	656.7µ	608.1µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	663.4µ	635.7µ	~	0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.38m	36.83m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.68m	36.73m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4	36.53m	36.47m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/128KB-4	36.16m	36.82m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	35.95m	36.39m	~	0.100
_PooledVsNonPooled/Pooled-4	831.1n	838.5n	~	0.100
_PooledVsNonPooled/NonPooled-4	8.429µ	8.009µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.720µ	6.756µ	~	1.000
_MemoryFootprint/Proposed_32KB_32concurrent-4	10.30µ	10.94µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	10.20µ	10.06µ	~	0.400
StoreBlock_Sequential/BelowCSVHeight-4	329.9µ	327.6µ	~	0.200
StoreBlock_Sequential/AboveCSVHeight-4	330.7µ	335.1µ	~	0.200
GetUtxoHashes-4	256.6n	257.6n	~	1.000
GetUtxoHashes_ManyOutputs-4	52.03µ	49.48µ	~	0.100
_NewMetaDataFromBytes-4	224.2n	222.7n	~	0.100
_Bytes-4	425.2n	421.1n	~	0.700
_MetaBytes-4	136.2n	137.1n	~	0.700

Threshold: >10% with p < 0.05 | Generated: 2026-06-12 11:10 UTC

Copilot

Pull request overview

This PR targets a major scalability bottleneck in subtreeprocessor during foreign-block moveForwardBlock by reducing contention and lock overhead in the transaction-hash membership map used across Phase A (insertion) and Phase B (mass Exists() probing).

Changes:

Introduces a bucketInserter pipeline to enforce single-writer-per-bucket inserts during CreateTransactionMap, avoiding RWMutex contention and excessive goroutine churn.
Adds SplitSwissMap.Freeze() + pooled reuse behavior (Clear() unfreezes) so Phase B Exists() probes can skip per-bucket read locks when the map is known read-only.
Adds benchmarks and concurrency tests to validate and quantify the contention fixes; also pins Pruner.UTXODefensiveEnabled=false in SQL UTXO store tests to avoid developer-local config affecting assertions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
stores/utxo/sql/unlock_batcher_postgres_test.go	Pins pruning mode in test settings to prevent local config from flipping pruning behavior.
stores/utxo/sql/sql_test.go	Same pruning-mode pin for shared SQL store test setup.
services/blockassembly/subtreeprocessor/txmap_contention_benchmark_test.go	Adds contention-reproduction + comparison benchmarks for Phase A inserts and Phase B `Exists()`.
services/blockassembly/subtreeprocessor/SubtreeProcessor.go	Wires in `bucketInserter`, joins inserter workers reliably, and freezes the map post-build.
services/blockassembly/subtreeprocessor/map.go	Switches buckets to slice-indexed storage and adds `Freeze()`/frozen fast path for `Exists()` plus write rejection.
services/blockassembly/subtreeprocessor/map_freeze_test.go	Adds tests for freeze correctness and lock-free concurrent read behavior.
services/blockassembly/subtreeprocessor/bucket_inserter.go	Implements the single-writer-per-bucket insert pipeline with batched submit semantics.
services/blockassembly/subtreeprocessor/bucket_inserter_test.go	Adds concurrency and misuse tests for the inserter.
services/blockassembly/subtreeprocessor/benchmark.go	Updates remainder benchmark to `Freeze()` the map like production.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ordishs

Reviewed the single-writer-per-bucket inserter, the Freeze/Clear lifecycle, and the slice-indexed buckets.

Verified independently:

No production writes to the returned transactionMap after Freeze() — only Exists and Iter (reorg loop), so the freeze contract holds.
The happens-before chain is sound: g.Wait() → closeAndWait() (channel close + worker join) → Freeze(), so lock-free readers see fully published bucket contents.
closeAndWait() runs even when g.Wait() errors, and workers drain after an insert error, so no goroutine leak or submitter deadlock — both covered by tests.
Ran go vet and the new concurrency tests under -race locally: pass.

Minor, non-blocking: submit()'s closed.Load() check only catches sequential misuse — a truly concurrent closeAndWait() would still panic on send-to-closed-channel. The doc comment states the contract correctly; a one-liner noting the guard's limits would help future readers. Also worth remembering that frozen reads make any future reader/Clear() overlap a data race rather than just a logic bug — the serialized block-movement loop is the invariant protecting this.

Benchmark methodology is solid (old and new shapes side by side, shared cases, -short gating) and the post-deploy profile re-capture plan is the right way to confirm the 192-core win.

…oroutine asserts Size the bucketInserter worker pool with runtime.GOMAXPROCS(0) instead of runtime.NumCPU(). The workers are CPU-bound, so the pool should track runnable parallelism; under a cgroup CPU quota (the normal container/k8s deployment) NumCPU() reports host cores and oversubscribes. Matches the existing GOMAXPROCS sizing in map.go and the errgroup limit in moveBlock. Also replace goroutine-local require.NoError with t.Errorf in the inserter and freeze tests: require's FailNow is only safe on the test goroutine.

Clear()'s parallel per-bucket clear sized its worker pool with runtime.NumCPU(), oversubscribing under a cgroup CPU quota. Use runtime.GOMAXPROCS(0) to match ParallelBulkSetIfNotExists and the bucketInserter sizing — the clear is bandwidth-bound and should track runnable parallelism, not host cores.

freemans13

Approve. Correct, well-tested, and unusually well-documented concurrency work — the single-writer-per-bucket insert pipeline plus the freeze-based lock-free read path is the right shape for the profiled 192-core contention collapse, and the happens-before reasoning (g.Wait → closeAndWait join → Freeze) is sound. Build passes; freeze lifecycle verified against all live callers and pool-reuse paths.

One non-blocking inline note on hardening the freeze invariant. Recommend landing with the production profile re-capture tracked as follow-up, and optionally splitting out the unrelated pruner-test pin.

sonarqubecloud · 2026-06-12T11:06:18Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
95.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…hase Bump go-tx-map to v1.4.0, which adds the Freeze/Clear lifecycle to the map interfaces, and use it in Block.Valid. Block.Valid is two-phase over b.txMap: checkDuplicateTransactions fills it (one Put per tx), then validOrderAndBlessed only reads it — one Get per tx plus one per parent, fanned out across every core. Each Get takes a per-bucket RWMutex.RLock, whose reader-counter atomic add/sub cache-line ping-pongs across cores and dominates the read phase on many-core validation nodes (the same contention bsv-blockchain#1078 removed in block assembly's moveForwardBlock). Block.Valid now calls b.txMap.Freeze() after the duplicate-check phase has completed and flushed, before validOrderAndBlessed. On the pooled in-memory SplitSwissMapUint64 this makes every read skip the RLock; releaseTxMap's PutTxMap -> Clear path un-freezes the map on its way back to the pool. checkDuplicateTransactions' internal errgroup.Wait is the happens-before edge guaranteeing all writes precede the Freeze. DiskTxMapUint64 (the disk-backed alternative, used when DiskMapDirs is set) implements the two new TxMap methods: Freeze is a write guard (Put returns txmap.ErrMapFrozen); Clear only lifts the freeze, since the disk map is single-use (released via Close, not pooled) and content-clearing is neither needed nor cheaply implementable over its multi-disk Badger + cuckoo layout. Also fixes a pre-existing data race (present on main, independent of this change but in the same file and surfaced by go test -race ./model): Stats reads dtmDiskShard.bytesWritten from the deferred releaseTxMap on the dup-detected early-return path while a writer goroutine is still incrementing it. Made bytesWritten an atomic.Int64.

…hase Bump go-tx-map to v1.4.0, which adds the Freeze/Clear lifecycle to the map interfaces, and use it in Block.Valid. Block.Valid is two-phase over b.txMap: checkDuplicateTransactions fills it (one Put per tx), then validOrderAndBlessed only reads it — one Get per tx plus one per parent, fanned out across every core. Each Get takes a per-bucket RWMutex.RLock, whose reader-counter atomic add/sub cache-line ping-pongs across cores and dominates the read phase on many-core validation nodes (the same contention bsv-blockchain#1078 removed in block assembly's moveForwardBlock). Block.Valid now calls b.txMap.Freeze() after the duplicate-check phase has completed and flushed, before validOrderAndBlessed. On the pooled in-memory SplitSwissMapUint64 this makes every read skip the RLock; releaseTxMap's PutTxMap -> Clear path un-freezes the map on its way back to the pool. checkDuplicateTransactions' internal errgroup.Wait is the happens-before edge guaranteeing all writes precede the Freeze. DiskTxMapUint64 (the mmap-backed alternative from bsv-blockchain#1053, used when DiskMapDirs is set) implements the two new TxMap methods: Freeze is a write guard (Put returns txmap.ErrMapFrozen); Clear only lifts the freeze, since the disk map is ephemeral and single-use (released via Close, not pooled). Its mmap reads are not made lock-free — the in-memory split map is the contended read path Freeze targets.

icellan added 2 commits June 12, 2026 11:50

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/bucket_inserter.go

icellan requested review from Copilot, freemans13 and ordishs June 12, 2026 10:10

Copilot started reviewing on behalf of icellan June 12, 2026 10:11 View session

icellan mentioned this pull request Jun 12, 2026

bench(subtreeprocessor): foreign-block move benchmark + inverted-diff gate verdict #1079

Closed

Copilot AI reviewed Jun 12, 2026

View reviewed changes

ordishs approved these changes Jun 12, 2026

View reviewed changes

github-actions Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/map.go Outdated

freemans13 reviewed Jun 12, 2026

View reviewed changes

Comment thread services/blockassembly/subtreeprocessor/map.go

freemans13 approved these changes Jun 12, 2026

View reviewed changes

icellan merged commit 840a8a1 into bsv-blockchain:main Jun 12, 2026
34 checks passed

icellan mentioned this pull request Jun 12, 2026

perf(blockvalidation): freeze txMap before the read-only validation phase #1082

Merged

icellan self-assigned this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(subtreeprocessor): eliminate SplitSwissMap contention in moveForwardBlock#1078

perf(subtreeprocessor): eliminate SplitSwissMap contention in moveForwardBlock#1078
icellan merged 4 commits into
bsv-blockchain:mainfrom
icellan:perf/splitswissmap-contention

icellan commented Jun 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ordishs left a comment

Uh oh!

Uh oh!

Uh oh!

freemans13 left a comment

Uh oh!

sonarqubecloud Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

icellan commented Jun 12, 2026

Problem

Changes

Benchmarks (M3 Max, 16 cores)

Verification

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

freemans13 left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 12, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Jun 12, 2026 •

edited

Loading

github-actions Bot commented Jun 12, 2026 •

edited

Loading