Skip to content

fix(model): replace O(N²) prefix sum in checkDuplicateTransactionsInSubtree (regression from #198)#904

Closed
oskarszoon wants to merge 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/900-dedup-quadratic
Closed

fix(model): replace O(N²) prefix sum in checkDuplicateTransactionsInSubtree (regression from #198)#904
oskarszoon wants to merge 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/900-dedup-quadratic

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Closes #900.

Summary

Block.checkDuplicateTransactionsInSubtree walked all preceding subtrees on every invocation, calling Subtree.Size() per subtree — an O(N) prefix sum executed under checkDuplicateTransactionsConcurrency workers in parallel, producing O(N²) atomic RWMutex traffic. On a 450k-subtree block this saturated 11.6 cores in Size() lock churn with effectively zero useful work.

Regression introduced in 58a68d376 (PR #198).

Fix

Replace the O(N) loop with baseIdx := subIdx * subtreeSize. The "all subtrees need to be the same size as the first tree, except the last one" invariant (model/Block.go:1289) means every subtree before subIdx is full-size — the smaller-last-subtree never appears in the sum. Closed-form computation is O(1) per call.

Comment quotes the invariant inline so a future drift in the invariant's location doesn't decouple from the formula.

Test plan

  • TestCheckDuplicateTransactions_LargeBlockNotQuadratic — synthetic 20k-subtree block; verifies wall-time well under a 30s budget. (Local: ~18ms.)
  • TestCheckDuplicateTransactionsInSubtree_BaseIndexCorrectness — 4 subtests covering subIdx=0, smaller-last-subtree, subtreeSize=1 degenerate, cross-subtree duplicate detection.
  • Pre-existing dedup tests from 4364: Test with blocks which have early duplicates which are fully sp… #198 (early-duplicate and fully-spent-duplicate paths) continue to pass.

go test -race ./model/... clean. make lint-new clean.

…ubtree (regression from bsv-blockchain#198)

The loop summing b.SubtreeSlices[i].Size() for i < subIdx was O(N) per
subtree, making the full checkDuplicateTransactions pass O(N²). Each
Size() acquires a read lock, so at N=450k subtrees this saturates all
cores with lock contention.

The loop comment claiming (subIdx * subtreeSize) is invalid was wrong:
baseIdx is the sum of subtrees BEFORE subIdx. Per Block.go:1293, every
subtree except the last must be full-size, so all subtrees before
subIdx are full-size regardless of whether subIdx itself is the smaller
last subtree. The O(1) formula is always correct.

Adds TestCheckDuplicateTransactions_LargeBlockNotQuadratic (N=20k
subtrees, completes in <1s vs tens of seconds with the old code) and
TestCheckDuplicateTransactionsInSubtree_BaseIndexCorrectness covering
subIdx=0, smaller last subtree, subtreeSize=1, and cross-subtree dup
detection.
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Review Summary

This PR correctly fixes a critical O(N²) performance regression in duplicate transaction checking. The change is mathematically sound, well-tested, and follows project conventions.

Findings: No issues found.

Analysis:

The fix replaces an O(N) loop that summed all predecessor subtree sizes with a closed-form O(1) calculation (baseIdx = subIdx * subtreeSize). This is correct because:

  1. The invariant at model/Block.go:1291 guarantees all subtrees except the last have identical size
  2. The smaller-last-subtree can never appear in the sum when calculating baseIdx for any subtree
  3. The PR includes comprehensive test coverage verifying:
    • Performance improvement (20k subtrees in <30s vs. previously saturating 11.6 cores)
    • Correctness across edge cases (subIdx=0, smaller-last-subtree, degenerate subtreeSize=1)
    • Duplicate detection still functions across subtree boundaries

The inline comment correctly quotes the invariant and explains the optimization, satisfying the "Philosophy" requirement from AGENTS.md to make tradeoffs explicit.

@sonarqubecloud

Copy link
Copy Markdown

@oskarszoon oskarszoon requested a review from ordishs May 19, 2026 19:07
@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-904 (9d14d6d)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.559µ 1.554µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.16n 71.64n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.16n 71.24n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.30n 71.25n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.57n 32.75n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 53.75n 53.97n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 133.3n 136.4n ~ 1.000
MiningCandidate_Stringify_Short-4 216.6n 215.2n ~ 0.100
MiningCandidate_Stringify_Long-4 1.617µ 1.602µ ~ 0.100
MiningSolution_Stringify-4 844.7n 845.6n ~ 0.700
BlockInfo_MarshalJSON-4 1.721µ 1.719µ ~ 0.400
NewFromBytes-4 138.9n 124.5n ~ 0.100
AddTxBatchColumnar_Validation-4 2.490µ 2.426µ ~ 0.100
OffsetValidationLoop-4 544.2n 730.9n ~ 0.100
Mine_EasyDifficulty-4 60.38µ 60.92µ ~ 0.400
Mine_WithAddress-4 7.137µ 6.807µ ~ 0.100
BlockAssembler_AddTx-4 0.02846n 0.02841n ~ 0.700
AddNode-4 10.83 11.02 ~ 0.400
AddNodeWithMap-4 11.69 11.93 ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 58.72n 61.39n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 31.69n 28.72n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 30.65n 27.41n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.38n 26.62n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.93n 26.14n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 280.4n 282.8n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 277.8n 276.9n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 283.8n 278.9n ~ 0.100
SubtreeProcessorAdd/1024_per_subtree-4 273.0n 271.6n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 272.3n 270.0n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 276.5n 275.1n ~ 0.200
SubtreeProcessorRotate/64_per_subtree-4 274.0n 273.2n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 275.9n 273.0n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 274.1n 274.2n ~ 0.800
SubtreeNodeAddOnly/4_per_subtree-4 54.67n 54.75n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.55n 34.48n ~ 0.200
SubtreeNodeAddOnly/256_per_subtree-4 33.59n 33.64n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 33.06n 32.94n ~ 1.000
SubtreeCreationOnly/4_per_subtree-4 115.9n 115.0n ~ 0.400
SubtreeCreationOnly/64_per_subtree-4 416.3n 417.8n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.400µ 1.388µ ~ 0.400
SubtreeCreationOnly/1024_per_subtree-4 4.480µ 4.473µ ~ 1.000
SubtreeCreationOnly/2048_per_subtree-4 8.281µ 8.281µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 275.0n 272.0n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 272.8n 270.2n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.191m 2.207m ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 5.354m 5.419m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.105m 7.111m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 9.620m 9.610m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 1.947m 1.962m ~ 0.200
SequentialGetAndSetIfNotExists/10k_nodes-4 4.386m 4.310m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 12.34m 12.00m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 22.66m 21.72m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.248m 2.267m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.199m 8.175m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.79m 12.82m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.988m 1.985m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.516m 7.450m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.69m 39.37m ~ 0.400
DiskTxMap_SetIfNotExists-4 3.637µ 3.560µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.296µ 3.258µ ~ 0.400
DiskTxMap_ExistenceOnly-4 319.1n 313.9n ~ 0.400
Queue-4 187.3n 187.9n ~ 0.100
AtomicPointer-4 4.514n 4.647n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 856.6µ 846.7µ ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4 803.6µ 791.3µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 103.9µ 103.9µ ~ 0.700
ReorgOptimizations/AllMarkFalse/New/10K-4 62.73µ 62.20µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/10K-4 54.34µ 54.57µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.75µ 12.09µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 4.823µ 4.551µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.658µ 1.577µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.626m 9.697m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.891m 9.847m ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.117m 1.076m ~ 0.400
ReorgOptimizations/AllMarkFalse/New/100K-4 686.2µ 694.7µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 544.5µ 549.0µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 342.3µ 342.9µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 47.99µ 50.50µ ~ 0.200
ReorgOptimizations/NodeFlags/New/100K-4 15.64µ 17.54µ ~ 0.100
TxMapSetIfNotExists-4 52.55n 51.95n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 39.82n 40.25n ~ 0.200
ChannelSendReceive-4 614.0n 593.2n ~ 0.100
CalcBlockWork-4 541.4n 495.3n ~ 0.200
CalculateWork-4 689.6n 694.5n ~ 0.200
BuildBlockLocatorString_Helpers/Size_10-4 1.360µ 1.385µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_100-4 14.00µ 13.05µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 127.7µ 155.1µ ~ 0.400
CatchupWithHeaderCache-4 104.4m 104.6m ~ 0.400
_prepareTxsPerLevel-4 421.6m 441.7m ~ 0.200
_prepareTxsPerLevelOrdered-4 3.545m 4.342m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 422.2m 447.6m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.734m 4.340m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.332m 1.363m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 320.8µ 316.0µ ~ 0.100
SubtreeSizes/10k_tx_64_per_subtree-4 76.16µ 76.24µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 18.98µ 19.07µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.357µ 9.359µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.661µ 4.658µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.328µ 2.345µ ~ 0.200
BlockSizeScaling/10k_tx_64_per_subtree-4 74.16µ 73.98µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 18.71µ 18.74µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.628µ 4.688µ ~ 0.300
BlockSizeScaling/50k_tx_64_per_subtree-4 388.7µ 393.1µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 93.27µ 93.21µ ~ 0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.31µ 23.23µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 162.6µ 157.2µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 159.4µ 161.6µ ~ 0.700
SubtreeAllocations/small_subtrees_full_validation-4 321.5µ 329.2µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.667µ 9.456µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.397µ 9.730µ ~ 0.400
SubtreeAllocations/medium_subtrees_full_validation-4 18.82µ 18.66µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.319µ 2.234µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.276µ 2.312µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.715µ 4.752µ ~ 0.200
_BufferPoolAllocation/16KB-4 2.677µ 3.428µ ~ 0.400
_BufferPoolAllocation/32KB-4 6.488µ 5.814µ ~ 0.700
_BufferPoolAllocation/64KB-4 12.19µ 12.46µ ~ 1.000
_BufferPoolAllocation/128KB-4 26.03µ 24.59µ ~ 0.100
_BufferPoolAllocation/512KB-4 90.07µ 85.35µ ~ 0.400
_BufferPoolConcurrent/32KB-4 13.42µ 14.43µ ~ 0.400
_BufferPoolConcurrent/64KB-4 20.84µ 21.66µ ~ 0.200
_BufferPoolConcurrent/512KB-4 107.9µ 104.9µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/16KB-4 476.5µ 482.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 472.7µ 480.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 477.9µ 474.7µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/128KB-4 456.6µ 464.9µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 456.3µ 456.1µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 27.64m 27.28m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 27.46m 27.40m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 27.67m 27.11m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 27.24m 27.06m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 27.54m 27.43m ~ 0.200
_PooledVsNonPooled/Pooled-4 643.3n 643.1n ~ 1.000
_PooledVsNonPooled/NonPooled-4 5.532µ 5.891µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 5.108µ 5.065µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 7.056µ 7.204µ ~ 0.200
_MemoryFootprint/Alternative_64KB_32concurrent-4 7.178µ 7.384µ ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 337.2µ 333.4µ ~ 0.100
StoreBlock_Sequential/AboveCSVHeight-4 339.0µ 337.4µ ~ 1.000
GetUtxoHashes-4 263.3n 277.8n ~ 0.700
GetUtxoHashes_ManyOutputs-4 41.97µ 41.99µ ~ 1.000
_NewMetaDataFromBytes-4 227.3n 243.0n ~ 0.100
_Bytes-4 396.1n 398.9n ~ 0.700
_MetaBytes-4 339.7n 341.0n ~ 0.300

Threshold: >10% with p < 0.05 | Generated: 2026-05-19 19:07 UTC

@oskarszoon

Copy link
Copy Markdown
Contributor Author

Closing as duplicate of #910 — Simon's PR lands the same baseIdx := subIdx * subtreeSize change with a tighter regression-test wall budget (2s vs the 30s here). Substantively equivalent on the code path.

A subset of edge-case subtests from this PR is worth carrying forward — subtreeSize=1 degenerate path and explicit cross-subtree duplicate detection are not directly covered in #910's ManySubtrees regression guard. Either backport on top of #910 or open a small follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

checkDuplicateTransactionsInSubtree O(N²) prefix sum pins all cores on big blocks (regression from #198)

2 participants