fix(model): O(1) base index in dedup, drop O(N^2) prefix-sum (#900)#910
Conversation
…ckchain#900) PR bsv-blockchain#198 replaced the O(1) baseIdx = subIdx*subtreeSize formula with a prefix-sum loop summing Subtree.Size() across all previous subtrees. Subtree.Size() takes the subtree's RWMutex internally, so under the concurrent dedup workers the per-block cost scaled as O(N^2) atomic ops on shared cache lines. On a 450k-subtree block in production this pinned every available core on lock contention with effectively zero useful dedup work being done — blockvalidation sat at 1161% CPU for 6+ minutes. The justification comment was incorrect: validateAndSetTransactionCount enforces "all subtrees must be the same size as the first tree, except the last one". Because the loop only sums subtrees strictly before subIdx, every summed subtree is full-size, so subIdx*subtreeSize gives the exact same baseIdx the loop would produce. Adds a regression test that exercises the dedup path with 2000 subtrees (last one smaller) and asserts both index correctness at scale and a generous wall-time budget any reintroduced O(N^2) would trivially blow past.
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. The fix correctly addresses the O(N²) performance regression from PR #198. Analysis:
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-19 23:26 UTC |
The O(1) path completes the checkDuplicateTransactions call in well under 100 ms; 30s left no real fence against a re-introduced O(N^2) regression. 2s sits ~20x above measured headroom, comfortably accommodates slow CI runners, and still trips immediately on the production-style scaling issue from bsv-blockchain#900.
|
oskarszoon
left a comment
There was a problem hiding this comment.
Same root fix as my (now-closed) #904: baseIdx := subIdx * subtreeSize per the validateAndSetTransactionCount invariant. Closed mine in favour of this.
Comment explanation of the RWMutex/atomic contention mechanism reads better than mine. Perf-test budget at 2s is a meaningful improvement — tight enough to catch a regression in CI without the 30s budget I had. Probed-correctness via global indices including the smaller-last edge is clean.
Two edge cases worth carrying as a follow-up (not blocking):
subtreeSize=1degenerate path (verifies the formula still holds when there's no "predecessor full-size" assumption to lean on).- Explicit cross-subtree duplicate detection (a duplicate where one occurrence is at
subIdx=0and another atsubIdx>0— exercises the global-index arithmetic in the dedup logic itself, not just at thebaseIdxboundary).
Either backport onto this branch or open as a separate PR.



Summary
Restores the O(1)
baseIdx = subIdx * subtreeSizeformula inmodel.Block.checkDuplicateTransactionsInSubtree. PR #198 had replaced it with a prefix-sum loop overSubtree.Size(), which scaled as O(N²) under the concurrent dedup workers becauseSubtree.Size()takes the subtree's RWMutex.Fixes #900.
Why it was broken
PR #198's comment claimed the multiplication was wrong because "the last subtree may be smaller". But
validateAndSetTransactionCount(Block.go:1293) enforces "all subtrees need to be the same size as the first tree, except the last one". The prefix-sum loop only sums subtrees strictly beforesubIdx, so every summed subtree is full-size —subIdx * subtreeSizeis exact in every case.Production impact
The cost is O(N²) atomic-add bus transactions on shared cache lines, with
checkDuplicateTransactionsConcurrencyworkers thrashing in parallel. Onbsva-ovh-teranode-ttn-eu-4(height 1622352, 900,679 txs, 450,340 subtrees, 177 MB):docker statsshowed blockvalidation at 1161% CPU (11.6 cores) for 6+ minutes with effectively zero useful dedup work.Changes
model/Block.go: replace the prefix-sum loop withbaseIdx := subIdx * subtreeSize, updating the comment to explain the invariant being relied on and the contention pattern the loop produced.model/Block_test.go: addTestBlock_CheckDuplicateTransactions_ManySubtrees— a regression guard that runs the concurrent dedup path on a 2,000-subtree block (last subtree smaller), asserts global-index correctness via probed nodes, and enforces a generous wall-time budget any reintroduced O(N²) would blow past.Test plan
go test -race ./model/...— full model package green (17s, race detector on)go vet ./...— no new warningsgolangci-lint run ./model/...— clean (pre-existing prealloc warnings only)go build ./...— whole repo buildsTestBlock_CheckDuplicateTransactionsInSubtree,TestCheckDuplicateTransactions,TestBlock_Valid_DupTxDetected_NilSubtreeStore) still catch duplicates correctly — the fix changes only the base-index computation, not the dedup logic itself.Closes #900