Skip to content

checkDuplicateTransactionsInSubtree O(N²) prefix sum pins all cores on big blocks (regression from #198) #900

@oskarszoon

Description

@oskarszoon

Summary

model.Block.checkDuplicateTransactionsInSubtree walks all previous subtrees on every invocation, calling Subtree.Size() for each — an O(N²) prefix-sum where N is the subtree count. On a large block (450k subtrees) this saturates every core in atomic contention on Subtree.Size's internal RWMutex, with no useful work done.

Regression source

Introduced in commit 58a68d376 (PR #198, "4364: Test with blocks which have early duplicates which are fully sp…", 2025-11-26).

Before:

idx64, err = safeconversion.IntToUint64((subIdx * subtreeSize) + txIdx)

After:

// Calculate the base index for this subtree by summing sizes of all previous subtrees.
// We cannot use (subIdx * subtreeSize) because the last subtree may be smaller.
// Per Block.go:1192: "all subtrees need to be the same size as the first tree, except the last one"
baseIdx := 0
for i := 0; i < subIdx; i++ {
    baseIdx += b.SubtreeSlices[i].Size()
}
...
idx64, err = safeconversion.IntToUint64(baseIdx + txIdx)

The justification comment is incorrect: baseIdx is the sum of sizes of subtrees before subIdx. Even when subIdx is the last (smaller) subtree, the subtrees being summed are all full-size by the same "first tree, except the last one" invariant. So subIdx * subtreeSize is correct for all subIdx.

Cost

  • N = number of subtrees in the block. Total Size() calls across all goroutines = N × (N−1) / 2.
  • Each Size() call (in go-subtree):
// github.com/bsv-blockchain/go-subtree.(*Subtree).Size()
func (st *Subtree) Size() int {
    st.mu.RLock()       // atomic.AddInt32(&readerCount, +1)
    size := cap(st.Nodes)
    st.mu.RUnlock()     // atomic.AddInt32(&readerCount, -1)
    return size
}

Two atomic Int32 ops per call → ~N² atomic-add bus transactions, all on the same cache lines, with checkDuplicateTransactionsConcurrency workers thrashing in parallel.

Production reproduction

bsva-ovh-teranode-ttn-eu-4 today, height 1622352 (txCount = 900,679, 450,340 subtrees, 177 MB).

30 s CPU profile of blockvalidation while processing this block:

99.94%  model.(*Block).checkDuplicateTransactionsInSubtree
98.33%  → go-subtree.(*Subtree).Size
67.29%  → sync.(*RWMutex).RLock
96.56%  → sync/atomic.(*Int32).Add  (inline)

docker stats showed blockvalidation at 1161% CPU (11.6 cores) for ~6+ minutes, all wasted on contention. Effective work done in the dedup step during the sample was zero.

Proposed fix

func (b *Block) checkDuplicateTransactionsInSubtree(subtree *subtreepkg.Subtree, subIdx, subtreeSize int) (err error) {
    var idx64 uint64

    baseIdx := subIdx * subtreeSize  // O(1); valid because all subtrees before subIdx are full-size per Block.go:1192 invariant

    for txIdx := 0; txIdx < len(subtree.Nodes); txIdx++ {
        ...
    }
    return nil
}

Optional defensiveness: compute the prefix sums once in checkDuplicateTransactions (the caller) and pass baseIdx directly to each goroutine, so the function doesn't have to reason about the invariant. Either way the cost drops from O(N²) atomic-ops to O(1) per subtree.

Test plan

  • Add a regression test using a synthetic block with several thousand subtrees and verify dedup wall time is sub-second (current code would take minutes).
  • Re-run the existing dedup tests from PR 4364: Test with blocks which have early duplicates which are fully sp… #198 — confirm they still catch early duplicates / fully spent duplicates correctly. The fix changes the base-index computation, nothing about the dedup logic itself.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions