fix(subtreevalidation): decouple peer subtree fetch cap from local assembly policy#911
Conversation
…ckchain#900) PR bsv-blockchain#198 replaced the O(1) baseIdx = subIdx*subtreeSize formula with a prefix-sum loop summing Subtree.Size() across all previous subtrees. Subtree.Size() takes the subtree's RWMutex internally, so under the concurrent dedup workers the per-block cost scaled as O(N^2) atomic ops on shared cache lines. On a 450k-subtree block in production this pinned every available core on lock contention with effectively zero useful dedup work being done — blockvalidation sat at 1161% CPU for 6+ minutes. The justification comment was incorrect: validateAndSetTransactionCount enforces "all subtrees must be the same size as the first tree, except the last one". Because the loop only sums subtrees strictly before subIdx, every summed subtree is full-size, so subIdx*subtreeSize gives the exact same baseIdx the loop would produce. Adds a regression test that exercises the dedup path with 2000 subtrees (last one smaller) and asserts both index correctness at scale and a generous wall-time budget any reintroduced O(N^2) would trivially blow past.
The O(1) path completes the checkDuplicateTransactions call in well under 100 ms; 30s left no real fence against a re-introduced O(N^2) regression. 2s sits ~20x above measured headroom, comfortably accommodates slow CI runners, and still trips immediately on the production-style scaling issue from bsv-blockchain#900.
…sembly policy PR bsv-blockchain#772 bounded incoming peer subtree responses by BlockAssembly.MaximumMerkleItemsPerSubtree * chainhash.HashSize. That value controls what *this node* assembles, not what peers send, so any node whose local cap is smaller than the network's actual subtree size rejects every peer response and catchup stalls. Repro: docker quickstart on teratestnet (maximum_merkle_items_per_subtree = 32768 -> 1 MiB cap). Real teratestnet subtrees exceed 1 MiB, so the bounded reader returns ErrExternal for every peer, fetchAndStore aborts, and the node never advances past the height where the bound was first exceeded. Fix: add SubtreeValidation.MaxIncomingSubtreeBytes (default 128 MiB) as a receive-side DoS cap independent of assembly policy, and use it at the three peer-fetch sites (services/blockvalidation/get_blocks.go, services/subtreevalidation/SubtreeValidation.go, services/subtreevalidation/check_block_subtrees.go). 128 MiB gives 4x headroom over the default mainnet subtree size (1M items * 32 bytes = 32 MiB) while preserving the OOM protection PR bsv-blockchain#772 added. Closes bsv-blockchain#905
|
🤖 Claude Code Review Status: Complete Analysis Summary: This PR correctly fixes issue #905 by decoupling the peer subtree fetch cap from local assembly policy. The fix is well-designed and properly tested. Verified: ✅ All three peer-fetch sites updated consistently History:
No issues found. The implementation follows the epistemic honesty and verification principles from AGENTS.md. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-20 15:19 UTC |
| // malicious peer can't OOM us by streaming oversized responses. This must be | ||
| // independent of local BlockAssembly.MaximumMerkleItemsPerSubtree, which only | ||
| // controls what *this node* assembles; peers may legitimately produce larger subtrees. | ||
| maxSubtreeBytes := int64(u.settings.SubtreeValidation.MaxIncomingSubtreeBytes) |
There was a problem hiding this comment.
[Critical] Incomplete fix downstream
While this line correctly uses MaxIncomingSubtreeBytes for the HTTP body read (fixing the immediate DoS issue), the subsequent validation at line 233 still checks leaf count against BlockAssembly.MaximumMerkleItemsPerSubtree:
if err := validateSubtreeLeafCount(subtreeHash, leafCount, u.settings.BlockAssembly.MaximumMerkleItemsPerSubtree)
This means a docker profile node will accept the 2 MiB response here but immediately reject it 15 lines later because 65536 items exceeds the 32768 assembly cap. The fix is incomplete for the CheckBlockSubtrees path.
The other two sites (fetchSubtreeFromPeer and getSubtreeTxHashes) correctly omit the post-read leaf count validation.
…leaf-count gate The peer-fetch fallback at check_block_subtrees.go bounds the HTTP body by MaxIncomingSubtreeBytes but immediately re-validates the derived leaf count against BlockAssembly.MaximumMerkleItemsPerSubtree. On the docker quickstart profile (assembly cap 32k * 32 bytes = 1 MiB) a 2 MiB peer subtree passes the new 128 MiB body cap and then fails the leaf-count gate with "exceeds policy max" — the same stuck-at-height regression bsv-blockchain#905 set out to fix, one statement later. Derive the leaf-count bound from MaxIncomingSubtreeBytes / chainhash.HashSize. The bounded HTTP read already enforces this implicitly; the explicit check remains as a guard before subtreepkg.NewIncompleteTreeByLeafCount allocates against the count. Also: - Change MaxIncomingSubtreeBytes from int to int64; it's a byte count and DoHTTPRequestBounded already takes int64. Drop the redundant int64(...) casts at the three call sites. - Add TestCheckBlockSubtrees_LocalAssemblyPolicyIgnored mirroring the existing regression tests for fetchSubtreeFromPeer and getSubtreeTxHashes.
|



Closes #905.
Problem
PR #772 (
fix(subtreevalidation): bound HTTP body reads when fetching subtree data) introduced a hard regression for any node whose localmaximum_merkle_items_per_subtreeis smaller than the network's real subtree size. Catchup aborts on every peer and the node stops advancing.The bound was derived from local assembly policy:
But
MaximumMerkleItemsPerSubtreecontrols what this node assembles, not what peers produce. On the docker profile (settings.conf:1089):→ cap = 32768 × 32 = 1,048,576 bytes (1 MiB). Real teratestnet subtrees exceed 1 MiB → bounded reader rejects every peer response → catchup aborts every cycle.
Observed on
bsva-ovh-teranode-ttn-eu-3runningv0.15.1-beta-2syncing teratestnet: node tip stuck at height 174, network tip ~19k.Fix
Add
SubtreeValidation.MaxIncomingSubtreeBytes(default 128 MiB) as a receive-side DoS cap that is independent of local assembly policy. Use it at the three peer-fetch sites:services/blockvalidation/get_blocks.go— catchup subtree fetchservices/subtreevalidation/SubtreeValidation.go—getSubtreeTxHashesservices/subtreevalidation/check_block_subtrees.go— peer fallback inCheckBlockSubtrees128 MiB gives 4× headroom over the default mainnet subtree size (1M items × 32 bytes = 32 MiB) while preserving the OOM protection PR #772 added.
Test plan
*_OversizedBodytests to drive the cap viaMaxIncomingSubtreeBytes.TestFetchSubtreeFromPeer_LocalAssemblyPolicyIgnored,TestGetSubtreeTxHashes_LocalAssemblyPolicyIgnored) that simulate the docker quickstart profile (small local assembly cap + generous receive cap) and verify peer responses larger than the local assembly policy are accepted.go build ./...clean.go vet ./settings/... ./services/blockvalidation/... ./services/subtreevalidation/... ./util/...clean.go test -race -tags testtxmetacache ./services/blockvalidation/ ./services/subtreevalidation/ ./settings/ ./util/passes (full package suites).