Skip to content

perf(blockassembly): pool subtree leaf buffer + DoS-bound the deserializer#959

Merged
icellan merged 3 commits into
bsv-blockchain:mainfrom
icellan:perf/subtree-deserialize-pool
May 28, 2026
Merged

perf(blockassembly): pool subtree leaf buffer + DoS-bound the deserializer#959
icellan merged 3 commits into
bsv-blockchain:mainfrom
icellan:perf/subtree-deserialize-pool

Conversation

@icellan

@icellan icellan commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

DeserializeHashesFromReaderIntoBuckets previously wrapped the subtree reader in bufio.NewReaderSize(reader, numLeaves*48), allocating a fresh ~48 MiB buffer per call. At SubtreeProcessorConcurrentReads=375 that is ~18 GiB of transient allocation per moveForwardBlock on a receiver, producing the GC pressure that dominates phase 1 wallclock variance on receivers (we've observed phase 1 swinging between 5 s and 70 s on a 415 M-tx block at scale).

This PR replaces the per-call bufio with a pooled scratch slice, reads the leaf-data section in a single io.ReadFull, and adds receive-side bounds against the existing subtreevalidation_max_incoming_subtree_bytes DoS cap (independent of local MaximumMerkleItemsPerSubtree, per the existing setting's documented contract).

What changed

  • Pool (sync.Pool of []byte) sized at first use, grows to a high-water mark. Steady-state memory bounded by concurrency × max-observed-subtree-leaf-bytes. New is intentionally nil so we never waste memory on subtrees smaller than the cap.
  • Single-shot read of the entire leaf-data section into the pooled buffer; per-leaf parse runs against memory. Removes the per-leaf io.ReadFull overhead and the 1 M-call inner loop's bufio dispatch cost.
  • Latent bug fixed: the original first read of the 48-byte header used reader.Read (not io.ReadFull); replaced by a single io.ReadFull of header+leaf-count (56 bytes) in one shot.
  • DoS bounds on numLeaves and numConflicting, overflow-safe via division (numLeaves > maxLeafDataBytes / subtreeLeafBytes instead of numLeaves * subtreeLeafBytes > maxLeafDataBytes). A peer can no longer claim 2^63 leaves in the header.
  • Variable peer subtree sizes are handled natively: each call reslices the pooled buffer to the actual numLeaves*48. Small subtrees use a small portion of the buffer; larger ones fit as long as they're under the cap.

Signature changed from

func DeserializeHashesFromReaderIntoBuckets(reader io.Reader, nBuckets uint16, hashes *map[uint16][]chainhash.Hash, conflictingNodes *[]chainhash.Hash) error

to

func DeserializeHashesFromReaderIntoBuckets(reader io.Reader, nBuckets uint16, maxLeafDataBytes int64, hashes *map[uint16][]chainhash.Hash, conflictingNodes *[]chainhash.Hash) error

There are two callers in the repo (CreateTransactionMap in production, and one long-running test); both updated to pass settings.SubtreeValidation.MaxIncomingSubtreeBytes (default 128 MiB).

Tests

New file SubtreeProcessor_deserialize_test.go with seven focused unit tests:

  • _HappyPath — round-trip a small leaf set + conflicting-node trailer
  • _EmptySubtree — zero leaves, zero conflicting; no allocation
  • _NumLeavesExceedsCap — forged 1 B-leaf header rejected before allocating
  • _NumLeavesOverflowSafenumLeaves = 2^64-1 rejected cleanly, no panic
  • _NumConflictingExceedsCap — forged trailer count rejected
  • _TruncatedStream — short stream surfaces io.ErrUnexpectedEOF, no panic
  • _PoolReuse — two back-to-back calls of different sizes, no cross-contamination

Test plan

  • go build ./...
  • go vet ./services/blockassembly/subtreeprocessor/... ./test/longtest/services/blockassembly/subtreeprocessor/...
  • go test -race -count=1 -timeout 300s ./services/blockassembly/subtreeprocessor/ — ok (195s)
  • go test -race -run TestDeserializeHashesFromReaderIntoBuckets ./services/blockassembly/subtreeprocessor/ — 7/7 PASS
  • go test -race -run TestSubtreeProcessor_CreateTransactionMap ./services/blockassembly/subtreeprocessor/ — PASS
  • go test -race -run Test_DeserializeHashesFromReaderIntoBuckets ./test/longtest/services/blockassembly/subtreeprocessor/ — PASS
  • Cluster verification: deploy to one receiver in scaling-2/scaling-3, watch block-assembly working-set memory across phase 1 (should be flat between blocks, no per-block spike of concurrency × 48 MiB).

Risks / notes

  • Behaviour change on malformed peer subtrees: previously could allocate up to numLeaves*48 bytes from the header before failing on read. Now bounded by maxLeafDataBytes upfront. Desired hardening; no benign code path produces a header above the cap.
  • Pool worst-case memory: at SubtreeProcessorConcurrentReads=375 and 48 MiB subtrees, pool peak is ~18 GiB (vs. 18 GiB transient per block before this change — net memory the same, but no GC churn). If subtreeProcessorConcurrentReads is reduced (recommended separately, for Ceph queue-depth reasons), the pool peak drops proportionally — e.g. 64 → ~3 GiB pool peak.
  • staticcheck SA6002 (slice header into sync.Pool) is suppressed with a rationale comment: using *[]byte would cost a pointer indirection in the hot path; the 24-byte slice-header copy is dominated by the multi-MiB backing array being reused.
  • Doesn't change the on-disk file format, doesn't change the wire protocol, doesn't change chainedSubtrees matching logic. Reverts cleanly.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:
No critical issues found. The PR successfully addresses the GC pressure from per-call bufio allocations by introducing a pooled buffer approach with comprehensive DoS bounds.

Previously Resolved:

Minor observations (non-blocking):

  • conflictingNodes slice handling assumes callers pass empty slices (documented at SubtreeProcessor.go:5586-5591). Current callers comply, but the append pattern at line 5712 could accumulate stale entries if a non-empty slice were passed with sufficient capacity. The godoc correctly notes this precondition.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-959 (ce8e46f)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.739µ 1.769µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 59.30n 59.52n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 59.83n 59.48n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 59.44n 59.84n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.31n 36.56n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 62.96n 64.44n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 160.5n 174.5n ~ 0.100
MiningCandidate_Stringify_Short-4 253.6n 255.1n ~ 0.400
MiningCandidate_Stringify_Long-4 1.782µ 1.765µ ~ 0.400
MiningSolution_Stringify-4 914.0n 921.7n ~ 0.400
BlockInfo_MarshalJSON-4 1.764µ 1.765µ ~ 0.600
NewFromBytes-4 131.8n 131.9n ~ 1.000
AddTxBatchColumnar_Validation-4 2.556µ 2.512µ ~ 0.100
OffsetValidationLoop-4 645.7n 657.3n ~ 1.000
Mine_EasyDifficulty-4 66.86µ 67.04µ ~ 0.700
Mine_WithAddress-4 7.014µ 7.753µ ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 57.64n 57.87n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.69n 29.49n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 30.67n 28.91n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.42n 27.47n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.91n 27.17n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 285.6n 284.2n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 281.6n 278.6n ~ 0.500
SubtreeProcessorAdd/256_per_subtree-4 280.5n 281.3n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 268.8n 272.4n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 271.8n 272.4n ~ 1.000
SubtreeProcessorRotate/4_per_subtree-4 272.3n 275.1n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 293.0n 274.3n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 270.6n 276.2n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 268.4n 272.9n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 53.95n 54.55n ~ 0.200
SubtreeNodeAddOnly/64_per_subtree-4 34.06n 34.40n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.27n 33.28n ~ 0.500
SubtreeNodeAddOnly/1024_per_subtree-4 32.57n 32.53n ~ 1.000
SubtreeCreationOnly/4_per_subtree-4 113.6n 115.2n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 395.8n 408.5n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.343µ 1.379µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.393µ 4.405µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 7.967µ 8.361µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 270.1n 273.1n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 269.8n 269.3n ~ 1.000
ParallelGetAndSetIfNotExists/1k_nodes-4 2.167m 2.196m ~ 0.400
ParallelGetAndSetIfNotExists/10k_nodes-4 5.370m 5.269m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.105m 7.047m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 9.609m 9.566m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.940m 1.943m ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 4.306m 4.283m ~ 0.700
SequentialGetAndSetIfNotExists/50k_nodes-4 12.13m 12.04m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 21.78m 21.70m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.234m 2.240m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.249m 8.125m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.87m 12.68m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.967m 1.969m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.367m 7.346m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.75m 38.96m ~ 0.100
BlockAssembler_AddTx-4 0.02614n 0.02804n ~ 0.700
AddNode-4 11.60 10.56 ~ 0.100
AddNodeWithMap-4 11.94 11.00 ~ 0.100
DiskTxMap_SetIfNotExists-4 3.805µ 3.970µ ~ 0.400
DiskTxMap_SetIfNotExists_Parallel-4 3.641µ 3.701µ ~ 0.100
DiskTxMap_ExistenceOnly-4 340.2n 443.9n ~ 0.100
Queue-4 186.1n 189.5n ~ 0.500
AtomicPointer-4 3.327n 3.655n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 874.2µ 882.1µ ~ 1.000
ReorgOptimizations/DedupFilterPipeline/New/10K-4 796.1µ 769.1µ ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/10K-4 111.6µ 108.0µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.20µ 63.85µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/10K-4 66.59µ 59.74µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.24µ 11.04µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/10K-4 4.369µ 4.798µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.516µ 1.708µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.809m 10.298m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.235m 10.393m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.173m 1.215m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 707.5µ 705.5µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 530.9µ 631.0µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 213.3µ 212.3µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 44.16µ 48.78µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 15.00µ 17.08µ ~ 0.100
TxMapSetIfNotExists-4 60.80n 49.38n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 41.41n 41.31n ~ 0.700
ChannelSendReceive-4 631.2n 588.9n ~ 0.100
CalcBlockWork-4 504.9n 500.8n ~ 0.200
CalculateWork-4 684.6n 687.6n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.449µ 1.672µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_100-4 12.81µ 12.86µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 127.5µ 127.7µ ~ 1.000
CatchupWithHeaderCache-4 104.1m 104.2m ~ 1.000
_BufferPoolAllocation/16KB-4 3.979µ 3.981µ ~ 1.000
_BufferPoolAllocation/32KB-4 8.871µ 10.231µ ~ 0.400
_BufferPoolAllocation/64KB-4 20.40µ 16.56µ ~ 0.100
_BufferPoolAllocation/128KB-4 35.69µ 27.01µ ~ 0.100
_BufferPoolAllocation/512KB-4 114.5µ 113.0µ ~ 0.700
_BufferPoolConcurrent/32KB-4 20.34µ 18.92µ ~ 0.400
_BufferPoolConcurrent/64KB-4 30.96µ 30.96µ ~ 1.000
_BufferPoolConcurrent/512KB-4 149.7µ 153.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 658.3µ 645.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 723.8µ 650.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 722.9µ 654.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 735.4µ 657.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 654.7µ 640.7µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.09m 36.86m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.05m 36.91m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.67m 37.27m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.74m 37.15m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.31m 36.54m ~ 0.200
_PooledVsNonPooled/Pooled-4 741.8n 741.7n ~ 1.000
_PooledVsNonPooled/NonPooled-4 8.829µ 8.133µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.594µ 6.764µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.28µ 11.43µ ~ 0.700
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.68µ 11.31µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.328m 1.357m ~ 0.100
SubtreeSizes/10k_tx_16_per_subtree-4 320.4µ 332.9µ ~ 0.400
SubtreeSizes/10k_tx_64_per_subtree-4 75.90µ 78.97µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.97µ 19.78µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.313µ 9.731µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.725µ 4.802µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.343µ 2.389µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 75.93µ 76.91µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 19.17µ 19.53µ ~ 0.200
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.755µ 4.785µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 393.2µ 397.8µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 95.58µ 96.11µ ~ 0.400
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.39µ 23.87µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 157.9µ 162.6µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 167.0µ 169.7µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 330.1µ 332.3µ ~ 0.200
SubtreeAllocations/medium_subtrees_exists_check-4 9.423µ 9.470µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 10.030µ 9.918µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.41µ 19.58µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.298µ 2.247µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.454µ 2.442µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.823µ 4.844µ ~ 1.000
_prepareTxsPerLevel-4 408.3m 418.7m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.880m 3.680m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 401.8m 405.9m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 4.428m 4.264m ~ 0.700
StoreBlock_Sequential/BelowCSVHeight-4 250.4µ 250.0µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 252.2µ 250.0µ ~ 0.400
GetUtxoHashes-4 273.2n 267.7n ~ 0.400
GetUtxoHashes_ManyOutputs-4 45.37µ 47.87µ ~ 0.100
_NewMetaDataFromBytes-4 213.8n 213.6n ~ 0.400
_Bytes-4 396.4n 397.1n ~ 0.400
_MetaBytes-4 140.0n 137.9n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-28 09:50 UTC

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes subtree deserialization in block assembly by replacing per-call buffered allocations with pooled scratch buffers and adding bounds checks around serialized subtree counts.

Changes:

  • Adds a pooled leaf-data buffer and single-shot leaf section read in DeserializeHashesFromReaderIntoBuckets.
  • Wires SubtreeValidation.MaxIncomingSubtreeBytes into the production and long-test callers.
  • Adds focused unit tests for happy path, empty input, malformed counts, truncation, and pool reuse.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
services/blockassembly/subtreeprocessor/SubtreeProcessor.go Updates subtree deserialization performance path and adds count bounds.
services/blockassembly/subtreeprocessor/SubtreeProcessor_deserialize_test.go Adds unit coverage for the new deserializer behavior.
test/longtest/services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go Updates direct long-test call to pass the subtree byte cap.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
Comment thread services/blockassembly/subtreeprocessor/SubtreeProcessor.go Outdated
…lizer

DeserializeHashesFromReaderIntoBuckets previously wrapped the subtree
reader in bufio.NewReaderSize(reader, numLeaves*48), allocating a fresh
~48 MiB buffer per call. With SubtreeProcessorConcurrentReads=375 that
is 18 GiB of transient allocation per moveForwardBlock on a receiver,
producing the GC pressure that dominated phase 1 wallclock variance.

Changes:
- sync.Pool of leaf-data scratch slices, sized at first use, growing
  to a high-water mark. Steady-state memory is bounded by
  concurrency * max-observed-subtree-leaf-bytes.
- Single io.ReadFull of header+leaf-count (was two reads, the first
  using reader.Read - a latent short-read bug).
- Single-shot io.ReadFull of the entire leaf-data section into the
  pooled buffer; the per-leaf parse runs against memory, no bufio.
- numLeaves and numConflicting are bounds-checked against a new
  maxLeafDataBytes parameter wired from
  settings.SubtreeValidation.MaxIncomingSubtreeBytes (default 128 MiB,
  the existing receive-side DoS cap independent of local
  MaximumMerkleItemsPerSubtree). Overflow-safe via division.

Variable peer subtree sizes are handled natively: each call reslices
the pooled buffer to the actual numLeaves*48, so a small subtree uses
a small portion of the buffer and a larger one fits as long as it is
under the DoS cap.

Tests: SubtreeProcessor_deserialize_test.go adds seven focused unit
tests covering happy path, empty subtree, num-leaves-exceeds-cap,
numLeaves=max uint64 (overflow safety), conflicting-count cap,
truncated stream (returns io.ErrUnexpectedEOF), and pool reuse across
calls of different sizes.

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — the latent reader.Readio.ReadFull fix on the header alone justifies landing this, and the pool + DoS-bound work on top of it is well-executed. Bounds-via-division for overflow safety is the right call, and the unit tests cover exactly the adversarial shapes (NumLeavesOverflowSafe, TruncatedStream asserting io.ErrUnexpectedEOF) that would catch regressions.

Two small follow-ups (non-blocking):

  1. Strengthen TestDeserializeHashesFromReaderIntoBuckets_PoolReuse — currently only asserts require.Len(t, got2, len(second)). A pool corruption bug that preserved length but altered bytes would slip through. In practice impossible because io.ReadFull overwrites the full slice, but flattening the buckets and asserting require.ElementsMatch(t, second, got2) would lock the invariant in.

  2. Defensive handling of non-positive maxLeafDataBytesuint64(int64(-1)) = ^uint64(0), so maxLeaves = ^uint64(0)/48 ≈ 3.8e17, effectively unbounded. Not reachable through current settings defaults, but a misconfigured env var could disable the cap silently. Either an early if maxLeafDataBytes <= 0 { return errors.NewProcessingError("...") } or a doc note that the cap must be positive.

Optional nice-to-have if you're iterating anyway: an in-tree go test -bench comparing old vs new would let CI catch future regressions in the hot path rather than rediscovering them at scale. The cluster verification is the right ground truth, but a micro-benchmark would be a cheap safety net.

Minor semantic note (probably ignore): the conflictingNodes branch now resets length to 0 on a capacity miss (make(..., 0, numConflicting)), whereas the original always appended. Both current callers pass freshly-created empty slices, so no behavior change today — worth a one-line docstring precondition if a future caller might pass a populated slice.

… cap + non-positive guard

Two review findings from the Copilot + ordishs reviews on PR bsv-blockchain#959:

1. Per-section caps allowed up to 2x the intended DoS bound. A peer
   could send maxLeafDataBytes worth of leaves *and* another
   maxLeafDataBytes of conflicting hashes and still pass. The setting
   MaxIncomingSubtreeBytes is documented as a whole-body cap.

2. uint64(int64(-1)) wraps to ~2^64 and silently disables the bound. A
   misconfigured non-positive setting would let the cap fail open.

Changes:
- Renamed maxLeafDataBytes -> maxSubtreeBodyBytes (matches the setting).
- Reject non-positive caps before any read or allocation.
- Bounds now track a remainingBudget that subtracts the fixed header
  overhead and reserves the trailer count field; conflicting trailer
  is bounded against the budget *after* the leaves section.
- Strengthened PoolReuse test with require.ElementsMatch (per ordishs).
- New NonPositiveCapRejected test covering 0 / -1 / large-negative.
- Doc-noted the conflictingNodes slice-reset precondition.
@icellan icellan force-pushed the perf/subtree-deserialize-pool branch from 2a44dd0 to e170d57 Compare May 28, 2026 09:29
@icellan

icellan commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Addressed both reviews in e170d57be:

Copilot #1 / ordishs #2 — non-positive cap silently disables the guard.
Added an early if maxSubtreeBodyBytes <= 0 reject before any read or allocation. Matches the fail-closed behaviour of the existing io.LimitReader-based path elsewhere on the subtree fetch surface. New test _NonPositiveCapRejected covers 0, -1, and a large negative.

Copilot #2 — per-section caps allowed 2× the intended bound.
Reworked the bound to track a remainingBudget against the whole-body cap. The header + leaf-count + reserved trailer-count field are subtracted up front, numLeaves is bounded against that budget, and numConflicting is bounded against the budget that remains after the leaves section. A subtree can no longer pass by claiming cap bytes of leaves plus another cap bytes of conflicting hashes. Renamed the parameter maxLeafDataBytesmaxSubtreeBodyBytes to match the meaning of the setting it's wired from.

ordishs #1PoolReuse test only asserted length.
Switched both back-to-back assertions to require.ElementsMatch, so a pool-corruption bug that preserved length but altered bytes would no longer slip through.

ordishs note — conflictingNodes reallocation precondition.
Added a docstring note that the function may reset the slice to len 0 on a capacity miss. Both current callers pass freshly empty slices, so no behavioural impact; documented to guard future callers.

Skipping the optional in-tree benchmark for now to keep the diff small — happy to follow up in a separate PR if useful.

Verification:

  • go build ./... clean
  • go vet ./services/blockassembly/subtreeprocessor/... clean
  • All 8 unit tests in _deserialize_test.go pass (7 previously + 1 new _NonPositiveCapRejected)
  • TestSubtreeProcessor_CreateTransactionMap still PASS
  • Test_DeserializeHashesFromReaderIntoBuckets in test/longtest still PASS

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@icellan icellan merged commit 845989d into bsv-blockchain:main May 28, 2026
25 checks passed
@icellan icellan self-assigned this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants