perf(blockassembly): pool subtree leaf buffer + DoS-bound the deserializer#959
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: Previously Resolved:
Minor observations (non-blocking):
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-28 09:50 UTC |
There was a problem hiding this comment.
Pull request overview
This PR optimizes subtree deserialization in block assembly by replacing per-call buffered allocations with pooled scratch buffers and adding bounds checks around serialized subtree counts.
Changes:
- Adds a pooled leaf-data buffer and single-shot leaf section read in
DeserializeHashesFromReaderIntoBuckets. - Wires
SubtreeValidation.MaxIncomingSubtreeBytesinto the production and long-test callers. - Adds focused unit tests for happy path, empty input, malformed counts, truncation, and pool reuse.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
services/blockassembly/subtreeprocessor/SubtreeProcessor.go |
Updates subtree deserialization performance path and adds count bounds. |
services/blockassembly/subtreeprocessor/SubtreeProcessor_deserialize_test.go |
Adds unit coverage for the new deserializer behavior. |
test/longtest/services/blockassembly/subtreeprocessor/SubtreeProcessor_test.go |
Updates direct long-test call to pass the subtree byte cap. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…lizer DeserializeHashesFromReaderIntoBuckets previously wrapped the subtree reader in bufio.NewReaderSize(reader, numLeaves*48), allocating a fresh ~48 MiB buffer per call. With SubtreeProcessorConcurrentReads=375 that is 18 GiB of transient allocation per moveForwardBlock on a receiver, producing the GC pressure that dominated phase 1 wallclock variance. Changes: - sync.Pool of leaf-data scratch slices, sized at first use, growing to a high-water mark. Steady-state memory is bounded by concurrency * max-observed-subtree-leaf-bytes. - Single io.ReadFull of header+leaf-count (was two reads, the first using reader.Read - a latent short-read bug). - Single-shot io.ReadFull of the entire leaf-data section into the pooled buffer; the per-leaf parse runs against memory, no bufio. - numLeaves and numConflicting are bounds-checked against a new maxLeafDataBytes parameter wired from settings.SubtreeValidation.MaxIncomingSubtreeBytes (default 128 MiB, the existing receive-side DoS cap independent of local MaximumMerkleItemsPerSubtree). Overflow-safe via division. Variable peer subtree sizes are handled natively: each call reslices the pooled buffer to the actual numLeaves*48, so a small subtree uses a small portion of the buffer and a larger one fits as long as it is under the DoS cap. Tests: SubtreeProcessor_deserialize_test.go adds seven focused unit tests covering happy path, empty subtree, num-leaves-exceeds-cap, numLeaves=max uint64 (overflow safety), conflicting-count cap, truncated stream (returns io.ErrUnexpectedEOF), and pool reuse across calls of different sizes.
ordishs
left a comment
There was a problem hiding this comment.
Approving — the latent reader.Read → io.ReadFull fix on the header alone justifies landing this, and the pool + DoS-bound work on top of it is well-executed. Bounds-via-division for overflow safety is the right call, and the unit tests cover exactly the adversarial shapes (NumLeavesOverflowSafe, TruncatedStream asserting io.ErrUnexpectedEOF) that would catch regressions.
Two small follow-ups (non-blocking):
-
Strengthen
TestDeserializeHashesFromReaderIntoBuckets_PoolReuse— currently only assertsrequire.Len(t, got2, len(second)). A pool corruption bug that preserved length but altered bytes would slip through. In practice impossible becauseio.ReadFulloverwrites the full slice, but flattening the buckets and assertingrequire.ElementsMatch(t, second, got2)would lock the invariant in. -
Defensive handling of non-positive
maxLeafDataBytes—uint64(int64(-1))=^uint64(0), somaxLeaves = ^uint64(0)/48 ≈ 3.8e17, effectively unbounded. Not reachable through current settings defaults, but a misconfigured env var could disable the cap silently. Either an earlyif maxLeafDataBytes <= 0 { return errors.NewProcessingError("...") }or a doc note that the cap must be positive.
Optional nice-to-have if you're iterating anyway: an in-tree go test -bench comparing old vs new would let CI catch future regressions in the hot path rather than rediscovering them at scale. The cluster verification is the right ground truth, but a micro-benchmark would be a cheap safety net.
Minor semantic note (probably ignore): the conflictingNodes branch now resets length to 0 on a capacity miss (make(..., 0, numConflicting)), whereas the original always appended. Both current callers pass freshly-created empty slices, so no behavior change today — worth a one-line docstring precondition if a future caller might pass a populated slice.
… cap + non-positive guard Two review findings from the Copilot + ordishs reviews on PR bsv-blockchain#959: 1. Per-section caps allowed up to 2x the intended DoS bound. A peer could send maxLeafDataBytes worth of leaves *and* another maxLeafDataBytes of conflicting hashes and still pass. The setting MaxIncomingSubtreeBytes is documented as a whole-body cap. 2. uint64(int64(-1)) wraps to ~2^64 and silently disables the bound. A misconfigured non-positive setting would let the cap fail open. Changes: - Renamed maxLeafDataBytes -> maxSubtreeBodyBytes (matches the setting). - Reject non-positive caps before any read or allocation. - Bounds now track a remainingBudget that subtracts the fixed header overhead and reserves the trailer count field; conflicting trailer is bounded against the budget *after* the leaves section. - Strengthened PoolReuse test with require.ElementsMatch (per ordishs). - New NonPositiveCapRejected test covering 0 / -1 / large-negative. - Doc-noted the conflictingNodes slice-reset precondition.
2a44dd0 to
e170d57
Compare
|
Addressed both reviews in Copilot #1 / ordishs #2 — non-positive cap silently disables the guard. Copilot #2 — per-section caps allowed 2× the intended bound. ordishs #1 — ordishs note — Skipping the optional in-tree benchmark for now to keep the diff small — happy to follow up in a separate PR if useful. Verification:
|
|




Summary
DeserializeHashesFromReaderIntoBucketspreviously wrapped the subtree reader inbufio.NewReaderSize(reader, numLeaves*48), allocating a fresh ~48 MiB buffer per call. AtSubtreeProcessorConcurrentReads=375that is ~18 GiB of transient allocation permoveForwardBlockon a receiver, producing the GC pressure that dominates phase 1 wallclock variance on receivers (we've observed phase 1 swinging between 5 s and 70 s on a 415 M-tx block at scale).This PR replaces the per-call bufio with a pooled scratch slice, reads the leaf-data section in a single
io.ReadFull, and adds receive-side bounds against the existingsubtreevalidation_max_incoming_subtree_bytesDoS cap (independent of localMaximumMerkleItemsPerSubtree, per the existing setting's documented contract).What changed
sync.Poolof[]byte) sized at first use, grows to a high-water mark. Steady-state memory bounded byconcurrency × max-observed-subtree-leaf-bytes.Newis intentionally nil so we never waste memory on subtrees smaller than the cap.io.ReadFulloverhead and the 1 M-call inner loop's bufio dispatch cost.reader.Read(notio.ReadFull); replaced by a singleio.ReadFullof header+leaf-count (56 bytes) in one shot.numLeavesandnumConflicting, overflow-safe via division (numLeaves > maxLeafDataBytes / subtreeLeafBytesinstead ofnumLeaves * subtreeLeafBytes > maxLeafDataBytes). A peer can no longer claim 2^63 leaves in the header.numLeaves*48. Small subtrees use a small portion of the buffer; larger ones fit as long as they're under the cap.Signature changed from
to
There are two callers in the repo (
CreateTransactionMapin production, and one long-running test); both updated to passsettings.SubtreeValidation.MaxIncomingSubtreeBytes(default 128 MiB).Tests
New file
SubtreeProcessor_deserialize_test.gowith seven focused unit tests:_HappyPath— round-trip a small leaf set + conflicting-node trailer_EmptySubtree— zero leaves, zero conflicting; no allocation_NumLeavesExceedsCap— forged 1 B-leaf header rejected before allocating_NumLeavesOverflowSafe—numLeaves = 2^64-1rejected cleanly, no panic_NumConflictingExceedsCap— forged trailer count rejected_TruncatedStream— short stream surfacesio.ErrUnexpectedEOF, no panic_PoolReuse— two back-to-back calls of different sizes, no cross-contaminationTest plan
go build ./...go vet ./services/blockassembly/subtreeprocessor/... ./test/longtest/services/blockassembly/subtreeprocessor/...go test -race -count=1 -timeout 300s ./services/blockassembly/subtreeprocessor/— ok (195s)go test -race -run TestDeserializeHashesFromReaderIntoBuckets ./services/blockassembly/subtreeprocessor/— 7/7 PASSgo test -race -run TestSubtreeProcessor_CreateTransactionMap ./services/blockassembly/subtreeprocessor/— PASSgo test -race -run Test_DeserializeHashesFromReaderIntoBuckets ./test/longtest/services/blockassembly/subtreeprocessor/— PASSscaling-2/scaling-3, watch block-assembly working-set memory across phase 1 (should be flat between blocks, no per-block spike ofconcurrency × 48 MiB).Risks / notes
numLeaves*48bytes from the header before failing on read. Now bounded bymaxLeafDataBytesupfront. Desired hardening; no benign code path produces a header above the cap.SubtreeProcessorConcurrentReads=375and 48 MiB subtrees, pool peak is ~18 GiB (vs. 18 GiB transient per block before this change — net memory the same, but no GC churn). IfsubtreeProcessorConcurrentReadsis reduced (recommended separately, for Ceph queue-depth reasons), the pool peak drops proportionally — e.g. 64 → ~3 GiB pool peak.staticcheck SA6002(slice header intosync.Pool) is suppressed with a rationale comment: using*[]bytewould cost a pointer indirection in the hot path; the 24-byte slice-header copy is dominated by the multi-MiB backing array being reused.chainedSubtreesmatching logic. Reverts cleanly.