Skip to content

catchup broken on docker/test profiles after PR #772: subtree fetch bound by local policy rejects peer responses #905

@oskarszoon

Description

@oskarszoon

Summary

PR #772 (fix(subtreevalidation): bound HTTP body reads when fetching subtree data) introduces a hard regression for any node whose local maximum_merkle_items_per_subtree is smaller than the network's actual subtree size. Catchup aborts on every peer and the node stops advancing.

Observed on bsva-ovh-teranode-ttn-eu-3 running v0.15.1-beta-2 syncing teratestnet:

  • node tip stuck at height 174
  • network tip ~19k
  • p2p tip-block announcements continue; catchup runs and fails immediately

Error

SERVICE_ERROR (59): [catchup:fetchAndStoreSubtreeAndSubtreeData] All peers failed to fetch subtree 03061e277dcb02638e9a7692bb4913dfce5ce3462162c8c369f7d3dc75ea3738
 -> SERVICE_ERROR (59): [catchup:fetchSubtreeFromPeer] failed to fetch subtree from https://testnet.teranode.sv/api/v1/subtree/03061e27...
 -> EXTERNAL (8): http request [...] response body exceeds 1048576 bytes

Root cause

services/blockvalidation/get_blocks.go:663:

maxSubtreeBytes := int64(u.settings.BlockAssembly.MaximumMerkleItemsPerSubtree) * int64(chainhash.HashSize)
subtreeBytes, err := util.DoHTTPRequestBounded(ctx, url, maxSubtreeBytes)

Cap derived from local MaximumMerkleItemsPerSubtree. On the docker profile (settings.conf:1089):

maximum_merkle_items_per_subtree.docker = 32768

→ cap = 32768 × 32 = 1,048,576 bytes (1 MiB). Real teratestnet subtrees exceed 1 MiB → bounded reader rejects → all peers fail → catchup aborts every cycle.

Same pattern in:

  • services/subtreevalidation/SubtreeValidation.go:960
  • services/subtreevalidation/check_block_subtrees.go:218

Why the bound is wrong

Peer's subtree size is set by their producer, not by this node's policy. Local MaximumMerkleItemsPerSubtree controls what we assemble, not what we accept. Bounding incoming subtrees by local assembly policy is a category error.

Suggested fix

Bound by a network/consensus max (largest legitimate subtree size), not by the receiving node's assembly policy. Or remove the bound and rely on the existing 5-minute streaming timeout + connection-level limits.

If the goal is DoS protection, the right knob is a separate max_incoming_subtree_bytes policy with a generous default (e.g. matching mainnet 32 MiB or higher), not a derived value from assembly config.

Unblock right now

Repro

  1. Bring up docker quickstart on teratestnet with default .env
  2. Wait — node hits subtrees > 1 MiB
  3. catchup logs response body exceeds 1048576 bytes, never advances

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions