Skip to content

fix(legacy): prevent sync stalls from backpressure-triggered peer rotation and dropped orphan continuation#1067

Merged
oskarszoon merged 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/legacy-service-error
Jun 11, 2026
Merged

fix(legacy): prevent sync stalls from backpressure-triggered peer rotation and dropped orphan continuation#1067
oskarszoon merged 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/legacy-service-error

Conversation

@oskarszoon

@oskarszoon oskarszoon commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Two fixes for legacy sync stalling mid-catchup (observed on testnet in CATCHINGBLOCKS past the checkpoints).

Fix 3 (never announce block-originated transactions) was split out to #1073 per review, with the provenance flag reworked as an explicit validator option.

Fix 1: stall detector blames the sync peer for local validation backpressure

  1. Block processing is single-threaded with backpressure — OnBlock blocks on <-sp.blockProcessed (peer_server.go:966), so the node stops reading from the sync peer while a block validates.
  2. A 2958-tx block took 48.5s in checkSubtreeFromBlock (full validation path past the highest checkpoint). Measured throughput dropped to zero, network-speed violations accrued each 30s tick, and at 3 violations handleCheckSyncPeer disconnected the healthy sync peer and removed it from peerStates.
  3. The ~92 blocks the rotated peer had already streamed ahead on its DATA1 stream then drained from the block queue, each failing the peerStates lookup (primary peer gone): SERVICE_ERROR (59): [handleBlockMsg] Received block message from unknown peer.
  4. The new sync peer restarted from the last accepted height and re-downloaded everything that was dropped. With every big block this repeats: validate slow → rotate → drop queued work → re-download.

The violation counter also ratchets: it only resets on a good-throughput tick, so violations accumulated across earlier backpressure ticks.

Fix: track the local block backlog on SyncManager (blockBacklog: blockQueue depth plus the block inside handleBlockMsg). While it is non-zero, handleCheckSyncPeer skips both stall checks — zero throughput during local validation measures our own validation speed, not the peer's health. The deferred updateNetwork still runs, so throughput samples stay fresh for the first tick after the backlog drains. A genuinely stalled peer delivers no blocks, so the backlog is empty and the detector behaves exactly as before.

Tradeoffs (raised in review, intentional): the suppression is not bounded by MaxBlockDownloadTime — rotation cannot fix slow local validation, and the suppression only persists while the sync peer keeps delivering blocks, which is the opposite of stalled. The counter is currently peer-agnostic; scoping it to the sync peer's association is a possible follow-up hardening.

Fix 2: orphan blocks dropped during sync break the getblocks continuation

In the legacy sync protocol the peer pushes its tip inv after delivering a getblocks batch and sends nothing further until the next getblocks arrives — the orphan tip block IS the batch-continuation signal (upstream btcd answers any sync-peer orphan with PushGetBlocksMsg). The legacy-sync/catching-blocks branch of handleBlockMsg dropped orphans with a bare return nil:

[HandleBlockDirect][...] failed to get block header for previous block ...: BLOCK_NOT_FOUND
Failed to process new block in legacy mode ...

After that, the pipeline ran dry and sync stalled until the stall detector rotated the sync peer (~90s dead time plus re-download) — the rotation machinery was accidentally the only recovery path.

Fix: remove the special case — answer every ErrBlockNotFound with getblocks from our best block, as the non-legacy branch already did. Safe mid-batch: PushGetBlocksMsg filters duplicate requests, requestedBlocks dedups in-flight getdata, and the peer only invs blocks past the locator fork point, so a redundant request costs one inv message (≤500 hashes).

Testing

  • TestHandleCheckSyncPeer_LocalBacklog: backlog pending → peer kept, no violation accrued; backlog drained → same zero-throughput state still rotates.
  • TestHandleBlockMsg_OrphanDuringCatchup: orphan in CATCHINGBLOCKS → no error, no disconnect, and the getblocks continuation path runs (block locator requested).
  • Both watched fail before their fix. go test -race ./services/legacy/netsync/ clean, go vet clean, golangci-lint 0 issues.

…nding

OnBlock backpressures network reads while a block is queued or
mid-validation (it blocks on blockProcessed), so zero throughput and a
stale last-block-time during local validation measure our own speed,
not the peer's. The detector counted those ticks as network-speed
violations and rotated a healthy sync peer mid-catchup; the rotated
peer's queued DATA1 blocks then failed the peerStates lookup, spamming
SERVICE_ERROR (59) and forcing a re-download from the new peer.

Track the local backlog (blockQueue depth plus the block inside
handleBlockMsg) and skip stall checks while it is non-zero. Throughput
samples keep updating, and a genuinely stalled peer still fails the
checks once the backlog drains.
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:

No issues found. Both fixes are minimal, correct, and well covered by tests.

  • Fix 1 (backlog-aware stall detector): blockBacklog accounting is balanced — Add(1) before queueing in the producer loop, Add(-1) after every handleBlockMsg in the consumer (runs on both success and error paths). The atomic.Int64 is read in the same goroutine that increments it and decremented in the consumer goroutine, which is safe. Skipping the check correctly bypasses validNetworkSpeed (the only caller that increments violations) while the deferred updateNetwork still runs to keep throughput samples fresh.
  • Fix 2 (orphan continuation): Collapsing the (legacySyncMode || catchingBlocks) early-return nil into the unified ErrBlockNotFound getblocks path is a clean change — the getblocks body is byte-for-byte the prior else branch, so non-legacy behavior is unchanged. legacySyncMode/catchingBlocks remain used (manager.go:1407), so no dead variables.
  • Both new tests assert the intended behavior (no rotation/violation while backlog pending vs. rotation once drained; getblocks continuation on orphan), and the inline godoc comments match the implementation.

The peer-agnostic violation counter and the unbounded suppression are acknowledged tradeoffs in the PR description, not regressions.

Note: the two pre-existing inline threads on Validator.go / data.go belong to code split out to #1073 and are no longer part of this diff.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1067 (92c5ca3)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 132
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.768µ 1.780µ ~ 0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.54n 61.51n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.48n 61.56n ~ 0.600
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.38n 61.48n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.90n 30.24n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 50.45n 51.89n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 107.0n 107.4n ~ 1.000
MiningCandidate_Stringify_Short-4 273.8n 273.6n ~ 1.000
MiningCandidate_Stringify_Long-4 1.915µ 1.905µ ~ 0.100
MiningSolution_Stringify-4 991.8n 982.3n ~ 0.100
BlockInfo_MarshalJSON-4 1.795µ 1.804µ ~ 0.100
NewFromBytes-4 141.3n 159.3n ~ 1.000
AddTxBatchColumnar_Validation-4 2.413µ 2.430µ ~ 1.000
OffsetValidationLoop-4 638.1n 636.6n ~ 0.400
Mine_EasyDifficulty-4 60.26µ 60.81µ ~ 0.700
Mine_WithAddress-4 6.799µ 6.878µ ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 59.02n 58.22n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 32.05n 31.97n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 30.63n 30.58n ~ 0.300
DirectSubtreeAdd/1024_per_subtree-4 29.49n 29.31n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 28.97n 28.88n ~ 0.200
SubtreeProcessorAdd/4_per_subtree-4 283.2n 286.2n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 268.4n 292.5n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 266.1n 274.0n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 271.3n 267.8n ~ 0.700
SubtreeProcessorAdd/2048_per_subtree-4 282.0n 271.0n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 284.4n 270.3n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 284.8n 268.3n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 278.5n 269.0n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 273.5n 272.3n ~ 0.400
SubtreeNodeAddOnly/4_per_subtree-4 54.52n 53.63n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.12n 34.17n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.11n 33.13n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 32.50n 32.47n ~ 0.300
SubtreeCreationOnly/4_per_subtree-4 99.52n 120.70n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 484.0n 503.2n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.489µ 1.380µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.409µ 5.545µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 7.403µ 9.311µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 279.8n 277.5n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 284.5n 276.1n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 11.07m 13.10m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 13.89m 15.94m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 16.73m 18.73m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 18.37m 23.25m ~ 0.400
SequentialGetAndSetIfNotExists/1k_nodes-4 10.83m 13.15m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 14.77m 18.12m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 21.75m 18.83m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 33.95m 25.54m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 12.29m 12.50m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 13.63m 14.61m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 18.73m 18.70m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 12.39m 14.07m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 14.63m 14.48m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 51.29m 50.34m ~ 0.200
DiskTxMap_SetIfNotExists-4 3.581µ 3.580µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.234µ 3.202µ ~ 0.400
DiskTxMap_ExistenceOnly-4 313.5n 305.7n ~ 0.300
Queue-4 189.3n 188.2n ~ 0.200
AtomicPointer-4 4.797n 4.943n ~ 0.700
TxMapSetIfNotExists-4 52.20n 53.25n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 40.20n 40.05n ~ 0.700
ChannelSendReceive-4 580.6n 579.0n ~ 0.700
BlockAssembler_AddTx-4 0.02767n 0.02728n ~ 1.000
AddNode-4 11.39 11.07 ~ 0.700
AddNodeWithMap-4 11.59 11.18 ~ 0.100
CalcBlockWork-4 365.0n 368.0n ~ 0.100
CalculateWork-4 501.9n 501.4n ~ 1.000
CheckOldBlockIDs/on-chain-prefetch/1000-4 63.76µ 64.45µ ~ 0.100
CheckOldBlockIDs/off-chain-prefetch/1000-4 55.23µ 59.35µ ~ 0.700
CheckOldBlockIDs/on-chain-prefetch/10000-4 469.8µ 479.3µ ~ 0.700
CheckOldBlockIDs/off-chain-prefetch/10000-4 350.9µ 372.0µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_10-4 1.349µ 1.420µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 12.82µ 13.59µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 126.7µ 134.8µ ~ 0.100
CatchupWithHeaderCache-4 104.5m 105.0m ~ 0.100
_BufferPoolAllocation/16KB-4 3.946µ 3.937µ ~ 1.000
_BufferPoolAllocation/32KB-4 8.295µ 9.758µ ~ 0.700
_BufferPoolAllocation/64KB-4 16.48µ 16.31µ ~ 0.400
_BufferPoolAllocation/128KB-4 27.22µ 34.80µ ~ 0.100
_BufferPoolAllocation/512KB-4 125.3µ 122.2µ ~ 0.700
_BufferPoolConcurrent/32KB-4 18.97µ 20.23µ ~ 0.400
_BufferPoolConcurrent/64KB-4 30.00µ 31.85µ ~ 0.200
_BufferPoolConcurrent/512KB-4 154.0µ 153.1µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 658.7µ 632.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 677.7µ 626.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 668.9µ 620.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 706.0µ 612.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 710.0µ 606.8µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.26m 37.44m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.99m 37.11m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.97m 36.94m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.84m 36.82m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4 37.08m 36.71m ~ 0.200
_PooledVsNonPooled/Pooled-4 742.2n 738.6n ~ 0.200
_PooledVsNonPooled/NonPooled-4 8.387µ 8.374µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.835µ 6.862µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.675µ 9.589µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.024µ 9.284µ ~ 0.100
_prepareTxsPerLevel-4 403.9m 411.5m ~ 0.400
_prepareTxsPerLevelOrdered-4 3.963m 4.156m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 398.8m 413.1m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.981m 3.838m ~ 0.400
SubtreeSizes/10k_tx_4_per_subtree-4 1.254m 1.257m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 304.4µ 301.2µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 70.78µ 72.08µ ~ 0.200
SubtreeSizes/10k_tx_256_per_subtree-4 17.69µ 17.80µ ~ 0.800
SubtreeSizes/10k_tx_512_per_subtree-4 8.786µ 8.838µ ~ 0.200
SubtreeSizes/10k_tx_1024_per_subtree-4 4.352µ 4.358µ ~ 0.700
SubtreeSizes/10k_tx_2k_per_subtree-4 2.157µ 2.180µ ~ 0.700
BlockSizeScaling/10k_tx_64_per_subtree-4 69.80µ 68.90µ ~ 0.200
BlockSizeScaling/10k_tx_256_per_subtree-4 17.45µ 17.28µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.283µ 4.304µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 366.1µ 369.6µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 86.87µ 86.92µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 21.33µ 21.25µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 149.6µ 147.6µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 157.5µ 157.2µ ~ 1.000
SubtreeAllocations/small_subtrees_full_validation-4 307.4µ 309.0µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 8.780µ 8.709µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 9.189µ 9.357µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 17.31µ 17.22µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.075µ 2.050µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.193µ 2.187µ ~ 0.700
SubtreeAllocations/large_subtrees_full_validation-4 4.283µ 4.260µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 328.3µ 319.4µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 318.7µ 322.4µ ~ 0.100
GetUtxoHashes-4 267.8n 269.1n ~ 0.700
GetUtxoHashes_ManyOutputs-4 43.47µ 43.41µ ~ 0.400
_NewMetaDataFromBytes-4 229.8n 227.6n ~ 0.400
_Bytes-4 402.1n 399.3n ~ 0.700
_MetaBytes-4 138.8n 142.1n ~ 0.200

Threshold: >10% with p < 0.05 | Generated: 2026-06-11 09:52 UTC

…sync

In the legacy sync protocol the peer pushes its tip inv after
delivering a getblocks batch and sends nothing further until the next
getblocks arrives — the orphan tip block is the batch-continuation
signal. The legacy-sync/catching-blocks branch dropped orphans without
responding, so the pipeline ran dry at every unexpected tip delivery
and sync stalled until the stall detector rotated the sync peer.

Drop the special case: answer every ErrBlockNotFound with a getblocks
from our best block, as the non-legacy branch (and upstream btcd's
orphan path) already did. PushGetBlocksMsg filters duplicates and the
peer only invs blocks past the locator fork point, so a redundant
request mid-batch costs one inv message.
@oskarszoon oskarszoon changed the title fix(legacy): skip sync-peer stall checks while local block backlog pending fix(legacy): prevent sync stalls from backpressure-triggered peer rotation and dropped orphan continuation Jun 10, 2026
@oskarszoon oskarszoon changed the title fix(legacy): prevent sync stalls from backpressure-triggered peer rotation and dropped orphan continuation fix(legacy): prevent sync stalls and mined-tx announce flood during and after catchup Jun 10, 2026
Comment thread services/validator/Validator.go Outdated
Comment thread stores/utxo/meta/data.go Outdated
@ordishs

ordishs commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

+1 to both of Siggi's comments on the Mined flag — and I think the first one points at a real correctness bug, not just a wording problem.

Mined = SkipPolicyChecks is incorrect, not just misleading

The comment at Validator.go claims "SkipPolicyChecks is only set when validating transactions from an already-mined block (block validation, legacy sync)." That premise doesn't hold. SkipPolicyChecks is a general validation option that also arrives from external paths:

  • services/validator/Server.go:466 — taken straight off the gRPC Validate request: opts.SkipPolicyChecks = *req.SkipPolicyChecks
  • services/validator/Server.go:358 — off the Kafka validation message
  • services/propagation/Server.go:1386 — propagation forwards it from the options it received

So a genuinely fresh tx submitted through Validate/propagation with SkipPolicyChecks=true would be persisted/published as Mined=true and then suppressed from relay — the inverse of the bug this fix targets. Deriving a durable provenance flag from a behavioural skip-flag overloads SkipPolicyChecks with a meaning it doesn't carry.

Suggested direction: thread an explicit "came from a block under validation" signal from the actual block-validation call sites (subtreevalidation, check_block_subtrees, legacy/netsync/handle_block) rather than inferring it. That also resolves Siggi's naming point — the flag's real meaning is provenance, so InBlock (or similar) is more accurate than Mined.

Other issues (fix #1, blockBacklog)

The increment/decrement accounting is correctly paired and the atomic usage is sound, but two things worth a look:

  1. Bypasses the MaxBlockDownloadTime cap. The existing throughput-based suppression in handleCheckSyncPeer is deliberately bounded by peerpkg.MaxBlockDownloadTime so a peer can't hold the single sync slot forever. The new blockBacklog > 0 check early-returns before that cap, so while the backlog is non-zero rotation is suppressed with no wall-clock bound at all. For a legitimately slow local validation that's the intended tradeoff, but it does drop the bounded-suppression guarantee — worth calling out explicitly (or bounding the skip).

  2. Counter is peer-agnostic. blockBacklog increments for blocks from any peer, not just the sync peer. During IBD that's effectively the sync peer, but in principle a second peer feeding blocks could keep the backlog non-zero and shield a stalled sync peer from rotation. Low severity, but combined with (1) it means the suppression window isn't bounded by anything peer-specific.

Process note

These are three independent fixes. Splitting #3 (Mined) into its own PR would let #1 and #2 — which look good — merge without waiting on the provenance-flag rework.

@oskarszoon oskarszoon force-pushed the fix/legacy-service-error branch from fd8272c to 8fcb0a0 Compare June 11, 2026 09:37
@oskarszoon oskarszoon changed the title fix(legacy): prevent sync stalls and mined-tx announce flood during and after catchup fix(legacy): prevent sync stalls from backpressure-triggered peer rotation and dropped orphan continuation Jun 11, 2026
@oskarszoon

Copy link
Copy Markdown
Contributor Author

Split done: fix 3 moved to #1073 with the provenance flag reworked as an explicit WithInBlock option set at the block-context call sites (subtreevalidation, check_block_subtrees, legacy handle_block) and carried as a new proto field — no longer inferred from SkipPolicyChecks. #1073's tests pin that SkipPolicyChecks alone doesn't set it.

On the blockBacklog notes: the missing MaxBlockDownloadTime bound is intentional — suppression only persists while blocks keep arriving and queuing, and a peer that keeps delivering isn't stalled; rotation can't fix slow local validation. Documented in the PR body now. The peer-agnostic counter is a fair point; scoping it to the sync peer's association is a small change and I'd do it as a follow-up rather than grow this PR further.

@oskarszoon oskarszoon requested a review from icellan June 11, 2026 09:39
@sonarqubecloud

Copy link
Copy Markdown

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the trimmed PR (now blockBacklog + orphan getblocks continuation, both in services/legacy/netsync/). The contested Mined/SkipPolicyChecks change was split out, so the earlier blocking objection no longer applies here.

Fix #1 (blockBacklog) — accounting is balanced (single producer/consumer site), atomic usage is correct, and the defer ordering keeps throughput samples fresh. Solid.

Fix #2 (orphan getblocks) — unifying all ErrBlockNotFound orphans onto the getblocks-from-best-block path is correct: in legacy sync the peer's tip inv after a batch is the continuation signal, so swallowing it stalled sync until the detector rotated the peer. Matches the non-legacy branch and upstream btcd. The getblocks-storm concern is bounded by PushGetBlocksMsg's consecutive-duplicate filter (begin hash advances with our best block, so real continuations aren't filtered). TestHandleBlockMsg_OrphanDuringCatchup is a genuine regression guard.

Verified at head 8fcb0a08:

go test -race -run 'TestHandleCheckSyncPeer_LocalBacklog|TestHandleBlockMsg_OrphanDuringCatchup' ./services/legacy/netsync/  → PASS
go test -race ./services/legacy/netsync/  → ok (full package)
go vet ./services/legacy/netsync/          → clean

Non-blocking follow-ups:

  • Stale comment at manager.go:1378 (// Create a block locator starting from the parent hash) — the code locates from the best block.
  • Fix #1's backlog skip early-returns before the MaxBlockDownloadTime cap and the counter is peer-agnostic; acceptable tradeoffs, worth a note in the comment.

@oskarszoon oskarszoon merged commit 24144c4 into bsv-blockchain:main Jun 11, 2026
66 checks passed
oskarszoon added a commit to oskarszoon/teranode that referenced this pull request Jun 11, 2026
…mber resolution flag to proto field 12

Conflicts resolved:
- validator_api.proto: bsv-blockchain#1073 took field 11 for in_block (merged upstream
  first, owns the number); unconfirmed_parents_at_candidate_height moves
  to field 12. Both fields kept.
- validator_api.pb.go: regenerated from the resolved proto.
- validator/Client.go: buildValidateTxRequest carries both InBlock
  (upstream) and UnconfirmedParentsAtCandidateHeight (this branch).

WIRE SKEW WARNING for the existing beta deployment: the beta build sends
the resolution flag as field 11, which post-merge means in_block. A beta
subtreevalidation talking to a post-merge validator would silently set
the wrong option (wedge returns, wrong provenance). Replace beta
deployments wholesale; do not mix beta and post-merge services.
oskarszoon added a commit that referenced this pull request Jun 11, 2026
oskarszoon added a commit to oskarszoon/teranode that referenced this pull request Jun 11, 2026
…-legacysyncing

Eleven upstream commits, several touching legacy code this branch also
changed. Resolutions:

- netsync/manager.go: upstream bsv-blockchain#1067 deleted the catchingBlocks early-return
  in handleBlockMsg (the swallowed orphan tip is the legacy batch-continuation
  signal; swallowing it stalled sync). Took upstream's always-request
  behaviour and updated the surrounding comments to post-LEGACYSYNCING wording.
- subtreevalidation/Server.go: kept upstream bsv-blockchain#1065's richer rationale for
  disabling block-assembly additions, on this branch's CATCHINGBLOCKS-only
  condition.
- Re-applied the LEGACYSYNCING removal to symbols upstream reintroduced:
  validator publishPolicyRejectedTx gate (bsv-blockchain#799) now checks CATCHINGBLOCKS
  only; the new legacy gate-pinning tests (bsv-blockchain#1065) drive CATCHINGBLOCKS
  instead of the removed state; comment references updated in adaptivefetch,
  Validator_test, and the Server.go incident note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants