Skip to content

fix(blockassembly): split catch-up from reorg in processNewBlockAnnouncement#903

Merged
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/898-blockassembly-catchup-vs-reorg
May 20, 2026
Merged

fix(blockassembly): split catch-up from reorg in processNewBlockAnnouncement#903
oskarszoon merged 3 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/898-blockassembly-catchup-vs-reorg

Conversation

@oskarszoon

@oskarszoon oskarszoon commented May 19, 2026

Copy link
Copy Markdown
Contributor

Closes #898.

Summary

processNewBlockAnnouncement previously routed any gap ≥ 2 between the assembler's current block and the blockchain tip through handleReorg. When moveBack == 0 (pure catch-up after downtime), handleReorg would still evaluate the large-reorg reset condition; with moveForward ≥ CoinbaseMaturity after a long downtime, this incorrectly triggered a full reset of block assembly state.

Fix

Pre-fetch reorg blocks at the top of the gap ≥ 2 case, then branch on moveBack length:

  • moveBack == 0 → new handleCatchUp helper: delegates to subtreeProcessor.Reorg([]*model.Block{}, blocks). reorgBlocks already has a fast path for len(moveBackBlocks) == 0 && len(moveForwardBlocks) > 0 (SubtreeProcessor.go:2794) that iterates moveForwardBlock per block with skipNotification = true and skipDequeue = true for every block except the last, and rolls back the subtree processor to pre-catchup state on any mid-loop failure (lines 2820-2828). On error, processNewBlockAnnouncement returns early so BlockAssembler.setBestBlockHeader is not called either; BA and the subtree processor both stay at the pre-catchup tip, aligned. Note: pass an empty (non-nil) []*model.Block{} for moveBack — reorgBlocks rejects nil at line 2778 but takes the fast path at len == 0.
  • moveBack > 0handleReorg(ctx, header, height) unchanged. The function body is byte-identical to upstream; the only cost on the reorg path is one additional getReorgBlocks call, accepted as the price of zero blast radius on the reorg implementation.

New prometheusBlockAssemblerCatchUp counter mirrors the existing reorg counter for ops visibility.

Test plan

TestProcessNewBlockAnnouncement_CatchupVsReorg with 7 subtests:

  • gap=0 no-op — same hash returns early without state change.
  • gap=1 moving up — existing default branch, single MoveForwardBlock.
  • gap=2 catch-up moveBack=0 — new catch-up path; stubs Reorg, asserts called with len(moveBack) == 0 and the expected moveForward slice.
  • large catch-up moveBack=0 moveForward=150 no reset — pins the regression the fix addresses; 150-block forward gap routes through Reorg fast path, never through handleReorg's reset gate.
  • real reorg moveBack=1 — existing reorg path unchanged.
  • large reorg triggers reset — existing handleReorg reset gate still fires for genuine reorgs at depth ≥ CoinbaseMaturity.
  • catch-up mid-error stops — stubs Reorg to return an error; asserts CurrentBlock() returns the pre-catchup tip (because processNewBlockAnnouncement returns early on error, leaving BA's setBestBlockHeader unreached, and reorgBlocks rolls back the subtree processor's state in the same scenario).

make lint-new clean, go test -race ./services/blockassembly/... passes.

…ncement

Prior to this change, any gap ≥ 2 between the assembler's current block
and the blockchain tip was routed through handleReorg. If moveBack was
zero (pure catch-up after downtime), handleReorg would still evaluate
the large-reorg reset condition. Because moveForward could be ≥
CoinbaseMaturity after a long downtime, this incorrectly triggered a
full reset of block assembly state.

Fix: call getReorgBlocks once at the top of the gap ≥ 2 case, then
branch on moveBack length:

- moveBack == 0 → handleCatchUp: walk MoveForwardBlock one block at a
  time, stop on first error, never touch reset/Reorg paths.
- moveBack > 0  → handleReorg: original signature unchanged; handles the
  existing large-reorg reset and small-reorg Reorg() paths unchanged.

Adds TestProcessNewBlockAnnouncement_CatchupVsReorg with 7 subtests
covering gap=0, gap=1, gap=2 catch-up, 150-block catch-up without
reset, 1-block reorg, large-reorg reset guard, and mid-catch-up error
propagation.
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Summary

This PR correctly addresses issue #898 by splitting catch-up (moveBack=0) from genuine reorg paths. The fix is sound and well-tested.

Core Fix Verified:

  • Line 717: Correctly routes pure catch-ups (len(moveBackBlocksWithMeta)==0) to handleCatchUp
  • Line 726+: Genuine reorgs (moveBack>0) continue through handleReorg
  • Line 1404-1418: handleCatchUp properly delegates to SubtreeProcessor.Reorg fast path with empty moveBack slice

Previous Inline Comments Status:
The existing inline comment threads have been addressed. The implementation correctly:

  1. ✅ Increments prometheusBlockAssemblerCatchUp counter (line 1409)
  2. ℹ️ Does not check ctx.Done() in handleCatchUp - this is intentional and documented in the godoc (line 1402)
  3. ℹ️ Maintains alignment via SubtreeProcessor rollback on error (lines 2820-2829), not per-block advancement

Test Coverage:
Comprehensive 7-subtest suite covers all paths including the critical regression scenario (150-block catch-up not triggering reset).

No issues found. The implementation matches the fix description, the tests are thorough, and the logic correctly prevents the CoinbaseMaturity reset bug during forward-only catch-ups.

Comment thread services/blockassembly/BlockAssembler.go Outdated
Comment thread services/blockassembly/BlockAssembler.go Outdated
Comment thread services/blockassembly/BlockAssembler.go Outdated
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-903 (cf9c877)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.686µ 1.709µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 59.45n 59.37n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 59.46n 59.28n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 59.83n 59.38n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.31n 33.25n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 58.24n 58.19n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 158.4n 157.6n ~ 1.000
MiningCandidate_Stringify_Short-4 249.3n 250.2n ~ 0.700
MiningCandidate_Stringify_Long-4 1.734µ 1.747µ ~ 0.100
MiningSolution_Stringify-4 908.9n 912.2n ~ 0.400
BlockInfo_MarshalJSON-4 1.726µ 1.717µ ~ 0.400
NewFromBytes-4 155.2n 127.7n ~ 0.200
AddTxBatchColumnar_Validation-4 2.471µ 2.496µ ~ 1.000
OffsetValidationLoop-4 643.5n 637.7n ~ 0.400
Mine_EasyDifficulty-4 66.81µ 67.00µ ~ 0.700
Mine_WithAddress-4 7.034µ 7.038µ ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 57.29n 58.01n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 28.28n 28.79n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 27.08n 27.59n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.04n 26.37n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 25.71n 25.93n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 299.7n 293.1n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 298.1n 303.4n ~ 0.500
SubtreeProcessorAdd/256_per_subtree-4 286.2n 287.9n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 285.8n 287.4n ~ 1.000
SubtreeProcessorAdd/2048_per_subtree-4 288.2n 294.5n ~ 1.000
SubtreeProcessorRotate/4_per_subtree-4 301.2n 332.3n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 289.6n 288.2n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 297.8n 288.7n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 294.1n 284.9n ~ 0.400
SubtreeNodeAddOnly/4_per_subtree-4 55.72n 54.79n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.67n 34.29n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.58n 33.45n ~ 0.200
SubtreeNodeAddOnly/1024_per_subtree-4 32.88n 32.67n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 116.7n 117.8n ~ 0.400
SubtreeCreationOnly/64_per_subtree-4 414.2n 407.3n ~ 0.300
SubtreeCreationOnly/256_per_subtree-4 1.411µ 1.388µ ~ 0.400
SubtreeCreationOnly/1024_per_subtree-4 4.478µ 4.395µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.564µ 8.390µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 311.6n 292.1n ~ 0.200
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 311.3n 281.2n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.293m 2.235m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.669m 5.601m ~ 0.400
ParallelGetAndSetIfNotExists/50k_nodes-4 8.193m 8.072m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 11.65m 11.16m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 2.014m 1.970m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 6.167m 5.221m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 20.31m 14.90m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 34.07m 24.97m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.348m 2.288m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.594m 8.534m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 14.68m 14.06m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 2.079m 2.009m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 10.299m 8.943m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 73.91m 55.22m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.031µ 3.714µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.482µ 3.462µ ~ 1.000
DiskTxMap_ExistenceOnly-4 418.1n 473.9n ~ 0.700
Queue-4 200.4n 197.1n ~ 0.200
AtomicPointer-4 4.709n 4.820n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 987.8µ 932.8µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 871.4µ 810.5µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 112.2µ 108.0µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 63.04µ 62.36µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 66.84µ 54.96µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.99µ 10.64µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 5.719µ 4.877µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.899µ 1.659µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 11.40m 10.32m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 11.06m 11.32m ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.152m 1.154m ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 686.4µ 689.9µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 566.2µ 561.8µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/100K-4 320.8µ 304.5µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/100K-4 51.95µ 51.83µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 17.82µ 17.99µ ~ 0.700
TxMapSetIfNotExists-4 52.90n 52.64n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 40.13n 40.02n ~ 0.200
ChannelSendReceive-4 610.3n 641.0n ~ 0.100
BlockAssembler_AddTx-4 0.03255n 0.02557n ~ 0.200
AddNode-4 10.85 11.28 ~ 0.700
AddNodeWithMap-4 11.93 11.94 ~ 1.000
CalcBlockWork-4 491.6n 498.6n ~ 0.100
CalculateWork-4 685.1n 688.8n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.636µ 1.665µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_100-4 13.02µ 12.76µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 125.6µ 128.3µ ~ 0.100
CatchupWithHeaderCache-4 104.3m 104.2m ~ 0.400
_prepareTxsPerLevel-4 414.9m 413.4m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.466m 3.485m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 410.5m 412.7m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 3.470m 3.471m ~ 1.000
_BufferPoolAllocation/16KB-4 3.906µ 4.860µ ~ 0.100
_BufferPoolAllocation/32KB-4 10.472µ 7.624µ ~ 0.100
_BufferPoolAllocation/64KB-4 17.24µ 14.70µ ~ 0.100
_BufferPoolAllocation/128KB-4 35.68µ 27.36µ ~ 0.100
_BufferPoolAllocation/512KB-4 123.4µ 107.4µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.82µ 22.53µ ~ 0.100
_BufferPoolConcurrent/64KB-4 29.45µ 34.91µ ~ 0.400
_BufferPoolConcurrent/512KB-4 145.2µ 159.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 618.7µ 671.7µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 621.5µ 727.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 612.1µ 629.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 614.8µ 599.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 603.5µ 599.5µ ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.26m 36.59m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.24m 36.95m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.34m 36.49m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.32m 36.27m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.60m 36.44m ~ 0.100
_PooledVsNonPooled/Pooled-4 831.8n 842.4n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.675µ 7.891µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 8.303µ 7.695µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.928µ 12.078µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.886µ 10.742µ ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.386m 1.431m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 326.9µ 325.6µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 80.82µ 80.23µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 20.25µ 20.17µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 10.05µ 10.22µ ~ 0.700
SubtreeSizes/10k_tx_1024_per_subtree-4 4.970µ 4.983µ ~ 1.000
SubtreeSizes/10k_tx_2k_per_subtree-4 2.488µ 2.479µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 78.30µ 79.51µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 20.09µ 20.26µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.922µ 5.009µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 393.1µ 401.9µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 99.75µ 99.11µ ~ 0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.61µ 24.58µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 162.6µ 162.8µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 170.2µ 171.2µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 327.0µ 326.2µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.868µ 9.997µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 10.60µ 10.58µ ~ 0.400
SubtreeAllocations/medium_subtrees_full_validation-4 20.28µ 20.36µ ~ 0.400
SubtreeAllocations/large_subtrees_exists_check-4 2.399µ 2.422µ ~ 0.700
SubtreeAllocations/large_subtrees_data_fetch-4 2.666µ 2.627µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 5.118µ 5.119µ ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 335.4µ 335.1µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 338.1µ 337.9µ ~ 1.000
GetUtxoHashes-4 261.0n 261.0n ~ 1.000
GetUtxoHashes_ManyOutputs-4 41.58µ 41.52µ ~ 1.000
_NewMetaDataFromBytes-4 225.5n 225.5n ~ 0.800
_Bytes-4 391.2n 396.0n ~ 0.200
_MetaBytes-4 334.0n 332.2n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-05-20 09:06 UTC

@oskarszoon oskarszoon force-pushed the fix/898-blockassembly-catchup-vs-reorg branch from f5b5637 to 2c523d4 Compare May 19, 2026 18:49
@oskarszoon oskarszoon requested review from icellan and ordishs May 19, 2026 19:06
@oskarszoon oskarszoon force-pushed the fix/898-blockassembly-catchup-vs-reorg branch from 2c523d4 to a4cd5ef Compare May 19, 2026 19:22
@oskarszoon oskarszoon force-pushed the fix/898-blockassembly-catchup-vs-reorg branch from a4cd5ef to 54b58e2 Compare May 19, 2026 19:33
@oskarszoon oskarszoon requested a review from icellan May 20, 2026 08:43
@sonarqubecloud

Copy link
Copy Markdown

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Targets a real, reproduced production bug; the regression test (large catch-up moveBack=0 moveForward=150 no reset) directly pins the bypass; and the implementation reuses the well-understood reorgBlocks fast path rather than introducing a new one.

Verified the key claims:

  • SubtreeProcessor.go:2794 fast path: iterates moveForwardBlock per block with rollback at lines 2820-2828 on mid-loop failure. ✓
  • Non-nil empty slice required: nil is rejected at line 2778, fast path triggers at len == 0 on line 2794. ✓
  • Error path: processNewBlockAnnouncement returns before setBestBlockHeader, so BA stays aligned with the subtree processor's rolled-back state. ✓

Suggestions (non-blocking)

  1. Duplicate getReorgBlocks on the reorg branch. The new code calls getReorgBlocks at the top of the gap-≥-2 case, then handleReorg calls it again at BlockAssembler.go:1303. For an N-block catch-up that turns out to be a reorg, that's ~N extra GetBlock calls. The PR description acknowledges this as "zero blast radius on the reorg implementation," which is defensible — but a cheap follow-up would be to thread the already-fetched slices into handleReorg.

  2. handleCatchUp(_ context.Context, ...). The ctx parameter is unused (Reorg uses its own channel-based lifecycle). Either drop it or thread it through for symmetry with handleReorg. Minor.

  3. Per-block notification semantics during multi-block catch-up. The fast path passes skipNotificationsAndDequeue = true for every block except the last, so downstream consumers only see notifications from the final catch-up block. Matches the existing reorgBlocks fast-path contract and is a clear improvement over the broken Reset path — but it's a real semantic change versus a hypothetical "loop MoveForwardBlock directly" approach. Worth a line in the PR description acknowledging this.

  4. Test resource cleanup. injectMockStp swaps subtreeProcessor after setupBlockAssemblyTest has already started the real one (line 741). The cleanup hook then calls Stop() on the mock, so the real processor's goroutines survive to process exit. Not a correctness issue, but consider stopping the real processor before swapping.

  5. Cosmetic: the moveForward → []*model.Block conversion in handleCatchUp mirrors lines 1329-1341 of handleReorg. Could be a helper. Not blocking.

Follow-up worth doing

Confirm in mainnet legacy sync (the bsva-ovh-teranode-ttn-eu-4 scenario from #898) that the catch-up path actually runs and the 10 GB heap profile is gone. The unit test exercises the routing; production validates the memory shape.

@oskarszoon oskarszoon merged commit 36a7382 into bsv-blockchain:main May 20, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BlockAssembly logs and resets as 'reorging' for forward-only catch-ups (gap of 2 blocks triggers Reset path)

3 participants