Skip to content

fix(blockassembly/subtreeprocessor): stop losing the boundary batch in Reset drain#851

Merged
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/blockassembly-reset-drain-fix
May 12, 2026
Merged

fix(blockassembly/subtreeprocessor): stop losing the boundary batch in Reset drain#851
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/blockassembly-reset-drain-fix

Conversation

@liam

@liam liam commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

The Reset drain loop at SubtreeProcessor.go:1355-1361 silently drops one boundary batch per Reset call. Fix: introduce LockFreeQueue.dequeueBatchUntil (peek-before-dequeue) and rewrite the drain loop to use it.

The bug

validUntilMillis := time.Now().UnixMilli()
for {
    batch, found := stp.queue.dequeueBatch(0)
    if !found || batch.time > validUntilMillis {
        // we are done
        break
    }
}

dequeueBatch(0) bypasses the queue filter and removes the head batch unconditionally. Then batch.time is checked. If the batch is too new, the loop breaks - but the batch is already gone from the queue and nothing puts it back. Net effect: one batch worth of fresh txs lost per Reset call.

Reset runs during reorgs, FSM transitions, and recovery operations. Under heavy concurrent enqueue (the documented use case for this codebase), each Reset statistically catches a batch in the window between time.Now() snapshot and the drain reaching the boundary. Those txs never get mined.

Fix

New dequeueBatchUntil on LockFreeQueue with inclusive-until semantics (admit iff batch.time <= maxTimeMillis). Peeks at batch.time before removing the batch, so the boundary batch is never lost.

The Reset drain becomes:

validUntilMillis := stp.clock.Now().UnixMilli()
for {
    if _, found := stp.queue.dequeueBatchUntil(validUntilMillis); !found {
        break
    }
}

Also switches the time anchor from time.Now() to stp.clock.Now() so the Reset path runs through the existing clock seam - consistent with the other validFromMillis call sites and necessary for deterministic future testing of Reset under reorg pressure.

Test changes

  • New Test_dequeueBatchUntilPreservesPostBoundaryBatch - regression guard. Enqueues a pre-snapshot and a post-snapshot batch via the clock seam, drains up to the snapshot, asserts the post-snapshot batch survives in the queue. Also covers the inclusive boundary (batch.time == maxTimeMillis admits) and the empty-queue case.

Test plan

  • go vet ./services/blockassembly/subtreeprocessor/ - clean
  • go test -race -count=1 ./services/blockassembly/subtreeprocessor/ - pass (155s). TestResetMarksAssemblyTxsAsNotOnLongestChainBeforeClearing still passes - Reset behavior is preserved for the documented invariants.
  • go test -race -count=1 ./services/blockassembly/ - pass (101s).

Series context

Fourth in a series from the same investigation:

Independent of #848 - depends only on #841's clock seam (already on main).

…n Reset drain

The Reset drain loop at SubtreeProcessor.go:1355-1361 used a
dequeue-then-time-check pattern that silently drops one boundary
batch per Reset call:

    validUntilMillis := time.Now().UnixMilli()
    for {
        batch, found := stp.queue.dequeueBatch(0)
        if !found || batch.time > validUntilMillis {
            // we are done
            break
        }
    }

dequeueBatch(0) bypasses the queue filter and removes the head batch
unconditionally. Only THEN does the loop check batch.time. If the
batch is too new, the loop breaks - but the batch has already been
removed from the queue and nothing puts it back, so it is dropped on
the floor.

Under heavy concurrent enqueue (Reset happens during reorgs / FSM
transitions / recovery), this drops one batch worth of in-flight txs
each call. The lost txs never get mined.

Fix introduces dequeueBatchUntil on LockFreeQueue with inclusive-until
semantics (admit iff batch.time <= maxTimeMillis). Unlike dequeueBatch
it peeks at batch.time BEFORE removing the batch, so the boundary
batch is never lost. The Reset drain loop becomes:

    validUntilMillis := stp.clock.Now().UnixMilli()
    for {
        if _, found := stp.queue.dequeueBatchUntil(validUntilMillis); !found {
            break
        }
    }

Also switches the time anchor from time.Now() to stp.clock.Now() so
the Reset path runs through the existing clock seam, consistent with
the other validFromMillis call sites.

Test changes:

* New Test_dequeueBatchUntilPreservesPostBoundaryBatch: regression
  guard. Enqueues a pre-snapshot and a post-snapshot batch via the
  clock seam, drains up to the snapshot, asserts the post-snapshot
  batch survives. Also covers the inclusive boundary
  (batch.time == maxTimeMillis admits) and empty-queue case.
@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Summary: No issues found. The PR correctly fixes a boundary-batch loss bug in the Reset drain loop through a well-designed peek-before-dequeue method.

Analysis:

  • Bug fix correctness: The old code dequeued first then checked time, losing the boundary batch. The new dequeueBatchUntil method peeks before removing, solving the issue.
  • Implementation quality: Clean, follows existing patterns, maintains thread-safety constraints, properly updates queue state.
  • Documentation accuracy: Godoc, inline comments, and test descriptions accurately match the implementation. The inclusive-until semantics (batch.time <= maxTimeMillis) are correctly documented and tested.
  • Test coverage: Comprehensive regression guard covers the exact bug scenario, boundary conditions, and empty-queue case.
  • Consistency: Uses the clock seam consistently with PR fix(blockassembly/subtreeprocessor): zero-guard validFromMillis in dequeueDuringBlockMovement #846, enabling deterministic testing without wall-time dependencies.

The change is minimal (31 lines added across queue.go and tests), well-scoped to the bug, and builds correctly on the existing clock abstraction from #841.

@liam liam requested review from icellan and ordishs May 12, 2026 14:16
@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-851 (c7ae8db)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.699µ 1.674µ ~ 0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.45n 61.55n ~ 0.600
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.57n 61.52n ~ 0.300
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.63n 61.69n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.07n 31.55n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 53.70n 54.21n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 108.4n 109.8n ~ 0.100
MiningCandidate_Stringify_Short-4 265.3n 264.6n ~ 1.000
MiningCandidate_Stringify_Long-4 1.888µ 1.883µ ~ 0.400
MiningSolution_Stringify-4 996.7n 981.3n ~ 0.100
BlockInfo_MarshalJSON-4 1.781µ 1.742µ ~ 0.100
NewFromBytes-4 127.1n 145.8n ~ 0.200
Mine_EasyDifficulty-4 67.14µ 67.01µ ~ 0.400
Mine_WithAddress-4 6.999µ 6.943µ ~ 0.200
BlockAssembler_AddTx-4 0.03020n 0.02836n ~ 1.000
AddNode-4 10.65 11.23 ~ 0.700
AddNodeWithMap-4 11.03 11.33 ~ 0.400
DirectSubtreeAdd/4_per_subtree-4 62.06n 63.63n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 29.21n 32.46n ~ 0.100
DirectSubtreeAdd/256_per_subtree-4 28.42n 30.87n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 26.97n 29.79n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 26.73n 29.43n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 296.1n 301.3n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 282.6n 297.9n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 291.1n 309.6n ~ 0.200
SubtreeProcessorAdd/1024_per_subtree-4 283.4n 317.1n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 290.4n 316.8n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 288.9n 291.3n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 292.5n 292.3n ~ 1.000
SubtreeProcessorRotate/256_per_subtree-4 284.1n 305.6n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 283.4n 296.4n ~ 0.700
SubtreeNodeAddOnly/4_per_subtree-4 55.79n 55.82n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 35.04n 35.32n ~ 0.300
SubtreeNodeAddOnly/256_per_subtree-4 33.99n 34.82n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 33.75n 33.85n ~ 0.700
SubtreeCreationOnly/4_per_subtree-4 117.5n 123.8n ~ 0.200
SubtreeCreationOnly/64_per_subtree-4 448.4n 445.9n ~ 1.000
SubtreeCreationOnly/256_per_subtree-4 1.549µ 1.569µ ~ 1.000
SubtreeCreationOnly/1024_per_subtree-4 5.017µ 5.444µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 9.060µ 9.381µ ~ 1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 282.1n 323.1n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 283.6n 321.2n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 627.0µ 622.1µ ~ 0.400
ParallelGetAndSetIfNotExists/10k_nodes-4 1.430m 1.392m ~ 0.400
ParallelGetAndSetIfNotExists/50k_nodes-4 7.172m 6.927m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 14.08m 14.21m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 688.9µ 702.9µ ~ 0.400
SequentialGetAndSetIfNotExists/10k_nodes-4 3.091m 4.206m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 11.32m 13.72m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 23.20m 23.93m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 661.4µ 655.7µ ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.540m 4.610m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 17.83m 17.48m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 721.7µ 703.0µ ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.184m 7.935m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 46.33m 45.76m ~ 1.000
DiskTxMap_SetIfNotExists-4 3.845µ 4.050µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.707µ 3.966µ ~ 0.400
DiskTxMap_ExistenceOnly-4 334.1n 370.7n ~ 0.700
Queue-4 158.8n 155.7n ~ 0.400
AtomicPointer-4 2.515n 2.841n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 692.6µ 712.4µ ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4 672.1µ 659.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 91.29µ 97.30µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 49.92µ 49.81µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/10K-4 55.01µ 53.02µ ~ 1.000
ReorgOptimizations/HashSlicePool/New/10K-4 8.703µ 8.594µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.290µ 3.968µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.753µ 1.353µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 8.310m 10.083m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 8.838m 8.950m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 948.5µ 941.6µ ~ 1.000
ReorgOptimizations/AllMarkFalse/New/100K-4 551.1µ 547.4µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/100K-4 452.7µ 490.3µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 200.6µ 195.1µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 41.96µ 41.35µ ~ 0.200
ReorgOptimizations/NodeFlags/New/100K-4 14.31µ 13.93µ ~ 0.400
TxMapSetIfNotExists-4 35.81n 35.56n ~ 0.400
TxMapSetIfNotExistsDuplicate-4 29.74n 29.90n ~ 0.100
ChannelSendReceive-4 452.5n 425.0n ~ 0.100
CalcBlockWork-4 471.7n 481.2n ~ 0.200
CalculateWork-4 643.7n 630.9n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.328µ 1.305µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_100-4 16.17µ 14.54µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 125.6µ 125.3µ ~ 1.000
CatchupWithHeaderCache-4 104.4m 104.7m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.376m 1.365m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 328.0µ 328.2µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 79.18µ 77.80µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 19.48µ 19.50µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.783µ 9.580µ ~ 0.400
SubtreeSizes/10k_tx_1024_per_subtree-4 4.791µ 4.795µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.406µ 2.386µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 77.03µ 76.05µ ~ 0.700
BlockSizeScaling/10k_tx_256_per_subtree-4 19.26µ 19.16µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.796µ 4.813µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 403.6µ 406.3µ ~ 0.100
BlockSizeScaling/50k_tx_256_per_subtree-4 97.99µ 96.08µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.90µ 23.62µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 157.2µ 159.7µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 173.3µ 171.2µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 331.0µ 332.3µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.515µ 9.438µ ~ 0.400
SubtreeAllocations/medium_subtrees_data_fetch-4 10.20µ 10.25µ ~ 1.000
SubtreeAllocations/medium_subtrees_full_validation-4 19.65µ 19.44µ ~ 0.400
SubtreeAllocations/large_subtrees_exists_check-4 2.256µ 2.279µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.483µ 2.489µ ~ 1.000
SubtreeAllocations/large_subtrees_full_validation-4 4.864µ 4.851µ ~ 1.000
_BufferPoolAllocation/16KB-4 3.432µ 3.400µ ~ 1.000
_BufferPoolAllocation/32KB-4 6.424µ 7.880µ ~ 0.200
_BufferPoolAllocation/64KB-4 15.76µ 16.55µ ~ 0.400
_BufferPoolAllocation/128KB-4 28.74µ 32.01µ ~ 0.100
_BufferPoolAllocation/512KB-4 111.6µ 111.6µ ~ 1.000
_BufferPoolConcurrent/32KB-4 18.41µ 17.70µ ~ 0.700
_BufferPoolConcurrent/64KB-4 28.92µ 27.99µ ~ 0.700
_BufferPoolConcurrent/512KB-4 139.3µ 143.2µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 620.7µ 607.6µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/32KB-4 612.6µ 624.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 608.5µ 630.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 609.0µ 622.3µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 628.6µ 621.6µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.48m 35.89m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.57m 35.87m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.98m 35.32m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.79m 35.46m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.39m 35.01m ~ 0.200
_PooledVsNonPooled/Pooled-4 840.9n 828.9n ~ 0.200
_PooledVsNonPooled/NonPooled-4 6.740µ 6.786µ ~ 1.000
_MemoryFootprint/Current_512KB_32concurrent-4 7.577µ 7.668µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.974µ 9.717µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.724µ 9.362µ ~ 0.100
_prepareTxsPerLevel-4 397.7m 389.6m ~ 1.000
_prepareTxsPerLevelOrdered-4 4.432m 4.071m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 408.4m 390.2m ~ 0.200
_prepareTxsPerLevel_Comparison/Optimized-4 4.545m 4.294m ~ 0.100
StoreBlock_Sequential/BelowCSVHeight-4 318.7µ 330.4µ ~ 0.200
StoreBlock_Sequential/AboveCSVHeight-4 326.5µ 326.3µ ~ 0.700
GetUtxoHashes-4 258.5n 253.8n ~ 0.400
GetUtxoHashes_ManyOutputs-4 44.03µ 43.29µ ~ 0.200
_NewMetaDataFromBytes-4 240.5n 239.3n ~ 0.100
_Bytes-4 629.0n 630.8n ~ 1.000
_MetaBytes-4 569.9n 561.7n ~ 0.300

Threshold: >10% with p < 0.05 | Generated: 2026-05-12 14:23 UTC

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Bug is real — verified that dequeueBatch(0) advances q.head before the time check, so the boundary batch is gone by the time the caller inspects batch.time. The peek-before-dequeue fix is correct, the memory ordering is sound (the new method reads next.time after the same atomic load the existing dequeue relies on), and the new test pins the inclusive-until contract.

A few minor suggestions, none blocking:

  1. Add a Reset-level regression test. The new test guards dequeueBatchUntil's contract at the queue level, but doesn't exercise the actual reset() drain loop. A test that installs a fixedClock on stp/stp.queue, enqueues pre- and post-snapshot batches, drives the Reset path, and asserts the post-snapshot batch survives in stp.queue would guard against someone "simplifying" the drain back to dequeueBatch(0). The queue-level guard is the more fundamental contract, so this is belt-and-braces.

  2. Millisecond-boundary precision in the PR description. With inclusive-until semantics, a batch enqueued in the same millisecond as the snapshot (but after validUntilMillis is captured) will still be drained, since batch.time == validUntilMillis admits. So "batches arriving concurrently after this anchor are intentionally left in the queue" is accurate at the next millisecond, not at the snapshot millisecond itself. 1ms window vs. the original full-drain window — orders of magnitude smaller, not worth changing the code, but worth being precise.

  3. API asymmetry around 0. dequeueBatch(0) means "no filter, drain all"; dequeueBatchUntil(0) means "reject everything with batch.time > 0" (i.e. drain nothing in practice). The docs handle it, but a brief cross-reference between the two methods would help the next caller.

  4. Verification. Recommend running golangci-lint run and staticcheck ./... on services/blockassembly/subtreeprocessor before merge per AGENTS.md — not called out in the test plan.

@ordishs ordishs requested a review from oskarszoon May 12, 2026 14:48
@liam liam merged commit 5ed3da0 into bsv-blockchain:main May 12, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants