fix(blockassembly/subtreeprocessor): stop losing the boundary batch in Reset drain by liam · Pull Request #851 · bsv-blockchain/teranode

liam · 2026-05-12T14:08:27Z

Summary

The Reset drain loop at SubtreeProcessor.go:1355-1361 silently drops one boundary batch per Reset call. Fix: introduce LockFreeQueue.dequeueBatchUntil (peek-before-dequeue) and rewrite the drain loop to use it.

The bug

validUntilMillis := time.Now().UnixMilli()
for {
    batch, found := stp.queue.dequeueBatch(0)
    if !found || batch.time > validUntilMillis {
        // we are done
        break
    }
}

dequeueBatch(0) bypasses the queue filter and removes the head batch unconditionally. Then batch.time is checked. If the batch is too new, the loop breaks - but the batch is already gone from the queue and nothing puts it back. Net effect: one batch worth of fresh txs lost per Reset call.

Reset runs during reorgs, FSM transitions, and recovery operations. Under heavy concurrent enqueue (the documented use case for this codebase), each Reset statistically catches a batch in the window between time.Now() snapshot and the drain reaching the boundary. Those txs never get mined.

Fix

New dequeueBatchUntil on LockFreeQueue with inclusive-until semantics (admit iff batch.time <= maxTimeMillis). Peeks at batch.time before removing the batch, so the boundary batch is never lost.

The Reset drain becomes:

validUntilMillis := stp.clock.Now().UnixMilli()
for {
    if _, found := stp.queue.dequeueBatchUntil(validUntilMillis); !found {
        break
    }
}

Also switches the time anchor from time.Now() to stp.clock.Now() so the Reset path runs through the existing clock seam - consistent with the other validFromMillis call sites and necessary for deterministic future testing of Reset under reorg pressure.

Test changes

New Test_dequeueBatchUntilPreservesPostBoundaryBatch - regression guard. Enqueues a pre-snapshot and a post-snapshot batch via the clock seam, drains up to the snapshot, asserts the post-snapshot batch survives in the queue. Also covers the inclusive boundary (batch.time == maxTimeMillis admits) and the empty-queue case.

Test plan

go vet ./services/blockassembly/subtreeprocessor/ - clean
go test -race -count=1 ./services/blockassembly/subtreeprocessor/ - pass (155s). TestResetMarksAssemblyTxsAsNotOnLongestChainBeforeClearing still passes - Reset behavior is preserved for the documented invariants.
go test -race -count=1 ./services/blockassembly/ - pass (101s).

Series context

Fourth in a series from the same investigation:

test(blockassembly/subtreeprocessor): add clock seam for deterministic tests #841 (merged) - clock seam: package-level clock interface enabling deterministic time-dependent tests.
fix(blockassembly/subtreeprocessor): zero-guard validFromMillis in dequeueDuringBlockMovement #846 (merged) - zero-guard validFromMillis in dequeueDuringBlockMovement.
test(blockassembly/subtreeprocessor): rapid property tests for validFromMillis admit logic #848 (open) - rapid property tests for the validFromMillis admit logic.
this PR - Reset drain boundary fix.

Independent of #848 - depends only on #841's clock seam (already on main).

…n Reset drain The Reset drain loop at SubtreeProcessor.go:1355-1361 used a dequeue-then-time-check pattern that silently drops one boundary batch per Reset call: validUntilMillis := time.Now().UnixMilli() for { batch, found := stp.queue.dequeueBatch(0) if !found || batch.time > validUntilMillis { // we are done break } } dequeueBatch(0) bypasses the queue filter and removes the head batch unconditionally. Only THEN does the loop check batch.time. If the batch is too new, the loop breaks - but the batch has already been removed from the queue and nothing puts it back, so it is dropped on the floor. Under heavy concurrent enqueue (Reset happens during reorgs / FSM transitions / recovery), this drops one batch worth of in-flight txs each call. The lost txs never get mined. Fix introduces dequeueBatchUntil on LockFreeQueue with inclusive-until semantics (admit iff batch.time <= maxTimeMillis). Unlike dequeueBatch it peeks at batch.time BEFORE removing the batch, so the boundary batch is never lost. The Reset drain loop becomes: validUntilMillis := stp.clock.Now().UnixMilli() for { if _, found := stp.queue.dequeueBatchUntil(validUntilMillis); !found { break } } Also switches the time anchor from time.Now() to stp.clock.Now() so the Reset path runs through the existing clock seam, consistent with the other validFromMillis call sites. Test changes: * New Test_dequeueBatchUntilPreservesPostBoundaryBatch: regression guard. Enqueues a pre-snapshot and a post-snapshot batch via the clock seam, drains up to the snapshot, asserts the post-snapshot batch survives. Also covers the inclusive boundary (batch.time == maxTimeMillis admits) and empty-queue case.

github-actions · 2026-05-12T14:09:35Z

🤖 Claude Code Review

Status: Complete

Summary: No issues found. The PR correctly fixes a boundary-batch loss bug in the Reset drain loop through a well-designed peek-before-dequeue method.

Analysis:

Bug fix correctness: The old code dequeued first then checked time, losing the boundary batch. The new dequeueBatchUntil method peeks before removing, solving the issue.
Implementation quality: Clean, follows existing patterns, maintains thread-safety constraints, properly updates queue state.
Documentation accuracy: Godoc, inline comments, and test descriptions accurately match the implementation. The inclusive-until semantics (batch.time <= maxTimeMillis) are correctly documented and tested.
Test coverage: Comprehensive regression guard covers the exact bug scenario, boundary conditions, and empty-queue case.
Consistency: Uses the clock seam consistently with PR fix(blockassembly/subtreeprocessor): zero-guard validFromMillis in dequeueDuringBlockMovement #846, enabling deterministic testing without wall-time dependencies.

The change is minimal (31 lines added across queue.go and tests), well-scoped to the bug, and builds correctly on the existing clock abstraction from #841.

sonarqubecloud · 2026-05-12T14:20:08Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-05-12T14:23:31Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-851 (c7ae8db)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.699µ	1.674µ	~	0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	61.45n	61.55n	~	0.600
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	61.57n	61.52n	~	0.300
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	61.63n	61.69n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	31.07n	31.55n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	53.70n	54.21n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	108.4n	109.8n	~	0.100
MiningCandidate_Stringify_Short-4	265.3n	264.6n	~	1.000
MiningCandidate_Stringify_Long-4	1.888µ	1.883µ	~	0.400
MiningSolution_Stringify-4	996.7n	981.3n	~	0.100
BlockInfo_MarshalJSON-4	1.781µ	1.742µ	~	0.100
NewFromBytes-4	127.1n	145.8n	~	0.200
Mine_EasyDifficulty-4	67.14µ	67.01µ	~	0.400
Mine_WithAddress-4	6.999µ	6.943µ	~	0.200
BlockAssembler_AddTx-4	0.03020n	0.02836n	~	1.000
AddNode-4	10.65	11.23	~	0.700
AddNodeWithMap-4	11.03	11.33	~	0.400
DirectSubtreeAdd/4_per_subtree-4	62.06n	63.63n	~	0.700
DirectSubtreeAdd/64_per_subtree-4	29.21n	32.46n	~	0.100
DirectSubtreeAdd/256_per_subtree-4	28.42n	30.87n	~	0.100
DirectSubtreeAdd/1024_per_subtree-4	26.97n	29.79n	~	0.100
DirectSubtreeAdd/2048_per_subtree-4	26.73n	29.43n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	296.1n	301.3n	~	0.400
SubtreeProcessorAdd/64_per_subtree-4	282.6n	297.9n	~	0.200
SubtreeProcessorAdd/256_per_subtree-4	291.1n	309.6n	~	0.200
SubtreeProcessorAdd/1024_per_subtree-4	283.4n	317.1n	~	0.200
SubtreeProcessorAdd/2048_per_subtree-4	290.4n	316.8n	~	0.400
SubtreeProcessorRotate/4_per_subtree-4	288.9n	291.3n	~	0.700
SubtreeProcessorRotate/64_per_subtree-4	292.5n	292.3n	~	1.000
SubtreeProcessorRotate/256_per_subtree-4	284.1n	305.6n	~	0.100
SubtreeProcessorRotate/1024_per_subtree-4	283.4n	296.4n	~	0.700
SubtreeNodeAddOnly/4_per_subtree-4	55.79n	55.82n	~	0.700
SubtreeNodeAddOnly/64_per_subtree-4	35.04n	35.32n	~	0.300
SubtreeNodeAddOnly/256_per_subtree-4	33.99n	34.82n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	33.75n	33.85n	~	0.700
SubtreeCreationOnly/4_per_subtree-4	117.5n	123.8n	~	0.200
SubtreeCreationOnly/64_per_subtree-4	448.4n	445.9n	~	1.000
SubtreeCreationOnly/256_per_subtree-4	1.549µ	1.569µ	~	1.000
SubtreeCreationOnly/1024_per_subtree-4	5.017µ	5.444µ	~	0.100
SubtreeCreationOnly/2048_per_subtree-4	9.060µ	9.381µ	~	1.000
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	282.1n	323.1n	~	0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	283.6n	321.2n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	627.0µ	622.1µ	~	0.400
ParallelGetAndSetIfNotExists/10k_nodes-4	1.430m	1.392m	~	0.400
ParallelGetAndSetIfNotExists/50k_nodes-4	7.172m	6.927m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	14.08m	14.21m	~	0.700
SequentialGetAndSetIfNotExists/1k_nodes-4	688.9µ	702.9µ	~	0.400
SequentialGetAndSetIfNotExists/10k_nodes-4	3.091m	4.206m	~	0.100
SequentialGetAndSetIfNotExists/50k_nodes-4	11.32m	13.72m	~	0.200
SequentialGetAndSetIfNotExists/100k_nodes-4	23.20m	23.93m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	661.4µ	655.7µ	~	0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	4.540m	4.610m	~	0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	17.83m	17.48m	~	0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	721.7µ	703.0µ	~	0.200
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	6.184m	7.935m	~	0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	46.33m	45.76m	~	1.000
DiskTxMap_SetIfNotExists-4	3.845µ	4.050µ	~	1.000
DiskTxMap_SetIfNotExists_Parallel-4	3.707µ	3.966µ	~	0.400
DiskTxMap_ExistenceOnly-4	334.1n	370.7n	~	0.700
Queue-4	158.8n	155.7n	~	0.400
AtomicPointer-4	2.515n	2.841n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	692.6µ	712.4µ	~	0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4	672.1µ	659.7µ	~	0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4	91.29µ	97.30µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	49.92µ	49.81µ	~	0.700
ReorgOptimizations/HashSlicePool/Old/10K-4	55.01µ	53.02µ	~	1.000
ReorgOptimizations/HashSlicePool/New/10K-4	8.703µ	8.594µ	~	0.100
ReorgOptimizations/NodeFlags/Old/10K-4	4.290µ	3.968µ	~	0.100
ReorgOptimizations/NodeFlags/New/10K-4	1.753µ	1.353µ	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	8.310m	10.083m	~	0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4	8.838m	8.950m	~	0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4	948.5µ	941.6µ	~	1.000
ReorgOptimizations/AllMarkFalse/New/100K-4	551.1µ	547.4µ	~	0.200
ReorgOptimizations/HashSlicePool/Old/100K-4	452.7µ	490.3µ	~	0.100
ReorgOptimizations/HashSlicePool/New/100K-4	200.6µ	195.1µ	~	0.400
ReorgOptimizations/NodeFlags/Old/100K-4	41.96µ	41.35µ	~	0.200
ReorgOptimizations/NodeFlags/New/100K-4	14.31µ	13.93µ	~	0.400
TxMapSetIfNotExists-4	35.81n	35.56n	~	0.400
TxMapSetIfNotExistsDuplicate-4	29.74n	29.90n	~	0.100
ChannelSendReceive-4	452.5n	425.0n	~	0.100
CalcBlockWork-4	471.7n	481.2n	~	0.200
CalculateWork-4	643.7n	630.9n	~	0.700
BuildBlockLocatorString_Helpers/Size_10-4	1.328µ	1.305µ	~	0.400
BuildBlockLocatorString_Helpers/Size_100-4	16.17µ	14.54µ	~	0.400
BuildBlockLocatorString_Helpers/Size_1000-4	125.6µ	125.3µ	~	1.000
CatchupWithHeaderCache-4	104.4m	104.7m	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.376m	1.365m	~	1.000
SubtreeSizes/10k_tx_16_per_subtree-4	328.0µ	328.2µ	~	1.000
SubtreeSizes/10k_tx_64_per_subtree-4	79.18µ	77.80µ	~	0.100
SubtreeSizes/10k_tx_256_per_subtree-4	19.48µ	19.50µ	~	1.000
SubtreeSizes/10k_tx_512_per_subtree-4	9.783µ	9.580µ	~	0.400
SubtreeSizes/10k_tx_1024_per_subtree-4	4.791µ	4.795µ	~	0.400
SubtreeSizes/10k_tx_2k_per_subtree-4	2.406µ	2.386µ	~	0.100
BlockSizeScaling/10k_tx_64_per_subtree-4	77.03µ	76.05µ	~	0.700
BlockSizeScaling/10k_tx_256_per_subtree-4	19.26µ	19.16µ	~	0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.796µ	4.813µ	~	0.700
BlockSizeScaling/50k_tx_64_per_subtree-4	403.6µ	406.3µ	~	0.100
BlockSizeScaling/50k_tx_256_per_subtree-4	97.99µ	96.08µ	~	0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4	23.90µ	23.62µ	~	1.000
SubtreeAllocations/small_subtrees_exists_check-4	157.2µ	159.7µ	~	0.100
SubtreeAllocations/small_subtrees_data_fetch-4	173.3µ	171.2µ	~	0.400
SubtreeAllocations/small_subtrees_full_validation-4	331.0µ	332.3µ	~	0.400
SubtreeAllocations/medium_subtrees_exists_check-4	9.515µ	9.438µ	~	0.400
SubtreeAllocations/medium_subtrees_data_fetch-4	10.20µ	10.25µ	~	1.000
SubtreeAllocations/medium_subtrees_full_validation-4	19.65µ	19.44µ	~	0.400
SubtreeAllocations/large_subtrees_exists_check-4	2.256µ	2.279µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.483µ	2.489µ	~	1.000
SubtreeAllocations/large_subtrees_full_validation-4	4.864µ	4.851µ	~	1.000
_BufferPoolAllocation/16KB-4	3.432µ	3.400µ	~	1.000
_BufferPoolAllocation/32KB-4	6.424µ	7.880µ	~	0.200
_BufferPoolAllocation/64KB-4	15.76µ	16.55µ	~	0.400
_BufferPoolAllocation/128KB-4	28.74µ	32.01µ	~	0.100
_BufferPoolAllocation/512KB-4	111.6µ	111.6µ	~	1.000
_BufferPoolConcurrent/32KB-4	18.41µ	17.70µ	~	0.700
_BufferPoolConcurrent/64KB-4	28.92µ	27.99µ	~	0.700
_BufferPoolConcurrent/512KB-4	139.3µ	143.2µ	~	0.700
_SubtreeDeserializationWithBufferSizes/16KB-4	620.7µ	607.6µ	~	1.000
_SubtreeDeserializationWithBufferSizes/32KB-4	612.6µ	624.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/64KB-4	608.5µ	630.6µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	609.0µ	622.3µ	~	0.100
_SubtreeDeserializationWithBufferSizes/512KB-4	628.6µ	621.6µ	~	0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4	36.48m	35.89m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4	36.57m	35.87m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4	35.98m	35.32m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4	35.79m	35.46m	~	0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4	35.39m	35.01m	~	0.200
_PooledVsNonPooled/Pooled-4	840.9n	828.9n	~	0.200
_PooledVsNonPooled/NonPooled-4	6.740µ	6.786µ	~	1.000
_MemoryFootprint/Current_512KB_32concurrent-4	7.577µ	7.668µ	~	0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4	9.974µ	9.717µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.724µ	9.362µ	~	0.100
_prepareTxsPerLevel-4	397.7m	389.6m	~	1.000
_prepareTxsPerLevelOrdered-4	4.432m	4.071m	~	0.100
_prepareTxsPerLevel_Comparison/Original-4	408.4m	390.2m	~	0.200
_prepareTxsPerLevel_Comparison/Optimized-4	4.545m	4.294m	~	0.100
StoreBlock_Sequential/BelowCSVHeight-4	318.7µ	330.4µ	~	0.200
StoreBlock_Sequential/AboveCSVHeight-4	326.5µ	326.3µ	~	0.700
GetUtxoHashes-4	258.5n	253.8n	~	0.400
GetUtxoHashes_ManyOutputs-4	44.03µ	43.29µ	~	0.200
_NewMetaDataFromBytes-4	240.5n	239.3n	~	0.100
_Bytes-4	629.0n	630.8n	~	1.000
_MetaBytes-4	569.9n	561.7n	~	0.300

Threshold: >10% with p < 0.05 | Generated: 2026-05-12 14:23 UTC

ordishs

Approve. Bug is real — verified that dequeueBatch(0) advances q.head before the time check, so the boundary batch is gone by the time the caller inspects batch.time. The peek-before-dequeue fix is correct, the memory ordering is sound (the new method reads next.time after the same atomic load the existing dequeue relies on), and the new test pins the inclusive-until contract.

A few minor suggestions, none blocking:

Add a Reset-level regression test. The new test guards dequeueBatchUntil's contract at the queue level, but doesn't exercise the actual reset() drain loop. A test that installs a fixedClock on stp/stp.queue, enqueues pre- and post-snapshot batches, drives the Reset path, and asserts the post-snapshot batch survives in stp.queue would guard against someone "simplifying" the drain back to dequeueBatch(0). The queue-level guard is the more fundamental contract, so this is belt-and-braces.
Millisecond-boundary precision in the PR description. With inclusive-until semantics, a batch enqueued in the same millisecond as the snapshot (but after validUntilMillis is captured) will still be drained, since batch.time == validUntilMillis admits. So "batches arriving concurrently after this anchor are intentionally left in the queue" is accurate at the next millisecond, not at the snapshot millisecond itself. 1ms window vs. the original full-drain window — orders of magnitude smaller, not worth changing the code, but worth being precise.
API asymmetry around 0. dequeueBatch(0) means "no filter, drain all"; dequeueBatchUntil(0) means "reject everything with batch.time > 0" (i.e. drain nothing in practice). The docs handle it, but a brief cross-reference between the two methods would help the next caller.
Verification. Recommend running golangci-lint run and staticcheck ./... on services/blockassembly/subtreeprocessor before merge per AGENTS.md — not called out in the test plan.

liam requested review from icellan and ordishs May 12, 2026 14:16

ordishs approved these changes May 12, 2026

View reviewed changes

ordishs requested a review from oskarszoon May 12, 2026 14:48

icellan approved these changes May 12, 2026

View reviewed changes

liam merged commit 5ed3da0 into bsv-blockchain:main May 12, 2026
25 checks passed

liam mentioned this pull request May 12, 2026

Incomplete rollback in moveForwardBlock can lose in-flight tx batches #852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(blockassembly/subtreeprocessor): stop losing the boundary batch in Reset drain#851

fix(blockassembly/subtreeprocessor): stop losing the boundary batch in Reset drain#851
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/blockassembly-reset-drain-fix

liam commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

ordishs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liam commented May 12, 2026

Summary

The bug

Fix

Test changes

Test plan

Series context

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 12, 2026

Quality Gate passed

Uh oh!

github-actions Bot commented May 12, 2026

Benchmark Comparison Report

Summary

Uh oh!

ordishs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 12, 2026 •

edited

Loading