Skip to content

fix(blockassembly): guard subtree worker ErrChan send against shutdown deadlock#1028

Merged
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/blockassembly-errchan-shutdown
Jun 3, 2026
Merged

fix(blockassembly): guard subtree worker ErrChan send against shutdown deadlock#1028
liam merged 1 commit into
bsv-blockchain:mainfrom
liam:liam/blockassembly-errchan-shutdown

Conversation

@liam

@liam liam commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Bug (shutdown deadlock)

The subtree storage worker (subtreeStorageWorker) sends its result back to the caller on work.request.ErrChan with a bare, unbuffered send:

if work.request.ErrChan != nil {
    work.request.ErrChan <- result.err   // no ctx escape
}

But the caller — the SubtreeProcessor — abandons the matching receive on its own context cancellation:

select {
case <-send.ErrChan:
    ...
case <-processorCtx.Done():
    return   // receiver gone
}

So on shutdown the worker's send has no receiver and blocks forever. That keeps the worker's for work := range workChan loop from returning, so runNewSubtreeListener's wg.Wait() (in the ctx.Done() shutdown path) never completes — block-assembly shutdown deadlocks. The resultChan send immediately above is already select-guarded on ctx.Done(); this ErrChan send was the one that wasn't.

Fix

Extract the send into sendCallerErr, which selects on ctx.Done() and so releases on shutdown. Dropping the result once ctx is cancelled is safe — the caller has already abandoned the receive and isn't waiting for it. This mirrors the existing ctx-guarded resultChan send.

Tests

send_caller_err_test.go:

  • AbandonsOnCancel — with an unbuffered ErrChan and no reader, sendCallerErr blocks until ctx is cancelled, then returns. Reverting the helper to a bare send makes this fail (reproduces the deadlock: "blocked after ctx cancel").
  • DeliversToReader — happy path still delivers.
  • NilChannelNoop — nil ErrChan is a no-op.

Verification

  • Negative check: bare-send variant fails AbandonsOnCancel (hangs → fatal); guarded version passes.
  • go test -race on the new tests: pass. go build, go vet, gofmt, gci: clean.

Second fix from the concurrency/lifecycle audit (after #1027, the kafka final-drain). The audit's remaining items: the netsync Queue* shutdown hang and the DiskTxMap.bytesWritten race.

…n deadlock

The subtree storage worker sends its result on the caller's ErrChan with a
bare, unbuffered send. The caller (SubtreeProcessor) abandons the matching
receive on its own context cancellation (select with processorCtx.Done), so
on shutdown the worker's send blocks forever — which hangs
runNewSubtreeListener's wg.Wait() and deadlocks block-assembly shutdown. The
resultChan send just above it is already ctx-guarded; the ErrChan send was
not.

Extract the send into sendCallerErr, which selects on ctx.Done so it releases
on shutdown. Dropping the result once ctx is cancelled is safe — the caller is
no longer waiting for it.

Add tests: sendCallerErr releases on ctx cancel with no reader (reverting it
to a bare send fails this with the deadlock), delivers to a present reader,
and is a no-op for a nil channel.
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review:

No issues found. This is a well-designed fix for a subtle shutdown deadlock.

Summary:

The PR correctly addresses a shutdown deadlock where the subtree storage worker blocked indefinitely on an unbuffered ErrChan send when the SubtreeProcessor receiver abandoned its receive due to context cancellation. The fix extracts the send into sendCallerErr, which uses a select statement to abandon the send when ctx is cancelled, mirroring the existing pattern for resultChan.

Strengths:

  • Correct analysis: The PR description accurately identifies the deadlock mechanism — worker sends on unbuffered channel while receiver has already exited on context cancellation
  • Minimal change: Extracts a 5-line helper function rather than rewriting surrounding code
  • Comprehensive tests: Three test cases cover the critical deadlock scenario (AbandonsOnCancel), happy path (DeliversToReader), and nil channel handling (NilChannelNoop)
  • Pattern consistency: The fix mirrors the existing ctx-guarded resultChan send at Server.go:418-422
  • Clear documentation: Both the function comment and test comment explain the deadlock mechanism and why the fix is necessary

@sonarqubecloud

sonarqubecloud Bot commented Jun 3, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1028 (c4fb812)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.870µ 1.762µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.72n 61.83n ~ 0.800
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.77n 61.74n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.73n 61.83n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 31.37n 31.22n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 53.45n 54.13n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 116.2n 138.3n ~ 0.100
MiningCandidate_Stringify_Short-4 261.6n 263.3n ~ 0.700
MiningCandidate_Stringify_Long-4 1.903µ 1.933µ ~ 0.100
MiningSolution_Stringify-4 990.2n 982.9n ~ 0.400
BlockInfo_MarshalJSON-4 1.822µ 1.816µ ~ 0.400
NewFromBytes-4 130.7n 130.6n ~ 1.000
AddTxBatchColumnar_Validation-4 2.564µ 2.492µ ~ 0.700
OffsetValidationLoop-4 639.6n 638.4n ~ 0.400
Mine_EasyDifficulty-4 67.33µ 67.45µ ~ 0.700
Mine_WithAddress-4 7.412µ 7.025µ ~ 0.100
BlockAssembler_AddTx-4 0.03057n 0.02615n ~ 0.700
AddNode-4 10.96 11.11 ~ 1.000
AddNodeWithMap-4 13.41 11.81 ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 57.80n 58.80n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.61n 31.72n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 30.47n 30.74n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.37n 29.42n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 28.93n 29.01n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 284.1n 283.7n ~ 0.800
SubtreeProcessorAdd/64_per_subtree-4 281.9n 279.3n ~ 0.200
SubtreeProcessorAdd/256_per_subtree-4 278.9n 278.2n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 269.9n 274.6n ~ 0.100
SubtreeProcessorAdd/2048_per_subtree-4 270.7n 271.4n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 275.7n 276.3n ~ 0.400
SubtreeProcessorRotate/64_per_subtree-4 271.6n 273.5n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 272.7n 273.2n ~ 0.400
SubtreeProcessorRotate/1024_per_subtree-4 273.6n 274.1n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.30n 54.18n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.19n 34.09n ~ 0.400
SubtreeNodeAddOnly/256_per_subtree-4 33.22n 33.33n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.55n 32.79n ~ 0.200
SubtreeCreationOnly/4_per_subtree-4 113.7n 114.8n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 401.6n 399.0n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.352µ 1.353µ ~ 1.000
SubtreeCreationOnly/1024_per_subtree-4 4.488µ 4.438µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.165µ 8.055µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 273.9n 274.5n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 274.6n 274.5n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 2.185m 2.191m ~ 1.000
ParallelGetAndSetIfNotExists/10k_nodes-4 5.332m 5.309m ~ 0.700
ParallelGetAndSetIfNotExists/50k_nodes-4 7.131m 7.249m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 9.767m 10.059m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.960m 1.955m ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 4.388m 4.377m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 12.81m 12.47m ~ 0.200
SequentialGetAndSetIfNotExists/100k_nodes-4 22.70m 23.23m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.250m 2.262m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.146m 8.193m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.99m 12.93m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.973m 1.975m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.473m 7.641m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.52m 40.39m ~ 0.100
DiskTxMap_SetIfNotExists-4 4.021µ 4.035µ ~ 1.000
DiskTxMap_SetIfNotExists_Parallel-4 3.492µ 3.561µ ~ 0.700
DiskTxMap_ExistenceOnly-4 416.9n 593.8n ~ 0.400
Queue-4 196.0n 193.5n ~ 0.600
AtomicPointer-4 4.423n 4.625n ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 942.3µ 894.5µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 826.7µ 826.7µ ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/10K-4 119.0µ 110.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 62.47µ 62.37µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 59.22µ 60.76µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.21µ 11.19µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/10K-4 4.992µ 5.336µ ~ 0.200
ReorgOptimizations/NodeFlags/New/10K-4 1.708µ 1.667µ ~ 0.200
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.50m 10.00m ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/100K-4 11.26m 10.74m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.191m 1.148m ~ 0.400
ReorgOptimizations/AllMarkFalse/New/100K-4 682.9µ 683.1µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 704.0µ 664.7µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 316.3µ 291.1µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/100K-4 47.22µ 52.99µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 17.63µ 18.46µ ~ 0.100
TxMapSetIfNotExists-4 53.05n 52.59n ~ 0.200
TxMapSetIfNotExistsDuplicate-4 40.05n 40.44n ~ 0.200
ChannelSendReceive-4 630.4n 606.8n ~ 0.100
CalcBlockWork-4 470.7n 471.2n ~ 0.700
CalculateWork-4 648.9n 661.1n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.048µ 1.043µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_100-4 10.08µ 10.08µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 106.2µ 107.3µ ~ 1.000
CatchupWithHeaderCache-4 103.9m 103.9m ~ 0.400
_BufferPoolAllocation/16KB-4 4.058µ 4.034µ ~ 1.000
_BufferPoolAllocation/32KB-4 8.330µ 9.211µ ~ 0.400
_BufferPoolAllocation/64KB-4 18.11µ 17.57µ ~ 0.100
_BufferPoolAllocation/128KB-4 36.42µ 35.98µ ~ 0.700
_BufferPoolAllocation/512KB-4 122.9µ 134.8µ ~ 0.700
_BufferPoolConcurrent/32KB-4 20.04µ 19.73µ ~ 0.400
_BufferPoolConcurrent/64KB-4 32.35µ 30.79µ ~ 0.100
_BufferPoolConcurrent/512KB-4 152.9µ 149.7µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 622.9µ 686.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 705.7µ 674.0µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/64KB-4 711.1µ 621.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 707.6µ 646.8µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/512KB-4 604.4µ 601.5µ ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/16KB-4 36.84m 36.83m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.42m 36.38m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.22m 36.76m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.71m 36.57m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.97m 36.26m ~ 0.700
_PooledVsNonPooled/Pooled-4 741.9n 738.6n ~ 0.700
_PooledVsNonPooled/NonPooled-4 9.020µ 8.560µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.559µ 6.553µ ~ 0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4 9.706µ 9.668µ ~ 1.000
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.293µ 9.263µ ~ 1.000
_prepareTxsPerLevel-4 419.6m 412.3m ~ 0.200
_prepareTxsPerLevelOrdered-4 3.508m 3.635m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 415.3m 419.3m ~ 0.200
_prepareTxsPerLevel_Comparison/Optimized-4 3.606m 3.577m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.331m 1.327m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 319.3µ 316.8µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 75.55µ 75.73µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 18.74µ 19.06µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.398µ 9.361µ ~ 0.400
SubtreeSizes/10k_tx_1024_per_subtree-4 4.654µ 4.683µ ~ 0.400
SubtreeSizes/10k_tx_2k_per_subtree-4 2.321µ 2.330µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 74.44µ 75.02µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 18.73µ 18.79µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.638µ 4.677µ ~ 0.200
BlockSizeScaling/50k_tx_64_per_subtree-4 393.1µ 391.7µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 93.32µ 93.78µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.20µ 23.27µ ~ 0.400
SubtreeAllocations/small_subtrees_exists_check-4 155.1µ 154.2µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 167.5µ 164.7µ ~ 0.200
SubtreeAllocations/small_subtrees_full_validation-4 322.4µ 323.0µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.240µ 9.124µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 9.842µ 9.837µ ~ 0.800
SubtreeAllocations/medium_subtrees_full_validation-4 18.78µ 18.90µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.182µ 2.212µ ~ 1.000
SubtreeAllocations/large_subtrees_data_fetch-4 2.404µ 2.424µ ~ 0.400
SubtreeAllocations/large_subtrees_full_validation-4 4.707µ 4.742µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 254.3µ 253.6µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 254.3µ 252.3µ ~ 0.400
GetUtxoHashes-4 268.0n 267.2n ~ 1.000
GetUtxoHashes_ManyOutputs-4 45.32µ 43.42µ ~ 0.100
_NewMetaDataFromBytes-4 231.3n 229.1n ~ 0.100
_Bytes-4 410.2n 404.9n ~ 0.400
_MetaBytes-4 139.4n 140.0n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-06-03 11:01 UTC

@liam liam requested a review from ordishs June 3, 2026 12:31

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Verified the full deadlock chain: the worker's bare unbuffered send on work.request.ErrChan had no receiver during shutdown (the SubtreeProcessor abandons the matching receive on processorCtx.Done()), hanging runNewSubtreeListener's wg.Wait(). The sendCallerErr ctx-guard faithfully mirrors the existing resultChan pattern. Minimal, well-documented, and the AbandonsOnCancel test genuinely reproduces the bug. go test -race passes locally.

@liam liam merged commit 367079a into bsv-blockchain:main Jun 3, 2026
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants