Skip to content

blockassembly: prevent Start hanging on pre-ready gRPC failures#940

Merged
gokutheengineer merged 3 commits into
bsv-blockchain:mainfrom
gokutheengineer:gokhan/blockassembly-starthang
Jun 8, 2026
Merged

blockassembly: prevent Start hanging on pre-ready gRPC failures#940
gokutheengineer merged 3 commits into
bsv-blockchain:mainfrom
gokutheengineer:gokhan/blockassembly-starthang

Conversation

@gokutheengineer

Copy link
Copy Markdown
Collaborator

Summary

BlockAssembly.Start could block forever waiting on grpcReady if StartGRPCServer returned before invoking the registration callback (for example, invalid bind address or early startup failure).

This change makes startup failure-aware and cancellation-aware:

  • Added a startup error channel (grpcStartErr) in Start and wait on:
    • gRPC readiness
    • early gRPC startup error
    • ctx.Done()
  • Updated readiness signaling to be non-blocking-safe by closing grpcReady with sync.Once from the registration callback.
  • Preserved existing behavior of always closing readyCh via deferred close, including failure paths.
  • Added regression test:
    • TestStartReturnsErrorWhenGRPCFailsBeforeReady
    • forces gRPC listen failure and asserts Start returns an error instead of hanging
    • verifies readyCh is closed when Start exits on error

Why

A startup error should fail fast and surface a clear error. Previously, the service could hang indefinitely, which can wedge tests, startup orchestration, or supervisors waiting on readiness.

Test Plan

  • go test ./services/blockassembly -run 'TestStartReturnsErrorWhenGRPCFailsBeforeReady|TestGetBlockAssemblyTxsReturnsWhenContextCancelled'
  • Confirm no linter errors on touched files.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-940 (545fe86)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.830µ 1.635µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.13n 72.10n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.30n 71.19n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.08n 71.27n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 33.10n 32.91n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.38n 55.45n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 127.4n 127.6n ~ 1.000
MiningCandidate_Stringify_Short-4 217.5n 221.0n ~ 0.100
MiningCandidate_Stringify_Long-4 1.646µ 1.646µ ~ 1.000
MiningSolution_Stringify-4 855.0n 846.9n ~ 0.700
BlockInfo_MarshalJSON-4 1.732µ 1.721µ ~ 0.100
NewFromBytes-4 131.6n 129.9n ~ 0.100
AddTxBatchColumnar_Validation-4 2.496µ 2.489µ ~ 0.700
OffsetValidationLoop-4 638.7n 644.4n ~ 0.100
Mine_EasyDifficulty-4 52.09µ 51.86µ ~ 0.700
Mine_WithAddress-4 5.424µ 5.415µ ~ 1.000
DirectSubtreeAdd/4_per_subtree-4 58.19n 58.09n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 30.01n 30.13n ~ 0.800
DirectSubtreeAdd/256_per_subtree-4 29.07n 28.85n ~ 0.200
DirectSubtreeAdd/1024_per_subtree-4 27.99n 27.96n ~ 0.700
DirectSubtreeAdd/2048_per_subtree-4 27.67n 27.61n ~ 0.400
SubtreeProcessorAdd/4_per_subtree-4 285.8n 293.2n ~ 0.100
SubtreeProcessorAdd/64_per_subtree-4 282.2n 282.2n ~ 0.800
SubtreeProcessorAdd/256_per_subtree-4 283.9n 283.7n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 273.4n 275.4n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 273.8n 280.1n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 279.9n 281.5n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 276.8n 276.9n ~ 1.000
SubtreeProcessorRotate/256_per_subtree-4 275.4n 301.2n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 278.5n 280.9n ~ 0.700
SubtreeNodeAddOnly/4_per_subtree-4 54.53n 54.25n ~ 1.000
SubtreeNodeAddOnly/64_per_subtree-4 34.31n 34.31n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 33.31n 33.35n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 32.58n 32.81n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 114.3n 114.1n ~ 1.000
SubtreeCreationOnly/64_per_subtree-4 399.9n 405.1n ~ 0.400
SubtreeCreationOnly/256_per_subtree-4 1.346µ 1.364µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.400µ 4.403µ ~ 0.700
SubtreeCreationOnly/2048_per_subtree-4 8.009µ 8.183µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 277.4n 279.5n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 275.7n 282.7n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.226m 2.245m ~ 0.200
ParallelGetAndSetIfNotExists/10k_nodes-4 5.394m 5.404m ~ 0.700
ParallelGetAndSetIfNotExists/50k_nodes-4 7.367m 7.487m ~ 0.400
ParallelGetAndSetIfNotExists/100k_nodes-4 10.38m 10.61m ~ 0.200
SequentialGetAndSetIfNotExists/1k_nodes-4 1.956m 1.978m ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 4.521m 4.588m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 12.45m 12.67m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 22.61m 22.61m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.275m 2.267m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.263m 8.358m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.22m 13.65m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.985m 2.010m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.641m 7.739m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.67m 40.34m ~ 0.400
BlockAssembler_AddTx-4 0.02842n 0.02572n ~ 0.400
AddNode-4 10.76 10.97 ~ 0.700
AddNodeWithMap-4 11.25 11.06 ~ 1.000
DiskTxMap_SetIfNotExists-4 3.729µ 3.824µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.528µ 3.704µ ~ 0.200
DiskTxMap_ExistenceOnly-4 323.5n 356.1n ~ 0.100
Queue-4 185.0n 187.4n ~ 0.200
AtomicPointer-4 3.640n 3.642n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 886.4µ 807.6µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 830.4µ 755.1µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 125.2µ 110.7µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.22µ 64.24µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 52.89µ 51.41µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.30µ 11.05µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 4.643µ 4.384µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.555µ 1.489µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.761m 9.379m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.922m 10.682m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.115m 1.110m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 705.9µ 708.5µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 519.6µ 612.5µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/100K-4 220.8µ 216.7µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/100K-4 47.40µ 48.29µ ~ 0.700
ReorgOptimizations/NodeFlags/New/100K-4 15.98µ 16.23µ ~ 1.000
TxMapSetIfNotExists-4 49.58n 50.23n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 41.48n 41.24n ~ 0.700
ChannelSendReceive-4 587.9n 560.4n ~ 0.100
CalcBlockWork-4 532.3n 534.0n ~ 1.000
CalculateWork-4 713.9n 716.0n ~ 0.400
BuildBlockLocatorString_Helpers/Size_10-4 1.356µ 1.440µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_100-4 13.46µ 12.85µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 126.4µ 127.1µ ~ 0.400
CatchupWithHeaderCache-4 104.4m 104.6m ~ 0.200
_BufferPoolAllocation/16KB-4 4.087µ 4.708µ ~ 0.100
_BufferPoolAllocation/32KB-4 10.378µ 9.090µ ~ 0.700
_BufferPoolAllocation/64KB-4 17.20µ 20.52µ ~ 0.200
_BufferPoolAllocation/128KB-4 36.63µ 35.83µ ~ 0.100
_BufferPoolAllocation/512KB-4 117.7µ 143.3µ ~ 0.100
_BufferPoolConcurrent/32KB-4 21.43µ 24.18µ ~ 0.100
_BufferPoolConcurrent/64KB-4 32.07µ 32.31µ ~ 0.700
_BufferPoolConcurrent/512KB-4 151.0µ 150.5µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 643.2µ 628.6µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 635.3µ 636.5µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/64KB-4 639.6µ 642.3µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/128KB-4 635.0µ 640.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 641.6µ 644.5µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.15m 36.86m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.99m 36.91m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.96m 36.85m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.92m 36.75m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.62m 36.63m ~ 1.000
_PooledVsNonPooled/Pooled-4 739.9n 738.8n ~ 1.000
_PooledVsNonPooled/NonPooled-4 8.718µ 8.266µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.889µ 7.062µ ~ 0.300
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.19µ 10.10µ ~ 0.400
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.083µ 9.649µ ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.373m 1.363m ~ 0.200
SubtreeSizes/10k_tx_16_per_subtree-4 329.4µ 319.7µ ~ 0.200
SubtreeSizes/10k_tx_64_per_subtree-4 77.96µ 77.09µ ~ 0.400
SubtreeSizes/10k_tx_256_per_subtree-4 19.19µ 19.35µ ~ 0.200
SubtreeSizes/10k_tx_512_per_subtree-4 9.629µ 9.628µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.783µ 4.793µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.362µ 2.412µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 76.36µ 77.73µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 19.06µ 19.24µ ~ 0.200
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.848µ 4.844µ ~ 1.000
BlockSizeScaling/50k_tx_64_per_subtree-4 396.0µ 405.3µ ~ 0.200
BlockSizeScaling/50k_tx_256_per_subtree-4 96.20µ 95.87µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.50µ 24.03µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 156.8µ 157.2µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 171.3µ 168.7µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 330.6µ 332.4µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.308µ 9.355µ ~ 0.200
SubtreeAllocations/medium_subtrees_data_fetch-4 10.14µ 10.26µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.36µ 19.40µ ~ 1.000
SubtreeAllocations/large_subtrees_exists_check-4 2.223µ 2.226µ ~ 0.500
SubtreeAllocations/large_subtrees_data_fetch-4 2.471µ 2.502µ ~ 0.200
SubtreeAllocations/large_subtrees_full_validation-4 4.821µ 4.850µ ~ 1.000
_prepareTxsPerLevel-4 388.6m 397.0m ~ 0.700
_prepareTxsPerLevelOrdered-4 4.318m 4.484m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 389.7m 396.5m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 4.569m 4.475m ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 335.4µ 334.6µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 339.8µ 337.9µ ~ 0.700
GetUtxoHashes-4 285.4n 279.6n ~ 0.700
GetUtxoHashes_ManyOutputs-4 46.48µ 46.28µ ~ 0.200
_NewMetaDataFromBytes-4 213.0n 215.5n ~ 0.400
_Bytes-4 398.7n 405.5n ~ 0.400
_MetaBytes-4 139.1n 140.0n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-04 12:23 UTC

@gokutheengineer gokutheengineer marked this pull request as ready for review June 2, 2026 13:04
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

No issues found. The implementation correctly addresses the startup hang scenario with proper error handling and comprehensive test coverage.

Summary:

  • Fixes potential deadlock when gRPC server fails before readiness callback
  • Uses buffered error channel with non-blocking send (good defensive programming)
  • Adds context cancellation awareness during startup phase
  • Test coverage validates both error path and readyCh closure guarantee
  • Implementation follows existing errgroup patterns in the codebase

@sonarqubecloud

sonarqubecloud Bot commented Jun 4, 2026

Copy link
Copy Markdown

@freemans13 freemans13 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Genuine bug fix: Start could hang forever on <-grpcReady when StartGRPCServer returned via any of its pre-callback error paths (bad listen address, missing cert/key, server-creation failure). The three-way select on readiness / early-startup-error / ctx.Done() resolves this comprehensively.

Strengths:

  • Switching to close(grpcReady) via sync.Once is broadcast-safe and won't block the callback even after the select returns.
  • grpcStartErr is buffered (cap 1) with a non-blocking send — no goroutine leak.
  • Race-clean; new regression test passes under -race and fails loudly (2s timeout) rather than hanging on regression. It also verifies readyCh is still closed on the error path.

Minor (non-blocking):

  • Error/cancel return paths no longer join the errgroup via g.Wait() as the original single exit did. No real leak (errgroup cancels gCtx on error return; ctx.Done propagates to the child gCtx), but a one-line comment noting this is intentional would help future readers.
  • Test mixes assert.Contains with require (testing.md prefers require), though consistent with the file's existing style.

LGTM.

@gokutheengineer gokutheengineer merged commit 4e4b723 into bsv-blockchain:main Jun 8, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants