Skip to content

fix(legacy): stop flaky legacy-sync — nil-guard Server.Stop + ephemeral 942 port (#1032)#1034

Merged
oskarszoon merged 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/legacy-sync-ci
Jun 4, 2026
Merged

fix(legacy): stop flaky legacy-sync — nil-guard Server.Stop + ephemeral 942 port (#1032)#1034
oskarszoon merged 2 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/legacy-sync-ci

Conversation

@oskarszoon

@oskarszoon oskarszoon commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the intermittent legacy-sync CI failures on TestLegacyTxBroadcast942_TeranodeRPCToSVMempool. Two independent root causes, two fixes:

  • Nil guard in legacy.Server.Stop (services/legacy/Server.go). When newServer fails (e.g. no valid listen address), Server.Init leaves the inner s.server nil. The daemon error path (ServiceManager.WaitServer.Stop) then derefs the nil *server at peer_server.go:2914 → unrecovered SIGSEGV in a daemon goroutine → crashes the whole test binary, reporting (unknown) and discarding all other results in the package. The guard returns nil when there's no inner server, downgrading a process-killing panic to a clean per-test failure. This is the load-bearing fix: any future startup error on this service hit the same crash path.

  • Ephemeral listen port in the 942 test (test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go). The test pinned the legacy listener to fixed 0.0.0.0:48444; under CI load that bind intermittently failed (TIME_WAIT/teardown of the preceding containers), producing the startup error above. Switched to 0.0.0.0:0 (OS-assigned). Safe because 942 only dials out (ConnectPeers=[sv]) and SV never connects back, so a fixed inbound port is unnecessary — verified the tx-broadcast assertion rides the established outbound connection.

Confirmed via ServiceManager that Stop is the only method invoked on a service whose Init errored, so the guard fully closes the crash path. Production initListeners retry/backoff was deliberately left out (YAGNI — would mask genuinely-occupied ports).

Test plan

  • New white-box regression test TestStop_NilInnerServer — fails (recovered nil-deref panic) before the guard, passes after.
  • go test -race ./services/legacy/ green.
  • go vet ./services/legacy/ ./test/e2e/daemon/ready/ clean.
  • legacy-sync CI job green (the authoritative signal for the e2e port change; Docker-gated locally).

Fixes #1032

@oskarszoon oskarszoon requested review from ordishs and sugh01 June 4, 2026 12:06
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. This is a solid fix for the intermittent CI failures.

Summary:

The PR correctly addresses two independent root causes of flaky test failures:

  1. Nil guard in Server.Stop() (services/legacy/Server.go:706-708) - Prevents SIGSEGV when Stop() is called after Init() failed. The ServiceManager correctly calls Stop() on all registered services during cleanup (servicemanager/service_manager.go:266-281), even if Init() failed, making this guard necessary.

  2. Ephemeral port binding (test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go:67) - Changing from fixed port 48444 to OS-assigned port ":0" eliminates bind failures under CI load. The change is safe because the test only initiates outbound connections (ConnectPeers=[sv]) and doesn't require inbound connections.

Strengths:

  • Comprehensive test coverage with TestStop_NilInnerServer
  • Accurate and detailed documentation throughout
  • Minimal, targeted changes following the Small Diff Rule
  • Clear explanation of the fix rationale in both code comments and PR description

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Verified the root-cause analysis against the code:

  • s.server is assigned only at Server.go:286, so a failed Init (GetOutboundIP or newServer error) leaves it nil.
  • ServiceManager appends the wrapper before calling Init (service_manager.go:152 vs 156), so Wait() calls Stop() on the failed service in reverse order. The Start goroutine is scheduled after the Init check, so it never runs — Stop is genuinely the only method invoked on the failed service.
  • daemon.go:298-302 logs + ForceShutdown but does not return; execution falls through to sm.Wait() at line 371 — exactly the described crash path.
  • :0 is safe: teranodeLegacyListen is only used for ListenAddresses (line 93), never as a dial target; 942 dials out via ConnectPeers=[sv].

Clean, minimal diff with a faithful regression test and an honest test plan (CI box correctly left unchecked). One non-blocking follow-up: the Wait() loop calls Stop() on every failed-Init service, so other services may carry the same latent nil-deref pattern — worth a future sweep. Recommend confirming the legacy-sync CI job is green on the branch before merge.

@sonarqubecloud

sonarqubecloud Bot commented Jun 4, 2026

Copy link
Copy Markdown

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-1034 (8f67aa8)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.843µ 1.590µ ~ 0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.26n 71.16n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.35n 71.37n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.14n 71.16n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 34.06n 33.52n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 57.10n 56.08n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 170.2n 126.9n ~ 0.100
MiningCandidate_Stringify_Short-4 219.7n 217.3n ~ 0.100
MiningCandidate_Stringify_Long-4 1.674µ 1.652µ ~ 1.000
MiningSolution_Stringify-4 857.1n 850.5n ~ 0.100
BlockInfo_MarshalJSON-4 1.729µ 1.732µ ~ 0.500
NewFromBytes-4 151.5n 157.2n ~ 1.000
AddTxBatchColumnar_Validation-4 2.511µ 2.493µ ~ 0.400
OffsetValidationLoop-4 638.9n 641.6n ~ 1.000
Mine_EasyDifficulty-4 65.32µ 65.30µ ~ 0.700
Mine_WithAddress-4 6.910µ 6.957µ ~ 0.500
BlockAssembler_AddTx-4 0.02797n 0.02848n ~ 0.700
AddNode-4 11.00 10.57 ~ 0.200
AddNodeWithMap-4 11.78 11.45 ~ 0.700
DirectSubtreeAdd/4_per_subtree-4 57.98n 57.99n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 29.85n 30.39n ~ 0.300
DirectSubtreeAdd/256_per_subtree-4 28.97n 29.14n ~ 0.700
DirectSubtreeAdd/1024_per_subtree-4 27.99n 27.79n ~ 0.100
DirectSubtreeAdd/2048_per_subtree-4 27.54n 27.25n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 283.9n 286.8n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 281.2n 280.8n ~ 0.600
SubtreeProcessorAdd/256_per_subtree-4 279.9n 280.2n ~ 0.400
SubtreeProcessorAdd/1024_per_subtree-4 277.0n 271.7n ~ 0.300
SubtreeProcessorAdd/2048_per_subtree-4 275.6n 273.8n ~ 0.700
SubtreeProcessorRotate/4_per_subtree-4 281.9n 279.2n ~ 0.700
SubtreeProcessorRotate/64_per_subtree-4 275.1n 282.4n ~ 0.200
SubtreeProcessorRotate/256_per_subtree-4 275.9n 278.5n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 275.0n 279.0n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.25n 54.49n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.35n 34.45n ~ 0.700
SubtreeNodeAddOnly/256_per_subtree-4 33.48n 33.39n ~ 0.700
SubtreeNodeAddOnly/1024_per_subtree-4 32.57n 32.54n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 115.5n 114.6n ~ 0.600
SubtreeCreationOnly/64_per_subtree-4 399.1n 399.2n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.366µ 1.363µ ~ 0.700
SubtreeCreationOnly/1024_per_subtree-4 4.457µ 4.443µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.167µ 8.194µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 274.1n 276.9n ~ 0.400
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 276.2n 274.3n ~ 0.100
ParallelGetAndSetIfNotExists/1k_nodes-4 2.210m 2.217m ~ 0.400
ParallelGetAndSetIfNotExists/10k_nodes-4 5.404m 5.433m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.494m 7.450m ~ 1.000
ParallelGetAndSetIfNotExists/100k_nodes-4 10.46m 10.35m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.963m 1.953m ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 4.524m 4.411m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 13.04m 12.37m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 24.37m 22.32m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.273m 2.249m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.297m 8.261m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 13.36m 13.52m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 2.001m 1.985m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 8.121m 7.720m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.87m 40.81m ~ 1.000
DiskTxMap_SetIfNotExists-4 3.439µ 3.363µ ~ 0.500
DiskTxMap_SetIfNotExists_Parallel-4 3.347µ 3.261µ ~ 1.000
DiskTxMap_ExistenceOnly-4 308.2n 304.3n ~ 0.100
Queue-4 186.4n 187.9n ~ 0.300
AtomicPointer-4 4.716n 4.503n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 895.9µ 874.1µ ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4 812.4µ 849.9µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 116.7µ 105.5µ ~ 0.400
ReorgOptimizations/AllMarkFalse/New/10K-4 62.09µ 61.87µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/10K-4 65.79µ 56.88µ ~ 0.200
ReorgOptimizations/HashSlicePool/New/10K-4 12.30µ 12.32µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/10K-4 5.269µ 5.101µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 2.126µ 1.550µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.422m 9.348m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.447m 9.424m ~ 1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.169m 1.064m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 680.9µ 686.9µ ~ 0.200
ReorgOptimizations/HashSlicePool/Old/100K-4 658.7µ 635.4µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 307.6µ 332.6µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 49.50µ 49.51µ ~ 1.000
ReorgOptimizations/NodeFlags/New/100K-4 17.70µ 17.17µ ~ 0.100
TxMapSetIfNotExists-4 52.10n 52.39n ~ 0.400
TxMapSetIfNotExistsDuplicate-4 40.28n 40.36n ~ 0.500
ChannelSendReceive-4 612.5n 602.9n ~ 0.100
CalcBlockWork-4 544.1n 542.4n ~ 0.400
CalculateWork-4 792.2n 733.8n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.347µ 1.341µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_100-4 14.69µ 15.94µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_1000-4 126.7µ 126.8µ ~ 0.700
CatchupWithHeaderCache-4 104.2m 104.1m ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.339m 1.337m ~ 1.000
SubtreeSizes/10k_tx_16_per_subtree-4 321.2µ 323.3µ ~ 0.700
SubtreeSizes/10k_tx_64_per_subtree-4 75.43µ 76.36µ ~ 0.200
SubtreeSizes/10k_tx_256_per_subtree-4 18.63µ 19.10µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.233µ 9.560µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.616µ 4.794µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.339µ 2.395µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 74.78µ 75.75µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 18.78µ 19.14µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.677µ 4.747µ ~ 0.200
BlockSizeScaling/50k_tx_64_per_subtree-4 397.6µ 393.8µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 93.59µ 95.44µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.01µ 23.72µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 156.8µ 156.2µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 166.5µ 168.1µ ~ 0.200
SubtreeAllocations/small_subtrees_full_validation-4 323.8µ 330.5µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 9.086µ 9.275µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.866µ 10.119µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 18.87µ 19.52µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.189µ 2.238µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.420µ 2.491µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.727µ 4.846µ ~ 0.100
_BufferPoolAllocation/16KB-4 3.820µ 3.837µ ~ 0.700
_BufferPoolAllocation/32KB-4 9.597µ 7.672µ ~ 1.000
_BufferPoolAllocation/64KB-4 22.21µ 16.21µ ~ 0.200
_BufferPoolAllocation/128KB-4 34.41µ 31.42µ ~ 0.100
_BufferPoolAllocation/512KB-4 122.1µ 114.3µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.37µ 18.49µ ~ 1.000
_BufferPoolConcurrent/64KB-4 29.35µ 29.56µ ~ 1.000
_BufferPoolConcurrent/512KB-4 148.1µ 144.4µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/16KB-4 635.5µ 650.0µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/32KB-4 642.2µ 631.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 644.0µ 622.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 648.8µ 639.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 607.1µ 592.3µ ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/16KB-4 37.38m 36.63m ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/32KB-4 37.42m 36.77m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 37.03m 36.48m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/128KB-4 36.93m 36.49m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 36.53m 36.52m ~ 1.000
_PooledVsNonPooled/Pooled-4 835.7n 843.6n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.582µ 8.011µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.564µ 6.957µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.969µ 9.688µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.626µ 9.284µ ~ 0.100
_prepareTxsPerLevel-4 401.3m 395.1m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.650m 3.721m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 398.2m 401.4m ~ 0.700
_prepareTxsPerLevel_Comparison/Optimized-4 3.624m 3.708m ~ 0.200
StoreBlock_Sequential/BelowCSVHeight-4 337.4µ 343.7µ ~ 0.400
StoreBlock_Sequential/AboveCSVHeight-4 345.5µ 337.3µ ~ 0.100
GetUtxoHashes-4 258.0n 258.7n ~ 1.000
GetUtxoHashes_ManyOutputs-4 49.98µ 49.80µ ~ 1.000
_NewMetaDataFromBytes-4 228.0n 228.4n ~ 0.300
_Bytes-4 424.5n 422.3n ~ 0.200
_MetaBytes-4 138.7n 138.1n ~ 0.700

Threshold: >10% with p < 0.05 | Generated: 2026-06-04 12:19 UTC

@oskarszoon oskarszoon merged commit e9e3f53 into bsv-blockchain:main Jun 4, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky legacy-sync: TestLegacyTxBroadcast942 fixed-port bind → Server.Stop nil-deref panics whole suite

3 participants