Skip to content

fix(legacy): remove duplicate peer registration that can wedge sync#946

Merged
freemans13 merged 5 commits into
bsv-blockchain:mainfrom
freemans13:stu/fix-netsync-zombie-peers
Jun 3, 2026
Merged

fix(legacy): remove duplicate peer registration that can wedge sync#946
freemans13 merged 5 commits into
bsv-blockchain:mainfrom
freemans13:stu/fix-netsync-zombie-peers

Conversation

@freemans13

@freemans13 freemans13 commented May 26, 2026

Copy link
Copy Markdown
Collaborator

What goes wrong

The legacy sync gets stuck and won't recover without a restart. We hit this on mainnet — sync sat on the same block for over 48 hours.

When the sync manager gets busy (we triggered it with a block that took 7 minutes to process under Postgres pressure), a race in the peer add/remove path can leave a disconnected peer sitting in the sync manager's peer list forever. The sync manager keeps picking that dead peer, asks it for headers, gets nothing back, gives up after 3 minutes, picks another dead peer, and so on. Forever.

What's actually happening

The legacy code was calling syncManager.NewPeer twice for every new peer — once from OnVersion (or OnProtoconf for multistream) and a second time from handleAddPeerMsg. Normally harmless — the second call just overwrites the first. But if the peer disconnects between the two calls (e.g. during handshake), three messages end up queued for that peer in this order:

  1. add peer
  2. remove peer (because it disconnected)
  3. add peer again (the duplicate)

The sync manager processes them in order: add, remove, then add the now-dead peer back. From then on, the dead peer sits in the peer map forever.

This is normally invisible because the sync manager drains messages quickly. The moment anything slows it down — a slow block, Postgres latency, GC pause — the window between calls widens and the race fires.

Direct evidence from the wedged log:

08:46:44  New valid peer 73                                  ← add
08:46:44  Lost peer 73 (removed from peerStates)             ← remove
08:46:44  New valid peer 73                                  ← add again ← stuck

Peer IDs are monotonic, so the same id 73 reappearing in sync-peer election 48 hours later proves the same dead struct is still mapped.

The fix

  1. Remove the duplicate registration. Each peer is now registered exactly once: in OnVersion for non-multistream peers, in OnProtoconf for multistream peers (matching the existing comments about stream-setup timing).

  2. Skip disconnected peers at two checkpoints, as defence-in-depth:

    • In startSync when choosing a sync peer
    • In handleNewPeerMsg when inserting into the peer map
  3. Fix a pre-existing nil panic in NewPeer / QueueTx / DonePeer — they wrote to a done reply channel on the shutdown path without checking for nil. Both production callers pass nil, so a shutdown call would panic. All three are the same one-line pattern; folded in here.

Behavior to be aware of

Multistream peers that never send Protoconf are now silently not registered with the sync manager (previously the duplicate call in handleAddPeerMsg acted as a backup). Practical risk is low — BSV nodes always send protoconf — but flagging it.

Test plan

  • go build ./services/legacy/...
  • go vet ./services/legacy/...
  • go test -race ./services/legacy/netsync/... ./services/legacy/ — 179 tests pass
  • New regression test for the disconnected-peer skip path
  • Deployed to a personal mainnet running this branch (cherry-picked onto local schema). Mainnet recovered from the stuck block and resumed sync immediately after the binary swap — no other intervention needed.

…ndidates

Every non-multistream outbound peer was being registered with the sync
manager twice: once from OnVersion (peer_server.go:648) and again from
handleAddPeerMsg (peer_server.go:1885). Multistream peers had the same
duplicate via OnProtoconf + handleAddPeerMsg.

In normal operation the two newPeerMsgs landed back-to-back on the sync
manager's msgChan and the second was a harmless no-op-overwrite. But if
ANY operation made blockHandler slow to drain msgChan (we saw this when
block 244024 took 7m23s under postgres pressure), a peer's lifecycle
events would queue out of phase: a Disconnect during handshake could put

  newPeerMsg, donePeerMsg, newPeerMsg

on msgChan for the same peer pointer. When blockHandler finally drained,
the resulting Set -> Delete -> Set sequence left a permanently
disconnected peer pointer in peerStates. startSync then re-elected that
zombie forever, pushed getheaders into a closed socket, waited the full
maxLastBlockTime (3 min) for the violation to fire, rotated to the next
zombie, and so on. Mainnet sat at height 244024 for over 48h cycling
between 5-6 dead peer structs while the connection manager held zero
live sockets to :8333.

Fix: remove the redundant NewPeer call from handleAddPeerMsg. Each peer
is now registered exactly once -- in OnVersion for non-multistream peers
and in OnProtoconf (after openRequiredStreams) for multistream peers --
matching the timing constraints already documented on those two sites.

As defence-in-depth, also skip peers whose socket has already been torn
down (peer.Connected() == false) when ranging over peerStates in
startSync. If a future regression slips a dead peer into the map again,
this stops it from being elected and stalling sync until the next
violation tick.

Tested:
- go build ./services/legacy/...
- go vet ./services/legacy/...
- go test -race ./services/legacy/netsync/... (93 tests pass)
@freemans13 freemans13 self-assigned this May 26, 2026
…hannels

Follow-up to the prior commit (eliminating the duplicate NewPeer that strands
zombies in peerStates). Two further hardening changes:

1. handleNewPeerMsg now early-returns when peer.Connected() is false. Pairs
   with the Connected() filter in startSync: dead pointers are now blocked
   at insertion rather than only being skipped at elect-time. Closes the
   remaining window during which a stale entry can sit in peerStates waiting
   for a donePeerMsg.

2. NewPeer/QueueTx/DonePeer all wrote unconditionally to the `done` reply
   channel on the shutdown path. Both production callsites of NewPeer and
   DonePeer pass nil, so a write during shutdown would panic. Add a nil
   check on each. Pre-existing latent bug noted during review; folded in
   because all three are the same one-line pattern.

Tests:
- TestHandleNewPeerMsg_NilFSMState updated to use a real connected peer
  (the prior zero-value stub now correctly fails the Connected() guard).
- New TestHandleNewPeerMsg_SkipsDisconnectedPeer covers the rejection path.
- go test -race ./services/legacy/netsync/... ./services/legacy/ (179 tests pass)
@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete

Current Review: No critical issues found. The changes correctly address the duplicate peer registration race condition.

Analysis:

The PR successfully fixes a production issue where the legacy sync could permanently wedge due to a subtle message-ordering race. The root cause was duplicate syncManager.NewPeer calls - one from OnVersion/OnProtoconf and another from handleAddPeerMsg. When a peer disconnected between these calls, the sequence add → remove → add left a dead peer in peerStates forever.

Changes verified:

  1. Duplicate registration removed (peer_server.go:1948): The call in handleAddPeerMsg is now correctly removed with a clear comment explaining why
  2. Defense-in-depth guards added (manager.go:482, 722): Connected() checks in both startSync and handleNewPeerMsg prevent any future regressions from re-introducing dead peers
  3. Nil channel fixes (manager.go:2088, 2102, 2195): Pre-existing nil pointer panics fixed in NewPeer, QueueTx, and DonePeer - all production callers pass nil
  4. Test coverage (manager_test.go:866-894): New regression test validates the disconnected-peer skip path

Notes:

  • The PR correctly identifies that multistream peers that never send Protoconf would no longer be registered. Practical risk is low since BSV nodes send Protoconf automatically 10ms after Verack (peer.go:2531)
  • The Connected() checks are inherently racy (TOCTOU), but this is acceptable as defense-in-depth - the real fix is removing the duplicate registration
  • Mainnet deployment validated the fix (sync recovered immediately after deploying the patched binary)

The implementation is sound, well-tested, and addresses the root cause correctly.

@github-actions

github-actions Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-946 (fb5fbd0)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.635µ 1.615µ ~ 0.700
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.24n 71.47n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.43n 71.18n ~ 1.000
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.10n 71.29n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 32.84n 33.46n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 58.53n 55.98n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 156.4n 129.8n ~ 0.700
MiningCandidate_Stringify_Short-4 219.4n 220.0n ~ 0.800
MiningCandidate_Stringify_Long-4 1.637µ 1.638µ ~ 0.400
MiningSolution_Stringify-4 849.0n 863.4n ~ 0.100
BlockInfo_MarshalJSON-4 1.728µ 1.757µ ~ 0.100
NewFromBytes-4 110.60n 95.37n ~ 0.700
AddTxBatchColumnar_Validation-4 1.958µ 1.977µ ~ 0.400
OffsetValidationLoop-4 420.3n 427.0n ~ 0.100
Mine_EasyDifficulty-4 60.21µ 60.67µ ~ 0.200
Mine_WithAddress-4 7.065µ 6.785µ ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 56.87n 65.34n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 30.14n 30.07n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 28.93n 29.04n ~ 0.200
DirectSubtreeAdd/1024_per_subtree-4 27.82n 28.02n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 27.53n 27.43n ~ 0.100
SubtreeProcessorAdd/4_per_subtree-4 285.5n 280.8n ~ 1.000
SubtreeProcessorAdd/64_per_subtree-4 280.7n 281.3n ~ 1.000
SubtreeProcessorAdd/256_per_subtree-4 287.2n 276.6n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 275.9n 273.3n ~ 0.700
SubtreeProcessorAdd/2048_per_subtree-4 276.6n 270.1n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 278.7n 276.4n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 272.6n 275.1n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 274.8n 271.8n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 274.9n 277.3n ~ 0.700
SubtreeNodeAddOnly/4_per_subtree-4 54.72n 54.42n ~ 0.700
SubtreeNodeAddOnly/64_per_subtree-4 34.42n 34.43n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 33.47n 33.32n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.63n 32.66n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 115.3n 114.5n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 413.7n 418.2n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.432µ 1.378µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.410µ 4.521µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 8.128µ 8.248µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 282.2n 276.4n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 275.9n 274.6n ~ 0.200
ParallelGetAndSetIfNotExists/1k_nodes-4 2.200m 2.209m ~ 1.000
ParallelGetAndSetIfNotExists/10k_nodes-4 5.334m 5.446m ~ 0.400
ParallelGetAndSetIfNotExists/50k_nodes-4 7.468m 7.211m ~ 0.200
ParallelGetAndSetIfNotExists/100k_nodes-4 10.29m 10.58m ~ 0.700
SequentialGetAndSetIfNotExists/1k_nodes-4 1.967m 1.955m ~ 0.700
SequentialGetAndSetIfNotExists/10k_nodes-4 4.508m 4.566m ~ 1.000
SequentialGetAndSetIfNotExists/50k_nodes-4 12.78m 12.16m ~ 0.100
SequentialGetAndSetIfNotExists/100k_nodes-4 23.40m 22.61m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.271m 2.250m ~ 1.000
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.297m 8.136m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 14.01m 12.84m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 2.087m 2.016m ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.723m 7.661m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.94m 42.83m ~ 1.000
BlockAssembler_AddTx-4 0.02594n 0.02765n ~ 0.400
AddNode-4 10.60 10.56 ~ 1.000
AddNodeWithMap-4 11.66 11.19 ~ 1.000
DiskTxMap_SetIfNotExists-4 3.812µ 3.774µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.481µ 3.427µ ~ 0.400
DiskTxMap_ExistenceOnly-4 337.7n 324.3n ~ 0.200
Queue-4 189.7n 190.6n ~ 1.000
AtomicPointer-4 3.628n 3.634n ~ 0.500
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 850.6µ 899.5µ ~ 0.200
ReorgOptimizations/DedupFilterPipeline/New/10K-4 771.9µ 835.3µ ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/10K-4 114.5µ 119.4µ ~ 0.200
ReorgOptimizations/AllMarkFalse/New/10K-4 64.12µ 64.16µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 55.09µ 51.51µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/10K-4 11.22µ 11.10µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/10K-4 4.403µ 4.389µ ~ 0.400
ReorgOptimizations/NodeFlags/New/10K-4 1.535µ 1.514µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.629m 10.335m ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.975m 10.340m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.162m 1.102m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 707.5µ 718.4µ ~ 0.400
ReorgOptimizations/HashSlicePool/Old/100K-4 639.9µ 465.8µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 201.4µ 197.4µ ~ 0.400
ReorgOptimizations/NodeFlags/Old/100K-4 46.37µ 48.17µ ~ 0.400
ReorgOptimizations/NodeFlags/New/100K-4 16.55µ 17.09µ ~ 0.100
TxMapSetIfNotExists-4 49.51n 49.57n ~ 1.000
TxMapSetIfNotExistsDuplicate-4 41.41n 41.63n ~ 0.400
ChannelSendReceive-4 595.7n 613.6n ~ 0.100
CalcBlockWork-4 510.0n 508.9n ~ 1.000
CalculateWork-4 696.7n 697.3n ~ 1.000
BuildBlockLocatorString_Helpers/Size_10-4 1.389µ 1.572µ ~ 1.000
BuildBlockLocatorString_Helpers/Size_100-4 12.90µ 13.05µ ~ 0.700
BuildBlockLocatorString_Helpers/Size_1000-4 126.5µ 127.2µ ~ 0.200
CatchupWithHeaderCache-4 104.5m 104.4m ~ 0.200
_prepareTxsPerLevel-4 411.6m 413.7m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.627m 3.853m ~ 0.700
_prepareTxsPerLevel_Comparison/Original-4 416.0m 422.5m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.634m 4.030m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.350m 1.346m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 320.1µ 317.2µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 74.94µ 76.57µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.63µ 19.12µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.338µ 9.539µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.667µ 4.708µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.312µ 2.339µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 74.75µ 75.31µ ~ 0.400
BlockSizeScaling/10k_tx_256_per_subtree-4 18.80µ 18.96µ ~ 0.400
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.696µ 4.700µ ~ 0.700
BlockSizeScaling/50k_tx_64_per_subtree-4 397.7µ 399.3µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 95.10µ 94.86µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.64µ 23.57µ ~ 1.000
SubtreeAllocations/small_subtrees_exists_check-4 158.5µ 162.5µ ~ 1.000
SubtreeAllocations/small_subtrees_data_fetch-4 165.8µ 164.8µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 330.3µ 329.4µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.404µ 9.426µ ~ 1.000
SubtreeAllocations/medium_subtrees_data_fetch-4 9.805µ 9.911µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 19.13µ 19.68µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.259µ 2.284µ ~ 0.200
SubtreeAllocations/large_subtrees_data_fetch-4 2.415µ 2.418µ ~ 0.500
SubtreeAllocations/large_subtrees_full_validation-4 4.831µ 4.899µ ~ 0.100
_BufferPoolAllocation/16KB-4 4.309µ 4.347µ ~ 0.400
_BufferPoolAllocation/32KB-4 11.60µ 10.69µ ~ 1.000
_BufferPoolAllocation/64KB-4 22.50µ 18.27µ ~ 0.100
_BufferPoolAllocation/128KB-4 38.27µ 37.16µ ~ 0.100
_BufferPoolAllocation/512KB-4 151.4µ 146.0µ ~ 0.700
_BufferPoolConcurrent/32KB-4 21.79µ 22.18µ ~ 0.100
_BufferPoolConcurrent/64KB-4 35.68µ 34.13µ ~ 0.200
_BufferPoolConcurrent/512KB-4 198.0µ 196.0µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/16KB-4 801.6µ 762.5µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/32KB-4 763.9µ 734.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 769.0µ 735.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 761.0µ 746.0µ ~ 0.400
_SubtreeDeserializationWithBufferSizes/512KB-4 793.7µ 758.1µ ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4 40.35m 39.57m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 40.88m 39.90m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 40.68m 40.29m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 40.74m 40.54m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/512KB-4 40.96m 40.50m ~ 0.100
_PooledVsNonPooled/Pooled-4 841.3n 832.1n ~ 0.400
_PooledVsNonPooled/NonPooled-4 9.134µ 9.217µ ~ 0.200
_MemoryFootprint/Current_512KB_32concurrent-4 9.291µ 9.925µ ~ 0.400
_MemoryFootprint/Proposed_32KB_32concurrent-4 12.90µ 14.11µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 13.31µ 12.55µ ~ 0.400
StoreBlock_Sequential/BelowCSVHeight-4 325.8µ 326.4µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 329.7µ 328.7µ ~ 0.700
GetUtxoHashes-4 271.7n 274.7n ~ 0.700
GetUtxoHashes_ManyOutputs-4 46.58µ 45.66µ ~ 0.700
_NewMetaDataFromBytes-4 214.5n 212.7n ~ 0.100
_Bytes-4 398.9n 394.9n ~ 0.700
_MetaBytes-4 140.2n 138.1n ~ 0.100

Threshold: >10% with p < 0.05 | Generated: 2026-06-02 09:05 UTC

@freemans13 freemans13 changed the title fix(legacy): eliminate duplicate NewPeer that produces zombie sync candidates fix(legacy): remove duplicate peer registration that can wedge sync May 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@freemans13 freemans13 requested review from liam and oskarszoon May 26, 2026 17:08

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Real prod bug (48h wedged mainnet sync), clean scoped fix.

NewPeer removal correct — OnVersion:648 + OnProtoconf:698 cover non-multistream + multistream paths. Post-fix msgChan is clean Set→Delete; fast-reconnect uses fresh *peerpkg.Peer so no ghost-entry aliasing.

0f43413038 defensive add-ons both sound:

  • Connected() guard at insertion — prunes peers whose socket closed mid-channel-drain.
  • nil FSM state guard at :736state != nil check before deref handles transient GetFSMCurrentState failure.

TestHandleNewPeerMsg_SkipsDisconnectedPeer + TestHandleNewPeerMsg_NilFSMState cover both. Channel-serialisation model preserved, no new locks. VersionKnown() correctly gates DonePeer. Tombstone comment in handleAddPeerMsg worth keeping.

@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

@freemans13 freemans13 requested a review from ordishs June 2, 2026 18:17

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — the fix is correct, minimal, and well-reasoned, with real-world mainnet validation.

Verified statically:

  • Root cause confirmed: NewPeer is now called exactly once (OnVersion for non-multistream, OnProtoconf for multistream); the removed handleAddPeerMsg call was the genuine duplicate, eliminating the Set→Delete→Set race at the source.
  • Connected() (connected != 0 && disconnect == 0) is the right predicate; the startSync guard is well-placed and the handleNewPeerMsg skip is safe — a later donePeerMsg for a skipped peer hits the graceful "unknown peer" return in handleDonePeerMsg.
  • Nil-channel guard is correct: both production callers pass nil, so the shutdown path would have panicked.

Non-blocking follow-ups:

  1. Add a test asserting handleAddPeerMsg no longer enqueues a newPeerMsg — this is the primary fix, but current regression tests only cover the defence-in-depth guards, not the duplicate-registration removal itself.
  2. Drop the unused .Maybe() mock in TestHandleNewPeerMsg_SkipsDisconnectedPeerhandleNewPeerMsg returns at the Connected() check before reaching the FSM fetch, so GetFSMCurrentState is never invoked.

One behavioral change worth keeping on the radar (already flagged in the PR description): multistream peers that never send Protoconf are no longer registered as sync candidates. Risk is low since BSV nodes always send protoconf, and the peer is still added to the server peer list via AddPeer.

@freemans13 freemans13 merged commit abe2140 into bsv-blockchain:main Jun 3, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants