fix(legacy/netsync): guard nil FSM state in handleNewPeerMsg#767
Conversation
|
🤖 Claude Code Review Status: Complete No issues found. This PR correctly fixes a critical nil-pointer dereference that caused process crashes. The implementation is solid: Core fix (manager.go:716): Nil guard prevents panic while preserving peer registration flow. Promotes error log from Debug to Error for visibility, consistent with other FSM call sites. Test fix (handle_block_test.go:728): Removes eviction limit from test fixture, eliminating non-deterministic map eviction that caused test flakes. Regression test (manager_test.go:826-854): Validates nil-state handling with proper assertions for both peer registration and fee-filter side-effect. All changes align with the minimal-diff principle in AGENTS.md and include appropriate test coverage. |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-21 17:53 UTC |
oskarszoon
left a comment
There was a problem hiding this comment.
Re-approval. Three small things worth tightening:
Debugffor a CRITICAL-tagged failure is operator-silent in prod. OtherGetFSMCurrentStatecall sites inmanager.gouseErrorf(lines 1769, 1968) or return a wrapped error (line 1157).Warnfat minimum here.- Test asserts no-panic + peer-registered but not the fee-filter skip — add
sm.currentFeeFilter.Load() == 0to confirm the nil-state branch actually runs (not just that the function didn't crash). - Blast radius is full-process crash, not a dropped peer connection —
handleNewPeerMsgruns inblockHandler's dispatch loop, norecover(). Worth making explicit in the PR description.
Review follow-ups for bsv-blockchain#767: - Errorf (was Debugf) so the FSM read failure is visible at default log levels and matches the convention of the file's other GetFSMCurrentState call sites (manager.go:1157 wraps in error, :1769 and :1968 use Errorf). - Test now asserts currentFeeFilter stays 0 to prove the nil-state guard short-circuits the fee-filter branch, not just that the function did not panic.
…limit buildInRangeFixture constructed the txMap with limit=2. The TestSyncManager_ExtendTransaction_NilParentTx and TestSyncManager_extendFromTxMap_NilParentTx tests then call txMap.Set(*parentHash, ...) a third time to swap the parent wrapper for one with a nil Tx. SyncedMap.setUnlocked treats len >= limit as 'evict a random entry' (random Go map iteration order), so the child entry was being dropped non-deterministically, surfacing as PROCESSING 'tx not found in txMap' instead of the TX_INVALID expected by the test. The fixture has no reason to cap its size; drop the limit and the flake disappears (verified 50x with -race).
|



Closes #4563.
Summary
handleNewPeerMsgcallsblockchainClient.GetFSMCurrentState; on error it logged a debug message but did not return, then unconditionally dereferenced*state. A(nil, err)return — common during blockchain-service transients (gRPC timeout, brief network blip, restart) — caused a panic on the very next peer connection or reconnect.Blast radius
Severity is CRITICAL because the panic propagates through
blockHandler's dispatch loop (manager.go:1985), which has nodefer recover(). The result is a full-process crash on the next peer connection — not a dropped peer, not a degraded sync, the whole node goes down.Fix
Guard the dereference:
if state != nil && *state == teranodeblockchain.FSMStateLEGACYSYNCING && .... The fee-filter side-effect is skipped when state can't be read; peer registration andstartSynccontinue unchanged. Deviation from the audit's literal suggestion (whichreturn-ed early): early return would silently drop peer registration on every transient FSM error.Also promote the error log from
DebugftoErrorfso the failure is visible at default log levels — matches the convention used by the other threeGetFSMCurrentStatecall sites in this file (manager.go:1157returns a wrapped error;:1769and:1968useErrorf).Test plan
GetFSMCurrentStateto return(nil, err); callshandleNewPeerMsg; asserts no panic, peer is registered, andcurrentFeeFilteris not set.go test -race ./services/legacy/netsync/...passes.