Skip to content

fix(blockchain): skip FSM RUN gate when transitioning from IDLE#855

Merged
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/fsm-state-start
May 13, 2026
Merged

fix(blockchain): skip FSM RUN gate when transitioning from IDLE#855
oskarszoon merged 1 commit into
bsv-blockchain:mainfrom
oskarszoon:fix/fsm-state-start

Conversation

@oskarszoon

Copy link
Copy Markdown
Contributor

Problem

PR #844 added a checkpoint gate to SendFSMEvent that refuses RUN while the local chain tip is below the network's highest hard-coded checkpoint. On a fresh node the gate also blocks the boot path (IDLERUNNING), because tip height is 0 and the highest mainnet checkpoint is 945000:

STATE_ERROR (109): refusing RUN: chain tip height 0 is below highest checkpoint 945000 for mainnet

Repro: ./cli.sh setfsmstate --fsmstate RUNNING on a freshly-initialised node.

Fix

Skip the gate when the prior FSM state is IDLE. The dangerous transitions the gate exists to block are LEGACYSYNCINGRUNNING and CATCHINGBLOCKSRUNNING, both of which claim "I'm caught up". IDLERUNNING is the boot path: no tip exists yet, downstream services (legacy, p2p) need the FSM in RUNNING to start syncing, and the legacy service already suppresses tx relay while FSM ≠ RUNNING (#843), so the original bad-txns-vout-p2sh ban concern from #844 does not regress.

Test

TestSendFSMEvent_RunGate_SourceState exercises SendFSMEvent end-to-end with mainnet params:

  • IDLE + tip 0 → RUN succeeds (regression for this bug)
  • IDLE + tip below checkpoint → RUN succeeds
  • LEGACYSYNCING + tip below checkpoint → RUN rejected
  • LEGACYSYNCING + tip at checkpoint → RUN succeeds
  • CATCHINGBLOCKS + tip below checkpoint → RUN rejected
  • CATCHINGBLOCKS + tip above checkpoint → RUN succeeds

Verification

go test -race -count=1 ./services/blockchain/   # 616 passed
go test -race -count=1 ./services/blockvalidation/  # 563 passed
go vet ./services/blockchain/...                # clean
go build ./...                                  # clean

PR bsv-blockchain#844 added a checkpoint gate to SendFSMEvent that refuses RUN while
the local chain tip is below the network's highest hard-coded checkpoint.
On a fresh node the gate also blocks the boot path (IDLE -> RUNNING),
because tip height is 0 and the highest mainnet checkpoint is 945000:

  STATE_ERROR (109): refusing RUN: chain tip height 0 is below highest
  checkpoint 945000 for mainnet

Make the gate consider the source state. The dangerous transitions the
gate exists to block are LEGACYSYNCING -> RUNNING and CATCHINGBLOCKS ->
RUNNING, both of which claim "I'm caught up". IDLE -> RUNNING is the
boot path: no tip exists yet, downstream services (legacy, p2p) need
the FSM in RUNNING to start syncing, and the legacy service already
suppresses tx relay while FSM != RUNNING (PR bsv-blockchain#843) so the original
bad-txns-vout-p2sh ban concern does not regress.

Regression test exercises SendFSMEvent end-to-end with mainnet params
across all RUN-permitted source states.
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Review Summary:

No issues found. The fix correctly addresses the boot-path regression while preserving the original safety guarantee.

Analysis:

The change adds a state-aware guard to the RUN gate introduced in PR #844:

  • IDLE → RUNNING (boot path): Gate skipped - fresh nodes need to reach RUNNING before sync services can operate
  • LEGACYSYNCING → RUNNING and CATCHINGBLOCKS → RUNNING: Gate enforced - these claim "caught up" and must be above checkpoint height

The implementation is correct:

  1. Logic: The priorState != IDLE check properly exempts only the boot path from the checkpoint gate
  2. Test coverage: TestSendFSMEvent_RunGate_SourceState exercises all three RUN source states (IDLE, LEGACYSYNCING, CATCHINGBLOCKS) with heights at, below, and above the checkpoint
  3. Safety: The guard remains active for the dangerous transitions that motivated fix(blockchain): reject FSM RUN below highest network checkpoint #844, preventing pre-Genesis mempool/relay issues
  4. FSM structure: Verified against fsm.go:47-56 - RUN can only originate from IDLE, LEGACYSYNCING, or CATCHINGBLOCKS, all covered by tests

The PR description accurately reflects the fix and includes verification results.

@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-855 (fd89dd4)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 142
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.545µ 1.572µ ~ 0.200
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 71.47n 71.18n ~ 0.100
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 71.39n 71.68n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 71.06n 71.73n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 32.94n 33.61n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 56.25n 57.80n ~ 0.300
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 127.0n 127.2n ~ 1.000
MiningCandidate_Stringify_Short-4 219.9n 220.0n ~ 1.000
MiningCandidate_Stringify_Long-4 1.627µ 1.626µ ~ 0.700
MiningSolution_Stringify-4 847.3n 852.6n ~ 1.000
BlockInfo_MarshalJSON-4 1.738µ 1.737µ ~ 1.000
NewFromBytes-4 145.1n 126.1n ~ 0.100
Mine_EasyDifficulty-4 65.85µ 65.64µ ~ 0.700
Mine_WithAddress-4 7.253µ 7.964µ ~ 0.100
BlockAssembler_AddTx-4 0.02628n 0.02724n ~ 1.000
AddNode-4 10.80 10.66 ~ 0.100
AddNodeWithMap-4 10.95 10.52 ~ 0.200
DirectSubtreeAdd/4_per_subtree-4 60.37n 62.71n ~ 0.100
DirectSubtreeAdd/64_per_subtree-4 31.57n 31.54n ~ 0.700
DirectSubtreeAdd/256_per_subtree-4 30.49n 30.49n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 29.20n 29.28n ~ 0.200
DirectSubtreeAdd/2048_per_subtree-4 28.82n 28.90n ~ 0.300
SubtreeProcessorAdd/4_per_subtree-4 281.7n 282.7n ~ 0.400
SubtreeProcessorAdd/64_per_subtree-4 275.9n 277.6n ~ 0.400
SubtreeProcessorAdd/256_per_subtree-4 277.4n 279.9n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 269.8n 270.8n ~ 0.700
SubtreeProcessorAdd/2048_per_subtree-4 270.4n 269.6n ~ 0.400
SubtreeProcessorRotate/4_per_subtree-4 276.4n 276.4n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 272.5n 272.4n ~ 0.700
SubtreeProcessorRotate/256_per_subtree-4 274.9n 273.7n ~ 1.000
SubtreeProcessorRotate/1024_per_subtree-4 272.8n 273.8n ~ 0.400
SubtreeNodeAddOnly/4_per_subtree-4 54.94n 54.24n ~ 0.100
SubtreeNodeAddOnly/64_per_subtree-4 34.65n 34.52n ~ 0.100
SubtreeNodeAddOnly/256_per_subtree-4 33.78n 33.54n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 32.99n 32.71n ~ 0.100
SubtreeCreationOnly/4_per_subtree-4 114.6n 114.3n ~ 1.000
SubtreeCreationOnly/64_per_subtree-4 410.3n 408.2n ~ 0.700
SubtreeCreationOnly/256_per_subtree-4 1.372µ 1.356µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.452µ 4.431µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.412µ 8.365µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 271.4n 273.5n ~ 0.100
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 272.2n 270.7n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 597.0µ 618.8µ ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 1.332m 1.450m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 6.684m 6.766m ~ 0.700
ParallelGetAndSetIfNotExists/100k_nodes-4 13.17m 13.58m ~ 0.200
SequentialGetAndSetIfNotExists/1k_nodes-4 667.0µ 660.1µ ~ 0.200
SequentialGetAndSetIfNotExists/10k_nodes-4 2.798m 2.780m ~ 0.400
SequentialGetAndSetIfNotExists/50k_nodes-4 10.47m 10.44m ~ 1.000
SequentialGetAndSetIfNotExists/100k_nodes-4 20.20m 20.00m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 640.2µ 660.4µ ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.203m 4.267m ~ 0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 16.44m 16.48m ~ 0.200
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 708.9µ 697.1µ ~ 0.100
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 5.750m 5.742m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 37.71m 37.73m ~ 0.700
DiskTxMap_SetIfNotExists-4 3.462µ 3.567µ ~ 0.100
DiskTxMap_SetIfNotExists_Parallel-4 3.387µ 3.364µ ~ 1.000
DiskTxMap_ExistenceOnly-4 291.7n 295.7n ~ 0.200
Queue-4 199.7n 193.7n ~ 0.100
AtomicPointer-4 4.805n 4.991n ~ 0.700
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 847.3µ 871.8µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4 817.5µ 819.3µ ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/10K-4 119.4µ 106.2µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 62.01µ 61.87µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/10K-4 58.49µ 68.66µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.86µ 11.22µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/10K-4 5.171µ 6.360µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 2.208µ 1.820µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.512m 9.383m ~ 0.700
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.126m 9.195m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.067m 1.147m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 682.8µ 685.7µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/100K-4 624.2µ 644.2µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/100K-4 314.6µ 338.2µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 54.13µ 57.60µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 19.38µ 20.13µ ~ 0.100
TxMapSetIfNotExists-4 51.43n 51.35n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 38.23n 38.06n ~ 1.000
ChannelSendReceive-4 603.7n 635.5n ~ 0.100
CalcBlockWork-4 493.9n 493.9n ~ 0.800
CalculateWork-4 689.2n 673.1n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.350µ 1.332µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_100-4 12.86µ 12.61µ ~ 0.100
BuildBlockLocatorString_Helpers/Size_1000-4 135.3µ 127.5µ ~ 0.700
CatchupWithHeaderCache-4 104.5m 104.6m ~ 0.700
_BufferPoolAllocation/16KB-4 3.588µ 3.486µ ~ 0.700
_BufferPoolAllocation/32KB-4 7.751µ 9.480µ ~ 0.200
_BufferPoolAllocation/64KB-4 14.68µ 17.21µ ~ 0.700
_BufferPoolAllocation/128KB-4 29.04µ 31.93µ ~ 0.100
_BufferPoolAllocation/512KB-4 111.6µ 119.5µ ~ 0.100
_BufferPoolConcurrent/32KB-4 18.71µ 18.09µ ~ 0.700
_BufferPoolConcurrent/64KB-4 29.62µ 28.86µ ~ 0.100
_BufferPoolConcurrent/512KB-4 153.3µ 152.7µ ~ 1.000
_SubtreeDeserializationWithBufferSizes/16KB-4 630.7µ 634.5µ ~ 0.700
_SubtreeDeserializationWithBufferSizes/32KB-4 638.4µ 679.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 630.0µ 672.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 631.1µ 671.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 641.8µ 684.9µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.93m 36.51m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/32KB-4 36.04m 36.10m ~ 1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4 36.11m 36.07m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.60m 35.73m ~ 0.700
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.57m 35.86m ~ 0.400
_PooledVsNonPooled/Pooled-4 744.2n 743.3n ~ 0.700
_PooledVsNonPooled/NonPooled-4 6.548µ 7.743µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 6.885µ 7.109µ ~ 0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.08µ 10.42µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.586µ 10.576µ ~ 0.100
_prepareTxsPerLevel-4 419.7m 420.4m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.995m 3.783m ~ 1.000
_prepareTxsPerLevel_Comparison/Original-4 430.7m 421.9m ~ 0.200
_prepareTxsPerLevel_Comparison/Optimized-4 3.687m 4.319m ~ 0.100
SubtreeSizes/10k_tx_4_per_subtree-4 1.381m 1.402m ~ 0.200
SubtreeSizes/10k_tx_16_per_subtree-4 327.2µ 332.1µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 79.09µ 79.23µ ~ 0.700
SubtreeSizes/10k_tx_256_per_subtree-4 19.76µ 19.66µ ~ 0.400
SubtreeSizes/10k_tx_512_per_subtree-4 9.776µ 9.794µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.908µ 4.820µ ~ 0.200
SubtreeSizes/10k_tx_2k_per_subtree-4 2.424µ 2.433µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 77.18µ 78.22µ ~ 0.200
BlockSizeScaling/10k_tx_256_per_subtree-4 19.55µ 19.46µ ~ 0.700
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.847µ 4.819µ ~ 1.000
BlockSizeScaling/50k_tx_64_per_subtree-4 401.6µ 403.8µ ~ 1.000
BlockSizeScaling/50k_tx_256_per_subtree-4 98.31µ 97.31µ ~ 1.000
BlockSizeScaling/50k_tx_1024_per_subtree-4 23.94µ 23.93µ ~ 0.700
SubtreeAllocations/small_subtrees_exists_check-4 165.1µ 165.1µ ~ 0.700
SubtreeAllocations/small_subtrees_data_fetch-4 171.3µ 172.3µ ~ 0.400
SubtreeAllocations/small_subtrees_full_validation-4 335.3µ 336.3µ ~ 0.700
SubtreeAllocations/medium_subtrees_exists_check-4 9.741µ 9.706µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 10.15µ 10.16µ ~ 1.000
SubtreeAllocations/medium_subtrees_full_validation-4 19.73µ 19.63µ ~ 0.700
SubtreeAllocations/large_subtrees_exists_check-4 2.347µ 2.358µ ~ 0.400
SubtreeAllocations/large_subtrees_data_fetch-4 2.470µ 2.515µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.941µ 4.934µ ~ 1.000
StoreBlock_Sequential/BelowCSVHeight-4 323.7µ 323.5µ ~ 0.700
StoreBlock_Sequential/AboveCSVHeight-4 326.3µ 325.3µ ~ 0.700
GetUtxoHashes-4 270.2n 272.1n ~ 0.400
GetUtxoHashes_ManyOutputs-4 46.40µ 46.45µ ~ 1.000
_NewMetaDataFromBytes-4 230.1n 231.7n ~ 0.400
_Bytes-4 616.0n 604.1n ~ 0.100
_MetaBytes-4 564.2n 558.8n ~ 0.400

Threshold: >10% with p < 0.05 | Generated: 2026-05-13 10:26 UTC

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve

Correct, minimal, well-tested fix for the boot-path regression from #844. The gate-skip is keyed on the right discriminator (priorState == IDLE), preserving protection on the two source states (LEGACYSYNCING, CATCHINGBLOCKS) that actually carry an "I'm caught up" claim.

Strengths

  • Surgical change. 8 lines of fix, ~120 lines of test — the right ratio for a state-machine gate.
  • Test coverage hits the right matrix. All three RUN-permitted source states × (below-checkpoint, at-or-above-checkpoint), plus an assertion that the FSM remains in the source state on rejection — that last one would catch a future regression if someone flipped the order of gate vs. Event().
  • Test plumbing is correct. newTestBlockchainForGate now provides notifications, stats (needed for SendNotification's tracing.WithParentStat(b.stats)), initPrometheusMetrics() (needed for the enter_state gauge write), and a buffered-channel drainer so the FSM callback can publish without blocking. .Maybe() on the store mock is the right call — IDLE sources never reach GetBestBlockHeader.

Optional follow-ups (non-blocking)

1. Comment misrepresents the safety mechanism. The new comment on Server.go:2598-2604 says:

…the legacy service already suppresses tx relay while FSM != RUNNING (#843) so the original bad-txns-vout-p2sh ban concern does not regress.

But after this fix the FSM does reach RUNNING, at which point peer_server.go:2612's gate (IsFSMCurrentState(RUNNING)) permits relay. The real safety argument is structural: while FSM was IDLE, #843 prevented mempool accumulation, so the mempool is empty at the moment of transition — there is nothing pre-Genesis to relay. Worth rewording so future readers don't think the cited mechanism is what protects the new path.

2. Operator-bypass surface worth acknowledging. Idle() emits STOP, which moves RUNNING/LEGACYSYNCING → IDLE while preserving the tip. An operator calling setfsmstate idle then setfsmstate running on a partially-synced node (e.g., tip 600000) bypasses the gate — explicitly accepted by the "IDLE with tip 100 still below checkpoint succeeds" case. This is a deliberate design choice (and #843's mempool clearing during IDLE makes it tolerable), but the test name doesn't flag it as such. Consider renaming to something like "IDLE bypasses gate even with non-zero tip (operator override path)" or expanding the test-block comment.

3. Minor: drainer goroutine lifetime. t.Cleanup(func() { close(done) }) signals shutdown but doesn't wait. Harmless today (each subtest gets its own Blockchain), but a future refactor that shares state across subtests would inherit cross-talk. Not a blocker.

Verdict: ship it. The two comment-level suggestions can land in a follow-up if convenient.

@oskarszoon oskarszoon merged commit adf9e66 into bsv-blockchain:main May 13, 2026
25 checks passed
@oskarszoon oskarszoon mentioned this pull request May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants