fix(blockchain): reject FSM RUN below highest network checkpoint by oskarszoon · Pull Request #844 · bsv-blockchain/teranode

oskarszoon · 2026-05-11T18:49:20Z

Summary

Block the FSMEventType_RUN transition in Blockchain.SendFSMEvent whenever the local chain tip is below the network's highest hard-coded checkpoint. Catchup callers continue to log the rejection and re-enter CATCHINGBLOCKS, so the node stops bouncing between states while mid-IBD.
Remove the unconditional Init-time RUN flip in blockvalidation.Server.Init (added in #3672). It bypassed the gate and was the load-bearing reason the node could lie about being RUNNING during IBD. Crash recovery from a stuck CATCHINGBLOCKS state is already covered by legacy startSync and restoreFSMState (both now subject to the gate).
Networks with no checkpoints (regtest, brand-new chains) bypass the gate. Mainnet's highest is currently 938000, testnet3's is 1700000 (go-chaincfg.MainNetParams.Checkpoints / TestNet3Params.Checkpoints).

Why

Follow-up to #843. While investigating that ban incident on bsva-ovh-teranode-eu-3 (chain tip ~293k, mainnet highest checkpoint 938000), Coralogix showed 30 Sending Run event blockvalidation/Server (CATCHINGBLOCKS => Run) log lines in 2.5 hours, all from services/blockvalidation/catchup.go:973 restoreFSMState. The defer u.restoreFSMState(ctx, catchupCtx) at catchup.go:790 flips state back to RUN unconditionally on every catchup chunk exit, regardless of whether the chain is actually caught up to network tip. Each RUN window let the legacy service emit at least one P2SH-bearing tx inv to seed-mainnet-eu-1 and earn bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED (the re-ban at 18:01:38 lines up exactly with the same-second RUN event).

Investigation also surfaced a second path: blockvalidation.Server.Init (added by PR #3672) checks at startup whether FSM is in CATCHINGBLOCKS and, if so, sends RUN with no consultation of chain height. Comment in the original PR: "this is probably a remnant of a crash". Intent was reasonable (don't pin the node in CATCHINGBLOCKS if the prior process crashed mid-catchup), but the implementation lies about progress whenever the chain genuinely is mid-IBD on restart. The right invariant is the one this PR enforces in SendFSMEvent.

Implementation

services/blockchain/Server.go:

New guardRunBelowHighestCheckpoint(ctx) helper. Reads b.store.GetBestBlockHeader(ctx); compares meta height against blockchain.HighestCheckpointHeight(b.settings.ChainCfgParams.Checkpoints). Returns errors.NewStateError(...) when below.
New exported HighestCheckpointHeight([]chaincfg.Checkpoint) uint32 helper. Replaces the private duplicate previously in services/blockvalidation/catchup.go.
SendFSMEvent calls the guard for FSMEventType_RUN. The existing CATCHINGBLOCKS-cannot-be-manually-exited gate is preserved.

services/blockvalidation/Server.go:

Remove the Init-time RUN flip (lines 527-541 before this PR). With the gate in place it was actively dangerous; without the gate it was the source of the eu-3 ban incident. Crash recovery for a stuck CATCHINGBLOCKS state still works through:
- services/legacy/netsync/manager.go:546 startSync — issues RUN when a peer reports the same height as the local tip.
- services/blockvalidation/catchup.go:973 restoreFSMState — defers RUN at the end of every catchup chunk.
  Both now go through the gate, so neither can fire while the local tip is below the highest checkpoint.

services/blockvalidation/catchup.go, services/blockvalidation/BlockValidation.go:

Drop the private getHighestCheckpointHeight; rewire the three callsites (catchup.go:682, BlockValidation.go:1339, BlockValidation.go:1764) to the shared blockchain.HighestCheckpointHeight.

Caller behaviour for Run rejection:

services/legacy/netsync/manager.go:546,1345,1973 — already log+continue.
services/blockvalidation/catchup.go:973 — already log+continue.
services/blockvalidation/Server.go — Init flip removed entirely.

No caller crashes on the new rejection; they just stay in CATCHINGBLOCKS until catchup genuinely passes the highest checkpoint.

Edge cases

Networks with no checkpoints (regtest, brand-new chains): HighestCheckpointHeight returns 0 → guard returns nil → RUN allowed.
Missing settings.ChainCfgParams: guard returns nil (defensive; production always populates).
Store read error or nil meta: fails closed (StateError) so the node retries later rather than slipping into RUN with unknown chain state.

Test plan

services/blockchain/server_fsm_run_gate_test.go:

TestHighestCheckpointHeight — nil/empty/unordered + MainNet (938000) + RegTest (0).
TestGuardRunBelowHighestCheckpoint — table-driven: below/at/above checkpoint, no-checkpoint network permits, store error fails closed, nil meta fails closed.
TestGuardRunBelowHighestCheckpoint_NilSettings — defensive paths.
go build ./...
go test -count=1 -race ./services/blockchain/ → 609 pass
go test -count=1 -race ./services/blockvalidation/ → 562 pass
go test -count=1 -race ./services/blockvalidation/ ./services/legacy/netsync/ → 605 pass (downstream callers unaffected)
go vet ./services/blockchain/...
Manual: starting testnet daemon from a mid-IBD chain state (height 1481322) no longer crashes Init; node comes up in CATCHINGBLOCKS and catchup runs.

Follow-ups

The deeper issue (restoreFSMState issues RUN even when more catchup is needed) is mitigated by the gate but worth a separate change to make restoreFSMState consult "are we actually at tip?" before transitioning. Avoids the constant rejection log spam during IBD.
Mainnet checkpoint list in go-chaincfg is currently stale (highest = 938000, real tip ~948677). Update the checkpoints periodically so the gate threshold tracks reality.
Pair with fix(legacy): suppress tx relay while FSM != RUNNING #843 to fully close the SV-node ban vector: fix(legacy): suppress tx relay while FSM != RUNNING #843 stops outbound tx relay while not RUNNING; this PR stops bogus RUN transitions while below checkpoint.

Refuse `FSMEventType_RUN` whenever the local chain tip is below the network's highest hard-coded checkpoint. Pre-checkpoint heights are guaranteed to be deep history (mainnet's highest is 938000), so a node sitting below them is mid-IBD even if a catchup worker thinks it has finished its current chunk. Going to RUNNING in that state lets the mempool/validator operate under pre-Genesis output rules and the legacy service relay tx invs that current peers ban on sight (`bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED`). Observed on mainnet eu-3 (chain tip ~293k, mainnet highest checkpoint 938000): 30 RUN events fired in 2.5 hours from `services/blockvalidation/catchup.go:973 restoreFSMState`. The unconditional `defer u.restoreFSMState(ctx, catchupCtx)` flips state back to RUN on every catchup chunk exit, regardless of whether the chain is anywhere near the network tip. Each RUN window let the legacy service emit at least one P2SH-bearing tx inv to seed-mainnet-eu-1 and collected an instant ban. The gate sits in `SendFSMEvent` so it covers every caller of `blockchainClient.Run(...)` (`legacy/netsync/manager.go` startSync, handleBlockMsg, blockHandler; `blockvalidation/Server.go` Start; `blockvalidation/catchup.go` restoreFSMState). Existing callers log the rejection and continue; FSM stays in CATCHINGBLOCKS until catchup genuinely passes the checkpoint. Fails closed: store read errors or missing meta also reject RUN. Networks with no checkpoints (regtest, brand-new nets) bypass the gate. The helper has no production-config dependency beyond `settings.ChainCfgParams.Checkpoints`.

github-actions · 2026-05-11T18:51:00Z

🤖 Claude Code Review

Status: Complete

Current Review:
No issues found. The PR correctly implements a safety gate for FSM RUN transitions.

Key Changes Verified:

Gate blocks RUN when chain tip < highest checkpoint (services/blockchain/Server.go:2601-2606)
Removed Init-time RUN flip that bypassed the gate (services/blockvalidation/Server.go)
Consolidated duplicate HighestCheckpointHeight implementations into shared function
Comprehensive test coverage including edge cases (nil settings, store errors, regtest)

History:

✅ Fixed: Code duplication issue resolved - all call sites now use shared blockchain.HighestCheckpointHeight

The implementation follows defensive patterns (fail-closed on errors), preserves regtest compatibility, and includes proper test coverage per AGENTS.md requirements.

github-actions · 2026-05-11T19:05:03Z

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-844 (855b349)

Summary

Regressions: 0
Improvements: 0
Unchanged: 142
Significance level: p < 0.05

All benchmark results (sec/op)

Benchmark	Baseline	Current	Change	p-value
_NewBlockFromBytes-4	1.546µ	1.564µ	~	0.100
SplitSyncedParentMap_SetIfNotExists/256_buckets-4	71.25n	71.16n	~	0.400
SplitSyncedParentMap_SetIfNotExists/16_buckets-4	71.18n	71.75n	~	0.400
SplitSyncedParentMap_SetIfNotExists/1_bucket-4	71.15n	71.25n	~	0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets...	32.62n	33.01n	~	0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_...	57.14n	55.07n	~	0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa...	127.6n	126.4n	~	0.100
MiningCandidate_Stringify_Short-4	218.4n	218.7n	~	0.700
MiningCandidate_Stringify_Long-4	1.637µ	1.646µ	~	0.100
MiningSolution_Stringify-4	847.4n	853.5n	~	0.100
BlockInfo_MarshalJSON-4	1.725µ	1.721µ	~	0.300
NewFromBytes-4	131.5n	149.2n	~	0.100
Mine_EasyDifficulty-4	60.88µ	61.46µ	~	0.400
Mine_WithAddress-4	6.954µ	6.879µ	~	0.700
BlockAssembler_AddTx-4	0.03373n	0.02895n	~	0.400
AddNode-4	10.53	10.63	~	1.000
AddNodeWithMap-4	11.31	10.68	~	0.200
DirectSubtreeAdd/4_per_subtree-4	61.58n	65.79n	~	0.200
DirectSubtreeAdd/64_per_subtree-4	31.35n	31.62n	~	0.400
DirectSubtreeAdd/256_per_subtree-4	30.63n	30.37n	~	0.400
DirectSubtreeAdd/1024_per_subtree-4	29.07n	29.23n	~	0.200
DirectSubtreeAdd/2048_per_subtree-4	28.58n	28.67n	~	0.100
SubtreeProcessorAdd/4_per_subtree-4	278.6n	281.3n	~	1.000
SubtreeProcessorAdd/64_per_subtree-4	276.2n	273.3n	~	0.200
SubtreeProcessorAdd/256_per_subtree-4	276.5n	274.6n	~	0.700
SubtreeProcessorAdd/1024_per_subtree-4	269.0n	266.4n	~	0.100
SubtreeProcessorAdd/2048_per_subtree-4	269.6n	265.8n	~	0.100
SubtreeProcessorRotate/4_per_subtree-4	274.5n	272.2n	~	0.400
SubtreeProcessorRotate/64_per_subtree-4	272.9n	270.7n	~	0.100
SubtreeProcessorRotate/256_per_subtree-4	273.9n	271.4n	~	0.200
SubtreeProcessorRotate/1024_per_subtree-4	270.8n	269.0n	~	0.400
SubtreeNodeAddOnly/4_per_subtree-4	54.04n	54.70n	~	0.200
SubtreeNodeAddOnly/64_per_subtree-4	34.39n	34.27n	~	0.700
SubtreeNodeAddOnly/256_per_subtree-4	33.28n	33.56n	~	0.100
SubtreeNodeAddOnly/1024_per_subtree-4	32.80n	32.71n	~	1.000
SubtreeCreationOnly/4_per_subtree-4	112.7n	112.8n	~	0.700
SubtreeCreationOnly/64_per_subtree-4	400.2n	395.3n	~	0.400
SubtreeCreationOnly/256_per_subtree-4	1.325µ	1.419µ	~	0.100
SubtreeCreationOnly/1024_per_subtree-4	4.377µ	4.546µ	~	0.200
SubtreeCreationOnly/2048_per_subtree-4	8.055µ	8.152µ	~	0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4	267.9n	268.2n	~	0.800
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4	270.6n	268.2n	~	0.100
ParallelGetAndSetIfNotExists/1k_nodes-4	586.4µ	588.9µ	~	0.400
ParallelGetAndSetIfNotExists/10k_nodes-4	1.327m	1.405m	~	0.100
ParallelGetAndSetIfNotExists/50k_nodes-4	6.623m	6.753m	~	0.100
ParallelGetAndSetIfNotExists/100k_nodes-4	13.22m	13.38m	~	0.100
SequentialGetAndSetIfNotExists/1k_nodes-4	658.4µ	671.2µ	~	0.100
SequentialGetAndSetIfNotExists/10k_nodes-4	2.751m	2.920m	~	0.200
SequentialGetAndSetIfNotExists/50k_nodes-4	10.32m	10.42m	~	0.200
SequentialGetAndSetIfNotExists/100k_nodes-4	19.90m	19.91m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4	636.1µ	647.9µ	~	0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4	4.253m	4.202m	~	0.400
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4	16.48m	16.41m	~	1.000
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4	710.3µ	702.4µ	~	0.700
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4	5.906m	5.891m	~	0.700
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4	37.97m	39.55m	~	0.100
DiskTxMap_SetIfNotExists-4	3.581µ	3.482µ	~	0.100
DiskTxMap_SetIfNotExists_Parallel-4	3.293µ	3.367µ	~	0.200
DiskTxMap_ExistenceOnly-4	312.4n	296.6n	~	0.100
Queue-4	199.1n	195.7n	~	0.100
AtomicPointer-4	4.803n	5.073n	~	0.100
ReorgOptimizations/DedupFilterPipeline/Old/10K-4	898.6µ	869.2µ	~	0.700
ReorgOptimizations/DedupFilterPipeline/New/10K-4	909.8µ	898.6µ	~	0.200
ReorgOptimizations/AllMarkFalse/Old/10K-4	115.3µ	129.8µ	~	0.100
ReorgOptimizations/AllMarkFalse/New/10K-4	62.60µ	61.88µ	~	0.400
ReorgOptimizations/HashSlicePool/Old/10K-4	64.92µ	59.88µ	~	0.100
ReorgOptimizations/HashSlicePool/New/10K-4	11.33µ	11.96µ	~	0.100
ReorgOptimizations/NodeFlags/Old/10K-4	5.446µ	5.316µ	~	0.400
ReorgOptimizations/NodeFlags/New/10K-4	1.829µ	1.910µ	~	1.000
ReorgOptimizations/DedupFilterPipeline/Old/100K-4	9.559m	10.222m	~	0.100
ReorgOptimizations/DedupFilterPipeline/New/100K-4	9.739m	9.723m	~	1.000
ReorgOptimizations/AllMarkFalse/Old/100K-4	1.154m	1.136m	~	1.000
ReorgOptimizations/AllMarkFalse/New/100K-4	683.1µ	689.3µ	~	0.100
ReorgOptimizations/HashSlicePool/Old/100K-4	630.2µ	721.7µ	~	0.700
ReorgOptimizations/HashSlicePool/New/100K-4	320.0µ	291.7µ	~	0.700
ReorgOptimizations/NodeFlags/Old/100K-4	57.58µ	54.94µ	~	0.100
ReorgOptimizations/NodeFlags/New/100K-4	19.79µ	19.65µ	~	1.000
TxMapSetIfNotExists-4	51.41n	51.57n	~	0.400
TxMapSetIfNotExistsDuplicate-4	38.08n	38.14n	~	1.000
ChannelSendReceive-4	615.3n	620.9n	~	0.100
CalcBlockWork-4	517.4n	507.7n	~	0.100
CalculateWork-4	706.7n	761.6n	~	0.700
BuildBlockLocatorString_Helpers/Size_10-4	1.287µ	1.306µ	~	0.700
BuildBlockLocatorString_Helpers/Size_100-4	14.59µ	15.53µ	~	0.400
BuildBlockLocatorString_Helpers/Size_1000-4	122.4µ	124.6µ	~	0.700
CatchupWithHeaderCache-4	104.1m	104.2m	~	0.200
_prepareTxsPerLevel-4	411.6m	412.4m	~	1.000
_prepareTxsPerLevelOrdered-4	3.564m	3.495m	~	1.000
_prepareTxsPerLevel_Comparison/Original-4	418.5m	414.6m	~	1.000
_prepareTxsPerLevel_Comparison/Optimized-4	3.548m	3.557m	~	1.000
_BufferPoolAllocation/16KB-4	3.299µ	3.221µ	~	0.400
_BufferPoolAllocation/32KB-4	6.746µ	8.789µ	~	0.100
_BufferPoolAllocation/64KB-4	16.63µ	16.58µ	~	1.000
_BufferPoolAllocation/128KB-4	30.43µ	28.46µ	~	0.700
_BufferPoolAllocation/512KB-4	106.9µ	107.6µ	~	1.000
_BufferPoolConcurrent/32KB-4	18.80µ	18.13µ	~	0.100
_BufferPoolConcurrent/64KB-4	26.14µ	29.32µ	~	0.100
_BufferPoolConcurrent/512KB-4	138.0µ	138.2µ	~	0.700
_SubtreeDeserializationWithBufferSizes/16KB-4	622.0µ	608.3µ	~	0.700
_SubtreeDeserializationWithBufferSizes/32KB-4	637.1µ	634.7µ	~	1.000
_SubtreeDeserializationWithBufferSizes/64KB-4	636.8µ	619.9µ	~	0.100
_SubtreeDeserializationWithBufferSizes/128KB-4	631.8µ	632.0µ	~	1.000
_SubtreeDeserializationWithBufferSizes/512KB-4	629.2µ	631.5µ	~	0.700
_SubtreeDataDeserializationWithBufferSizes/16KB-4	35.14m	35.69m	~	0.400
_SubtreeDataDeserializationWithBufferSizes/32KB-4	35.29m	35.14m	~	1.000
_SubtreeDataDeserializationWithBufferSizes/64KB-4	35.24m	35.03m	~	0.700
_SubtreeDataDeserializationWithBufferSizes/128KB-4	34.93m	35.13m	~	0.200
_SubtreeDataDeserializationWithBufferSizes/512KB-4	34.65m	34.76m	~	1.000
_PooledVsNonPooled/Pooled-4	828.7n	828.9n	~	1.000
_PooledVsNonPooled/NonPooled-4	7.147µ	6.413µ	~	0.100
_MemoryFootprint/Current_512KB_32concurrent-4	6.938µ	6.735µ	~	0.100
_MemoryFootprint/Proposed_32KB_32concurrent-4	11.156µ	9.245µ	~	0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4	9.836µ	8.665µ	~	0.100
SubtreeSizes/10k_tx_4_per_subtree-4	1.286m	1.271m	~	0.400
SubtreeSizes/10k_tx_16_per_subtree-4	305.3µ	304.2µ	~	0.700
SubtreeSizes/10k_tx_64_per_subtree-4	71.05µ	70.90µ	~	0.400
SubtreeSizes/10k_tx_256_per_subtree-4	17.90µ	17.91µ	~	0.700
SubtreeSizes/10k_tx_512_per_subtree-4	8.844µ	8.973µ	~	0.400
SubtreeSizes/10k_tx_1024_per_subtree-4	4.356µ	4.429µ	~	0.100
SubtreeSizes/10k_tx_2k_per_subtree-4	2.173µ	2.177µ	~	1.000
BlockSizeScaling/10k_tx_64_per_subtree-4	69.67µ	69.25µ	~	0.700
BlockSizeScaling/10k_tx_256_per_subtree-4	17.51µ	18.04µ	~	0.200
BlockSizeScaling/10k_tx_1024_per_subtree-4	4.358µ	4.417µ	~	0.100
BlockSizeScaling/50k_tx_64_per_subtree-4	381.0µ	369.8µ	~	0.700
BlockSizeScaling/50k_tx_256_per_subtree-4	87.54µ	88.88µ	~	0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4	21.63µ	21.88µ	~	1.000
SubtreeAllocations/small_subtrees_exists_check-4	152.2µ	151.7µ	~	0.700
SubtreeAllocations/small_subtrees_data_fetch-4	160.7µ	158.1µ	~	0.400
SubtreeAllocations/small_subtrees_full_validation-4	311.8µ	308.4µ	~	0.700
SubtreeAllocations/medium_subtrees_exists_check-4	8.891µ	8.851µ	~	0.400
SubtreeAllocations/medium_subtrees_data_fetch-4	9.486µ	9.357µ	~	0.700
SubtreeAllocations/medium_subtrees_full_validation-4	17.75µ	17.78µ	~	1.000
SubtreeAllocations/large_subtrees_exists_check-4	2.117µ	2.067µ	~	0.100
SubtreeAllocations/large_subtrees_data_fetch-4	2.224µ	2.233µ	~	0.200
SubtreeAllocations/large_subtrees_full_validation-4	4.358µ	4.347µ	~	0.400
StoreBlock_Sequential/BelowCSVHeight-4	319.5µ	330.6µ	~	0.200
StoreBlock_Sequential/AboveCSVHeight-4	332.2µ	333.3µ	~	1.000
GetUtxoHashes-4	254.2n	254.5n	~	1.000
GetUtxoHashes_ManyOutputs-4	49.42µ	50.19µ	~	0.700
_NewMetaDataFromBytes-4	238.1n	238.1n	~	1.000
_Bytes-4	632.9n	634.9n	~	0.400
_MetaBytes-4	590.5n	576.1n	~	0.100

Threshold: >10% with p < 0.05 | Generated: 2026-05-12 08:25 UTC

Address review feedback on bsv-blockchain#844: the new helper duplicates `getHighestCheckpointHeight` in services/blockvalidation/catchup.go. - Export the helper as `blockchain.HighestCheckpointHeight` and keep the negative-height guard from the new copy (cp.Height is signed). - Delete the private duplicate from blockvalidation/catchup.go. - Update the three callsites in blockvalidation (BlockValidation.go:1339, :1764, catchup.go:682) to use the shared function. - Rename Server_fsm_run_gate_test.go → server_fsm_run_gate_test.go to satisfy SonarQube's snake_case.go rule (per PR feedback).

Block the FSMEventType_RUN transition in Blockchain.SendFSMEvent whenever the local chain tip is below the network's highest hard-coded checkpoint. Catchup callers continue to log the rejection and re-enter CATCHINGBLOCKS, so the node stops bouncing between states while mid-IBD. Networks with no checkpoints (regtest, brand-new chains) bypass the gate. Mainnet's highest is currently 938000. Follow-up to #843. The ban incident on bsva-ovh-teranode-eu-3 (chain tip ~293k, mainnet highest checkpoint 938000) showed 30 RUN events in 2.5 hours from blockvalidation/catchup's restoreFSMState. The defer unconditionally transitions to RUN on every catchup chunk exit, regardless of whether the chain is caught up. Each RUN window allowed the legacy service to emit at least one P2SH-bearing tx inv to seed- mainnet-eu-1, earning bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED. The right surgical fix is at the FSM chokepoint rather than rewriting the catchup state machine. Pair with #843: 843 stops outbound tx relay while not RUNNING; this stops bogus RUN transitions while below checkpoint. Includes a refactor pulling HighestCheckpointHeight out of blockchain and blockvalidation into a single shared helper.

blockvalidation's Init crash-recovery path (services/blockvalidation/ Server.go:528, added in #3672) unconditionally flips FSM from CATCHINGBLOCKS to RUNNING on startup, on the assumption that a stuck CATCHINGBLOCKS state is a remnant of a prior crash. With the new SendFSMEvent gate (bsv-blockchain#844) the blockchain server legitimately refuses RUN when the local chain tip is below the network's highest checkpoint, which currently happens whenever a node is doing IBD on testnet (highest checkpoint 1700000) or mainnet (highest checkpoint 938000) from a low height. Previously a refused RUN returned ServiceError which the daemon treated as a fatal Init failure and shut the node down: refusing RUN: chain tip height 1481322 is below highest checkpoint 1700000 for testnet Match the behaviour of the other blockchainClient.Run callers (legacy/netsync/manager.go:546,1345,1973; blockvalidation/catchup.go:973): log the rejection and continue. The node stays in CATCHINGBLOCKS, blocks keep flowing via the Kafka consumer (unconditional Start at Server.go:1053) and the legacy/p2p sync paths, catchup runs as normal, and once the chain crosses the checkpoint the next restoreFSMState will get a permit. The Init flip is a workaround for a stuck FSM after a crash, not a declaration that we are caught up. The gate is the real invariant.

The crash-recovery path added in #3672 unconditionally transitioned FSM from CATCHINGBLOCKS to RUNNING at blockvalidation startup, on the assumption that a persisted CATCHINGBLOCKS state was a remnant of a prior crash. With the SendFSMEvent checkpoint gate in this PR the flip can't actually transition when the local chain is mid-IBD, so the previous commit just downgraded its error from fatal to a warning. That's still wrong in spirit: the flip claimed catchup was complete, when it wasn't. Stuck-after-crash recovery is covered by existing runtime triggers that issue RUN under conditions the gate accepts: services/legacy/netsync/manager.go:546 startSync, when a peer reports the same height as the local tip services/blockvalidation/catchup.go:973 restoreFSMState, deferred at the end of every catchup chunk Both run on the normal block-arrival cadence, so a crashed-at-tip node recovers as soon as its first peer handshake completes, and a crashed-mid-IBD node continues catching up.

Block the FSMEventType_RUN transition in Blockchain.SendFSMEvent whenever the local chain tip is below the network's highest hard-coded checkpoint. Catchup callers continue to log the rejection and re-enter CATCHINGBLOCKS, so the node stops bouncing between states while mid-IBD. Networks with no checkpoints (regtest, brand-new chains) bypass the gate. Mainnet's highest is currently 938000, testnet3's is 1700000. Also removes the unconditional Init-time RUN flip in blockvalidation.Server.Init (added in #3672). It bypassed the gate and was the load-bearing reason the node could lie about being RUNNING during IBD. Crash recovery from a stuck CATCHINGBLOCKS state is already covered by legacy startSync and restoreFSMState (both now subject to the gate). Follow-up to #843. The ban incident on bsva-ovh-teranode-eu-3 (chain tip ~293k, mainnet highest checkpoint 938000) showed 30 RUN events in 2.5 hours from blockvalidation/catchup's restoreFSMState. Each RUN window allowed the legacy service to emit at least one P2SH-bearing tx inv to seed-mainnet-eu-1, earning bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED. The right surgical fix is at the FSM chokepoint plus removing the known-bad Init flip, rather than rewriting the catchup state machine. Pair with #843: 843 stops outbound tx relay while not RUNNING; this stops bogus RUN transitions while below checkpoint. Includes a refactor pulling HighestCheckpointHeight out of blockchain and blockvalidation into a single shared blockchain.HighestCheckpointHeight helper.

sonarqubecloud · 2026-05-12T08:21:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
90.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions Bot reviewed May 11, 2026

View reviewed changes

Comment thread services/blockchain/Server.go Outdated

oskarszoon added 2 commits May 12, 2026 09:58

icellan approved these changes May 12, 2026

View reviewed changes

freemans13 approved these changes May 12, 2026

View reviewed changes

oskarszoon merged commit f8903b7 into bsv-blockchain:main May 12, 2026
25 checks passed

oskarszoon mentioned this pull request May 12, 2026

fix(legacy): gate quickValidation on checkpoints, not FSM state #847

Merged

5 tasks

oskarszoon self-assigned this May 12, 2026

oskarszoon mentioned this pull request May 13, 2026

fix(blockchain): skip FSM RUN gate when transitioning from IDLE #855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(blockchain): reject FSM RUN below highest network checkpoint#844

fix(blockchain): reject FSM RUN below highest network checkpoint#844
oskarszoon merged 4 commits into
bsv-blockchain:mainfrom
oskarszoon:fix/fsm-no-running-below-checkpoint

oskarszoon commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oskarszoon commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Implementation

Edge cases

Test plan

Follow-ups

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Report

Summary

Uh oh!

sonarqubecloud Bot commented May 12, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oskarszoon commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading