fix(legacy): suppress tx relay while FSM != RUNNING#843
Merged
oskarszoon merged 1 commit intoMay 12, 2026
Conversation
The legacy server's outbound transaction-relay paths (AnnounceNewTransactions, RelayInventory for tx invs) were ungated by FSM state. When the node is in LEGACYSYNCING / CATCHINGBLOCKS and the local chain tip sits below the Genesis activation height, the validator accepts pre-Genesis-only outputs such as P2SH. Re-broadcasting those tx to post-Genesis peers triggers an instant ban (`bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED`) on the BSV legacy network. Observed on mainnet eu-3 (height ~293k, Genesis at 620538): two ban events from a seed SV node within a single afternoon, the second re-triggered immediately after a clearban. Mirrors the inbound INV gate that already exists (manager.go handleInvMsg + the kafka listener at manager.go:2342). Adds a canRelayTx() helper backed by blockchain.Client's atomic FSM cache, so the check costs an atomic load per tx. Fails closed on FSM read error. Block invs continue to be relayed in all states (block sync needs them during legacy/catchup).
Contributor
|
🤖 Claude Code Review Status: Complete Current Review: Summary:
Code Quality:
|
|
Contributor
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-11 18:38 UTC |
6 tasks
icellan
approved these changes
May 11, 2026
oskarszoon
added a commit
that referenced
this pull request
May 12, 2026
The legacy server's outbound transaction-relay paths (AnnounceNewTransactions, RelayInventory for tx invs) were not gated by FSM state. While the node is in LEGACYSYNCING / CATCHINGBLOCKS and its local chain tip is below the Genesis activation height, validator.checkOutputs accepts pre-Genesis-only outputs (notably P2SH). Re-broadcasting those tx to post-Genesis peers earns an instant ban (bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED) on the BSV legacy network. Observed on bsva-ovh-teranode-eu-3 (mainnet, height ~293k, Genesis at 620538): peer 57.129.99.214 banned 0->100 twice in two hours, second ban immediately after a manual clearbanned. Adds an FSM gate (canRelayTx) to both outbound entry points. Block invs are still relayed in every state — only tx invs are suppressed. The check is cheap: blockchain.Client serves FSM state from a locally- cached atomic.Pointer, so this is ~5 ns on the hot path with no per-tx RPC.
oskarszoon
added a commit
that referenced
this pull request
May 12, 2026
Block the FSMEventType_RUN transition in Blockchain.SendFSMEvent whenever the local chain tip is below the network's highest hard-coded checkpoint. Catchup callers continue to log the rejection and re-enter CATCHINGBLOCKS, so the node stops bouncing between states while mid-IBD. Networks with no checkpoints (regtest, brand-new chains) bypass the gate. Mainnet's highest is currently 938000. Follow-up to #843. The ban incident on bsva-ovh-teranode-eu-3 (chain tip ~293k, mainnet highest checkpoint 938000) showed 30 RUN events in 2.5 hours from blockvalidation/catchup's restoreFSMState. The defer unconditionally transitions to RUN on every catchup chunk exit, regardless of whether the chain is caught up. Each RUN window allowed the legacy service to emit at least one P2SH-bearing tx inv to seed- mainnet-eu-1, earning bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED. The right surgical fix is at the FSM chokepoint rather than rewriting the catchup state machine. Pair with #843: 843 stops outbound tx relay while not RUNNING; this stops bogus RUN transitions while below checkpoint. Includes a refactor pulling HighestCheckpointHeight out of blockchain and blockvalidation into a single shared helper.
freemans13
approved these changes
May 12, 2026
oskarszoon
added a commit
that referenced
this pull request
May 12, 2026
Block the FSMEventType_RUN transition in Blockchain.SendFSMEvent whenever the local chain tip is below the network's highest hard-coded checkpoint. Catchup callers continue to log the rejection and re-enter CATCHINGBLOCKS, so the node stops bouncing between states while mid-IBD. Networks with no checkpoints (regtest, brand-new chains) bypass the gate. Mainnet's highest is currently 938000, testnet3's is 1700000. Also removes the unconditional Init-time RUN flip in blockvalidation.Server.Init (added in #3672). It bypassed the gate and was the load-bearing reason the node could lie about being RUNNING during IBD. Crash recovery from a stuck CATCHINGBLOCKS state is already covered by legacy startSync and restoreFSMState (both now subject to the gate). Follow-up to #843. The ban incident on bsva-ovh-teranode-eu-3 (chain tip ~293k, mainnet highest checkpoint 938000) showed 30 RUN events in 2.5 hours from blockvalidation/catchup's restoreFSMState. Each RUN window allowed the legacy service to emit at least one P2SH-bearing tx inv to seed-mainnet-eu-1, earning bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED. The right surgical fix is at the FSM chokepoint plus removing the known-bad Init flip, rather than rewriting the catchup state machine. Pair with #843: 843 stops outbound tx relay while not RUNNING; this stops bogus RUN transitions while below checkpoint. Includes a refactor pulling HighestCheckpointHeight out of blockchain and blockvalidation into a single shared blockchain.HighestCheckpointHeight helper.
This was referenced May 13, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
AnnounceNewTransactions,RelayInventoryfor tx invs) are not gated by FSM state. While the node is inLEGACYSYNCING/CATCHINGBLOCKSand its local chain tip is below the Genesis activation height,validator.checkOutputsaccepts pre-Genesis-only outputs (notably P2SH). Re-broadcasting those tx to post-Genesis peers earns an instant ban (bad-txns-vout-p2sh BAN THRESHOLD EXCEEDED) on the BSV legacy network.canRelayTx) to both outbound entry points. Block invs are still relayed in every state — only tx invs are suppressed.Background incident
On
bsva-ovh-teranode-eu-3(mainnet, height ~293k; Genesis at 620538):Second ban triggered immediately after a manual
clearbanned. eu-3 chain state at the time of the ban was ~block 293030 (well below Genesis), so the validator treated P2SH as valid and the unguarded relay paths broadcast the tx to seed-mainnet-eu-1.Trigger paths (both unguarded prior to this PR)
services/legacy/netsync/handle_block.go:185-195— orphan-pool drain after each accepted block callspeerNotifier.AnnounceNewTransactions(acceptedTxs).services/legacy/netsync/manager.go:1004— direct tx accept afterhandleTxMsg.Both funnel into
server.AnnounceNewTransactions/server.RelayInventoryinservices/legacy/peer_server.go. Those two functions checked onlyP2P.ListenMode(added in #589 for the silent-mode kill switch) — no FSM gate.Why this wasn't a regression
History check (
git log -S 'AnnounceNewTransactions'and-S 'relayTransactions'onservices/legacy/peer_server.go) shows the functions were introduced in the original WIP legacy commit (c33490517) and never gated. The symmetric inbound INV path did get an FSM gate in Sept 2024 (dc2971de8,977071daa— "process transactions when in RUNNING mode"). The outbound side was simply missed. Bug stayed latent because eu-3-class nodes normally run at tip — it only fires when chain < Genesis, which is uncommon outside of fresh IBD from low height.Implementation
(s *server) canRelayTx() boolinpeer_server.gonext toBanPeer/RelayInventory. CallsblockchainClient.IsFSMCurrentState(ctx, FSMStateRUNNING). Fails closed on error.AnnounceNewTransactions: early-return if!canRelayTx().RelayInventory: same gate, but only wheninvVect.Type == wire.InvTypeTx. Block invs continue unconditionally so block sync still works during legacy/catch-up.Performance
The check is cheap.
blockchain.Client.GetFSMCurrentState(Client.go:1670) serves from a locally-cachedatomic.Pointer[FSMStateType](c.fmsState), updated by the FSM subscription handler (Client.go:188-190 onNotificationType_FSMState). Hot path = atomic load + dereference + bool compare (~5 ns). Cold path = one gRPC fallback before the first subscription event, then cached for the rest of the process lifetime. No per-tx RPC.Test plan
services/legacy/peer_server_test.go:TestCanRelayTx_FSMStates— table-driven across RUNNING / LEGACYSYNCING / CATCHINGBLOCKS / IDLE plus the FSM-error case.TestCanRelayTx_NilBlockchainClient— defensive path returnstruewithout panicking.TestRelayInventory_SuppressesTxWhenNotRunning— regression test for the ban incident: tx inv must not hitrelayInvwhen FSM = CATCHINGBLOCKS.TestRelayInventory_RelaysTxWhenRunning— positive case: tx inv flows when RUNNING.TestRelayInventory_AlwaysRelaysBlockInvs— guards against over-broad fix; block invs must flow in every state.TestAnnounceNewTransactions_SuppressedWhenNotRunning— covers the public entry called from the orphan-pool drain.go build ./services/legacy/...go test -count=1 -race ./services/legacy/...→ 330 passedgo vet ./services/legacy/...Follow-ups (out of scope)
checkOutputsgating P2SH onlocalChainHeightrather than ontipHeight or max(localHeight, ActivationHeight)is the deeper issue: even with this PR, if the node receives a tx via direct gossip while in RUNNING but its chain tip drifts below Genesis (e.g. after a deep reorg), the same broadcast leak can occur. Worth a follow-up to make Genesis enforcement tip-aware (or gate the validator's mempool-accept path on FSM state as well).