Skip to content

fix(legacy): wire AddRebroadcastInventory + drop dev listen_mode override#955

Merged
icellan merged 1 commit into
bsv-blockchain:mainfrom
icellan:fix/legacy-tx-broadcast-942
May 28, 2026
Merged

fix(legacy): wire AddRebroadcastInventory + drop dev listen_mode override#955
icellan merged 1 commit into
bsv-blockchain:mainfrom
icellan:fix/legacy-tx-broadcast-942

Conversation

@icellan

@icellan icellan commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related fixes for tx propagation over legacy P2P (closes #942):

  • Wire up the previously-dead rebroadcast feature. AddRebroadcastInventory had no caller, rebroadcastHandler was never started, and the channel was unbuffered. Now: relayTransactions enqueues every announced tx, Start() spawns the handler, and the channel is buffered with non-blocking sends. Bounded by maxRebroadcastInventory=4096 entries and maxRebroadcastAttempts=6 per entry. Drop counters (droppedRebroadcastAdds, droppedRebroadcastCapHits) are exposed via RebroadcastDropCounts() so operators can see when the queue saturates.
  • Drop the listen_mode.dev = listen_only setting.conf override. The legacy server gates its tx announce on the modern-P2P ListenMode, so this dev-only override silently disabled legacy P2P tx announce for any node running with SETTINGS_CONTEXT=dev. Dev now inherits the documented default of full; the legacy P2P path works in dev runs without a manual override.

Why this fixes #942

The reporter saw txs accepted by Teranode via sendrawtransaction never reach connected SV-Node mempools. Two failure modes were in play:

  1. Transient relay miss with no retry. The immediate RelayInventory dispatch from AnnounceNewTransactions is best-effort — a peer mid-version-handshake or briefly disconnected will silently drop the inv. With the rebroadcast feature dead, a tx that missed its first relay had no retry path and sat in the local mempool indefinitely. Wiring up the rebroadcast loop closes that gap.
  2. Surprise gate in dev context. Reproducing locally with SETTINGS_CONTEXT=dev hit the listen_mode.dev = listen_only override, which the legacy AnnounceNewTransactions honoured by silently dropping every announce. Removing the override avoids the foot-gun.

How it was diagnosed

Per the systematic-debugging approach: tagged [diag-942] log lines were added at each component boundary (validator publish → Kafka consume → batcher flush → AnnounceNewTransactions → handleRelayInvMsg → QueueInventory → trickle flush), then driven via a docker-backed e2e test against a real bitcoinsv/bitcoin-sv:1.2.0 container. The trail terminated at AnnounceNewTransactions DROPPED listenMode=listen_only, which led to both findings above. The diagnostic instrumentation is not part of this PR — only the fixes and the regression test.

Test plan

  • go test -race ./services/legacy/... — passes (12 sub-packages)
  • New unit tests in services/legacy/peer_server_test.go:
    • TestAnnounceNewTransactions_EnqueuesForRebroadcast — wire-up
    • TestAnnounceNewTransactions_DoesNotEnqueueWhenRelayGated — respects FSM + listen_mode gates
    • TestTryAddRebroadcast_PreservesAttemptsOnReadd — duplicate adds don't reset retry budget
    • TestTryAddRebroadcast_DropsAtCap — bounded queue
    • TestProcessRebroadcastTick_AgesOutAfterMaxAttempts — per-entry retry budget honoured
    • TestProcessRebroadcastTick_KeepsEntriesUnderBudget — off-by-one guard
    • TestAddRebroadcastInventory_BumpsDropCounterOnFullChannel — observability counter
  • New e2e regression TestLegacyTxBroadcast942_TeranodeRPCToSVMempool in test/e2e/daemon/ready/ — spins up a real SV-Node docker container, submits a tx via Teranode RPC, asserts it reaches SV's mempool. Lives behind the legacy-sync Makefile target (Docker required).
  • go vet ./services/legacy/... ./settings/... clean
  • golangci-lint ./services/legacy 0 issues

Risks / Notes

  • The attempts-preserved-on-readd contract is the correct invariant — without it, a duplicate add could refresh the retry budget indefinitely — but it is a behaviour change at the duplicate-add boundary.
  • RebroadcastDropCounts() returns raw uint64s, not yet wired into Prometheus. The existing services/legacy/metrics.go pattern makes that a small follow-up.
  • Section 4.2.1 of docs/topics/services/legacy.md still describes the old subtree-notification path that was removed in fix(legacy): remove subtree notification listener causing tx re-broadcasts #735 — out of scope here; the new §4.2.2 stands on its own and documents the rebroadcast retry semantics.
  • The latent dead-code TransactionConfirmed hook is called out in the new docs as a follow-up — wiring block inclusion to RemoveRebroadcastInventory would let the queue self-purge instead of waiting for aging.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🤖 Claude Code Review

Status: Complete


Current Review:

No issues found. This PR successfully addresses issue #942 with two clean fixes:

  1. Rebroadcast feature wiring — Previously dead code is now properly activated with bounded retry queue, drop counters, and comprehensive unit tests
  2. Legacy P2P decoupling — Removed incorrect listen_mode gates from legacy announce path with clear documentation of the rationale

The implementation is careful, well-tested, and includes excellent documentation. Code quality aligns with project standards in AGENTS.md.

Previous inline comments:

  • ✅ Fixed: Test now correctly included in Makefile legacy-sync target
  • ✅ Fixed: Test header rewritten to remove stale diagnostic references

Review completed on 2026-05-28

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-955 (710c902)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 144
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.823µ 1.979µ ~ 1.000
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 59.14n 59.29n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 59.30n 59.35n ~ 0.700
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 59.23n 59.26n ~ 1.000
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 35.54n 34.20n ~ 0.200
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 64.60n 57.96n ~ 0.100
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 165.6n 157.4n ~ 0.200
MiningCandidate_Stringify_Short-4 253.4n 252.3n ~ 0.400
MiningCandidate_Stringify_Long-4 1.771µ 1.768µ ~ 1.000
MiningSolution_Stringify-4 913.3n 914.1n ~ 1.000
BlockInfo_MarshalJSON-4 1.771µ 1.765µ ~ 1.000
NewFromBytes-4 124.6n 125.8n ~ 0.300
AddTxBatchColumnar_Validation-4 2.528µ 2.436µ ~ 1.000
OffsetValidationLoop-4 726.1n 722.3n ~ 1.000
Mine_EasyDifficulty-4 67.18µ 67.62µ ~ 0.100
Mine_WithAddress-4 7.598µ 7.070µ ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 59.28n 60.28n ~ 0.700
DirectSubtreeAdd/64_per_subtree-4 32.08n 31.68n ~ 0.300
DirectSubtreeAdd/256_per_subtree-4 30.57n 30.76n ~ 0.100
DirectSubtreeAdd/1024_per_subtree-4 29.27n 29.44n ~ 0.300
DirectSubtreeAdd/2048_per_subtree-4 29.04n 28.97n ~ 0.700
SubtreeProcessorAdd/4_per_subtree-4 280.7n 282.0n ~ 0.700
SubtreeProcessorAdd/64_per_subtree-4 277.1n 277.7n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 279.8n 281.5n ~ 0.700
SubtreeProcessorAdd/1024_per_subtree-4 269.3n 270.8n ~ 0.400
SubtreeProcessorAdd/2048_per_subtree-4 270.8n 269.1n ~ 0.600
SubtreeProcessorRotate/4_per_subtree-4 272.4n 278.7n ~ 0.100
SubtreeProcessorRotate/64_per_subtree-4 272.3n 276.3n ~ 0.100
SubtreeProcessorRotate/256_per_subtree-4 272.8n 276.2n ~ 0.100
SubtreeProcessorRotate/1024_per_subtree-4 274.6n 276.5n ~ 0.100
SubtreeNodeAddOnly/4_per_subtree-4 54.30n 54.44n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 34.10n 34.09n ~ 1.000
SubtreeNodeAddOnly/256_per_subtree-4 33.29n 33.37n ~ 1.000
SubtreeNodeAddOnly/1024_per_subtree-4 32.52n 32.58n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 113.7n 116.5n ~ 0.100
SubtreeCreationOnly/64_per_subtree-4 400.7n 404.7n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 1.357µ 1.361µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 4.384µ 4.443µ ~ 0.400
SubtreeCreationOnly/2048_per_subtree-4 8.110µ 8.230µ ~ 0.100
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 269.8n 270.2n ~ 0.700
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 270.4n 269.6n ~ 0.700
ParallelGetAndSetIfNotExists/1k_nodes-4 2.179m 2.211m ~ 0.100
ParallelGetAndSetIfNotExists/10k_nodes-4 5.285m 5.409m ~ 0.100
ParallelGetAndSetIfNotExists/50k_nodes-4 7.165m 7.497m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 9.735m 10.130m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 1.941m 1.940m ~ 1.000
SequentialGetAndSetIfNotExists/10k_nodes-4 4.302m 4.345m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 12.20m 12.09m ~ 0.700
SequentialGetAndSetIfNotExists/100k_nodes-4 22.43m 21.93m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 2.231m 2.232m ~ 0.700
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 8.181m 8.125m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 12.91m 12.98m ~ 0.400
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 1.961m 1.965m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 7.354m 7.350m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 39.11m 39.57m ~ 0.200
BlockAssembler_AddTx-4 0.02584n 0.02560n ~ 1.000
AddNode-4 11.72 10.64 ~ 0.100
AddNodeWithMap-4 11.06 11.35 ~ 1.000
DiskTxMap_SetIfNotExists-4 3.828µ 3.930µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.461µ 3.531µ ~ 0.100
DiskTxMap_ExistenceOnly-4 320.1n 318.7n ~ 0.400
Queue-4 186.8n 184.4n ~ 0.200
AtomicPointer-4 3.235n 3.242n ~ 1.000
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 785.5µ 824.6µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/New/10K-4 748.6µ 753.1µ ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/10K-4 103.1µ 105.1µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 64.96µ 64.37µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 62.18µ 50.69µ ~ 0.100
ReorgOptimizations/HashSlicePool/New/10K-4 11.40µ 11.01µ ~ 0.100
ReorgOptimizations/NodeFlags/Old/10K-4 5.107µ 4.684µ ~ 0.100
ReorgOptimizations/NodeFlags/New/10K-4 1.651µ 1.600µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 9.102m 9.271m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 9.524m 9.398m ~ 0.400
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.071m 1.073m ~ 0.700
ReorgOptimizations/AllMarkFalse/New/100K-4 703.3µ 704.1µ ~ 1.000
ReorgOptimizations/HashSlicePool/Old/100K-4 608.9µ 622.5µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/100K-4 215.8µ 215.2µ ~ 1.000
ReorgOptimizations/NodeFlags/Old/100K-4 42.38µ 50.20µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 15.93µ 17.19µ ~ 0.100
TxMapSetIfNotExists-4 50.66n 49.99n ~ 0.700
TxMapSetIfNotExistsDuplicate-4 41.57n 41.41n ~ 0.700
ChannelSendReceive-4 598.6n 574.5n ~ 0.100
CalcBlockWork-4 482.7n 480.5n ~ 1.000
CalculateWork-4 645.3n 638.1n ~ 0.200
BuildBlockLocatorString_Helpers/Size_10-4 1.324µ 1.328µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_100-4 15.23µ 12.80µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_1000-4 126.1µ 127.3µ ~ 0.100
CatchupWithHeaderCache-4 104.0m 103.9m ~ 0.200
_BufferPoolAllocation/16KB-4 4.452µ 4.898µ ~ 0.200
_BufferPoolAllocation/32KB-4 11.84µ 11.44µ ~ 1.000
_BufferPoolAllocation/64KB-4 22.17µ 19.19µ ~ 0.100
_BufferPoolAllocation/128KB-4 41.75µ 37.22µ ~ 0.100
_BufferPoolAllocation/512KB-4 142.1µ 139.6µ ~ 0.700
_BufferPoolConcurrent/32KB-4 23.70µ 23.71µ ~ 0.700
_BufferPoolConcurrent/64KB-4 34.64µ 35.90µ ~ 0.700
_BufferPoolConcurrent/512KB-4 174.5µ 187.7µ ~ 0.200
_SubtreeDeserializationWithBufferSizes/16KB-4 696.8µ 741.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 689.2µ 829.6µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 705.6µ 832.8µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 696.1µ 810.0µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 693.3µ 733.9µ ~ 0.200
_SubtreeDataDeserializationWithBufferSizes/16KB-4 42.52m 43.32m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 42.39m 43.13m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 42.30m 43.31m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 41.98m 43.06m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/512KB-4 42.47m 43.07m ~ 0.100
_PooledVsNonPooled/Pooled-4 804.0n 811.9n ~ 0.100
_PooledVsNonPooled/NonPooled-4 8.353µ 9.842µ ~ 0.100
_MemoryFootprint/Current_512KB_32concurrent-4 8.212µ 8.142µ ~ 0.700
_MemoryFootprint/Proposed_32KB_32concurrent-4 10.98µ 11.03µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 10.34µ 11.21µ ~ 0.100
_prepareTxsPerLevel-4 409.6m 413.2m ~ 1.000
_prepareTxsPerLevelOrdered-4 3.867m 3.408m ~ 0.100
_prepareTxsPerLevel_Comparison/Original-4 415.1m 412.9m ~ 1.000
_prepareTxsPerLevel_Comparison/Optimized-4 3.668m 3.418m ~ 0.200
SubtreeSizes/10k_tx_4_per_subtree-4 1.391m 1.409m ~ 0.700
SubtreeSizes/10k_tx_16_per_subtree-4 324.9µ 329.1µ ~ 1.000
SubtreeSizes/10k_tx_64_per_subtree-4 78.79µ 79.00µ ~ 1.000
SubtreeSizes/10k_tx_256_per_subtree-4 19.72µ 19.69µ ~ 1.000
SubtreeSizes/10k_tx_512_per_subtree-4 9.690µ 9.759µ ~ 1.000
SubtreeSizes/10k_tx_1024_per_subtree-4 4.827µ 4.878µ ~ 1.000
SubtreeSizes/10k_tx_2k_per_subtree-4 2.443µ 2.453µ ~ 1.000
BlockSizeScaling/10k_tx_64_per_subtree-4 78.99µ 76.81µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 19.64µ 19.56µ ~ 1.000
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.939µ 4.776µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 404.3µ 394.5µ ~ 0.700
BlockSizeScaling/50k_tx_256_per_subtree-4 99.38µ 97.14µ ~ 0.700
BlockSizeScaling/50k_tx_1024_per_subtree-4 24.39µ 24.16µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 159.0µ 158.0µ ~ 0.200
SubtreeAllocations/small_subtrees_data_fetch-4 167.6µ 165.3µ ~ 0.200
SubtreeAllocations/small_subtrees_full_validation-4 322.7µ 318.8µ ~ 0.400
SubtreeAllocations/medium_subtrees_exists_check-4 9.612µ 9.571µ ~ 0.700
SubtreeAllocations/medium_subtrees_data_fetch-4 10.41µ 10.38µ ~ 1.000
SubtreeAllocations/medium_subtrees_full_validation-4 20.08µ 19.78µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.409µ 2.352µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.589µ 2.563µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 5.000µ 4.967µ ~ 0.700
StoreBlock_Sequential/BelowCSVHeight-4 243.9µ 244.0µ ~ 1.000
StoreBlock_Sequential/AboveCSVHeight-4 243.0µ 247.4µ ~ 0.200
GetUtxoHashes-4 253.5n 252.2n ~ 1.000
GetUtxoHashes_ManyOutputs-4 49.52µ 49.47µ ~ 0.700
_NewMetaDataFromBytes-4 229.7n 227.1n ~ 0.100
_Bytes-4 421.0n 417.0n ~ 0.700
_MetaBytes-4 139.3n 137.1n ~ 0.200

Threshold: >10% with p < 0.05 | Generated: 2026-05-28 10:02 UTC

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes #942 (Teranode→SV legacy P2P tx broadcasts never reach SV mempools) with two changes: wiring up the previously-dead rebroadcast retry path in the legacy server, and removing the listen_mode.dev = listen_only override in settings.conf that silently disabled legacy tx announce in dev runs. Adds bounded-memory/retry semantics (4096 entries × 6 attempts, buffered modify channel with non-blocking sends and drop counters) plus unit and e2e regression coverage.

Changes:

  • Wire relayTransactionsAddRebroadcastInventory, start rebroadcastHandler from Start(), buffer modifyRebroadcastInv, and add per-entry attempts with cap/aging plus RebroadcastDropCounts() for observability.
  • Remove listen_mode.dev = listen_only from settings.conf so dev inherits the documented full default.
  • Add unit tests in peer_server_test.go, an e2e regression test, and document the rebroadcast retry behavior in docs/topics/services/legacy.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
services/legacy/peer_server.go Adds rebroadcast constants, drop counters, non-blocking add/remove sends, tryAddRebroadcast/processRebroadcastTick helpers, bounded handler, and wires it from relayTransactions + Start.
services/legacy/peer_server_test.go Adds unit tests for enqueue-on-announce, gating, attempt preservation, cap, aging, off-by-one, and drop counter; provisions modifyRebroadcastInv on the relay test server.
settings.conf Removes the dev listen_only override and documents why dev stays on full.
docs/topics/services/legacy.md New §4.2.2 documenting rebroadcast retry semantics, caps, and operator counters.
test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go New e2e regression that spins up SV-Node, submits a tx via Teranode RPC, and asserts it reaches SV's mempool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go
Comment thread test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go Outdated
@icellan icellan force-pushed the fix/legacy-tx-broadcast-942 branch from 3c54b36 to f1e7fda Compare May 27, 2026 10:33

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve. Channel wiring + handler lifecycle clean (Start/Stop with wg.Done() and drain loop); cap behavior bounded at maxRebroadcastInventory=4096 / maxRebroadcastAttempts=6 with non-blocking sends and droppedRebroadcastAdds / droppedRebroadcastCapHits atomics; relayTransactions enqueue + 5-minute tick handler does the retry. Double-relay window is harmless — per-peer inv dedup filters it. listen_mode.dev = listen_only removal is clean; SETTINGS_CONTEXT=dev now inherits the documented full default.

legacy_tx_broadcast_942_test.go e2e exercises the immediate-relay path (30s mempool poll) against a real SV-Node peer in the test harness — reproduces the bug pre-fix.

Follow-ups (not blocking)

  • RebroadcastDropCounts() is exported but not wired to Prometheus yet — doc calls it out as a follow-up.
  • E2e covers immediate-relay only; the 5-minute rebroadcast retry path is unit-tested (TestProcessRebroadcastTick_*) but not exercised end-to-end.
  • TransactionConfirmed is still a no-op (pre-existing); entries age out via attempt count, not block inclusion. Documented in legacy.md.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this after looking at the ListenMode gate that necessitated the dev-context override. The dev override removal is the right immediate unblock for #942, but the gate itself is in the wrong layer and stays wired after this PR.

The gate

services/legacy/peer_server.go:1568, 2686, 2706AnnounceNewTransactions, RelayInventory, BroadcastMessage all check s.settings.P2P.ListenMode == ListenModeListenOnly || ListenModeSilent and return early.

P2P.ListenMode (settings/p2p_settings.go:15) is documented as the gate for modern P2P advertisement — listen_only says "no DataHub URL advertisement, cannot be selected as sync source". Its purpose is to stop dev nodes with unreachable asset endpoints from advertising subtrees the network can't download.

Legacy SV-Node tx announce is a different concern: bog-standard Bitcoin INV → GETDATA over the legacy wire. No asset URL involved. The risk that motivated P2P.ListenMode doesn't apply to forwarding txs to SV-Node mempools.

The listen_mode.dev = listen_only override was a symptom of that conflation — the legacy path was being silently disabled in dev runs because it inherited a gate that was never meant for it. Removing the override unblocks dev but leaves any operator running with listen_mode=listen_only (a documented + supported configuration for monitoring / analytics nodes) silently unable to relay txs to legacy peers, while modern P2P listen-only behaviour works as documented.

Recommended fix

Drop the three checks entirely. Legacy P2P relay is already opt-in via enabling the legacy service — a separate gate is redundant. Or, if there's a legitimate use case for relaying-but-not-announcing on legacy, introduce a dedicated Legacy.RelayTxs (or similar) setting whose semantics actually match what the legacy path does.

Either way, P2P.ListenMode shouldn't be the gate on the legacy tx-announce path. Happy to land the dev-override removal + rebroadcast wiring as-is if those land alongside dropping (or replacing) the three legacy-side checks.

@icellan icellan force-pushed the fix/legacy-tx-broadcast-942 branch from f1e7fda to 95af08f Compare May 27, 2026 15:47
@icellan

icellan commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

Addressed in 95af08f. Dropped the three P2P.ListenMode gates from AnnounceNewTransactions, RelayInventory, and BroadcastMessage — went with your Option A since the legacy service is already opt-in via being enabled, no separate Legacy.RelayTxs setting needed.

Changes from the previous revision:

  • services/legacy/peer_server.go: removed the three listen_mode early-returns; doc-comments on each now explicitly note the decoupling.
  • services/legacy/peer_server_test.go: new table-driven tests TestAnnounceNewTransactions_RelaysRegardlessOfListenMode, TestRelayInventory_RelaysTxRegardlessOfListenMode, TestBroadcastMessage_BroadcastsRegardlessOfListenMode — each runs against full/listen_only/silent and asserts the relay/queue/broadcast still happens. The old ListenMode=listen_only subtest (which asserted the now-removed behavior) is gone; the FSM-not-RUNNING gate test stays under its own name.
  • settings.conf: comment rewritten to explain the decoupling and that the .dev override is now dead weight once the legacy gate is gone.
  • docs/topics/services/legacy.md §4.2.2: the announce-gate bullet no longer mentions listen_mode; cross-references the rationale in the code.
  • Commit subject + body updated to reflect the broader scope.

Verified:

  • go test -race ./services/legacy/... — all 12 packages pass
  • golangci-lint — clean
  • E2E TestLegacyTxBroadcast942_TeranodeRPCToSVMempool — passes against a real SV-Node docker container (18.7s)

@ordishs ordishs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The fix is real, well-bounded, and the test coverage is solid (race detector clean locally; the docker-backed e2e is the right reproducer for #942).

A few non-blocking follow-ups worth tracking:

  1. CHANGELOG / release-notes entry for the listen_mode behaviour change. Dropping the three gates inside services/legacy/peer_server.go is architecturally correct — listen_mode is documented for the modern P2P service — but it is a real capability change for any operator who was setting listen_mode=listen_only on a node with legacy enabled and relying on that combination to suppress outgoing legacy tx announce. After this PR, the only way to silence legacy tx announce is to disable the legacy service. Worth surfacing so it does not surprise anyone on upgrade.

  2. Wire RemoveRebroadcastInventory via TransactionConfirmed. The PR docs already flag this. Right now TransactionConfirmed is a no-op stub, so confirmed txs sit in the queue burning retry ticks until they age out instead of being purged on block inclusion. Bounded by the cap, so not urgent, but wasteful.

  3. Surface RebroadcastDropCounts as Prometheus gauges through services/legacy/metrics.go (also flagged in the PR docs). Without that, the counters are only reachable via the gRPC (*legacy.Server) surface.

  4. Watch for per-tick goroutine pile-up under load. processRebroadcastTick calls s.RelayInventory per entry, and each call spawns a goroutine that blocks on the cfg.MaxPeers-buffered relayInv channel. A full 4096-entry tick against a briefly slow peerHandler could leave ~3,900 goroutines blocked on send. Pre-existing pattern, but the rebroadcast tick is the first caller that fans out N relay-spawns at once — worth keeping an eye on if production data shows the pile-up materialise, possibly with batching or rate-limited re-emission.

Minor cosmetic: PR body lists a test named TestAnnounceNewTransactions_DoesNotEnqueueWhenRelayGated, the actual test is TestAnnounceNewTransactions_DoesNotEnqueueWhenFSMNotRunning. Logic matches the new name.

Nice diagnosis and write-up — particularly the AnnounceNewTransactions doc comment explaining why listen_mode no longer gates it, and legacy.md §4.2.2 documenting the retry semantics for operators.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: one change requested, otherwise clean.

Restore listen_mode.dev = listen_only in settings.conf.

Decoupling the three legacy-side gates is exactly right (Option A from the earlier review). With those gates gone, the dev override no longer silently disables legacy tx broadcast — that conflation is what the CHANGES_REQUESTED was about. The override itself was never about legacy: it stops dev nodes with unreachable asset_httpPublicAddress endpoints from advertising DataHub URLs on the modern P2P service and being selected as a sync source. That use case still applies after this PR.

The settings.conf comment ("once the legacy gate was dropped the override became dead weight") is only true for the legacy path. For the modern P2P service it remains the documented mechanism per settings/p2p_settings.go:15, and removing it flips dev-context modern P2P from listen_onlyfull. That's a behaviour change for dev runs that's orthogonal to fixing #942.

Concretely:

-listen_mode = full
+listen_mode     = full
+listen_mode.dev = listen_only

Plus tweak the comment to reflect what actually happened: legacy gates dropped, modern P2P listen_mode unchanged in semantics, dev override restored because the legacy conflation no longer applies.

Everything else in the PR is clean. Rebroadcast wiring (4096-cap queue, 6-attempt budget, 1024 non-blocking channel, RebroadcastDropCounts() observability) matches SV Node ResendWalletTransactions cadence (src/wallet/wallet.cpp:2126-2156). FSM RUNNING gate preserved as the legitimate suppression on the legacy path. Test coverage exercises all five intended scopes (announce, retry loop, cap, channel-full, listen_mode invariance). E2E docker test is race-safe (uses legacySyncTestLock, isolated ports, context.Background() for cleanup).

Non-blocking follow-ups (already documented in PR or worth filing separately):

  • Wire RebroadcastDropCounts to Prometheus via services/legacy/metrics.go.
  • Wire TransactionConfirmedRemoveRebroadcastInventory so confirmed txs free queue slots on block inclusion.
  • Spot-check compose/settings_test.conf and other *settings*.conf for residual listen_mode.dev = listen_only (after this PR restores it in the base settings.conf).

Fixes bsv-blockchain#942 (Teranode→SV legacy P2P tx broadcasts never reach SV
mempools) by addressing two distinct problems in the legacy server:

1. AddRebroadcastInventory / rebroadcastHandler were dead code end-to-
   end: no caller, the goroutine was never started, and the channel was
   unbuffered. Wire AddRebroadcastInventory in from relayTransactions
   and start the handler from Start() so txs that hit a transient relay
   miss (peer mid-handshake, brief disconnect) get periodic retries
   instead of rotting in the local mempool. Bounded by
   maxRebroadcastInventory=4096 entries and maxRebroadcastAttempts=6
   per entry; AddRebroadcastInventory sends are non-blocking with a
   1024-entry buffer to keep the hot relay path uncontended. Drops bump
   droppedRebroadcastAdds / droppedRebroadcastCapHits counters, exposed
   via (*legacy.Server).RebroadcastDropCounts() so operators can
   observe queue saturation.

2. Decouple legacy P2P relay from the modern-P2P `listen_mode` setting.
   AnnounceNewTransactions, RelayInventory, and BroadcastMessage all
   used to early-return when listen_mode was listen_only or silent.
   `listen_mode` is documented for DataHub URL advertisement on the
   modern P2P service — gating bog-standard Bitcoin INV→GETDATA on it
   conflated two unrelated concerns and silently broke legacy tx relay
   for any operator running with listen_mode=listen_only (a supported
   configuration for monitoring / analytics nodes, and a foot-gun in
   dev runs via the .dev override). The legacy service is opt-in via
   enabling it; that opt-in is the only gate it needs. The
   listen_mode.dev = listen_only override stays in settings.conf — it
   was always serving the modern-P2P intent (dev nodes shouldn't
   advertise DataHub URLs), and with the legacy gate gone it does only
   what its docs say it does.

Tests:
- New unit tests assert AnnounceNewTransactions / RelayInventory /
  BroadcastMessage relay regardless of listen_mode (full / listen_only /
  silent).
- Helper-level tests for tryAddRebroadcast cap + attempts-preserved-on-
  readd, processRebroadcastTick aging, AddRebroadcastInventory drop
  counter.
- E2E regression at test/e2e/daemon/ready that submits a tx via
  Teranode RPC against a real SV-Node docker container and asserts it
  reaches SV's mempool. The e2e runs in the .dev context (listen_mode=
  listen_only), exercising the post-PR behaviour end-to-end. Wired into
  the legacy-sync Makefile target and excluded from smoketest.

Closes bsv-blockchain#942
@icellan icellan force-pushed the fix/legacy-tx-broadcast-942 branch from 9cdfa0e to f745ceb Compare May 28, 2026 09:47
@icellan

icellan commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Addressed in f745ceb. Restored listen_mode.dev = listen_only in settings.conf and rewrote the comment to reflect what actually changed.

Concrete changes from the previous revision:

  • settings.conflisten_mode.dev = listen_only back, exactly as requested. Comment now explains: modern-P2P semantics unchanged (this is what the .dev override has always been about); legacy gate dropped in this PR; the override no longer doubles as a foot-gun for legacy because legacy doesn't read it anymore.
  • test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go — updated the stale comment in SettingsOverrideFunc. The e2e runs in the .dev context, so listen_mode=listen_only is now active during the test. Test passes (18.4s) — proves the decoupling works end-to-end with the override in place. This is now meaningful coverage rather than a no-op.
  • Commit body — dropped the line about "removing the override"; the override removal was never the intent now that the legacy gate is gone.
  • Spot-checked the other *.conf files per your follow-up: only settings.conf references listen_mode at all. No residual .dev overrides to clean up elsewhere.

Verified:

  • go test -race ./services/legacy/... — clean (12 pkgs)
  • golangci-lint ./services/legacy ./test/e2e/daemon/ready — only pre-existing prealloc warnings in utxo_test.go (not in diff)
  • E2E TestLegacyTxBroadcast942_TeranodeRPCToSVMempool against real SV-Node docker — PASS, 18.45s, with listen_mode.dev = listen_only active

The two non-blocking follow-ups (Prometheus wiring for the drop counters, TransactionConfirmedRemoveRebroadcastInventory for queue self-purge) — happy to file as separate issues / follow-up PRs once this lands.

@oskarszoon oskarszoon left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: addressed. Approving.

  • listen_mode.dev = listen_only restored with comment explaining the modern-P2P-only scope and why the override stays for dev.
  • E2E test comment updated to match — the test now exercises the dev listen_only context with the legacy gate decoupled, which is actually a stronger regression check than the previous version.

Everything else from the prior re-review still holds: rebroadcast wiring matches SV Node ResendWalletTransactions cadence, FSM RUNNING remains the legitimate gate, coverage solid, no abuse vector.

@sonarqubecloud

Copy link
Copy Markdown

@icellan icellan merged commit 3e1ed38 into bsv-blockchain:main May 28, 2026
25 checks passed
@icellan icellan self-assigned this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Teranode→SV legacy P2P tx broadcast appears broken: txs accepted by Teranode never reach connected SV-Node mempools

4 participants