fix(legacy): wire AddRebroadcastInventory + drop dev listen_mode override#955
Conversation
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. This PR successfully addresses issue #942 with two clean fixes:
The implementation is careful, well-tested, and includes excellent documentation. Code quality aligns with project standards in AGENTS.md. Previous inline comments:
Review completed on 2026-05-28 |
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-28 10:02 UTC |
There was a problem hiding this comment.
Pull request overview
Fixes #942 (Teranode→SV legacy P2P tx broadcasts never reach SV mempools) with two changes: wiring up the previously-dead rebroadcast retry path in the legacy server, and removing the listen_mode.dev = listen_only override in settings.conf that silently disabled legacy tx announce in dev runs. Adds bounded-memory/retry semantics (4096 entries × 6 attempts, buffered modify channel with non-blocking sends and drop counters) plus unit and e2e regression coverage.
Changes:
- Wire
relayTransactions→AddRebroadcastInventory, startrebroadcastHandlerfromStart(), buffermodifyRebroadcastInv, and add per-entryattemptswith cap/aging plusRebroadcastDropCounts()for observability. - Remove
listen_mode.dev = listen_onlyfromsettings.confso dev inherits the documentedfulldefault. - Add unit tests in
peer_server_test.go, an e2e regression test, and document the rebroadcast retry behavior indocs/topics/services/legacy.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| services/legacy/peer_server.go | Adds rebroadcast constants, drop counters, non-blocking add/remove sends, tryAddRebroadcast/processRebroadcastTick helpers, bounded handler, and wires it from relayTransactions + Start. |
| services/legacy/peer_server_test.go | Adds unit tests for enqueue-on-announce, gating, attempt preservation, cap, aging, off-by-one, and drop counter; provisions modifyRebroadcastInv on the relay test server. |
| settings.conf | Removes the dev listen_only override and documents why dev stays on full. |
| docs/topics/services/legacy.md | New §4.2.2 documenting rebroadcast retry semantics, caps, and operator counters. |
| test/e2e/daemon/ready/legacy_tx_broadcast_942_test.go | New e2e regression that spins up SV-Node, submits a tx via Teranode RPC, and asserts it reaches SV's mempool. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3c54b36 to
f1e7fda
Compare
oskarszoon
left a comment
There was a problem hiding this comment.
Approve. Channel wiring + handler lifecycle clean (Start/Stop with wg.Done() and drain loop); cap behavior bounded at maxRebroadcastInventory=4096 / maxRebroadcastAttempts=6 with non-blocking sends and droppedRebroadcastAdds / droppedRebroadcastCapHits atomics; relayTransactions enqueue + 5-minute tick handler does the retry. Double-relay window is harmless — per-peer inv dedup filters it. listen_mode.dev = listen_only removal is clean; SETTINGS_CONTEXT=dev now inherits the documented full default.
legacy_tx_broadcast_942_test.go e2e exercises the immediate-relay path (30s mempool poll) against a real SV-Node peer in the test harness — reproduces the bug pre-fix.
Follow-ups (not blocking)
RebroadcastDropCounts()is exported but not wired to Prometheus yet — doc calls it out as a follow-up.- E2e covers immediate-relay only; the 5-minute rebroadcast retry path is unit-tested (
TestProcessRebroadcastTick_*) but not exercised end-to-end. TransactionConfirmedis still a no-op (pre-existing); entries age out via attempt count, not block inclusion. Documented inlegacy.md.
oskarszoon
left a comment
There was a problem hiding this comment.
Coming back to this after looking at the ListenMode gate that necessitated the dev-context override. The dev override removal is the right immediate unblock for #942, but the gate itself is in the wrong layer and stays wired after this PR.
The gate
services/legacy/peer_server.go:1568, 2686, 2706 — AnnounceNewTransactions, RelayInventory, BroadcastMessage all check s.settings.P2P.ListenMode == ListenModeListenOnly || ListenModeSilent and return early.
P2P.ListenMode (settings/p2p_settings.go:15) is documented as the gate for modern P2P advertisement — listen_only says "no DataHub URL advertisement, cannot be selected as sync source". Its purpose is to stop dev nodes with unreachable asset endpoints from advertising subtrees the network can't download.
Legacy SV-Node tx announce is a different concern: bog-standard Bitcoin INV → GETDATA over the legacy wire. No asset URL involved. The risk that motivated P2P.ListenMode doesn't apply to forwarding txs to SV-Node mempools.
The listen_mode.dev = listen_only override was a symptom of that conflation — the legacy path was being silently disabled in dev runs because it inherited a gate that was never meant for it. Removing the override unblocks dev but leaves any operator running with listen_mode=listen_only (a documented + supported configuration for monitoring / analytics nodes) silently unable to relay txs to legacy peers, while modern P2P listen-only behaviour works as documented.
Recommended fix
Drop the three checks entirely. Legacy P2P relay is already opt-in via enabling the legacy service — a separate gate is redundant. Or, if there's a legitimate use case for relaying-but-not-announcing on legacy, introduce a dedicated Legacy.RelayTxs (or similar) setting whose semantics actually match what the legacy path does.
Either way, P2P.ListenMode shouldn't be the gate on the legacy tx-announce path. Happy to land the dev-override removal + rebroadcast wiring as-is if those land alongside dropping (or replacing) the three legacy-side checks.
f1e7fda to
95af08f
Compare
|
Addressed in 95af08f. Dropped the three Changes from the previous revision:
Verified:
|
ordishs
left a comment
There was a problem hiding this comment.
LGTM. The fix is real, well-bounded, and the test coverage is solid (race detector clean locally; the docker-backed e2e is the right reproducer for #942).
A few non-blocking follow-ups worth tracking:
-
CHANGELOG / release-notes entry for the
listen_modebehaviour change. Dropping the three gates insideservices/legacy/peer_server.gois architecturally correct —listen_modeis documented for the modern P2P service — but it is a real capability change for any operator who was settinglisten_mode=listen_onlyon a node with legacy enabled and relying on that combination to suppress outgoing legacy tx announce. After this PR, the only way to silence legacy tx announce is to disable the legacy service. Worth surfacing so it does not surprise anyone on upgrade. -
Wire
RemoveRebroadcastInventoryviaTransactionConfirmed. The PR docs already flag this. Right nowTransactionConfirmedis a no-op stub, so confirmed txs sit in the queue burning retry ticks until they age out instead of being purged on block inclusion. Bounded by the cap, so not urgent, but wasteful. -
Surface
RebroadcastDropCountsas Prometheus gauges throughservices/legacy/metrics.go(also flagged in the PR docs). Without that, the counters are only reachable via the gRPC(*legacy.Server)surface. -
Watch for per-tick goroutine pile-up under load.
processRebroadcastTickcallss.RelayInventoryper entry, and each call spawns a goroutine that blocks on thecfg.MaxPeers-bufferedrelayInvchannel. A full 4096-entry tick against a briefly slowpeerHandlercould leave ~3,900 goroutines blocked on send. Pre-existing pattern, but the rebroadcast tick is the first caller that fans out N relay-spawns at once — worth keeping an eye on if production data shows the pile-up materialise, possibly with batching or rate-limited re-emission.
Minor cosmetic: PR body lists a test named TestAnnounceNewTransactions_DoesNotEnqueueWhenRelayGated, the actual test is TestAnnounceNewTransactions_DoesNotEnqueueWhenFSMNotRunning. Logic matches the new name.
Nice diagnosis and write-up — particularly the AnnounceNewTransactions doc comment explaining why listen_mode no longer gates it, and legacy.md §4.2.2 documenting the retry semantics for operators.
oskarszoon
left a comment
There was a problem hiding this comment.
Re-review: one change requested, otherwise clean.
Restore listen_mode.dev = listen_only in settings.conf.
Decoupling the three legacy-side gates is exactly right (Option A from the earlier review). With those gates gone, the dev override no longer silently disables legacy tx broadcast — that conflation is what the CHANGES_REQUESTED was about. The override itself was never about legacy: it stops dev nodes with unreachable asset_httpPublicAddress endpoints from advertising DataHub URLs on the modern P2P service and being selected as a sync source. That use case still applies after this PR.
The settings.conf comment ("once the legacy gate was dropped the override became dead weight") is only true for the legacy path. For the modern P2P service it remains the documented mechanism per settings/p2p_settings.go:15, and removing it flips dev-context modern P2P from listen_only → full. That's a behaviour change for dev runs that's orthogonal to fixing #942.
Concretely:
-listen_mode = full
+listen_mode = full
+listen_mode.dev = listen_onlyPlus tweak the comment to reflect what actually happened: legacy gates dropped, modern P2P listen_mode unchanged in semantics, dev override restored because the legacy conflation no longer applies.
Everything else in the PR is clean. Rebroadcast wiring (4096-cap queue, 6-attempt budget, 1024 non-blocking channel, RebroadcastDropCounts() observability) matches SV Node ResendWalletTransactions cadence (src/wallet/wallet.cpp:2126-2156). FSM RUNNING gate preserved as the legitimate suppression on the legacy path. Test coverage exercises all five intended scopes (announce, retry loop, cap, channel-full, listen_mode invariance). E2E docker test is race-safe (uses legacySyncTestLock, isolated ports, context.Background() for cleanup).
Non-blocking follow-ups (already documented in PR or worth filing separately):
- Wire
RebroadcastDropCountsto Prometheus viaservices/legacy/metrics.go. - Wire
TransactionConfirmed→RemoveRebroadcastInventoryso confirmed txs free queue slots on block inclusion. - Spot-check
compose/settings_test.confand other*settings*.conffor residuallisten_mode.dev = listen_only(after this PR restores it in the base settings.conf).
Fixes bsv-blockchain#942 (Teranode→SV legacy P2P tx broadcasts never reach SV mempools) by addressing two distinct problems in the legacy server: 1. AddRebroadcastInventory / rebroadcastHandler were dead code end-to- end: no caller, the goroutine was never started, and the channel was unbuffered. Wire AddRebroadcastInventory in from relayTransactions and start the handler from Start() so txs that hit a transient relay miss (peer mid-handshake, brief disconnect) get periodic retries instead of rotting in the local mempool. Bounded by maxRebroadcastInventory=4096 entries and maxRebroadcastAttempts=6 per entry; AddRebroadcastInventory sends are non-blocking with a 1024-entry buffer to keep the hot relay path uncontended. Drops bump droppedRebroadcastAdds / droppedRebroadcastCapHits counters, exposed via (*legacy.Server).RebroadcastDropCounts() so operators can observe queue saturation. 2. Decouple legacy P2P relay from the modern-P2P `listen_mode` setting. AnnounceNewTransactions, RelayInventory, and BroadcastMessage all used to early-return when listen_mode was listen_only or silent. `listen_mode` is documented for DataHub URL advertisement on the modern P2P service — gating bog-standard Bitcoin INV→GETDATA on it conflated two unrelated concerns and silently broke legacy tx relay for any operator running with listen_mode=listen_only (a supported configuration for monitoring / analytics nodes, and a foot-gun in dev runs via the .dev override). The legacy service is opt-in via enabling it; that opt-in is the only gate it needs. The listen_mode.dev = listen_only override stays in settings.conf — it was always serving the modern-P2P intent (dev nodes shouldn't advertise DataHub URLs), and with the legacy gate gone it does only what its docs say it does. Tests: - New unit tests assert AnnounceNewTransactions / RelayInventory / BroadcastMessage relay regardless of listen_mode (full / listen_only / silent). - Helper-level tests for tryAddRebroadcast cap + attempts-preserved-on- readd, processRebroadcastTick aging, AddRebroadcastInventory drop counter. - E2E regression at test/e2e/daemon/ready that submits a tx via Teranode RPC against a real SV-Node docker container and asserts it reaches SV's mempool. The e2e runs in the .dev context (listen_mode= listen_only), exercising the post-PR behaviour end-to-end. Wired into the legacy-sync Makefile target and excluded from smoketest. Closes bsv-blockchain#942
9cdfa0e to
f745ceb
Compare
|
Addressed in f745ceb. Restored Concrete changes from the previous revision:
Verified:
The two non-blocking follow-ups (Prometheus wiring for the drop counters, |
oskarszoon
left a comment
There was a problem hiding this comment.
Re-review: addressed. Approving.
listen_mode.dev = listen_onlyrestored with comment explaining the modern-P2P-only scope and why the override stays for dev.- E2E test comment updated to match — the test now exercises the dev
listen_onlycontext with the legacy gate decoupled, which is actually a stronger regression check than the previous version.
Everything else from the prior re-review still holds: rebroadcast wiring matches SV Node ResendWalletTransactions cadence, FSM RUNNING remains the legitimate gate, coverage solid, no abuse vector.
|



Summary
Two related fixes for tx propagation over legacy P2P (closes #942):
AddRebroadcastInventoryhad no caller,rebroadcastHandlerwas never started, and the channel was unbuffered. Now:relayTransactionsenqueues every announced tx,Start()spawns the handler, and the channel is buffered with non-blocking sends. Bounded bymaxRebroadcastInventory=4096entries andmaxRebroadcastAttempts=6per entry. Drop counters (droppedRebroadcastAdds,droppedRebroadcastCapHits) are exposed viaRebroadcastDropCounts()so operators can see when the queue saturates.listen_mode.dev = listen_onlysetting.conf override. The legacy server gates its tx announce on the modern-P2PListenMode, so this dev-only override silently disabled legacy P2P tx announce for any node running withSETTINGS_CONTEXT=dev. Dev now inherits the documented default offull; the legacy P2P path works in dev runs without a manual override.Why this fixes #942
The reporter saw txs accepted by Teranode via
sendrawtransactionnever reach connected SV-Node mempools. Two failure modes were in play:RelayInventorydispatch fromAnnounceNewTransactionsis best-effort — a peer mid-version-handshake or briefly disconnected will silently drop the inv. With the rebroadcast feature dead, a tx that missed its first relay had no retry path and sat in the local mempool indefinitely. Wiring up the rebroadcast loop closes that gap.SETTINGS_CONTEXT=devhit thelisten_mode.dev = listen_onlyoverride, which the legacyAnnounceNewTransactionshonoured by silently dropping every announce. Removing the override avoids the foot-gun.How it was diagnosed
Per the systematic-debugging approach: tagged
[diag-942]log lines were added at each component boundary (validator publish → Kafka consume → batcher flush → AnnounceNewTransactions → handleRelayInvMsg → QueueInventory → trickle flush), then driven via a docker-backed e2e test against a realbitcoinsv/bitcoin-sv:1.2.0container. The trail terminated atAnnounceNewTransactions DROPPED listenMode=listen_only, which led to both findings above. The diagnostic instrumentation is not part of this PR — only the fixes and the regression test.Test plan
go test -race ./services/legacy/...— passes (12 sub-packages)services/legacy/peer_server_test.go:TestAnnounceNewTransactions_EnqueuesForRebroadcast— wire-upTestAnnounceNewTransactions_DoesNotEnqueueWhenRelayGated— respects FSM + listen_mode gatesTestTryAddRebroadcast_PreservesAttemptsOnReadd— duplicate adds don't reset retry budgetTestTryAddRebroadcast_DropsAtCap— bounded queueTestProcessRebroadcastTick_AgesOutAfterMaxAttempts— per-entry retry budget honouredTestProcessRebroadcastTick_KeepsEntriesUnderBudget— off-by-one guardTestAddRebroadcastInventory_BumpsDropCounterOnFullChannel— observability counterTestLegacyTxBroadcast942_TeranodeRPCToSVMempoolintest/e2e/daemon/ready/— spins up a real SV-Node docker container, submits a tx via Teranode RPC, asserts it reaches SV's mempool. Lives behind thelegacy-syncMakefile target (Docker required).go vet ./services/legacy/... ./settings/...cleangolangci-lint ./services/legacy0 issuesRisks / Notes
attempts-preserved-on-readd contract is the correct invariant — without it, a duplicate add could refresh the retry budget indefinitely — but it is a behaviour change at the duplicate-add boundary.RebroadcastDropCounts()returns rawuint64s, not yet wired into Prometheus. The existingservices/legacy/metrics.gopattern makes that a small follow-up.docs/topics/services/legacy.mdstill describes the old subtree-notification path that was removed in fix(legacy): remove subtree notification listener causing tx re-broadcasts #735 — out of scope here; the new §4.2.2 stands on its own and documents the rebroadcast retry semantics.TransactionConfirmedhook is called out in the new docs as a follow-up — wiring block inclusion toRemoveRebroadcastInventorywould let the queue self-purge instead of waiting for aging.