Skip to content

p2p: protect manual evictions#34743

Open
willcl-ark wants to merge 8 commits intobitcoin:masterfrom
willcl-ark:protect-manual-evictions
Open

p2p: protect manual evictions#34743
willcl-ark wants to merge 8 commits intobitcoin:masterfrom
willcl-ark:protect-manual-evictions

Conversation

@willcl-ark
Copy link
Member

Closes: #5097

Manual peers (-addnode, -connect + RPC) currently get disconnected by IBD timeout logic for block stalling, block download timeout and headers sync timeout. This can conflict with explicit user intent to keep these peers connected (especially in privacy-focused -connect setups), and causes unnecessary disconnect/reconnect churn.

This PR changes that behavior: manual peers are not disconnected by those timeout paths and we try to re-assign their work to other peers to maintain progress.


Disconnecting manual peers automatically causes reconnect loops and other behavior that can violate user expectation for manually configured peers.

Simply “not disconnecting” (for block stalling) is not sufficient by itself: a manual peer can retain in-flight block assignments and delay overall IBD progress.

I considered taking an approach more like that from #32051, simply protecting peers from some disconnects. However without the reassignment the stalling effects are simply prolonged (which is worse, IMO).

This PR uses a different appraoch:

  • For manual peers in block-stalling/download-timeout paths:
    • release all in-flight blocks from that peer,
    • set a one-round recovery flag (m_stall_recovery) so scheduler ordering doesn’t immediately reassign those same blocks back to the same peer.

This attempts to both preserve manual connections and maintain download progress. It actively reassigns stalled work so IBD can continue, rather than just waiting longer. And, the one-round skip prevents immediate re-capture of freed blocks by the same manual peer due to per-peer send ordering.

Compared with simpler “don’t disconnect” approaches, this is more robust both in tests, and real-world slow-peer conditions.

I believe the changes do not increase remote attacker control because it only applies to manual peers (-addnode, -connect, addnode), which are explicitly chosen by the local operator. You were and still are free to add a malicious peer manually.

A malicious manually-added peer can now remain connected longer, but that peer was already granted privileged trust by operator configuration. And in fact, for block stalling/download timeout this PR reduces potential impact of a slow/malicious manual peer by releasing its in-flight blocks so other peers can fetch them.

Of course, if an operator relies only on bad manual peers, sync can still be delayed.

@willcl-ark willcl-ark changed the title Protect manual evictions p2p: protect manual evictions Mar 5, 2026
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 5, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK w0xlt, sedited, kevkevinpal, brunoerg

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34588 (refactor: decompose Peer struct into focused sub-components by w0xlt)
  • #34565 (refactor: extract BlockDownloadManager from PeerManagerImpl by w0xlt)
  • #34389 (net/log: standardize peer+addr log formatting via LogPeer by l0rinc)
  • #33854 (fix assumevalid is ignored during reindex by Eunovo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • nodes added using addnode line -> "are not required to be full nodes or support SegWit" [clarifies the intended meaning; "full nodes/support SegWit" is ambiguous/grammatically incorrect — should use "or" between the two requirements]

2026-03-11 19:30:05

@DrahtBot DrahtBot added the P2P label Mar 5, 2026
@w0xlt
Copy link
Contributor

w0xlt commented Mar 5, 2026

Concept ACK

@jonatack
Copy link
Member

jonatack commented Mar 6, 2026

Thanks for picking this up, Will.

@sedited
Copy link
Contributor

sedited commented Mar 6, 2026

Concept ACK

1 similar comment
@kevkevinpal
Copy link
Contributor

Concept ACK

@brunoerg
Copy link
Contributor

brunoerg commented Mar 9, 2026

Concept ACK

Manual connections (-addnode, -connect) represent explicit user intent
to maintain the connection, and are important for privacy in
-connect-only setups. Yet they get disconnected during IBD when the
block download window stalls.

Instead of disconnecting, release all in-flight blocks from the manual
peer so other peers can request them. Set m_stall_recovery to skip one
round of block download for this peer. Without this, the message
handler's random peer ordering can cause the manual peer's next
SendMessages call to re-request the same blocks via
FindNextBlocksToDownload before any other peer has a chance to claim
them, permanently stalling IBD when mocktime is frozen (and adding
unnecessary delay with real time).

This flakiness was observed first in functional tests with few peers
(intermittent failures), but would also occasionalyl happen with more
peers in a non-test scenario.
Add a helper for creating manual (addnode onetry) p2p connections,
mirroring the existing add_outbound_p2p_connection method. This will
be used by tests that verify manual peer behavior during IBD.
test that a manual (addnode) peer which stalls on a block, is not
disconnected when the stalling timeout fires.
Manual connections (-addnode, -connect) represent explicit user intent
to maintain the connection, and are important for privacy in
-connect-only setups. Yet they get disconnected during IBD when a
block download times out.

Apply the same release-and-skip strategy as the stalling path: release
all in-flight blocks and set m_stall_recovery so other peers get a
chance to pick them up.
Test the download timeout path in isolation by using only a manual
peer initially (no outbound peers). Without outbound peers, no
staller is identified, so only the download timeout fires (after
600s of mocktime). Verify non-disconnection, then add outbound
peers and advance time past the stalling timeout to verify IBD
completes.
During IBD, a slow headers sync peer gets disconnected if other
preferred peers are available. The existing code already protects
NoBan peers by resetting their sync state instead of disconnecting.

Extend this protection to manual peers, which represent explicit
user intent to maintain the connection. Like NoBan peers, reset
their sync state so headers can be fetched from another peer
without dropping the manual connection.
test that a manually added peer which is used as initial sync peer is
not disconnected when the headers download timeout fires.
@willcl-ark willcl-ark force-pushed the protect-manual-evictions branch from 7beac82 to accb5e4 Compare March 11, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Headers-first is disconnecting nodes added via config.

7 participants