Open
Conversation
Contributor
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2026-03-11 19:30:05 |
Contributor
|
Concept ACK |
kevkevinpal
reviewed
Mar 5, 2026
Member
|
Thanks for picking this up, Will. |
Contributor
|
Concept ACK |
1 similar comment
Contributor
|
Concept ACK |
547f119 to
7beac82
Compare
This was referenced Mar 8, 2026
Contributor
|
Concept ACK |
Manual connections (-addnode, -connect) represent explicit user intent to maintain the connection, and are important for privacy in -connect-only setups. Yet they get disconnected during IBD when the block download window stalls. Instead of disconnecting, release all in-flight blocks from the manual peer so other peers can request them. Set m_stall_recovery to skip one round of block download for this peer. Without this, the message handler's random peer ordering can cause the manual peer's next SendMessages call to re-request the same blocks via FindNextBlocksToDownload before any other peer has a chance to claim them, permanently stalling IBD when mocktime is frozen (and adding unnecessary delay with real time). This flakiness was observed first in functional tests with few peers (intermittent failures), but would also occasionalyl happen with more peers in a non-test scenario.
Add a helper for creating manual (addnode onetry) p2p connections, mirroring the existing add_outbound_p2p_connection method. This will be used by tests that verify manual peer behavior during IBD.
test that a manual (addnode) peer which stalls on a block, is not disconnected when the stalling timeout fires.
Manual connections (-addnode, -connect) represent explicit user intent to maintain the connection, and are important for privacy in -connect-only setups. Yet they get disconnected during IBD when a block download times out. Apply the same release-and-skip strategy as the stalling path: release all in-flight blocks and set m_stall_recovery so other peers get a chance to pick them up.
Test the download timeout path in isolation by using only a manual peer initially (no outbound peers). Without outbound peers, no staller is identified, so only the download timeout fires (after 600s of mocktime). Verify non-disconnection, then add outbound peers and advance time past the stalling timeout to verify IBD completes.
During IBD, a slow headers sync peer gets disconnected if other preferred peers are available. The existing code already protects NoBan peers by resetting their sync state instead of disconnecting. Extend this protection to manual peers, which represent explicit user intent to maintain the connection. Like NoBan peers, reset their sync state so headers can be fetched from another peer without dropping the manual connection.
test that a manually added peer which is used as initial sync peer is not disconnected when the headers download timeout fires.
7beac82 to
accb5e4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #5097
Manual peers (
-addnode,-connect+ RPC) currently get disconnected by IBD timeout logic for block stalling, block download timeout and headers sync timeout. This can conflict with explicit user intent to keep these peers connected (especially in privacy-focused -connect setups), and causes unnecessary disconnect/reconnect churn.This PR changes that behavior: manual peers are not disconnected by those timeout paths and we try to re-assign their work to other peers to maintain progress.
Disconnecting manual peers automatically causes reconnect loops and other behavior that can violate user expectation for manually configured peers.
Simply “not disconnecting” (for block stalling) is not sufficient by itself: a manual peer can retain in-flight block assignments and delay overall IBD progress.
I considered taking an approach more like that from #32051, simply protecting peers from some disconnects. However without the reassignment the stalling effects are simply prolonged (which is worse, IMO).
This PR uses a different appraoch:
m_stall_recovery) so scheduler ordering doesn’t immediately reassign those same blocks back to the same peer.This attempts to both preserve manual connections and maintain download progress. It actively reassigns stalled work so IBD can continue, rather than just waiting longer. And, the one-round skip prevents immediate re-capture of freed blocks by the same manual peer due to per-peer send ordering.
Compared with simpler “don’t disconnect” approaches, this is more robust both in tests, and real-world slow-peer conditions.
I believe the changes do not increase remote attacker control because it only applies to manual peers (-addnode, -connect, addnode), which are explicitly chosen by the local operator. You were and still are free to add a malicious peer manually.
A malicious manually-added peer can now remain connected longer, but that peer was already granted privileged trust by operator configuration. And in fact, for block stalling/download timeout this PR reduces potential impact of a slow/malicious manual peer by releasing its in-flight blocks so other peers can fetch them.
Of course, if an operator relies only on bad manual peers, sync can still be delayed.