R18 again#638
Merged
Merged
Conversation
Updates to BGP unnumbered peers that change the Router Lifetime field of the NDP Router Advertisement (via mgadm --act-as-default-router arg) would propagate through the DB and into the NDP interface state, but the tx thread had no way of picking up the new value without restarting... and there was no logic to restart the tx thread. This swaps out the u16 holding the Router Lifetime for an AtomicU16 and adds logic to do an atomic store so the tx thread can use the new value without requiring a restart.
recv_header handles WouldBlock from the socket read timeout by retrying, but recv_msg uses read_exact for the message body, which propagates WouldBlock as a fatal IO error. With large BGP UPDATE messages the body may not arrive within a single timeout window, causing a spurious TcpConnectionFails. Replace read_exact with an incremental retry loop. Introduce RecvError::Shutdown to cleanly separate dropped-flag exits from real IO errors, removing the defensive dropped check from the IO error path. Fixes: #634
Header::from_wire does not check that the length field is between 19 and 4096. This means recv_msg accepts oversized headers and attempts to allocate and read arbitrarily large message bodies. Undersized lengths would underflow the body size calculation. Add length bounds checks in recv_msg after header parsing. Out-of-range lengths produce a HeaderParseError that the FSM translates into a NOTIFICATION with Bad Message Length, per RFC 4271 section 4.1. Validation is intentionally in the connection layer rather than from_wire so the session can send the NOTIFICATION before closing.
rcgoodfellow
approved these changes
Feb 11, 2026
| direction: ConnectionDirection, | ||
| config: &SessionInfo, | ||
| ) -> Result<Self, Error> { | ||
| conn.set_nodelay(true)?; |
Contributor
Author
There was a problem hiding this comment.
Nagle came to mind when thinking through potential causes of latency and I noticed we didn't disable it. There isn't a specific issue motivating this, just that in modern deployments there's not much point in leaving it enabled.
Contributor
Author
There was a problem hiding this comment.
Reverting for this PR. Will look at re-adding after R18 is cut
This reverts commit 70732ff.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Couple important fixes for R18 plus a couple mgadm quality of life fixes.
Important fixes:
ErrorKind::WouldBlockin recv_msg() to avoid tearing down TCP session when a read returnsEWOULDBLOCK/EAGAINfor a non-blocking socket (this fixes large update message regression #634) -- when moving the TCP socket to non-blocking, this read call was missed when adding error handling for WouldBlock.mgadm:
Other:
- set TCP_NODELAY on BGP socketsdetails for all changes are in the individual commit messages