We've noticed an issue where a CPU gets pegged per-peer in bgp_recv.
This does not reproduce everywhere. Some systems in the lab do not exhibit this behavior.
Reading through the code I've noticed that we set up our TCP socket differently depending on whether we have initiated the connection or not.
When we set up the BGP listener, se set up the socket as non-blocking
|
impl BgpListener<BgpConnectionTcp> for BgpListenerTcp { |
|
fn bind<A: ToSocketAddrs>( |
|
addr: A, |
|
unnumbered_manager: Option<Arc<dyn UnnumberedManager>>, |
|
) -> Result<Self, Error> |
|
where |
|
Self: Sized, |
|
{ |
|
let addr = addr |
|
.to_socket_addrs() |
|
.map_err(|e| Error::InvalidAddress(e.to_string()))? |
|
.next() |
|
.ok_or(Error::InvalidAddress( |
|
"at least one address required".into(), |
|
))?; |
|
let listener = TcpListener::bind(addr)?; |
|
listener.set_nonblocking(true)?; |
|
Ok(Self { |
|
listener, |
|
unnumbered_manager, |
|
}) |
|
} |
If we accept a connection on this socket, it will continue to be non-blocking for the lifetime of the session. That overrides the set_read_timeout(IO_TIMEOUT) that we do in spawn_recv_loop.
On the other hand if we are the initiator of the connection, non-blocking is not set
|
// Establish the connection (THIS IS THE BLOCKING CALL) |
|
let sa: socket2::SockAddr = peer.into(); |
|
let new_conn: TcpStream = match s.connect_timeout(&sa, timeout) { |
|
Ok(()) => s.into(), |
|
Err(e) => { |
|
connection_log_lite!(log, |
|
warn, |
|
"connection attempt to {peer} failed: {e}"; |
|
"direction" => ConnectionDirection::Outbound, |
|
"peer" => format!("{peer}"), |
|
"error" => format!("{e}") |
|
); |
|
return; |
|
} |
|
}; |
In the recv_header function we effectively have a hot spin-loop in the WouldBlock case
|
loop { |
|
if dropped.load(Ordering::Relaxed) { |
|
return Err(RecvError::Shutdown); |
|
} |
|
match stream.read(&mut buf[i..]) { |
|
Ok(0) => { |
|
return Err(RecvError::Io(std::io::Error::new( |
|
std::io::ErrorKind::UnexpectedEof, |
|
"peer closed connection", |
|
))); |
|
} |
|
Ok(n) => i += n, |
|
Err(e) if e.kind() == std::io::ErrorKind::WouldBlock => { |
|
// This condition happens due to the read timeout that |
|
// is set on the TcpStream object on connect being hit. |
|
// This is a normal condition and we just jump back to |
|
// the beginning of the loop, check the shutdown flag |
|
// and carry on reading if there is no shutdown. |
|
continue; |
|
} |
We should be setting non-blocking to false in spawn_recv_loop so that WouldBlock just happens on the IO timeout we intend. Something like this
diff --git a/bgp/src/connection_tcp.rs b/bgp/src/connection_tcp.rs
index 13edd95..99d7c78 100644
--- a/bgp/src/connection_tcp.rs
+++ b/bgp/src/connection_tcp.rs
@@ -599,6 +599,20 @@ impl BgpConnectionTcp {
.spawn(move || {
let mut conn = conn;
+ if let Err(e) = conn.set_nonblocking(false) {
+ connection_log_lite!(log,
+ error,
+ "failed to set connection to blocking for {peer} (conn_id: {}): {e}",
+ conn_id.short();
+ "direction" => direction,
+ "connection" => format!("{conn:?}"),
+ "connection_peer" => format!("{peer}"),
+ "connection_id" => conn_id.short(),
+ "error" => format!("{e}")
+ );
+ return;
+ }
+
if !timeout.is_zero()
&& let Err(e) = conn.set_read_timeout(Some(timeout))
{
We've noticed an issue where a CPU gets pegged per-peer in
bgp_recv.This does not reproduce everywhere. Some systems in the lab do not exhibit this behavior.
Reading through the code I've noticed that we set up our TCP socket differently depending on whether we have initiated the connection or not.
When we set up the BGP listener, se set up the socket as non-blocking
maghemite/bgp/src/connection_tcp.rs
Lines 103 to 124 in 0e2fea5
If we accept a connection on this socket, it will continue to be non-blocking for the lifetime of the session. That overrides the
set_read_timeout(IO_TIMEOUT)that we do inspawn_recv_loop.On the other hand if we are the initiator of the connection, non-blocking is not set
maghemite/bgp/src/connection_tcp.rs
Lines 315 to 329 in 0e2fea5
In the
recv_headerfunction we effectively have a hot spin-loop in theWouldBlockcasemaghemite/bgp/src/connection_tcp.rs
Lines 732 to 751 in 0e2fea5
We should be setting non-blocking to false in
spawn_recv_loopso thatWouldBlockjust happens on the IO timeout we intend. Something like this