Skip to content

CPU pegged in bgp_recv for each peer #657

@rcgoodfellow

Description

@rcgoodfellow

We've noticed an issue where a CPU gets pegged per-peer in bgp_recv.

This does not reproduce everywhere. Some systems in the lab do not exhibit this behavior.

Reading through the code I've noticed that we set up our TCP socket differently depending on whether we have initiated the connection or not.

When we set up the BGP listener, se set up the socket as non-blocking

impl BgpListener<BgpConnectionTcp> for BgpListenerTcp {
fn bind<A: ToSocketAddrs>(
addr: A,
unnumbered_manager: Option<Arc<dyn UnnumberedManager>>,
) -> Result<Self, Error>
where
Self: Sized,
{
let addr = addr
.to_socket_addrs()
.map_err(|e| Error::InvalidAddress(e.to_string()))?
.next()
.ok_or(Error::InvalidAddress(
"at least one address required".into(),
))?;
let listener = TcpListener::bind(addr)?;
listener.set_nonblocking(true)?;
Ok(Self {
listener,
unnumbered_manager,
})
}

If we accept a connection on this socket, it will continue to be non-blocking for the lifetime of the session. That overrides the set_read_timeout(IO_TIMEOUT) that we do in spawn_recv_loop.

On the other hand if we are the initiator of the connection, non-blocking is not set

// Establish the connection (THIS IS THE BLOCKING CALL)
let sa: socket2::SockAddr = peer.into();
let new_conn: TcpStream = match s.connect_timeout(&sa, timeout) {
Ok(()) => s.into(),
Err(e) => {
connection_log_lite!(log,
warn,
"connection attempt to {peer} failed: {e}";
"direction" => ConnectionDirection::Outbound,
"peer" => format!("{peer}"),
"error" => format!("{e}")
);
return;
}
};

In the recv_header function we effectively have a hot spin-loop in the WouldBlock case

loop {
if dropped.load(Ordering::Relaxed) {
return Err(RecvError::Shutdown);
}
match stream.read(&mut buf[i..]) {
Ok(0) => {
return Err(RecvError::Io(std::io::Error::new(
std::io::ErrorKind::UnexpectedEof,
"peer closed connection",
)));
}
Ok(n) => i += n,
Err(e) if e.kind() == std::io::ErrorKind::WouldBlock => {
// This condition happens due to the read timeout that
// is set on the TcpStream object on connect being hit.
// This is a normal condition and we just jump back to
// the beginning of the loop, check the shutdown flag
// and carry on reading if there is no shutdown.
continue;
}

We should be setting non-blocking to false in spawn_recv_loop so that WouldBlock just happens on the IO timeout we intend. Something like this

diff --git a/bgp/src/connection_tcp.rs b/bgp/src/connection_tcp.rs
index 13edd95..99d7c78 100644
--- a/bgp/src/connection_tcp.rs
+++ b/bgp/src/connection_tcp.rs
@@ -599,6 +599,20 @@ impl BgpConnectionTcp {
             .spawn(move || {
                 let mut conn = conn;

+                if let Err(e) = conn.set_nonblocking(false) {
+                    connection_log_lite!(log,
+                        error,
+                        "failed to set connection to blocking for {peer} (conn_id: {}): {e}",
+                        conn_id.short();
+                        "direction" => direction,
+                        "connection" => format!("{conn:?}"),
+                        "connection_peer" => format!("{peer}"),
+                        "connection_id" => conn_id.short(),
+                        "error" => format!("{e}")
+                    );
+                    return;
+                }
+
                 if !timeout.is_zero()
                     && let Err(e) = conn.set_read_timeout(Some(timeout))
                 {

Metadata

Metadata

Assignees

Labels

BugbgpBorder Gateway ProtocolcustomerFor any bug reports or feature requests tied to customer requests

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions