As an experienced Linux system administrator, I routinely rely on SSH to securely access both servers and infrastructure devices. However, one issue that occasionally disrupts my SSH sessions is the "broken pipe" error, which abruptly terminates the connection. In this comprehensive guide, I will leverage my expertise to demonstrate proven methods for avoiding this pesky error.
Understanding the SSH Architecture and Broken Pipes
To understand broken SSH pipes, we must first briefly overview how SSH manages sessions under the hood using sockets and the TCP protocol:
Socket Pairs
SSH establishes two socket pairs – one for encryption and the other for data transport between the client and server [[1]]. The encrypted socket protects the integrity and privacy of the session.
TCP Streams
This socket data gets encapsulated into a bidirectional TCP stream. Keepalive packets maintain this stream to prevent intermediate network devices like firewalls and NAT gateways from dropping the connection.
Detecting Broken Pipes
If the TCP stream breaks without a proper SSH shutdown, the SSH process still writing to the socket will get a SIGPIPE signal – the "broken pipe" error [[2]]. This abruptly terminates the SSH session.
Why Pipes Break
Common root causes of broken SSH pipes include:
- Temporary network outages
- Excessive latency disrupting the TCP stream
- Inactivity timeouts on stateful firewalls and NAT gateways
- Server-initiated session termination without client notification
Adjusting the ClientAlive Settings
The primary defense against broken SSH pipes is configuring keepalives so intermediate network devices don‘t prematurely timeout the session. This is controlled by two sshd_config parameters on the server [[3]]:
ClientAliveInterval
Defines the keepalive packet interval in seconds. Lower values result in more frequent keepalive packets.
ClientAliveCountMax
Specifies the maximum number of missed keepalive packets before terminating the session.
For example:
ClientAliveInterval 120
ClientAliveCountMax 12
Will send keepalives every 120 seconds, allowing up to 24 minutes of missed packets before disconnecting (12 * 120 sec = 24 min).
Here are some recommended ClientAlive settings based on use case:
| Use Case | ClientAliveInterval | ClientAliveCountMax |
| Interactive shell use | 180 sec | 3 |
| Background file transfers | 300 sec | 6 |
| Persistent VPN tunnel | 90 sec | 12 |
Adjusting these thresholds allows customizing SSH resilience to network disruptions based on the application.
Client-Side Keepalive Configuration
In addition to tuning the SSH server, keepalive behavior can also be configured in the client:
1. Per-Host SSH Config
Specify ServerAliveInterval for particular hosts in ~/.ssh/config:
Host tunnel-host
Hostname 1.2.3.4
ServerAliveInterval 180
2. SSH Command Line
The -o flag can set options like ServerAliveInterval on a one-off basis:
ssh -o ServerAliveInterval=180 user@host
3. System-Wide SSH Config
Global defaults defined in /etc/ssh/ssh_config will apply to all SSH sessions from the client unless explicitly overridden.
Client-side keepalives are another layer of protection regardless of server configuration.
Verifying Alive Packets with tcpdump
To check if keepalives are being properly exchanged during an active SSH session, use the tcpdump utility to inspect packets on the wire.
The following will capture traffic sent to or from port 22 (SSH) and print the TCP payload in ASCII:
# tcpdump -i any ‘port 22‘ -A
Alive packets contain a blank SSH packet, evidenced by a string of semi-colons (;):
;;;;;;;;;;
Lack of semi-colons at regular intervals indicates keepalive configuration issues.
Inspecting SSH Session State
Besides verifying keepalives, we can also inspect detailed SSH connection state using ss or netstat:
# ss -neot ‘( dport = :ssh or sport = :ssh )‘
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.1.20:56818 100.64.0.2:ssh
If a broken pipe is encountered, expect to see TIME-WAIT, CLOSE-WAIT, FIN-WAIT-1, or related states indicating the session is winding down.
Under normal operation, state will continue displaying ESTABLISHED while the stream remains intact.
Enable TCP Keepalives for Persistence
While SSH handles session keepalives itself, enabling TCP keepalive probes provides additional resilience:
# sysctl -w net.ipv4.tcp_keepalive_time=120
# sysctl -w net.ipv4.tcp_keepalive_intvl=30
# sysctl -w net.ipv4.tcp_keepalive_probes=6
This instructs the TCP stack to send a keepalive probe every 30 seconds, waiting up to 120 seconds (6 * 30sec) before declaring the connection dead [[4]].
TCP keepalives persist even if routes change or IPs get remapped by NAT, preventing additional broken pipes.
OS-Specific TCP Keepalive Configuration
In addition to sysctl parameters, most major operating systems provide additional TCP keepalive controls:
- Linux: /proc/sys/net/ipv4/tcpkeepalive* [[5]]
- Windows: Registry keys like KeepAliveInterval [[6]]
- macOS: sysctl net.inet.tcp.* e.g. net.inet.tcp.always_keepalive [[7]]
Consult your OS documentation for exposing advanced socket-level tuning.
Renegotiating SSH Sessions with Rekeying
By default, SSH cryptographic session parameters get renegotiated after 1GB of data gets transferred or 1 hour passes.
Triggering intentional rekeying helps resurrect broken connections, provided the network outage was brief:
# ssh -oRekeyLimit=5M user@host
This lowers the rekey threshold to 5MB, allowing more frequent rekeys.
However, excessive rekeying increases overhead. Tune based on reliability needs.
The Risk of Undetectable Broken Pipes
While properly configured keepalives prevent most broken pipes, extremely long network failures exceeding ClientAliveCountMax intervals can still unexpectedly terminate SSH sessions without any errors on the client side!
The only indication will be failure to execute commands or transfer data due to the cloaked broken pipe.
Additionally, intermittent connectivity loss can trick TCP‘s error detection, abruptly breaking the underlying socket without notification [[8]].
So always architect SSH usage expecting potential unannounced disconnections, despite applying all keepalive best practices.
The only guaranteed resilience comes from directly attaching servers to highly reliable networks.
Conclusion
Broken pipes represent a lurking reliability threat, poised to sabotage SSH connectivity. Protect mission critical sessions by:
- Tuning ClientAliveInterval and CountMax appropriately
- Enabling multilayered TCP + SSH keepalives
- Designing infrastructure and software for intermittent failures
With vigilance and regular monitoring, the sneaky broken pipe phenomenon can be prevented most of the time. But also plan for handling the unexpected disconnections that will inevitably slip through.
By leveraging SSH‘s keepalive capabilities complemented by TCP-layer persistence, you can maximize remote access resilience and productivity.


