Problem
When using TCP or TLS transport (-nt tcp or -nt tls), heplify-proxy fails to reconnect to heplify-server after the server restarts. The proxy continues attempting to send data through a broken connection, resulting in continuous "broken pipe" errors without establishing a new connection.
Environment
- Transport: TLS (
-nt tls) - also reproduces with TCP
- SendRetries: 0 (default)
- Scenario: heplify-server restart while heplify-proxy is running
Expected Behavior
After heplify-server restarts, heplify-proxy should detect the broken connection and automatically establish a new connection to resume data transmission.
Actual Behavior
Heplify-proxy gets stuck in a loop sending data to the same broken connection, never establishing a new one.
Error Logs
heplify-proxy | 2025/07/17 13:17:51.270814 hep.go:186: ERR Failed to send message: error writing to socket during sending message: write tcp 172.20.0.7:53862->172.20.0.3:9060: write: broken pipe
heplify-proxy | 2025/07/17 13:17:55.271236 hep.go:186: ERR Failed to send message: error writing to socket during sending message: write tcp 172.20.0.7:53862->172.20.0.3:9060: write: broken pipe
Key observation: The same connection (172.20.0.7:53862->172.20.0.3:9060) appears repeatedly in logs, indicating no reconnection is occurring.
Root Cause
When a write error occurs, the connection object remains in a broken state but is not cleared. The connection checking logic if h.client[n].conn == nil in the Send function fails because the connection object exists but is broken, causing the same broken connection to be reused indefinitely instead of triggering the reconnection logic.
Testing
We tested locally by adding connection cleanup immediately after write errors in the Send function:
_, err = writeAndFlush(&h.client[n], msg, "sending message")
if err != nil {
logp.Err("Failed to send message: %s", err.Error())
// Force connection cleanup on write errors
h.client[n].conn.Close()
h.client[n].conn = nil
h.client[n].writer = nil
}
With this change, the reconnection issue no longer occurs and the proxy successfully establishes new connections after server restarts.
This forces the connection object to nil and allows the existing reconnection logic to work properly.
Reproduction Steps
- Start heplify-proxy with TCP/TLS transport
- Start heplify-server
- Restart heplify-server while proxy is running
- Observe continuous "broken pipe" errors with same connection
Problem
When using TCP or TLS transport (
-nt tcpor-nt tls), heplify-proxy fails to reconnect to heplify-server after the server restarts. The proxy continues attempting to send data through a broken connection, resulting in continuous "broken pipe" errors without establishing a new connection.Environment
-nt tls) - also reproduces with TCPExpected Behavior
After heplify-server restarts, heplify-proxy should detect the broken connection and automatically establish a new connection to resume data transmission.
Actual Behavior
Heplify-proxy gets stuck in a loop sending data to the same broken connection, never establishing a new one.
Error Logs
heplify-proxy | 2025/07/17 13:17:51.270814 hep.go:186: ERR Failed to send message: error writing to socket during sending message: write tcp 172.20.0.7:53862->172.20.0.3:9060: write: broken pipe
heplify-proxy | 2025/07/17 13:17:55.271236 hep.go:186: ERR Failed to send message: error writing to socket during sending message: write tcp 172.20.0.7:53862->172.20.0.3:9060: write: broken pipe
Key observation: The same connection (
172.20.0.7:53862->172.20.0.3:9060) appears repeatedly in logs, indicating no reconnection is occurring.Root Cause
When a write error occurs, the connection object remains in a broken state but is not cleared. The connection checking logic
if h.client[n].conn == nilin theSendfunction fails because the connection object exists but is broken, causing the same broken connection to be reused indefinitely instead of triggering the reconnection logic.Testing
We tested locally by adding connection cleanup immediately after write errors in the
Sendfunction:With this change, the reconnection issue no longer occurs and the proxy successfully establishes new connections after server restarts.
This forces the connection object to nil and allows the existing reconnection logic to work properly.
Reproduction Steps