Description
When running openclaw gateway behind NAT/cloud firewall, Telegram long-polling periodically stalls and cannot recover without a full process restart. The gateway correctly detects the stall ("no getUpdates response for 90s") and attempts to restart polling, but the restart reuses the same undici HTTP dispatcher, which still holds dead keep-alive connections in its pool.
Environment
- openclaw 2026.3.13
- Node.js 22 (uses undici as default HTTP client)
- VPS behind cloud NAT (idle TCP connections silently dropped after ~5-10 min)
- Telegram channel with
getUpdates long-polling (30s timeout)
Steps to Reproduce
- Run openclaw gateway with Telegram channel on a VPS behind NAT/firewall
- Wait for an idle period where no Telegram messages arrive for 5-10+ minutes
- NAT silently drops the idle TCP connection
- Next
getUpdates request hangs indefinitely on the dead connection
- Gateway detects stall after 90s and restarts polling
- Restart reuses the same undici dispatcher → new requests go through the same dead connection pool
- Gateway enters infinite failure loop:
Network request for 'sendChatAction' failed! / Network request for 'getUpdates' failed!
Log Pattern
[WARN] No getUpdates response for 90 seconds. Restarting polling...
[ERROR] Network request for 'getUpdates' failed!
[ERROR] Network request for 'sendChatAction' failed!
[ERROR] Network request for 'getUpdates' failed!
... (repeats indefinitely until process kill)
Root Cause Analysis
- Node.js 22's undici maintains a keep-alive connection pool per dispatcher
- When NAT drops the underlying TCP connection, undici doesn't detect it (no SO_KEEPALIVE, no application-level ping)
polling-session.ts restarts polling but reuses the same dispatcher instance, so new requests are routed through the same pool of dead connections
- Discord doesn't have this problem because it uses WebSocket with heartbeat frames, which detect dead connections and trigger automatic reconnection
Suggested Fix
When restarting Telegram polling after a stall, create a new undici dispatcher (or explicitly close/drain the old one) so the connection pool is clean:
// In polling-session.ts restart logic:
if (this.dispatcher) {
await this.dispatcher.close(); // drain existing pool
}
this.dispatcher = new undici.Agent({ /* fresh pool */ });
Alternatively, configure the undici pool with shorter keepAliveTimeout or keepAliveMaxTimeout to proactively evict idle connections before NAT drops them.
Current Workaround
Adding retry config to the Telegram channel helps (as mentioned in #7526), but it's a band-aid — the connection pool still holds dead connections, and each retry attempt may hit the same dead socket before undici eventually opens a new one.
"retry": {
"attempts": 5,
"minDelayMs": 1000,
"maxDelayMs": 10000,
"jitter": 0.3
}
A process-level watchdog that force-restarts the gateway is another workaround, but neither addresses the root cause.
Description
When running openclaw gateway behind NAT/cloud firewall, Telegram long-polling periodically stalls and cannot recover without a full process restart. The gateway correctly detects the stall ("no getUpdates response for 90s") and attempts to restart polling, but the restart reuses the same undici HTTP dispatcher, which still holds dead keep-alive connections in its pool.
Environment
getUpdateslong-polling (30s timeout)Steps to Reproduce
getUpdatesrequest hangs indefinitely on the dead connectionNetwork request for 'sendChatAction' failed!/Network request for 'getUpdates' failed!Log Pattern
Root Cause Analysis
polling-session.tsrestarts polling but reuses the same dispatcher instance, so new requests are routed through the same pool of dead connectionsSuggested Fix
When restarting Telegram polling after a stall, create a new undici dispatcher (or explicitly close/drain the old one) so the connection pool is clean:
Alternatively, configure the undici pool with shorter
keepAliveTimeoutorkeepAliveMaxTimeoutto proactively evict idle connections before NAT drops them.Current Workaround
Adding
retryconfig to the Telegram channel helps (as mentioned in #7526), but it's a band-aid — the connection pool still holds dead connections, and each retry attempt may hit the same dead socket before undici eventually opens a new one.A process-level watchdog that force-restarts the gateway is another workaround, but neither addresses the root cause.