Bug Description
When the Feishu WebSocket connection drops due to a network disruption, the connection is never re-established. The gateway silently loses the ability to receive Feishu messages until a manual openclaw gateway restart.
Steps to Reproduce
- Start OpenClaw with Feishu channel enabled (
connectionMode: "websocket")
- Wait for WebSocket to connect (
[ws] ws client ready)
- Cause a network interruption (e.g., router restart, ISP hiccup) lasting ~10 minutes
- Observe gateway logs:
[ws] unable to connect to the server after trying 1 times
[ws] unable to connect to the server after trying 2 times
...
[ws] unable to connect to the server after trying 7 times ← ECONNRESET
- After network recovers, Feishu WebSocket does not reconnect
- Messages sent to the bot are silently dropped
Expected Behavior
After the Lark SDK exhausts its internal retries, OpenClaw should implement a supervisor loop that periodically attempts to re-establish the WebSocket connection with exponential backoff, similar to how Slack and Telegram channels handle reconnection.
Root Cause Analysis
In extensions/feishu/src/monitor.transport.ts (line 84-127), monitorWebSocket() calls wsClient.start() once inside a Promise that only resolves on abort signal. There is:
- No reconnection loop after SDK retry exhaustion
- No stall/disconnect detection
- No backoff or supervisor logic
Comparison with other channels:
| Channel |
Reconnection |
Backoff |
Stall Detection |
| Slack |
Explicit while loop with SLACK_SOCKET_RECONNECT_POLICY |
Exponential (2s→30s, 1.8x, 25% jitter) |
No |
| Telegram |
grammY runner (maxRetryTime: 60min) + explicit loop |
Exponential (2s→30s, 1.8x, 25% jitter) |
Yes (90s watchdog) |
| Discord |
Full lifecycle controller with createArmableStallWatchdog |
Exponential via computeBackoff |
Yes (5min reconnect stall) |
| Feishu |
None — delegates entirely to Lark SDK (7 retries, then dead) |
None |
None |
Proposed Fix
Wrap the existing monitorWebSocket() in a supervisor loop that:
- Catches connection failures after Lark SDK retry exhaustion
- Recreates the WSClient and retries with exponential backoff
- Reuses OpenClaw's existing
createArmableStallWatchdog for disconnect detection
- Follows the Slack reconnection pattern (
while (!aborted) { try/catch + backoff })
- Fails fast on non-recoverable errors (invalid credentials, app disabled)
Environment
- OpenClaw: 2026.3.13
- macOS (Mac Mini M4), Feishu channel via WebSocket
@larksuiteoapi/node-sdk: ^1.59.0
- Home network (occasional ISP disruptions)
Bug Description
When the Feishu WebSocket connection drops due to a network disruption, the connection is never re-established. The gateway silently loses the ability to receive Feishu messages until a manual
openclaw gateway restart.Steps to Reproduce
connectionMode: "websocket")[ws] ws client ready)Expected Behavior
After the Lark SDK exhausts its internal retries, OpenClaw should implement a supervisor loop that periodically attempts to re-establish the WebSocket connection with exponential backoff, similar to how Slack and Telegram channels handle reconnection.
Root Cause Analysis
In
extensions/feishu/src/monitor.transport.ts(line 84-127),monitorWebSocket()callswsClient.start()once inside a Promise that only resolves on abort signal. There is:Comparison with other channels:
SLACK_SOCKET_RECONNECT_POLICYmaxRetryTime: 60min) + explicit loopcreateArmableStallWatchdogcomputeBackoffProposed Fix
Wrap the existing
monitorWebSocket()in a supervisor loop that:createArmableStallWatchdogfor disconnect detectionwhile (!aborted) { try/catch + backoff })Environment
@larksuiteoapi/node-sdk: ^1.59.0