Problem
When the Feishu WebSocket connection suffers a keepalive ping timeout, the SDK's message loop exits, but the main Hermes agent process does not terminate or reconnect. This leaves the Gateway in a zombie state where it appears "running" to systemd but accepts no messages.
Log output:
[Lark] [ERROR] receive message loop exit, err: sent 1011 (internal error) keepalive ping timeout; no close frame received
[Lark] [WARNING] ping failed, err: sent 1011 (internal error) keepalive ping timeout
Expected behavior: Under a crash-only architecture, the feishu.py integration thread should raise a SystemExit(1) so systemd-level Restart=always can respawn a healthy stack.
Reference
Upstream: NousResearch#10616
Design (from OpenClaw)
OpenClaw's monitor.ts has a complete lifecycle management system:
monitorSingleAccount() with abort signal support
- Health check probe via
fetchBotIdentityForMonitor()
- State management in
monitor.state.ts
- Webhook anomaly tracking
Implementation Plan
- Add a watchdog thread/task in
FeishuAdapter.connect() that monitors the SDK's message loop
- Detect ping timeout conditions and raise
SystemExit(1) to trigger systemd restart
- Add
botIdentity pre-fetch to validate connection health on startup
- Reference:
gateway/platforms/feishu.py line ~1042 FeishuAdapter class
Problem
When the Feishu WebSocket connection suffers a
keepalive ping timeout, the SDK's message loop exits, but the main Hermes agent process does not terminate or reconnect. This leaves the Gateway in a zombie state where it appears "running" to systemd but accepts no messages.Log output:
Expected behavior: Under a crash-only architecture, the
feishu.pyintegration thread should raise aSystemExit(1)so systemd-levelRestart=alwayscan respawn a healthy stack.Reference
Upstream: NousResearch#10616
Design (from OpenClaw)
OpenClaw's
monitor.tshas a complete lifecycle management system:monitorSingleAccount()with abort signal supportfetchBotIdentityForMonitor()monitor.state.tsImplementation Plan
FeishuAdapter.connect()that monitors the SDK's message loopSystemExit(1)to trigger systemd restartbotIdentitypre-fetch to validate connection health on startupgateway/platforms/feishu.pyline ~1042FeishuAdapterclass