Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
When the Feishu tenant_access_token refresh fails due to a transient timeout (e.g., open.feishu.cn responds slowly during off-peak hours), the Feishu WebSocket connection drops and does not automatically recover.
The current reconnection logic attempts only one retry after the initial failure. If that retry also fails (which is likely when the upstream issue is still ongoing), the plugin gives up entirely and stops receiving Feishu events. The connection remains dead until the gateway is manually restarted.
This means a single transient API hiccup can cause hours of silent message loss with no visible error to the user.
Steps to reproduce
- Configure Feishu plugin with
connectionMode: "websocket".
- Wait for a transient
tenant_access_token timeout (or simulate by temporarily blocking open.feishu.cn for ~60s).
- Observe gateway logs:
2026-04-19T01:19:40 [error]: AxiosError: timeout of 30000ms exceeded
url: 'https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal'
code: 'ECONNABORTED'
2026-04-19T01:20:39 [info]: [ '[ws]', 'reconnect' ]
2026-04-19T01:21:40 [error]: AxiosError: timeout of 30000ms exceeded
url: 'https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal'
code: 'ECONNABORTED'
# No further reconnection attempts after this. Next Feishu event only after manual gateway restart at 09:48.
- After the transient issue resolves, Feishu events are never received again — no further reconnection attempts are made.
openclaw doctor still reports Feishu: ok (does not detect the dead ws connection).
Expected behavior
- The Feishu WebSocket plugin should implement exponential backoff with persistent retries (e.g., 1s → 2s → 4s → ... → max 5min, retrying indefinitely until reconnected).
- After successful reconnection, the plugin should log a clear
[feishu] reconnected message.
- Optionally:
openclaw doctor could check whether the Feishu ws connection is actually alive (not just configured).
Actual behavior
In our case, the token refresh timed out at 01:19 AM. The connection was not restored until a manual gateway restart at 09:48 AM — 8.5 hours of silent message loss.
OpenClaw version
2026.4.15
Operating system
Ubuntu 20.04
Install method
No response
Model
opus-4-6
Provider / routing chain
openclaw -> anthropic-API
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
When the Feishu
tenant_access_tokenrefresh fails due to a transient timeout (e.g.,open.feishu.cnresponds slowly during off-peak hours), the Feishu WebSocket connection drops and does not automatically recover.The current reconnection logic attempts only one retry after the initial failure. If that retry also fails (which is likely when the upstream issue is still ongoing), the plugin gives up entirely and stops receiving Feishu events. The connection remains dead until the gateway is manually restarted.
This means a single transient API hiccup can cause hours of silent message loss with no visible error to the user.
Steps to reproduce
connectionMode: "websocket".tenant_access_tokentimeout (or simulate by temporarily blockingopen.feishu.cnfor ~60s).openclaw doctorstill reportsFeishu: ok(does not detect the dead ws connection).Expected behavior
[feishu] reconnectedmessage.openclaw doctorcould check whether the Feishu ws connection is actually alive (not just configured).Actual behavior
In our case, the token refresh timed out at 01:19 AM. The connection was not restored until a manual
gateway restartat 09:48 AM — 8.5 hours of silent message loss.OpenClaw version
2026.4.15
Operating system
Ubuntu 20.04
Install method
No response
Model
opus-4-6
Provider / routing chain
openclaw -> anthropic-API
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response