Skip to content

[Bug]: Feishu WebSocket connection does not recover after transient token refresh failure #68766

@jw8957

Description

@jw8957

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When the Feishu tenant_access_token refresh fails due to a transient timeout (e.g., open.feishu.cn responds slowly during off-peak hours), the Feishu WebSocket connection drops and does not automatically recover.

The current reconnection logic attempts only one retry after the initial failure. If that retry also fails (which is likely when the upstream issue is still ongoing), the plugin gives up entirely and stops receiving Feishu events. The connection remains dead until the gateway is manually restarted.

This means a single transient API hiccup can cause hours of silent message loss with no visible error to the user.

Steps to reproduce

  1. Configure Feishu plugin with connectionMode: "websocket".
  2. Wait for a transient tenant_access_token timeout (or simulate by temporarily blocking open.feishu.cn for ~60s).
  3. Observe gateway logs:
      2026-04-19T01:19:40 [error]: AxiosError: timeout of 30000ms exceeded
         url: 'https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal'
         code: 'ECONNABORTED'
      2026-04-19T01:20:39 [info]: [ '[ws]', 'reconnect' ]
      2026-04-19T01:21:40 [error]: AxiosError: timeout of 30000ms exceeded
         url: 'https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal'
         code: 'ECONNABORTED'
      # No further reconnection attempts after this. Next Feishu event only after manual gateway restart at 09:48.
    
  4. After the transient issue resolves, Feishu events are never received again — no further reconnection attempts are made.
  5. openclaw doctor still reports Feishu: ok (does not detect the dead ws connection).

Expected behavior

  • The Feishu WebSocket plugin should implement exponential backoff with persistent retries (e.g., 1s → 2s → 4s → ... → max 5min, retrying indefinitely until reconnected).
  • After successful reconnection, the plugin should log a clear [feishu] reconnected message.
  • Optionally: openclaw doctor could check whether the Feishu ws connection is actually alive (not just configured).

Actual behavior

In our case, the token refresh timed out at 01:19 AM. The connection was not restored until a manual gateway restart at 09:48 AM — 8.5 hours of silent message loss.

OpenClaw version

2026.4.15

Operating system

Ubuntu 20.04

Install method

No response

Model

opus-4-6

Provider / routing chain

openclaw -> anthropic-API

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:behaviorIncorrect behavior without a crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions