fix(feishu): WebSocket connection not recovered after network disruption

## Bug Description

When the Feishu WebSocket connection drops due to a network disruption, the connection is never re-established. The gateway silently loses the ability to receive Feishu messages until a manual `openclaw gateway restart`.

## Steps to Reproduce

1. Start OpenClaw with Feishu channel enabled (`connectionMode: "websocket"`)
2. Wait for WebSocket to connect (`[ws] ws client ready`)
3. Cause a network interruption (e.g., router restart, ISP hiccup) lasting ~10 minutes
4. Observe gateway logs:

```
[ws] unable to connect to the server after trying 1 times
[ws] unable to connect to the server after trying 2 times
...
[ws] unable to connect to the server after trying 7 times  ← ECONNRESET
```

5. After network recovers, Feishu WebSocket **does not reconnect**
6. Messages sent to the bot are silently dropped

## Expected Behavior

After the Lark SDK exhausts its internal retries, OpenClaw should implement a supervisor loop that periodically attempts to re-establish the WebSocket connection with exponential backoff, similar to how Slack and Telegram channels handle reconnection.

## Root Cause Analysis

In `extensions/feishu/src/monitor.transport.ts` (line 84-127), `monitorWebSocket()` calls `wsClient.start()` once inside a Promise that only resolves on abort signal. There is:

- No reconnection loop after SDK retry exhaustion
- No stall/disconnect detection
- No backoff or supervisor logic

**Comparison with other channels:**

| Channel | Reconnection | Backoff | Stall Detection |
|---------|-------------|---------|-----------------|
| **Slack** | Explicit while loop with `SLACK_SOCKET_RECONNECT_POLICY` | Exponential (2s→30s, 1.8x, 25% jitter) | No |
| **Telegram** | grammY runner (`maxRetryTime: 60min`) + explicit loop | Exponential (2s→30s, 1.8x, 25% jitter) | Yes (90s watchdog) |
| **Discord** | Full lifecycle controller with `createArmableStallWatchdog` | Exponential via `computeBackoff` | Yes (5min reconnect stall) |
| **Feishu** | **None** — delegates entirely to Lark SDK (7 retries, then dead) | None | None |

## Proposed Fix

Wrap the existing `monitorWebSocket()` in a supervisor loop that:

1. Catches connection failures after Lark SDK retry exhaustion
2. Recreates the WSClient and retries with exponential backoff
3. Reuses OpenClaw's existing `createArmableStallWatchdog` for disconnect detection
4. Follows the Slack reconnection pattern (`while (!aborted) { try/catch + backoff }`)
5. Fails fast on non-recoverable errors (invalid credentials, app disabled)

## Environment

- OpenClaw: 2026.3.13
- macOS (Mac Mini M4), Feishu channel via WebSocket
- `@larksuiteoapi/node-sdk: ^1.59.0`
- Home network (occasional ISP disruptions)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(feishu): WebSocket connection not recovered after network disruption #52618

Bug Description

Steps to Reproduce

Expected Behavior

Root Cause Analysis

Proposed Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Channel	Reconnection	Backoff	Stall Detection
Slack	Explicit while loop with `SLACK_SOCKET_RECONNECT_POLICY`	Exponential (2s→30s, 1.8x, 25% jitter)	No
Telegram	grammY runner (`maxRetryTime: 60min`) + explicit loop	Exponential (2s→30s, 1.8x, 25% jitter)	Yes (90s watchdog)
Discord	Full lifecycle controller with `createArmableStallWatchdog`	Exponential via `computeBackoff`	Yes (5min reconnect stall)
Feishu	None — delegates entirely to Lark SDK (7 retries, then dead)	None	None

Uh oh!

fix(feishu): WebSocket connection not recovered after network disruption #52618

Description

Bug Description

Steps to Reproduce

Expected Behavior

Root Cause Analysis

Proposed Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions