fix(feishu): reconcile WebSocket reconnect backoff#73998
fix(feishu): reconcile WebSocket reconnect backoff#73998openclaw-clownfish[bot] wants to merge 1 commit intomainfrom
Conversation
Greptile SummaryThis PR replaces the Feishu WebSocket client's static Confidence Score: 4/5Safe to merge with one behavior change to verify: PingTimeout: 3 from the old static wsConfig is no longer set anywhere. All changes are P2. The only noteworthy gap is that the old PingTimeout: 3 constructor-level guard is dropped and not replicated — if the Lark SDK default is much larger, dead-connection detection may be slower. The rest of the implementation (backoff reset, error-triggered retry, ClientConfig sanitization) is logically sound and well-tested. extensions/feishu/src/client.ts — confirm whether PingTimeout needs to be set explicitly on the WSClient constructor alongside the new httpInstance approach. Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/feishu/src/client.ts
Line: 18-23
Comment:
**`PingTimeout` guard no longer applied**
The old `FEISHU_WS_CONFIG` included `PingTimeout: 3`, passed as a constructor-level `wsConfig` override that limited how long the SDK would wait for a pong before treating the connection as dead. The new `FEISHU_WS_CLIENT_CONFIG_DEFAULTS` and `sanitizeFeishuWsEndpointResponse` only sanitize server-returned fields (`PingInterval`, `ReconnectCount`, `ReconnectInterval`, `ReconnectNonce`). `PingTimeout` is a static SDK constructor option and does not come from the API response, so it is silently dropped. If the SDK's default `PingTimeout` is much larger than 3 s, a dead connection may stall for longer before the heartbeat timeout triggers — which was presumably the original motivation for the 3 s override.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 163-175
Comment:
**`handleAbort` redundantly calls `removeEventListener` when registered with `{ once: true }`**
The listener is added with `{ once: true }` (line 171), which auto-removes it on first fire. `handleAbort` then also calls `removeEventListener` manually (line 165), making that call a no-op. Both branches of cleanup (`handleAbort` itself and `removeAbortListener` in `.finally()`) attempt the same removal. The code is correct but the manual call inside `handleAbort` is dead. Consider removing it to avoid confusion about which mechanism actually deregisters the listener.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(feishu): reconcile WebSocket reconne..." | Re-trigger Greptile |
| const FEISHU_WS_CLIENT_CONFIG_DEFAULTS = { | ||
| PingInterval: 30, | ||
| PingTimeout: 3, | ||
| ReconnectCount: -1, | ||
| ReconnectInterval: 120, | ||
| ReconnectNonce: 30, | ||
| } as const; |
There was a problem hiding this comment.
PingTimeout guard no longer applied
The old FEISHU_WS_CONFIG included PingTimeout: 3, passed as a constructor-level wsConfig override that limited how long the SDK would wait for a pong before treating the connection as dead. The new FEISHU_WS_CLIENT_CONFIG_DEFAULTS and sanitizeFeishuWsEndpointResponse only sanitize server-returned fields (PingInterval, ReconnectCount, ReconnectInterval, ReconnectNonce). PingTimeout is a static SDK constructor option and does not come from the API response, so it is silently dropped. If the SDK's default PingTimeout is much larger than 3 s, a dead connection may stall for longer before the heartbeat timeout triggers — which was presumably the original motivation for the 3 s override.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/client.ts
Line: 18-23
Comment:
**`PingTimeout` guard no longer applied**
The old `FEISHU_WS_CONFIG` included `PingTimeout: 3`, passed as a constructor-level `wsConfig` override that limited how long the SDK would wait for a pong before treating the connection as dead. The new `FEISHU_WS_CLIENT_CONFIG_DEFAULTS` and `sanitizeFeishuWsEndpointResponse` only sanitize server-returned fields (`PingInterval`, `ReconnectCount`, `ReconnectInterval`, `ReconnectNonce`). `PingTimeout` is a static SDK constructor option and does not come from the API response, so it is silently dropped. If the SDK's default `PingTimeout` is much larger than 3 s, a dead connection may stall for longer before the heartbeat timeout triggers — which was presumably the original motivation for the 3 s override.
How can I resolve this? If you propose a fix, please make it concise.| const abortResult = new Promise<FeishuWsStopReason>((resolve) => { | ||
| const handleAbort = () => { | ||
| abortSignal.removeEventListener("abort", handleAbort); | ||
| resolve(); | ||
| resolve({ status: "abort" }); | ||
| }; | ||
| removeAbortListener = () => { | ||
| abortSignal.removeEventListener("abort", handleAbort); | ||
| }; | ||
| abortSignal.addEventListener("abort", handleAbort, { once: true }); | ||
| if (abortSignal.aborted) { | ||
| handleAbort(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
handleAbort redundantly calls removeEventListener when registered with { once: true }
The listener is added with { once: true } (line 171), which auto-removes it on first fire. handleAbort then also calls removeEventListener manually (line 165), making that call a no-op. Both branches of cleanup (handleAbort itself and removeAbortListener in .finally()) attempt the same removal. The code is correct but the manual call inside handleAbort is dead. Consider removing it to avoid confusion about which mechanism actually deregisters the listener.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 163-175
Comment:
**`handleAbort` redundantly calls `removeEventListener` when registered with `{ once: true }`**
The listener is added with `{ once: true }` (line 171), which auto-removes it on first fire. `handleAbort` then also calls `removeEventListener` manually (line 165), making that call a no-op. Both branches of cleanup (`handleAbort` itself and `removeAbortListener` in `.finally()`) attempt the same removal. The code is correct but the manual call inside `handleAbort` is dead. Consider removing it to avoid confusion about which mechanism actually deregisters the listener.
How can I resolve this? If you propose a fix, please make it concise.|
Codex review: needs changes before merge. Summary Reproducibility: yes. Mock the WS endpoint response as Next step before merge Security Review findings
Review detailsBest possible solution: Keep the Feishu-owned SDK lifecycle/httpInstance composition, preserve the current-main supervisor cleanup contract, and make missing or non-OK WS endpoint config deterministically reach a finite retry path with focused tests. Do we have a high-confidence way to reproduce the issue? Yes. Mock the WS endpoint response as Is this the best way to solve the issue? No. The direction is maintainable, but the current branch is incomplete; the safer fix is a rebased Feishu-local recovery policy that handles malformed endpoint responses while carrying forward current-main identity-preserving cleanup. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against f7549079ceb3. |
Summary
Repair the remaining Feishu WebSocket reconnect/backoff and PingInterval guard path for #55532 without process-wide Lark.WSClient prototype mutation. Keep the patch narrow to the Feishu client/monitor surfaces, building on current main after #72411.
Credit
Carries forward @sirfengyu's investigation from #55619 and the closed ProjectClownfish replacement attempt in #73945. Preserve @alex-xuweilong's heartbeat attribution from #45674 if that constructor path is touched.
Review blockers to address
Validation
Fixes #55532. Related to #42354. Security-routed refs #46472, #68865, and #72411 are not modified by this PR.
ProjectClownfish replacement details: