fix(feishu): reconcile WebSocket reconnect backoff#73945
fix(feishu): reconcile WebSocket reconnect backoff#73945openclaw-clownfish[bot] wants to merge 2 commits intomainfrom
Conversation
|
Closing this PR because the author has more than 10 active PRs in this repo. Please reduce the active PR queue and reopen or resubmit once it is back under the limit. You can close your own PRs to get back under the limit. |
Greptile SummaryThis PR replaces the previous global Confidence Score: 5/5Safe to merge; no new P0/P1 findings beyond those already in open review threads. All identified concerns are P2 or were already raised in prior review threads. The core logic — managed client wrapper, separated backoff counters, reset after successful start, per-instance PingInterval guard — is sound. raceWithTimeoutAndAbort propagates start() rejections correctly via Promise.race semantics, and waitForTerminalError only ever resolves, so status-handling paths are correct. extensions/feishu/src/client.ts (scheduleReconnectClose / double-close interaction) and extensions/feishu/src/monitor.transport.ts (recoveringFromDisconnect classification for factory-level failures) — both already tracked in open threads. Reviews (3): Last reviewed commit: "fix(feishu): reconcile WebSocket reconne..." | Re-trigger Greptile |
| const isReconnectFailure = | ||
| recoveringFromDisconnect || isFeishuWebSocketReconnectRequiredError(err); | ||
| if (isReconnectFailure) { | ||
| reconnectAttempt += 1; | ||
| recoveringFromDisconnect = true; |
There was a problem hiding this comment.
Start-failure during reconnect recovery uses full reconnect backoff
When recoveringFromDisconnect is true and createFeishuWSClient(account) itself throws (e.g., transient DNS failure, auth rejection), the failure is classified as a reconnect failure (isReconnectFailure = true) and uses getFeishuWsReconnectDelayMs (starting at 120 s) instead of the startup retry delay (starting at 1 s). If the intent is to stay aggressive only for SDK-initiated reconnects and apply normal start-retry delays for factory-level failures, consider checking isFeishuWebSocketReconnectRequiredError(err) solely rather than short-circuiting on recoveringFromDisconnect.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 220-224
Comment:
**Start-failure during reconnect recovery uses full reconnect backoff**
When `recoveringFromDisconnect` is `true` and `createFeishuWSClient(account)` itself throws (e.g., transient DNS failure, auth rejection), the failure is classified as a reconnect failure (`isReconnectFailure = true`) and uses `getFeishuWsReconnectDelayMs` (starting at 120 s) instead of the startup retry delay (starting at 1 s). If the intent is to stay aggressive only for SDK-initiated reconnects and apply normal start-retry delays for factory-level failures, consider checking `isFeishuWebSocketReconnectRequiredError(err)` solely rather than short-circuiting on `recoveringFromDisconnect`.
How can I resolve this? If you propose a fix, please make it concise.|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Close as superseded. The same Feishu reconnect/backoff repair is now tracked by #73998, which explicitly carries forward this branch and its source work while addressing the review blockers; current main still lacks the complete repair, so the remaining work should continue on the newer PR rather than this older branch. So I’m closing this here and keeping the remaining discussion on the canonical linked item. Review detailsBest possible solution: Close this older PR and review the newer #73998 branch as the canonical Feishu reconnect/backoff repair, preserving the contributor credit from #55619 and the heartbeat attribution while keeping #55532 open until a replacement lands. Do we have a high-confidence way to reproduce the issue? Yes. Source-level reproduction is high-confidence: current main waits only for abort after Is this the best way to solve the issue? No. This branch is no longer the best fix path because #73998 supersedes it with a narrower replacement plan that explicitly carries forward this work and addresses this branch's residual review blockers. Security review: Security review cleared: The diff is limited to Feishu plugin runtime, tests, and changelog; it does not change dependencies, workflows, package resolution, permissions, or secret handling surfaces. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 7969f1f07ccc. |
871970c to
bc33d7f
Compare
bc33d7f to
5a42828
Compare
Summary
Repair the Feishu WebSocket reconnect path from #55619 without process-wide Lark.WSClient prototype mutation. Rebase the contributor branch onto current main, verify the pinned Lark SDK start/onReady/onError/reConnect behavior, and keep the fix inside the Feishu app-layer monitor/client code.
Credit
Carries forward @sirfengyu's reconnect/backoff investigation from #55619. Preserve existing heartbeat attribution from #45674 if that path is touched.
Review blockers to address
Validation
Fixes #55532. Related to #42354.
ProjectClownfish replacement details:
! [remote rejected] HEAD -> pr-55546 (refusing to allow a GitHub App to create or update workflow
.github/workflows/auto-response.ymlwithoutworkflowspermission)error: failed to push some refs to 'https://github.com/sirfengyu/openclaw.git'