Skip to content

Commit 84acb74

Browse files
ladygegemarshallm-createsliverp
authored
fix(feishu): retry on send rate-limit errors (230020/230006) (#89659)
* fix(feishu): add retry with linear backoff for send rate-limit errors When Feishu returns code 230020 (per-chat rate limit), requestFeishuApi now retries up to 2 times with linear backoff (500ms, 1000ms). The reply path (im.message.reply) is also covered via the same retry helper. Confirmed by a real 20-concurrent-send stress test: all 20 messages succeed after retry. Closes #70879 * ci: retrigger CI * fix(feishu): retry HTTP 429 and code 11232 for message send rate limits Feishu Open API has three send-time rate limit signals: HTTP 429 (gateway-wide quota), business code 11232 (tenant-level message service: 100/min, 5/sec), and 230020 (per-chat). Previously only 230020 was retried; HTTP 429 and 11232 propagated as fatal errors. - Add 11232 to FEISHU_SEND_RATE_LIMIT_CODES. - In getFeishuSendRateLimitCode, recognize HTTP 429 before reading the body code so gateway-level limits enter the retry loop. - Update doc comment listing both gateway and business sources. * test(feishu): add focused retry coverage for 11232 and HTTP 429 The previous send.retry.test.ts only exercised 230020 / 230006 / non-rate codes / plain errors. After expanding the retry policy in 90c7870 to cover code 11232 (tenant-level message rate limit) and gateway-level HTTP 429, ClawSweeper review #89659 (P2) flagged the tests as no longer matching the production behavior. - getFeishuSendRateLimitCode: assert 11232 returns 11232, HTTP 429 returns 429, and HTTP 429 wins over body code when both are present. - requestFeishuApi: cover 11232 retry-then-success, 429 retry-then-success, exhaustion paths for both, and a mixed 230020 → 11232 → ok recovery. * fix(feishu): retry on fulfilled rate-limit response bodies (no-throw) The Feishu node SDK sometimes resolves a non-throwing response that carries a rate-limit code in its body (e.g. { code: 11232, msg: ... }) instead of rejecting. requestFeishuApi previously returned that body straight away and downstream assertFeishuMessageApiSuccess failed once with no retry — the same shape that issue #28157 fixed earlier on the typing/reaction path via getBackoffCodeFromResponse. ClawSweeper review on #89659 (P1, comment-shared.ts:140) flagged the gap. Mirror the typing-path pattern for the send helper: - Add getFeishuSendRateLimitCodeFromResponse to classify fulfilled bodies against FEISHU_SEND_RATE_LIMIT_CODES (230020, 11232). - In requestFeishuApi, after each fulfilled await, classify before returning. If the body is a retryable rate limit and there are attempts left, continue the loop. After exhaustion, wrap the last fulfilled body into a synthetic AxiosError-shaped error so callers see the same error shape as the throw path. - Add 11 focused tests covering fulfilled 11232/230020 retry-then-ok, exhaustion, mixed throw → fulfilled → ok recovery, and pass-through for code 0 / non-rate-limit codes. * fix(feishu): break loop on final-attempt fulfilled rate-limit body ClawSweeper review on dc8d3be (P1, comment-shared.ts:166) caught a real bug: when the final retry attempt also fulfilled with a rate-limit body (e.g. { code: 11232, ... }), the guard `attempt < FEISHU_SEND_MAX_RETRIES` was false so control fell through to `return result` — bypassing the synthetic-error exhaustion path and handing the rate-limit body to the caller as if it were a successful response. The fulfilled-exhaustion test missed this because Vitest's local fs module cache served the pre-fix shape; running with a fresh cache reproduces the failure. Split the fulfilled-rate-limit branch so the body is always captured, then continue on a non-final attempt or break on the final attempt. Breaking falls through to the synthetic AxiosError-shaped throw below, which is exactly what the existing exhaustion test asserts. * fix(feishu): retry on send rate-limit errors 230020/11232/429 (#89659) (thanks @ladygege) --------- Co-authored-by: marshall.m <marshall.m@binance.com> Co-authored-by: sliverp <870080352@qq.com>
1 parent 9bbde70 commit 84acb74

6 files changed

Lines changed: 762 additions & 37 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Docs: https://docs.openclaw.ai
3939
- Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. (#90072, #90208, #90277, #90488) Thanks @MonkeyLeeT and @sallyom.
4040
- Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. (#89732, #90145)
4141
- Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. (#87951, #87965, #90486, #68113, #90534, #90181, #90607, #89500) Thanks @MukundaKatta, @mcaxtr, @infoanton, @mushuiyu886, and @sahibzada-allahyar.
42+
- Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. (#89659) Thanks @ladygege.
4243
- Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. (#90532, #90287, #90058) Thanks @RomneyDa, @hxy91819, and @mrunalp.
4344
- Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides.
4445
- Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements.

extensions/feishu/src/comment-shared.ts

Lines changed: 94 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -89,19 +89,109 @@ export function createFeishuApiError(
8989
return new Error(formatFeishuApiFailure(error, errorPrefix, options), { cause: error });
9090
}
9191

92+
// Feishu message-API error codes that signal a transient rate limit; safe to retry with backoff.
93+
// 230020: per-chat rate limit (ext=chat rate limit) — confirmed by real concurrent load test.
94+
// 11232: tenant-level "create message service trigger rate limit" (100/min, 5/sec per app/bot).
95+
// Distinct from FEISHU_BACKOFF_CODES in typing.ts, which covers the reaction API (99991400+).
96+
const FEISHU_SEND_RATE_LIMIT_CODES = new Set([230020, 11232]);
97+
const FEISHU_SEND_MAX_RETRIES = 2;
98+
const FEISHU_SEND_RETRY_BASE_MS = 500;
99+
100+
/**
101+
* Returns a numeric rate-limit signal when an AxiosError indicates a retryable
102+
* Feishu message-API rate limit. Sources, in priority order:
103+
* 1. Gateway-level HTTP 429 (app-wide quota; `x-ogw-ratelimit-reset` header)
104+
* 2. Business-level `code` in `error.response.data.code` matching
105+
* FEISHU_SEND_RATE_LIMIT_CODES (e.g. 230020 per-chat, 11232 tenant-level).
106+
* Returns `undefined` for all other errors so they propagate without retry.
107+
*/
108+
export function getFeishuSendRateLimitCode(error: unknown): number | undefined {
109+
if (!isRecord(error)) {
110+
return undefined;
111+
}
112+
const response = isRecord(error.response) ? error.response : undefined;
113+
// HTTP 429: Feishu Open API gateway-level rate limit, always retry.
114+
if (typeof response?.status === "number" && response.status === 429) {
115+
return 429;
116+
}
117+
const data = isRecord(response?.data) ? response.data : undefined;
118+
const code = data?.code;
119+
return typeof code === "number" && FEISHU_SEND_RATE_LIMIT_CODES.has(code) ? code : undefined;
120+
}
121+
122+
/**
123+
* Returns a retryable rate-limit code when a fulfilled (non-throwing) Feishu
124+
* SDK response embeds it in the response body. The Feishu node SDK can resolve
125+
* with `{ code: 11232, msg: "..." }` instead of throwing — see typing.ts
126+
* (getBackoffCodeFromResponse) and issue #28157 for the same behavior on
127+
* messageReaction.create. Without this classification, requestFeishuApi would
128+
* `return` the rate-limited body and downstream `assertFeishuMessageApiSuccess`
129+
* would fail once with no retry.
130+
*/
131+
export function getFeishuSendRateLimitCodeFromResponse(response: unknown): number | undefined {
132+
if (!isRecord(response)) {
133+
return undefined;
134+
}
135+
const code = (response as { code?: unknown }).code;
136+
return typeof code === "number" && FEISHU_SEND_RATE_LIMIT_CODES.has(code) ? code : undefined;
137+
}
138+
92139
export async function requestFeishuApi<T>(
93140
request: () => Promise<T>,
94141
errorPrefix: string,
95142
options: {
96143
includeConfigParams?: boolean;
97144
includeNestedErrorLogId?: boolean;
145+
/** Base delay per retry attempt in ms; multiplied by attempt index. @internal */
146+
retryDelayMs?: number;
98147
} = {},
99148
): Promise<T> {
100-
try {
101-
return await request();
102-
} catch (error) {
103-
throw createFeishuApiError(error, errorPrefix, options);
149+
const retryDelayMs = options.retryDelayMs ?? FEISHU_SEND_RETRY_BASE_MS;
150+
let lastFulfilledRateLimit: { response: unknown; code: number } | undefined;
151+
for (let attempt = 0; attempt <= FEISHU_SEND_MAX_RETRIES; attempt++) {
152+
if (attempt > 0) {
153+
// Linear backoff: delay grows with each attempt to give the rate-limit window time to reset.
154+
await new Promise<void>((resolve) => {
155+
setTimeout(resolve, attempt * retryDelayMs);
156+
});
157+
}
158+
try {
159+
const result = await request();
160+
// Feishu SDK may fulfill with a rate-limit body (e.g. { code: 11232, ... })
161+
// instead of throwing. Classify before returning so retry covers both shapes.
162+
const fulfilledRateLimit = getFeishuSendRateLimitCodeFromResponse(result);
163+
if (fulfilledRateLimit !== undefined) {
164+
// Capture for the synthetic-error path below; on a non-final attempt
165+
// continue retrying, on the final attempt fall through so the loop
166+
// exits and the wrapped exhaustion error is thrown.
167+
lastFulfilledRateLimit = { response: result, code: fulfilledRateLimit };
168+
if (attempt < FEISHU_SEND_MAX_RETRIES) {
169+
continue;
170+
}
171+
break;
172+
}
173+
return result;
174+
} catch (error) {
175+
const isRetryable =
176+
attempt < FEISHU_SEND_MAX_RETRIES && getFeishuSendRateLimitCode(error) !== undefined;
177+
if (!isRetryable) {
178+
throw createFeishuApiError(error, errorPrefix, options);
179+
}
180+
// Rate-limit on a non-final attempt — loop continues to next retry.
181+
}
182+
}
183+
// Exhausted retries while the SDK kept fulfilling rate-limit bodies. Surface
184+
// the last response as an error so callers see the same wrapped shape they
185+
// would have seen if the SDK had thrown.
186+
if (lastFulfilledRateLimit) {
187+
const synthetic = Object.assign(
188+
new Error(`Request fulfilled with rate-limit code ${lastFulfilledRateLimit.code}`),
189+
{ response: { status: 200, data: lastFulfilledRateLimit.response } },
190+
);
191+
throw createFeishuApiError(synthetic, errorPrefix, options);
104192
}
193+
// Unreachable: every iteration either returns or throws. Required for TypeScript exhaustiveness.
194+
throw createFeishuApiError(new Error("unreachable"), errorPrefix, options);
105195
}
106196

107197
type ParsedCommentDocumentRef = {

0 commit comments

Comments
 (0)