fix(feishu): re-throw rate-limit errors from typing indicator to trip circuit breaker by kollaikal-rupesh · Pull Request #28086 · openclaw/openclaw

kollaikal-rupesh · 2026-02-27T01:11:09Z

addTypingIndicator and removeTypingIndicator caught all errors silently, preventing the circuit breaker from tripping on Feishu API rate-limit / quota errors. This caused the keepalive loop to retry indefinitely (~4,800 failed calls/hour).

Fixes #28062

Summary

Problem: Feishu typing indicator catch block swallows all errors including HTTP 429 and quota error 99991403. The circuit breaker in typing-start-guard.ts relies on errors being thrown to count consecutive failures and trip — but it never sees them.
Why it matters: After quota is hit, generates ~76,000+ failed API calls in 8 hours. Worsens quota exhaustion and floods logs.
What changed: Rate-limit errors (HTTP 429 / Feishu code 99991403) are now re-thrown so the circuit breaker can trip and stop the keepalive loop. Other non-critical errors (e.g. message deleted) remain silently handled.
What did NOT change: No changes to the circuit breaker itself, the keepalive loop, or any other extension. Only the Feishu typing indicator error handling is affected.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Integrations

Linked Issue/PR

Closes Feishu typing indicator ignores 429/quota errors, causing infinite retry loop #28062

User-visible / Behavior Changes

When Feishu API quota is exceeded, the typing indicator stops retrying after 2 consecutive failures (circuit breaker default) instead of retrying indefinitely.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

Integration/channel: Feishu (Lark)

Steps

Exhaust Feishu API monthly quota (or simulate 429 response)
Send a message that triggers agent reply with typing indicator
Observe keepalive loop behavior

Expected

Circuit breaker trips after 2 consecutive failures, keepalive loop stops.

Actual (before fix)

Keepalive loop retries every 3 seconds indefinitely, generating ~4,800 failed calls/hour.

Evidence

Trace/log snippets

From the issue: 76,000+ total 429 errors accumulated in ~8 hours before manual intervention.

The fix ensures these errors propagate to the circuit breaker (typing-start-guard.ts line 41: consecutiveFailures += 1) which trips at maxConsecutiveFailures: 2 and calls keepaliveLoop.stop().

Human Verification (required)

Verified scenarios: Traced error flow from addTypingIndicator → typing-start-guard.run() → circuit breaker. Confirmed the guard expects thrown errors to count failures.
Edge cases checked: Non-rate-limit errors (e.g. message deleted, network timeout) still silently handled. removeTypingIndicator gets the same treatment.
What I did not verify: Live Feishu 429 reproduction (no Feishu API access).

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

Revert this single commit
The only behavioral change: rate-limit errors now throw instead of being swallowed

Risks and Mitigations

Risk: Some other caller of addTypingIndicator might not expect thrown errors.
- Mitigation: All callers go through createTypingCallbacks → typing-start-guard.run(), which already has a try-catch that handles thrown errors (counting failures + optional rethrow). This is the intended contract.

… circuit breaker The addTypingIndicator and removeTypingIndicator functions caught all errors silently, preventing the circuit breaker in typing-start-guard from detecting failures. When the Feishu API returns 429 or quota error 99991403, the keepalive loop retried indefinitely (~4,800 failed calls per hour). Re-throw rate-limit and quota errors so the circuit breaker can count consecutive failures and trip, stopping the keepalive loop. Other non-critical errors (e.g. message deleted) remain silently handled. Fixes openclaw#28062

kollaikal-rupesh · 2026-02-27T01:11:18Z

Hey @steipete @vincentkoc — fix for #28062 (Feishu typing indicator retries indefinitely on 429). Small, targeted change — only re-throws rate-limit errors so the existing circuit breaker can do its job. Would appreciate a review!

greptile-apps · 2026-02-27T01:14:42Z

Greptile Summary

This PR fixes a real and impactful bug: addTypingIndicator was silently swallowing all errors (including HTTP 429 / Feishu quota code 99991403), which prevented the typing-start-guard circuit breaker from ever seeing failures — causing the keepalive loop to retry indefinitely (~4,800 failed API calls/hour).

The fix is correct and well-targeted for addTypingIndicator: errors now propagate through startGuard.run() → consecutiveFailures counter → circuit breaker trip + keepaliveLoop.stop().

Key observations:

The addTypingIndicator re-throw correctly reaches the circuit breaker via startGuard.run() in src/channels/typing.ts.
The removeTypingIndicator re-throw is harmless but its comment ("so the circuit breaker can trip") is inaccurate — errors from the stop callback are caught by void stop().catch(...) and forwarded to onStopError/logTypingFailure, never touching the circuit breaker's failure counter.
The new isRateLimitError helper follows the existing Axios error shape used elsewhere in the Feishu extension (e.g. bot.ts) and the detection logic is sound.
No unit tests are included for the new isRateLimitError function; a colocated typing.test.ts would improve confidence and match repo testing conventions.

Confidence Score: 4/5

Safe to merge — the core fix for addTypingIndicator is correct, no other code paths are changed, and the only behavioral risk is a slightly misleading comment in removeTypingIndicator.
The fix correctly threads rate-limit errors from addTypingIndicator through startGuard.run() to the circuit breaker, directly solving the root cause. The isRateLimitError shape is consistent with how other Feishu SDK errors are handled in the codebase. The removeTypingIndicator re-throw is harmless. Minor deductions for a misleading comment and the absence of unit tests for the new helper.
No files require special attention beyond the inline notes on extensions/feishu/src/typing.ts.

_{Last reviewed commit: 077db0a}

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-27T01:14:46Z

extensions/feishu/src/typing.ts

+    // Re-throw rate-limit / quota errors so the circuit breaker can trip (#28062).
+    if (isRateLimitError(err)) {
+      throw err;
+    }


Misleading comment — removeTypingIndicator re-throw does not reach the circuit breaker

The comment says "so the circuit breaker can trip", but the circuit breaker only counts failures from the start callback, which is wrapped in startGuard.run(). The stop callback is invoked as void stop().catch(...) in typing.ts (line 95 of src/channels/typing.ts):

void stop().catch((err) => (params.onStopError ?? params.onStartError)(err));

Any error thrown here is caught by .catch() and forwarded to onStopError / logTypingFailure — it never increments consecutiveFailures in the guard. The re-throw is harmless (rate-limit errors will still be logged via logTypingFailure instead of console.log), but the comment overstates what it achieves. Consider updating it to reflect the actual behavior, e.g.:

Suggested change

// Re-throw rate-limit / quota errors so the circuit breaker can trip (#28062).

if (isRateLimitError(err)) {

throw err;

}

// Re-throw rate-limit / quota errors so callers (onStopError) can observe them (#28062).

if (isRateLimitError(err)) {

throw err;

}

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/feishu/src/typing.ts Line: 82-85 Comment: **Misleading comment — `removeTypingIndicator` re-throw does not reach the circuit breaker** The comment says "so the circuit breaker can trip", but the circuit breaker only counts failures from the **`start`** callback, which is wrapped in `startGuard.run()`. The `stop` callback is invoked as `void stop().catch(...)` in `typing.ts` (line 95 of `src/channels/typing.ts`): ```ts void stop().catch((err) => (params.onStopError ?? params.onStartError)(err)); ``` Any error thrown here is caught by `.catch()` and forwarded to `onStopError` / `logTypingFailure` — it never increments `consecutiveFailures` in the guard. The re-throw is harmless (rate-limit errors will still be logged via `logTypingFailure` instead of `console.log`), but the comment overstates what it achieves. Consider updating it to reflect the actual behavior, e.g.: ```suggestion // Re-throw rate-limit / quota errors so callers (onStopError) can observe them (#28062). if (isRateLimitError(err)) { throw err; } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-27T01:14:47Z

extensions/feishu/src/typing.ts

+function isRateLimitError(err: unknown): boolean {
+  if (typeof err !== "object" || err === null) {
+    return false;
+  }
+  const response = (err as Record<string, unknown>).response as Record<string, unknown> | undefined;
+  if (!response) {
+    return false;
+  }
+  if (response.status === 429) {
+    return true;
+  }
+  const data = response.data as Record<string, unknown> | undefined;
+  return data?.code === 99991403;
+}


Missing unit tests for isRateLimitError

The new isRateLimitError function has specific detection logic (HTTP 429 status and Feishu error code 99991403) that would benefit from a colocated typing.test.ts. Currently there is no test file for extensions/feishu/src/typing.ts at all.

Scenarios worth covering:

An Axios error with response.status === 429 → true

An Axios error with response.data.code === 99991403 → true

An Axios error with response.data.code === 99991672 (permission, not rate-limit) → false

A plain Error with no response property → false

null / non-object errors → false

This aligns with the repo's testing guidelines (Vitest, colocated *.test.ts, pnpm test before pushing logic changes).

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/feishu/src/typing.ts Line: 95-108 Comment: **Missing unit tests for `isRateLimitError`** The new `isRateLimitError` function has specific detection logic (HTTP `429` status and Feishu error code `99991403`) that would benefit from a colocated `typing.test.ts`. Currently there is no test file for `extensions/feishu/src/typing.ts` at all. Scenarios worth covering: - An Axios error with `response.status === 429` → `true` - An Axios error with `response.data.code === 99991403` → `true` - An Axios error with `response.data.code === 99991672` (permission, not rate-limit) → `false` - A plain `Error` with no `response` property → `false` - `null` / non-object errors → `false` This aligns with the repo's testing guidelines (Vitest, colocated `*.test.ts`, `pnpm test` before pushing logic changes). <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

- Updated removeTypingIndicator comment to accurately reflect that re-thrown errors reach onStopError, not the circuit breaker directly. - Added unit tests for isRateLimitError covering HTTP 429, Feishu quota code 99991403, non-rate-limit codes, and edge cases. - Exported isRateLimitError for testability.

kollaikal-rupesh · 2026-02-27T03:03:35Z

Addressed both review comments in 1e3e6f1:

Misleading comment — Updated the removeTypingIndicator comment to accurately reflect that re-thrown errors reach onStopError, not the circuit breaker directly. Good catch.
Missing tests — Added typing.test.ts with 6 tests for isRateLimitError:
- HTTP 429 → true
- Feishu quota code 99991403 → true
- Non-rate-limit code (99991672) → false
- Plain Error with no response → false
- null → false
- Non-object values → false

All tests pass. Also exported isRateLimitError for testability.

Takhoffman · 2026-02-28T00:42:26Z

PR #28086 is superseded by commit 32ee2f0:
32ee2f0

Closing as superseded to keep one canonical fix path.

kollaikal-rupesh mentioned this pull request Feb 27, 2026

Feishu typing indicator ignores 429/quota errors, causing infinite retry loop #28062

Closed

openclaw-barnacle bot added channel: feishu Channel integration: feishu size: XS labels Feb 27, 2026

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

guoqunabc mentioned this pull request Feb 27, 2026

fix(feishu): propagate rate-limit errors from typing indicator to circuit breaker #28157

Closed

openclaw-barnacle bot added size: S and removed size: XS labels Feb 27, 2026

guoqunabc mentioned this pull request Feb 27, 2026

fix(feishu): break infinite typing-indicator retry loop on rate-limit / quota errors #28494

Merged

Takhoffman added the close:superseded PR close reason label Feb 28, 2026

Takhoffman closed this Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(feishu): re-throw rate-limit errors from typing indicator to trip circuit breaker#28086

fix(feishu): re-throw rate-limit errors from typing indicator to trip circuit breaker#28086
kollaikal-rupesh wants to merge 2 commits intoopenclaw:mainfrom
kollaikal-rupesh:fix/feishu-typing-indicator-infinite-retry

kollaikal-rupesh commented Feb 27, 2026

Uh oh!

kollaikal-rupesh commented Feb 27, 2026

Uh oh!

greptile-apps bot commented Feb 27, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 27, 2026

Uh oh!

greptile-apps bot Feb 27, 2026

Uh oh!

kollaikal-rupesh commented Feb 27, 2026

Uh oh!

Takhoffman commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kollaikal-rupesh commented Feb 27, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

kollaikal-rupesh commented Feb 27, 2026

Uh oh!

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

kollaikal-rupesh commented Feb 27, 2026

Uh oh!

Takhoffman commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants