Skip to content

fix(feishu): propagate rate-limit errors from typing indicator to circuit breaker#28157

Closed
guoqunabc wants to merge 7 commits intoopenclaw:mainfrom
guoqunabc:fix/feishu-typing-rate-limit-circuit-breaker
Closed

fix(feishu): propagate rate-limit errors from typing indicator to circuit breaker#28157
guoqunabc wants to merge 7 commits intoopenclaw:mainfrom
guoqunabc:fix/feishu-typing-rate-limit-circuit-breaker

Conversation

@guoqunabc
Copy link
Contributor

Summary / 概述

Fixes #28062

The typing indicator silently swallowed all errors, preventing the circuit breaker (typing-start-guard, maxConsecutiveFailures: 2) from tripping on rate-limit failures. This caused the keepalive loop to retry every 3 seconds indefinitely when the Feishu API quota was exceeded.

飞书 typing indicator 静默吞掉了所有异常,导致断路器无法在 429/配额超限时触发熔断。keepalive 循环每3秒无限重试,实测8小时内产生 76,000+ 次无效 API 调用。

Changes / 改动

extensions/feishu/src/typing.ts

  • Extract isFeishuBackoffError() helper — detects rate-limit conditions in both AxiosError and Feishu SDK error shapes:
    • HTTP 429 (standard rate limit)
    • Feishu code 99991403 (monthly quota exceeded)
    • Feishu code 99991400 (per-second rate limit)
    • Uses a Set for error codes, making it easy to add future codes
  • Re-throw backoff errors from both addTypingIndicator and removeTypingIndicator so the circuit breaker can count failures and trip
  • Log with distinct message ("hit rate-limit/quota, stopping keepalive") to differentiate from non-critical failures in debug logs
  • Other non-critical errors (message deleted, permission issues) remain silently handled

提取 isFeishuBackoffError() 辅助函数,支持 AxiosError 和飞书 SDK 两种错误形态。429/配额错误向上抛出,让断路器正常熔断。使用 Set 存储错误码,方便未来扩展。

extensions/feishu/src/typing.test.ts (new)

  • 11 unit tests covering: HTTP 429, quota code, rate-limit code, SDK error shape, non-matching errors, edge cases (null, undefined, string, Error)

新增 11 个单元测试,覆盖各种错误形态和边界条件。

Comparison with #28086 / 与 #28086 的区别

Aspect #28086 This PR
Error detection AxiosError only AxiosError + SDK error shape
Error codes 99991403 only 99991403 + 99991400 (rate limit)
Extensibility Hardcoded checks Set-based, easy to add codes
Debug logging No log on re-throw Distinct log message for backoff
Unit tests None 11 tests
Helper exported No Yes (isFeishuBackoffError)

Testing / 测试

  • Verified locally: after the fix, the keepalive loop stops within 2 failures (6 seconds) instead of retrying indefinitely
  • Unit tests pass for all error detection paths

…cuit breaker

The typing indicator's addTypingIndicator/removeTypingIndicator silently
caught all errors, preventing the circuit breaker in createTypingCallbacks
(typing-start-guard) from tripping on rate-limit failures. This caused
the keepalive loop to retry every 3s indefinitely when the Feishu API
quota was exceeded, generating 76,000+ failed calls in ~8 hours.

Changes:
- Extract isFeishuBackoffError() helper that detects HTTP 429, Feishu
  error codes 99991403 (quota exceeded) and 99991400 (rate limit),
  in both AxiosError and SDK error shapes
- Re-throw backoff errors from both add and remove functions so the
  circuit breaker can count consecutive failures and trip
- Add comprehensive unit tests for the detection helper
- Other non-critical errors (message deleted, permissions) remain
  silently handled

Fixes #28062
@openclaw-barnacle openclaw-barnacle bot added channel: feishu Channel integration: feishu size: S labels Feb 27, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Fixed critical rate-limit handling bug in Feishu typing indicator that caused infinite API retries when quota was exceeded (reported as 76,000+ calls in 8 hours).

Key changes:

  • Extracted isFeishuBackoffError() helper that detects rate-limit conditions in both AxiosError and Feishu SDK error shapes (HTTP 429, codes 99991403/99991400)
  • Modified addTypingIndicator() and removeTypingIndicator() to re-throw backoff errors instead of silently swallowing them, allowing the circuit breaker to trip after 2 failures
  • Used Set-based approach for error codes, making it easy to add future codes
  • Added distinct logging for rate-limit errors vs other non-critical failures
  • Comprehensive test coverage with 11 unit tests covering various error shapes and edge cases

The implementation is clean, well-structured, and follows repository guidelines. No issues found.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The fix is well-targeted, solving a specific documented problem (Feishu typing indicator ignores 429/quota errors, causing infinite retry loop #28062) with a clean implementation. Comprehensive test coverage (11 tests) validates all error detection paths. The change is backwards compatible, maintains existing behavior for non-critical errors, and introduces no breaking changes. Code follows repository style guidelines and is properly documented.
  • No files require special attention

Last reviewed commit: 559526d

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87dd4c6cd2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Address Codex review feedback on #28157: the Feishu SDK may return a
normal response object with an API-level error code (e.g. 99991403)
instead of throwing. Previously this path treated the call as success
with reactionId: null, so the circuit breaker never counted a failure.

Changes:
- Add hasBackoffCodeInResponse() helper to detect backoff codes in
  normal SDK response objects
- Check response codes after both messageReaction.create and .delete
  before treating the result as success
- Throw to trip the circuit breaker when backoff codes are detected
- Add 7 new unit tests for hasBackoffCodeInResponse() (total: 18)
@guoqunabc
Copy link
Contributor Author

Thanks for the excellent catch @chatgpt-codex-connector! 🎯

You're right — the Feishu SDK can return a normal response with an API-level error code instead of throwing. This would bypass the catch block entirely and the circuit breaker would never trip.

Addressed in 4f201ae:

  • Added hasBackoffCodeInResponse() helper to detect backoff codes in non-throwing SDK responses
  • Both addTypingIndicator and removeTypingIndicator now check response codes before treating the result as success
  • Added 7 new unit tests (total: 18)

@chatgpt-codex-connector
Copy link

To use Codex here, create a Codex account and connect to github.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f201ae6fe

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Address Codex P2 review: when hasBackoffCodeInResponse() detects a
backoff code and throws, the error is caught by the surrounding
try/catch. A plain Error has no .code property, so isFeishuBackoffError()
cannot match it and the error is silently swallowed.

This affects BOTH addTypingIndicator and removeTypingIndicator (Codex
only flagged removeTypingIndicator).

Fix: introduce FeishuBackoffError class that extends Error with a
numeric .code property. isFeishuBackoffError() already checks err.code,
so these errors are now correctly detected and re-thrown to trip the
circuit breaker.

Added 5 new tests for FeishuBackoffError including a catch-rethrow
simulation that proves the fix works (total: 23 tests).
@guoqunabc
Copy link
Contributor Author

Addressed Codex P2 review in 6d1adec — and found the issue also affects addTypingIndicator, not just removeTypingIndicator.

Root cause: throw new Error(...) inside try is caught by the surrounding catch. Since a plain Error has no .code property, isFeishuBackoffError() doesn't match it → silently swallowed → circuit breaker never trips.

Fix: Introduced FeishuBackoffError class extending Error with a numeric .code property. Since isFeishuBackoffError() already checks err.code, these errors are now correctly detected and re-thrown.

Added 5 new tests (total: 23), including a catch-rethrow simulation that proves the fix works end-to-end. 🎯

…h getBackoffCodeFromResponse

The boolean helper left `code` typed as `number | undefined`, causing
TS2345 when passed to FeishuBackoffError(number). Instead of using a
non-null assertion, refactored to getBackoffCodeFromResponse() which
returns the numeric code directly (or undefined), eliminating the type
mismatch cleanly.
@guoqunabc guoqunabc closed this by deleting the head repository Feb 27, 2026
Takhoffman pushed a commit to guoqunabc/openclaw that referenced this pull request Feb 28, 2026
Address Codex review feedback on openclaw#28157: the Feishu SDK may return a
normal response object with an API-level error code (e.g. 99991403)
instead of throwing. Previously this path treated the call as success
with reactionId: null, so the circuit breaker never counted a failure.

Changes:
- Add hasBackoffCodeInResponse() helper to detect backoff codes in
  normal SDK response objects
- Check response codes after both messageReaction.create and .delete
  before treating the result as success
- Throw to trip the circuit breaker when backoff codes are detected
- Add 7 new unit tests for hasBackoffCodeInResponse() (total: 18)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feishu typing indicator ignores 429/quota errors, causing infinite retry loop

1 participant