Skip to content

fix(failover): classify Moonshot balance 429 as billing#83079

Merged
altaywtf merged 3 commits into
openclaw:mainfrom
leno23:hermes-auto/issue-43447-moonshot-429-billing
May 17, 2026
Merged

fix(failover): classify Moonshot balance 429 as billing#83079
altaywtf merged 3 commits into
openclaw:mainfrom
leno23:hermes-auto/issue-43447-moonshot-429-billing

Conversation

@leno23

@leno23 leno23 commented May 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fixes Moonshot/Kimi exhausted-balance responses that arrive as HTTP 429 from being classified as generic rate_limit.
  • Keeps the override narrow by checking high-confidence billing text only for Moonshot/Kimi before the generic HTTP 429 status rule returns rate_limit.
  • Keeps the existing regression coverage for Moonshot/Kimi balance-shaped 429 payloads, ordinary Moonshot 429 rate limits, and non-Moonshot providers.

Test Plan

  • node --import tsx ./tmp-openclaw-pr-83079-green-probe.mjs
  • git diff --check
  • Attempted: OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=180000 node scripts/run-vitest.mjs run src/agents/failover-error.test.ts -t "lets Moonshot/Kimi billing-shaped 429 payloads win over generic rate limit status" (local runner stalled before test output; see Real behavior proof caveat)

Real behavior proof

Behavior addressed: a Moonshot/Kimi response with HTTP 429 and an Insufficient account balance / Please recharge payload should surface as billing, not rate_limit, so OpenClaw can show billing/recharge guidance instead of rate-limit behavior.

Real environment tested: local OpenClaw source checkout using Node.js v22.18.0 after applying commit 14b33d7160f964a2f5606d5e37a8973bef216577 on this PR branch. No live Moonshot credentials were used.

Exact steps or command run after this patch: created a temporary repo-root probe that imports src/agents/failover-error.ts and src/agents/pi-embedded-helpers/errors.ts, then ran node --import tsx ./tmp-openclaw-pr-83079-green-probe.mjs; also ran git diff --check.

Evidence after fix:

$ node --import tsx ./tmp-openclaw-pr-83079-green-probe.mjs
{"cases":[["moonshot balance 429","billing","billing"],["kimi provider hint balance 429","billing","billing"],["moonshot ordinary 429","rate_limit","rate_limit"],["openai balance-shaped 429","rate_limit","rate_limit"]],"signal":{"kind":"reason","reason":"billing"}}

$ git diff --check
(passed with no output)

Observed result after fix: Moonshot and Kimi billing-shaped 429 payloads resolve to billing; an ordinary Moonshot 429 remains rate_limit; the same balance-shaped 429 payload from openai remains rate_limit; classifyFailoverSignal() returns { kind: "reason", reason: "billing" } for the Moonshot balance-shaped 429 signal.

What was not tested: live Moonshot/Kimi depleted-account behavior was not tested because this run has no provider credentials. The focused Vitest command stalled locally before producing test output (no output for 180000ms; terminating stalled Vitest process group), so CI is expected to provide the official runner result.

Risk / Notes

  • Scope is intentionally narrow: only Moonshot/Kimi billing-shaped 429 responses are allowed to override the generic 429 → rate_limit rule.
  • The fix addresses the matcher-order review finding by checking isBillingErrorMessage(message) directly inside the provider-specific 429 branch, instead of depending on the earlier message classification.

Fixes #43447

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 17, 2026
@clawsweeper

clawsweeper Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by maintainer comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors can comment @clawsweeper re-review or @clawsweeper re-run on their own open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR threads provider context into failover HTTP-status classification, classifies Moonshot/Kimi billing-shaped HTTP 429 payloads as billing before the generic 429 rate-limit rule, adds regression coverage, and updates the changelog.

Reproducibility: yes. at source level. Current main returns rate_limit for explicit HTTP 429 before the Moonshot/Kimi exhausted-balance payload can be preserved as billing, and the PR adds focused cases for that path.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal output from a local source-checkout probe exercising the exact classifier path and showing the expected Moonshot/Kimi billing results.

Next step before merge
No repair lane is needed because the PR already contains the focused fix; the remaining action is maintainer landing review after the pending checks complete.

Security
Cleared: The diff only changes failover classification logic, focused tests, and changelog text; I found no concrete security or supply-chain concern.

Review details

Best possible solution:

Land the narrow provider-aware 429 classifier override with the regression test once the remaining checks finish green, then let the linked source issue close from the merge.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main returns rate_limit for explicit HTTP 429 before the Moonshot/Kimi exhausted-balance payload can be preserved as billing, and the PR adds focused cases for that path.

Is this the best way to solve the issue?

Yes. The PR uses a narrow provider-scoped 429 exception and avoids reordering the shared message classifier, which is the lower-risk fix for this bug.

Acceptance criteria:

  • Wait for the latest GitHub check runs on f061e7d to complete.
  • Optional focused proof: node scripts/run-vitest.mjs run src/agents/failover-error.test.ts -t "lets Moonshot/Kimi billing-shaped 429 payloads win over generic rate limit status"

What I checked:

  • Current main generic 429 behavior: Current main maps explicit HTTP 429 directly to rate_limit, before any later billing text can win for the Moonshot/Kimi exhausted-balance payload. (src/agents/pi-embedded-helpers/errors.ts:646, 543518bd43d7)
  • Current main matcher ordering: The message classifier checks rate-limit patterns before billing patterns, and failover-matches includes rate[_ ]limit as a rate-limit signal while insufficient balance is a billing signal. (src/agents/pi-embedded-helpers/errors.ts:827, 543518bd43d7)
  • PR fix shape: The PR passes provider into classifyFailoverClassificationFromHttpStatus() and returns billing for Moonshot/Kimi HTTP 429 only when isBillingErrorMessage(message) matches, then keeps the generic 429 to rate_limit fallback. (src/agents/pi-embedded-helpers/errors.ts:653, f061e7d08c97)
  • Regression coverage: The added test covers Moonshot billing 429, Kimi provider-hint billing 429, ordinary Moonshot 429 rate limits, non-Moonshot balance-shaped 429 payloads, and classifyFailoverSignal(). (src/agents/failover-error.test.ts:299, f061e7d08c97)
  • Review discussion incorporated: The PR discussion identified that messageReason could not become billing because rate_limit_reached matched first; the current diff uses the recommended direct isBillingErrorMessage(message) check in the provider-specific 429 branch. (src/agents/pi-embedded-helpers/errors.ts:653, f061e7d08c97)
  • Latest head and changed files: The live PR head is f061e7d, changing only CHANGELOG.md, src/agents/failover-error.test.ts, and src/agents/pi-embedded-helpers/errors.ts. (f061e7d08c97)

Likely related people:

  • steipete: Local blame and GitHub file history show recent work on the central failover classifier and related failover handling paths. (role: recent area contributor; confidence: medium; commits: 4ccd07718d2f, 936c02e22c98; files: src/agents/pi-embedded-helpers/errors.ts, src/agents/failover-error.test.ts, src/agents/pi-embedded-helpers/failover-matches.ts)
  • vincentkoc: Recent main history for errors.ts includes auth/failover classification work, adjacent to the status and message-classification surface this PR changes. (role: recent adjacent contributor; confidence: medium; commits: 3485a907d13f; files: src/agents/pi-embedded-helpers/errors.ts)
  • hclsys: Recent main history for failover-matches.ts includes failover matcher changes and verification on related agent failover behavior. (role: recent adjacent contributor; confidence: medium; commits: 398dd6e0b091; files: src/agents/pi-embedded-helpers/failover-matches.ts)
  • altaywtf: The live PR is assigned to altaywtf, and the latest head commit only adds the changelog entry after the focused classifier fix. (role: current PR handler; confidence: medium; commits: f061e7d08c97; files: CHANGELOG.md)

Remaining risk / open question:

  • The latest head still had queued or in-progress check runs during review, including network-runtime-boundary and Socket Security checks.
  • The proof uses the issue-backed payload against the classifier path; no live depleted Moonshot/Kimi account was exercised.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 543518bd43d7.

@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. impact:auth-provider Auth, provider routing, model choice, or SecretRef resolution may break. labels May 17, 2026
@MantisCartography

Copy link
Copy Markdown

Verified ClawSweeper's P1 finding locally — the fix as written cannot work because of matcher ordering.

What I verified:

I inspected the branch source at 701a3a3d3f53 and traced the full classification path:

  1. classifyFailoverClassificationFromMessage() in errors.ts checks isRateLimitErrorMessage() before isBillingErrorMessage() (lines ~848 vs ~855+).
  2. The Moonshot fixture MOONSHOT_INSUFFICIENT_BALANCE_429_PAYLOAD contains "rate_limit_reached", which matches the rate-limit pattern /rate[_ ]limit/ in failover-matches.ts.
  3. So messageClassification resolves to "rate_limit", not "billing".
  4. The new 429 guard at errors.ts:651 checks messageReason === "billing", which is never true for this payload.
  5. The generic return toReasonClassification("rate_limit") at line ~657 fires instead.

Result: The test lets Moonshot/Kimi billing-shaped 429 payloads win over generic rate limit status should fail because resolveFailoverReasonFromError returns "rate_limit" for the Moonshot fixture.

Suggested fix approaches:

  • Option A: In the 429 handler, check isBillingErrorMessage(message) for known billing-sensitive providers (Moonshot/Kimi) directly, without relying on messageReason. E.g.:
    if (status === 429) {
      if (
        (isProvider(provider, "moonshot") || isProvider(provider, "kimi")) &&
        isBillingErrorMessage(message)
      ) {
        return toReasonClassification("billing");
      }
      return toReasonClassification("rate_limit");
    }
  • Option B: Reorder classifyFailoverClassificationFromMessage() to check billing patterns before rate-limit patterns when a provider hint indicates a billing-sensitive provider, though this is broader and more risky.

Option A is narrower and avoids changing the shared message classifier ordering. It also means the Moonshot payload's "Insufficient account balance. Please recharge" text will match the existing insufficient_balance billing pattern in failover-matches.ts, correctly classifying as billing even though "rate_limit_reached" also appears in the payload.

Has anyone run the test suite on this branch? I expect src/agents/failover-error.test.ts to fail on the Moonshot/Kimi 429 billing test case.

@leno23

leno23 commented May 17, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the matcher-order finding in 14b33d7160f964a2f5606d5e37a8973bef216577: the 429 branch now checks Moonshot/Kimi billing text directly with isBillingErrorMessage(message) before falling back to generic rate_limit, so the rate_limit_reached token no longer prevents exhausted-balance payloads from reaching billing.

Verification added to the PR body: source-level RED/GREEN probe for Moonshot, Kimi provider hint, ordinary Moonshot 429, non-Moonshot 429, plus git diff --check. The local Vitest runner stalled before output, so I left the PR as draft and documented that CI still needs to provide the official runner result.

@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed proof: supplied External PR includes structured after-fix real behavior proof. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 17, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@altaywtf altaywtf self-assigned this May 17, 2026
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: msteams Channel integration: msteams channel: slack Channel integration: slack channel: telegram Channel integration: telegram app: android App: android app: macos App: macos gateway Gateway runtime extensions: memory-core Extension: memory-core cli CLI command changes scripts Repository scripts commands Command implementations labels May 17, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed cli CLI command changes scripts Repository scripts commands Command implementations channel: feishu Channel integration: feishu channel: qa-channel Channel integration: qa-channel extensions: qa-lab extensions: codex extensions: lmstudio size: XL labels May 17, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@altaywtf altaywtf marked this pull request as ready for review May 17, 2026 14:37
Copilot AI review requested due to automatic review settings May 17, 2026 14:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Narrow fix to ensure Moonshot/Kimi providers that return HTTP 429 with an "Insufficient account balance" payload are classified as billing rather than the generic rate_limit, so the UI can show recharge guidance and failover behavior is correct (fixes #43447).

Changes:

  • Thread provider into classifyFailoverClassificationFromHttpStatus and, in the 429 branch, return billing for Moonshot/Kimi when the message is a billing-shaped error.
  • Add a regression test covering Moonshot/Kimi billing 429s, ordinary Moonshot 429s, and non-Moonshot 429 payloads.
  • Add a CHANGELOG entry.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/agents/pi-embedded-helpers/errors.ts Adds provider-aware billing override before the 429 → rate_limit fallback.
src/agents/failover-error.test.ts Adds coverage for the new Moonshot/Kimi billing-429 behavior.
CHANGELOG.md Documents the fix.

@clawsweeper clawsweeper Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@altaywtf altaywtf force-pushed the hermes-auto/issue-43447-moonshot-429-billing branch from 8377b4b to dd05301 Compare May 17, 2026 14:51
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@altaywtf altaywtf force-pushed the hermes-auto/issue-43447-moonshot-429-billing branch from dd05301 to f061e7d Compare May 17, 2026 14:58
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
@altaywtf altaywtf force-pushed the hermes-auto/issue-43447-moonshot-429-billing branch from f061e7d to 5856a6a Compare May 17, 2026 15:05
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 17, 2026
leno23 and others added 3 commits May 17, 2026 18:05
Preserve billing classification for Moonshot/Kimi HTTP 429 payloads that
report insufficient account balance, while leaving ordinary 429 responses as
rate_limit.
@altaywtf altaywtf force-pushed the hermes-auto/issue-43447-moonshot-429-billing branch from 5856a6a to 9f70bf5 Compare May 17, 2026 15:05
@altaywtf altaywtf merged commit 019dbcc into openclaw:main May 17, 2026
14 checks passed
@altaywtf

Copy link
Copy Markdown
Member

Merged via squash.

Thanks @leno23!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling impact:auth-provider Auth, provider routing, model choice, or SecretRef resolution may break. impact:session-state Session, memory, transcript, context, or agent state can drift or corrupt. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kimi/Moonshot 'Rate Limit' error masks insufficient funds, causes UI lockout

4 participants