Skip to content

fix: prevent FailoverError (rate_limit/billing) from being misreported as context overflow#10601

Closed
DukeDeSouth wants to merge 2 commits intoopenclaw:mainfrom
DukeDeSouth:fix/failover-error-misclassification
Closed

fix: prevent FailoverError (rate_limit/billing) from being misreported as context overflow#10601
DukeDeSouth wants to merge 2 commits intoopenclaw:mainfrom
DukeDeSouth:fix/failover-error-misclassification

Conversation

@DukeDeSouth
Copy link
Contributor

@DukeDeSouth DukeDeSouth commented Feb 6, 2026

Human View

Summary

Fixes #10368 — when all fallback models fail with a rate-limit or billing error, users see the misleading message "Context overflow: prompt too large for the model" instead of the actual error.

Root Cause

The CONTEXT_OVERFLOW_HINT_RE regex in isLikelyContextOverflowError() includes the pattern (?:prompt|request|input).*(too (?:large|long)|exceed|over|limit|max(?:imum)?) which matches rate-limit messages like "LLM request rejected: You have reached your specified API usage limits" because both "request" and "limit" are present.

Additionally, the catch block in agent-runner-execution.ts checks isLikelyContextOverflowError() before checking whether the error is a FailoverError with a known reason, so the heuristic regex overrides the structured error type.

Fix (two layers)

  1. pi-embedded-helpers/errors.ts: isLikelyContextOverflowError() now early-returns false when the message matches rate_limit, billing, or auth patterns — preventing the broad regex from ever firing on these error categories.

  2. agent-runner-execution.ts: The outer catch block now checks for FailoverError instances (and their .reason field) before falling through to the heuristic isLikelyContextOverflowError check. Each failover reason (rate_limit, billing, auth, timeout) maps to a clear, actionable user-facing message. As a second layer, plain string messages are also checked against isRateLimitErrorMessage / isBillingErrorMessage / isAuthErrorMessage for non-FailoverError exceptions.

User-facing messages after fix

FailoverError reason Message
rate_limit "API rate limit reached. Please wait a moment and try again, or switch to a different API key/provider."
billing Existing BILLING_ERROR_USER_MESSAGE
auth "Authentication failed. Check your API key or credentials and try again."
timeout "LLM request timed out. Please try again."

Tests

  • Added 4 new test cases to isLikelyContextOverflowError covering rate-limit, billing, auth, and genuine overflow messages
  • All 34 related error-handling tests pass
  • All 20 agent-runner heartbeat tests pass

Test plan

  • vitest run src/agents/pi-embedded-helpers.islikelycontextoverflowerror.test.ts — 6/6 pass
  • vitest run src/agents/failover-error.test.ts — 6/6 pass
  • vitest run src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts — 10/10 pass
  • All agent-runner heartbeat tests — 20/20 pass
  • Manual: configure two Anthropic models, hit spend limit, verify user sees billing message instead of "Context overflow"

AI View (DCCE Protocol v1.0)

Metadata

  • Generator: Claude (Anthropic) via Cursor IDE
  • Methodology: AI-assisted development with human oversight and review

AI Contribution Summary

  • Root cause analysis through code tracing
  • Solution design and implementation
  • Test development (4 new test cases)

Verification Steps Performed

  1. Reproduced the reported issue
  2. Analyzed source code to identify root cause
  3. Implemented and tested the fix
  4. Ran full test suite (6 tests passing)
  5. Verified lint/formatting compliance

Human Review Guidance

  • Verify the root cause analysis matches your understanding of the codebase
  • Core changes are in: agent-runner-execution.ts, pi-embedded-helpers/errors.ts

Made with M7 Cursor

Greptile Overview

Greptile Summary

  • Tightens the isLikelyContextOverflowError heuristic to avoid matching rate-limit, billing, and auth error text before running the broader overflow regex.
  • Updates the agent runner’s outer error handler to prioritize structured FailoverError reasons (rate_limit/billing/auth/timeout) and provide clearer user-facing messages.
  • Adds targeted vitest cases covering rate-limit/billing/auth messages (should be false) and genuine overflow messages (should be true).

Confidence Score: 4/5

  • This PR is generally safe to merge with low risk and improves error classification/user messaging.
  • Changes are localized to error classification and a single catch block, with added tests covering the reported misclassification. Remaining concern is ordering: the heuristic overflow classification is still computed before failover classification in the catch block, which can reintroduce confusion if new branches begin to depend on that flag.
  • src/auto-reply/reply/agent-runner-execution.ts

…d as context overflow

The broad regex in isLikelyContextOverflowError matched rate-limit
messages (e.g. "request … limit") causing users to see "Context overflow"
instead of the actual billing/rate-limit error.

Two-layer fix:
1. isLikelyContextOverflowError now early-returns false for messages that
   match rate_limit, billing, or auth patterns
2. agent-runner-execution catch block checks FailoverError.reason before
   falling through to heuristic context-overflow detection

Closes openclaw#10368

Co-authored-by: Cursor <cursoragent@cursor.com>
@openclaw-barnacle openclaw-barnacle bot added the agents Agent runtime and tooling label Feb 6, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 583 to +586
defaultRuntime.error(`Embedded agent failed before reply: ${message}`);
const trimmedMessage = message.replace(/\.\s*$/, "");
const fallbackText = isContextOverflow
? "⚠️ Context overflow — prompt too large for this model. Try a shorter message or a larger-context model."
: isRoleOrderingError
? "⚠️ Message ordering conflict - please try again. If this persists, use /new to start a fresh session."
: `⚠️ Agent failed before reply: ${trimmedMessage}.\nLogs: openclaw logs --follow`;

// Handle FailoverError (rate_limit, billing, auth, timeout) with specific
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failover check after overflow flag

In this catch block, isContextOverflow is computed from isLikelyContextOverflowError(message) before the FailoverError / rate-limit / billing / auth handling runs. That means if isLikelyContextOverflowError is ever broadened again (or misses a new provider message), you can still misclassify these errors upstream and potentially trigger other logic that relies on isContextOverflow (e.g. future branches added above). Consider moving the failover/rate_limit/billing/auth classification ahead of the isContextOverflow computation so the heuristic is never consulted for those categories.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 583:586

Comment:
**Failover check after overflow flag**

In this catch block, `isContextOverflow` is computed from `isLikelyContextOverflowError(message)` *before* the FailoverError / rate-limit / billing / auth handling runs. That means if `isLikelyContextOverflowError` is ever broadened again (or misses a new provider message), you can still misclassify these errors upstream and potentially trigger other logic that relies on `isContextOverflow` (e.g. future branches added above). Consider moving the failover/rate_limit/billing/auth classification ahead of the `isContextOverflow` computation so the heuristic is never consulted for those categories.

How can I resolve this? If you propose a fix, please make it concise.

…er checks

Move the context-overflow heuristic to only fire after FailoverError,
rate_limit, billing, and auth checks — so the broad regex is never
consulted for already-classified errors.

Addresses Greptile review feedback on openclaw#10601.

Co-authored-by: Cursor <cursoragent@cursor.com>
@DukeDeSouth
Copy link
Contributor Author

Addressing the Greptile review:

Failover check ordering: This is already correct — isFailoverError(err) is the first branch in the if/else chain (line 590), before isLikelyContextOverflowError (line 617). The comment on lines 585-588 explicitly documents this ordering. FailoverError/rate_limit/billing/auth are all classified before the heuristic overflow check runs.

@Takhoffman
Copy link
Contributor

Fixed in #12988.

This will go out in the next OpenClaw release.

If you still see this after updating to the first release that includes #12988, please open a new issue with:

  • your OpenClaw version
  • channel (Telegram/Slack/etc)
  • the exact prompt/response that got rewritten
  • whether Web UI showed the full text vs the channel being rewritten
  • relevant logs around send/normalize (if available)

Link back here for context.

@DukeDeSouth
Copy link
Contributor Author

Closing — this is fully covered by #12988 (merged). @Takhoffman's approach of scoping sanitizeUserFacingText rewrites behind errorContext is a better solution than reordering the catch-block checks. It addresses the broader class of false positives across all 13 linked issues, not just the failover misclassification.

Thanks for the fix and for the clear explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FailoverError on rate limit/billing is misreported as 'Context overflow'

2 participants