Skip to content

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

@DaAwesomeRazor

Description

@DaAwesomeRazor

Bug Description

When the Anthropic API returns a 400 error with the message "prompt is too long: 233153 tokens > 200000 maximum", the agent treats it as a non-retryable client error and immediately aborts — instead of triggering context compression.

This happens because "prompt is too long" is not in the is_context_length_error phrase list in run_agent.py (line ~3685). The error falls through to the generic 400 handler which gives up immediately.

Steps to Reproduce

  1. Use an Anthropic model (via OpenRouter or direct API) with a 200k context limit
  2. Have a long-running gateway session (Discord, Telegram, etc.) that accumulates large conversation history
  3. Send a message when the context exceeds 200k tokens
  4. Actual: Agent returns "Non-retryable client error detected. Aborting immediately."
  5. Expected: Agent triggers context compression, summarizes middle turns, and retries

Root Cause

The is_context_length_error check in run_agent.py (~line 3685) checks for these phrases:

'context length', 'context size', 'maximum context',
'token limit', 'too many tokens', 'reduce the length',
'exceeds the limit', 'context window',
'request entity too large',

Anthropic's error format "prompt is too long: N tokens > M maximum" doesn't match any of these. It falls through to the generic 4xx handler at line ~3755 which treats all unrecognized 400 errors as non-retryable and aborts.

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

The preflight compression uses estimate_messages_tokens_rough() which divides total chars by 4. Tool call messages with JSON payloads tokenize at closer to 2-3 chars/token, so the rough estimate can significantly undercount — causing preflight compression to not trigger when it should.

Gateway session hygiene only estimates simple messages

The gateway's session hygiene auto-compression (~line 1002 in gateway/run.py) filters to only user/assistant messages for its token estimate, but the actual API call includes full tool_calls and tool results. A session can pass the hygiene check while still being over-limit with the full message payload.

Proposed Fix

Primary — run_agent.py

Add 'prompt is too long' to the is_context_length_error detection list.

Secondary — agent/model_metadata.py

Make estimate_messages_tokens_rough() more conservative (e.g. 3.2 chars/token + per-message overhead) so preflight compression triggers earlier for JSON-heavy conversations.

Additional Notes

  • Other providers may have their own unique error messages for context overflow that aren't currently detected — might want a more robust regex-based approach rather than exact phrase matching
  • The parse_context_limit_from_error() function already correctly parses the 200000 limit from this error format, so once detection works the step-down logic is fine

Environment

  • Hermes Agent main branch
  • Model: anthropic/claude-opus-4.6 via OpenRouter
  • Platform: Discord gateway
  • Error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions