Skip to content

[Bug]: parse_available_output_tokens_from_error() misses OpenRouter/Nous "in the output" format — causes infinite auto-reset loop #38652

@Xeron2000

Description

@Xeron2000

Bug Description

When using OpenRouter-compatible providers (Nous Research inference, OpenRouter itself), the parse_available_output_tokens_from_error() function in agent/model_metadata.py fails to detect "output cap too large" errors, causing a dead loop:

  1. User configures max_tokens larger than context_length - input_tokens
  2. Provider returns a 400 error with the output cap clearly stated
  3. Hermes classifies it as context_overflow (correct)
  4. But parse_available_output_tokens_from_error() returns None — it only recognizes Anthropic's "available_tokens" keyword, not OpenRouter's "X in the output" format
  5. Hermes falls through to the input-too-large recovery path (compression)
  6. On a fresh session with 1 message, compression cannot reduce anything
  7. Returns "Context length exceeded (4,882 tokens). Cannot compress further."
  8. Gateway auto-resets the session
  9. Next message → same error → infinite loop

The user sees "Session auto-reset" repeatedly, and /new has no effect because the root cause is the max_tokens config value, not the session history.

Steps to Reproduce

  1. Configure Hermes to use an OpenRouter/Nous provider with a model that has a 256K context window (e.g., stepfun/step-3.7-flash:free)
  2. Set max_tokens: 262000 in config.yaml (exceeding context_length - input_tokens)
  3. Send ANY message — even "hi" in a brand-new session
  4. Observe: API returns 400, Hermes attempts compression, fails, auto-resets the session, loops forever

Actual API Error (from Nous Research)

Error code: 400 - {
  "status": 400,
  "message": "This request is not valid. Check the model name and other parameters. 
    Additional info: This endpoint's maximum context length is 256000 tokens. 
    However, you requested about 281093 tokens 
    (5683 of text input, 13410 of tool input, 262000 in the output). 
    Please reduce the length of either one, or use the context-compression plugin 
    to compress your prompt automatically."
}

Breakdown:

Component Tokens
Text input (system prompt + user msg) 5,683
Tool schemas 13,410
max_tokens (output) 262,000
Total requested 281,093
Model context window 256,000
Exceeds by 25,093

Note: The input (5683 + 13410 = 19093) is NOT the problem. The output cap (262000) is.

Expected Behavior

Hermes should:

  1. Parse the OpenRouter/Nous error format: extract maximum context length (256000), text input (5683), tool input (13410), and output (262000)
  2. Detect this is an output-cap error (max_tokens is the issue, not input)
  3. Calculate available_output = context_length - text_input - tool_input
  4. Auto-reduce max_tokens to available_output - safety_margin and retry
  5. Warn the user: "Output cap (262000) too large for remaining context. Auto-reduced to ~236907."

Affected Component

  • Agent Core (conversation loop, context compression, memory)
  • Gateway (Telegram/Discord/Slack/WhatsApp) — the auto-reset loop occurs here

Root Cause Analysis

Primary: Missing error format in parse_available_output_tokens_from_error()

File: agent/model_metadata.py, line ~958

The current is_output_cap_error guard:

is_output_cap_error = (
    "max_tokens" in error_lower
    and ("available_tokens" in error_lower or "available tokens" in error_lower)
)

This only matches Anthropic-style errors ("... = available_tokens: 10000"). OpenRouter/Nous use the format:

(N of text input, M of tool input, K in the output)

This format appears in errors from at least:

  • Nous Research inference API (inference-api.nousresearch.com)
  • OpenRouter (openrouter.ai) — same proxy software

Secondary: Bad UX — error message is misleading

The final error message seen by the user is:

Context length exceeded (4,882 tokens). Cannot compress further.
💡 The conversation has accumulated too much content. 
Try /new to start fresh, or /compress to manually trigger compression.

This message implies the conversation history is too long (4,882 tokens is tiny). The real fix is to reduce max_tokens — but the user has no way to know this. /new doesn't reset max_tokens, so the loop continues.

Relevant Architecture Issue

Issue #9181 ("Architecture: separate base vs effective context in overflow recovery") tracks the broader design problem of conflation between input overflow and output-cap overflow. This bug is a concrete, reproducible instance of that conflation causing an infinite loop.

Proposed Fix

Add OpenRouter/Nous error format support to parse_available_output_tokens_from_error():

# In the is_output_cap_error check:
is_output_cap_error = (
    "max_tokens" in error_lower
    and (
        "available_tokens" in error_lower 
        or "available tokens" in error_lower
        or re.search(r'\d+\s+in\s+the\s+output', error_lower) is not None
    )
)

And extract the available output tokens from the OpenRouter format:

# New extraction for OpenRouter/Nous format:
# "5683 of text input, 13410 of tool input, 262000 in the output"
# + "maximum context length is 256000"
or_match = re.search(
    r'maximum\s+context\s+length\s+is\s+(\d+).*?'
    r'(\d+)\s+of\s+text\s+input.*?'
    r'(\d+)\s+of\s+tool\s+input',
    error_lower
)
if or_match:
    max_ctx = int(or_match.group(1))
    text_input = int(or_match.group(2))
    tool_input = int(or_match.group(3))
    available = max_ctx - text_input - tool_input
    if available >= 1:
        return available

Additional Notes

The server-side fix in config is: set max_tokens to ≤ context_length - ~20K (system prompt + tools overhead), or remove it entirely to let the provider default apply.

Environment

  • OS: Ubuntu 24.04 (server), Telegram gateway
  • Hermes version: 0.15.x (installed from source at /usr/local/lib/hermes-agent)
  • Model: stepfun/step-3.7-flash:free via Nous Research
  • Provider: nous (OpenRouter-compatible protocol)

Are you willing to submit a PR for this?

  • Yes, I can submit a PR with the regex fix above

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt builderprovider/nousNous Research API (OAuth)provider/openrouterOpenRouter aggregatortype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions