[Bug]: parse_available_output_tokens_from_error() misses OpenRouter/Nous "in the output" format — causes infinite auto-reset loop

## Bug Description

When using OpenRouter-compatible providers (Nous Research inference, OpenRouter itself), the `parse_available_output_tokens_from_error()` function in `agent/model_metadata.py` fails to detect **"output cap too large" errors**, causing a dead loop:

1. User configures `max_tokens` larger than `context_length - input_tokens`
2. Provider returns a 400 error with the **output cap** clearly stated
3. Hermes classifies it as `context_overflow` (correct)
4. But `parse_available_output_tokens_from_error()` returns `None` — it only recognizes Anthropic's `"available_tokens"` keyword, not OpenRouter's `"X in the output"` format
5. Hermes falls through to the **input-too-large recovery path** (compression)
6. On a fresh session with 1 message, compression cannot reduce anything
7. Returns "Context length exceeded (4,882 tokens). Cannot compress further."
8. Gateway auto-resets the session
9. Next message → same error → infinite loop

The user sees "Session auto-reset" repeatedly, and `/new` has no effect because the root cause is the `max_tokens` config value, not the session history.

## Steps to Reproduce

1. Configure Hermes to use an OpenRouter/Nous provider with a model that has a 256K context window (e.g., `stepfun/step-3.7-flash:free`)
2. Set `max_tokens: 262000` in config.yaml (exceeding `context_length - input_tokens`)
3. Send ANY message — even "hi" in a brand-new session
4. Observe: API returns 400, Hermes attempts compression, fails, auto-resets the session, loops forever

### Actual API Error (from Nous Research)

```
Error code: 400 - {
  "status": 400,
  "message": "This request is not valid. Check the model name and other parameters. 
    Additional info: This endpoint's maximum context length is 256000 tokens. 
    However, you requested about 281093 tokens 
    (5683 of text input, 13410 of tool input, 262000 in the output). 
    Please reduce the length of either one, or use the context-compression plugin 
    to compress your prompt automatically."
}
```

Breakdown:
| Component | Tokens |
|-----------|--------|
| Text input (system prompt + user msg) | 5,683 |
| Tool schemas | 13,410 |
| **max_tokens (output)** | **262,000** |
| Total requested | 281,093 |
| Model context window | 256,000 |
| **Exceeds by** | **25,093** |

Note: The input (`5683 + 13410 = 19093`) is NOT the problem. The output cap (`262000`) is.

## Expected Behavior

Hermes should:
1. Parse the OpenRouter/Nous error format: extract `maximum context length (256000)`, `text input (5683)`, `tool input (13410)`, and `output (262000)`
2. Detect this is an **output-cap error** (`max_tokens` is the issue, not input)
3. Calculate `available_output = context_length - text_input - tool_input`
4. Auto-reduce `max_tokens` to `available_output - safety_margin` and retry
5. Warn the user: "Output cap (262000) too large for remaining context. Auto-reduced to ~236907."

## Affected Component

- [x] Agent Core (conversation loop, context compression, memory)  
- [x] Gateway (Telegram/Discord/Slack/WhatsApp) — the auto-reset loop occurs here

## Root Cause Analysis

### Primary: Missing error format in `parse_available_output_tokens_from_error()`

**File:** `agent/model_metadata.py`, line ~958

The current `is_output_cap_error` guard:

```python
is_output_cap_error = (
    "max_tokens" in error_lower
    and ("available_tokens" in error_lower or "available tokens" in error_lower)
)
```

This only matches Anthropic-style errors (`"... = available_tokens: 10000"`). OpenRouter/Nous use the format:

```
(N of text input, M of tool input, K in the output)
```

This format appears in errors from at least:
- **Nous Research** inference API (`inference-api.nousresearch.com`)
- **OpenRouter** (`openrouter.ai`) — same proxy software

### Secondary: Bad UX — error message is misleading

The final error message seen by the user is:

```
Context length exceeded (4,882 tokens). Cannot compress further.
💡 The conversation has accumulated too much content. 
Try /new to start fresh, or /compress to manually trigger compression.
```

This message implies the **conversation history** is too long (4,882 tokens is tiny). The real fix is to reduce `max_tokens` — but the user has no way to know this. `/new` doesn't reset `max_tokens`, so the loop continues.

### Relevant Architecture Issue

Issue #9181 ("Architecture: separate base vs effective context in overflow recovery") tracks the broader design problem of conflation between input overflow and output-cap overflow. This bug is a concrete, reproducible instance of that conflation causing an infinite loop.

## Proposed Fix

Add OpenRouter/Nous error format support to `parse_available_output_tokens_from_error()`:

```python
# In the is_output_cap_error check:
is_output_cap_error = (
    "max_tokens" in error_lower
    and (
        "available_tokens" in error_lower 
        or "available tokens" in error_lower
        or re.search(r'\d+\s+in\s+the\s+output', error_lower) is not None
    )
)
```

And extract the available output tokens from the OpenRouter format:

```python
# New extraction for OpenRouter/Nous format:
# "5683 of text input, 13410 of tool input, 262000 in the output"
# + "maximum context length is 256000"
or_match = re.search(
    r'maximum\s+context\s+length\s+is\s+(\d+).*?'
    r'(\d+)\s+of\s+text\s+input.*?'
    r'(\d+)\s+of\s+tool\s+input',
    error_lower
)
if or_match:
    max_ctx = int(or_match.group(1))
    text_input = int(or_match.group(2))
    tool_input = int(or_match.group(3))
    available = max_ctx - text_input - tool_input
    if available >= 1:
        return available
```

## Additional Notes

The server-side fix in config is: set `max_tokens` to ≤ `context_length - ~20K` (system prompt + tools overhead), or remove it entirely to let the provider default apply.

## Environment

- **OS:** Ubuntu 24.04 (server), Telegram gateway
- **Hermes version:** 0.15.x (installed from source at `/usr/local/lib/hermes-agent`)
- **Model:** `stepfun/step-3.7-flash:free` via Nous Research
- **Provider:** `nous` (OpenRouter-compatible protocol)

## Are you willing to submit a PR for this?

- [x] Yes, I can submit a PR with the regex fix above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: parse_available_output_tokens_from_error() misses OpenRouter/Nous "in the output" format — causes infinite auto-reset loop #38652

Bug Description

Steps to Reproduce

Actual API Error (from Nous Research)

Expected Behavior

Affected Component

Root Cause Analysis

Primary: Missing error format in `parse_available_output_tokens_from_error()`

Secondary: Bad UX — error message is misleading

Relevant Architecture Issue

Proposed Fix

Additional Notes

Environment

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Tokens
Text input (system prompt + user msg)	5,683
Tool schemas	13,410
max_tokens (output)	262,000
Total requested	281,093
Model context window	256,000
Exceeds by	25,093

[Bug]: parse_available_output_tokens_from_error() misses OpenRouter/Nous "in the output" format — causes infinite auto-reset loop #38652

Description

Bug Description

Steps to Reproduce

Actual API Error (from Nous Research)

Expected Behavior

Affected Component

Root Cause Analysis

Primary: Missing error format in parse_available_output_tokens_from_error()

Secondary: Bad UX — error message is misleading

Relevant Architecture Issue

Proposed Fix

Additional Notes

Environment

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Primary: Missing error format in `parse_available_output_tokens_from_error()`