Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing

## Bug Description

When the Anthropic API returns a 400 error with the message `"prompt is too long: 233153 tokens > 200000 maximum"`, the agent treats it as a **non-retryable client error** and immediately aborts — instead of triggering context compression.

This happens because `"prompt is too long"` is not in the `is_context_length_error` phrase list in `run_agent.py` (line ~3685). The error falls through to the generic 400 handler which gives up immediately.

## Steps to Reproduce

1. Use an Anthropic model (via OpenRouter or direct API) with a 200k context limit
2. Have a long-running gateway session (Discord, Telegram, etc.) that accumulates large conversation history
3. Send a message when the context exceeds 200k tokens
4. **Actual:** Agent returns `"Non-retryable client error detected. Aborting immediately."`
5. **Expected:** Agent triggers context compression, summarizes middle turns, and retries

## Root Cause

The `is_context_length_error` check in `run_agent.py` (~line 3685) checks for these phrases:
```python
'context length', 'context size', 'maximum context',
'token limit', 'too many tokens', 'reduce the length',
'exceeds the limit', 'context window',
'request entity too large',
```

Anthropic's error format `"prompt is too long: N tokens > M maximum"` doesn't match any of these. It falls through to the generic 4xx handler at line ~3755 which treats all unrecognized 400 errors as non-retryable and aborts.

## Contributing Factors

### Token estimation undercounts JSON-heavy tool messages
The preflight compression uses `estimate_messages_tokens_rough()` which divides total chars by 4. Tool call messages with JSON payloads tokenize at closer to 2-3 chars/token, so the rough estimate can significantly undercount — causing preflight compression to not trigger when it should.

### Gateway session hygiene only estimates simple messages
The gateway's session hygiene auto-compression (~line 1002 in `gateway/run.py`) filters to only `user`/`assistant` messages for its token estimate, but the actual API call includes full `tool_calls` and `tool` results. A session can pass the hygiene check while still being over-limit with the full message payload.

## Proposed Fix

### Primary — `run_agent.py`
Add `'prompt is too long'` to the `is_context_length_error` detection list.

### Secondary — `agent/model_metadata.py`
Make `estimate_messages_tokens_rough()` more conservative (e.g. 3.2 chars/token + per-message overhead) so preflight compression triggers earlier for JSON-heavy conversations.

## Additional Notes

- Other providers may have their own unique error messages for context overflow that aren't currently detected — might want a more robust regex-based approach rather than exact phrase matching
- The `parse_context_limit_from_error()` function already correctly parses the `200000` limit from this error format, so once detection works the step-down logic is fine

## Environment

- Hermes Agent main branch
- Model: anthropic/claude-opus-4.6 via OpenRouter
- Platform: Discord gateway
- Error: `Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

Bug Description

Steps to Reproduce

Root Cause

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

Gateway session hygiene only estimates simple messages

Proposed Fix

Primary — `run_agent.py`

Secondary — `agent/model_metadata.py`

Additional Notes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

Description

Bug Description

Steps to Reproduce

Root Cause

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

Gateway session hygiene only estimates simple messages

Proposed Fix

Primary — run_agent.py

Secondary — agent/model_metadata.py

Additional Notes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Primary — `run_agent.py`

Secondary — `agent/model_metadata.py`