DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

# DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

## Bug Description

When using `deepseek/deepseek-v4-pro` as the default model via OpenRouter, the Hermes Agent gateway enters a crash loop that renders the Telegram bot (and any other messaging integration) completely unresponsive. This has been a recurring issue over the past 2 days (April 26-27, 2026) with multiple distinct failure modes.

## Steps to Reproduce

1. Configure `model.default: deepseek/deepseek-v4-pro` with `model.provider: openrouter`
2. Connect a Telegram bot via the Hermes gateway
3. Wait for OpenRouter upstream rate limits or provider outages to trigger
4. Gateway crashes with `status=75/TEMPFAIL`, systemd auto-restarts, and the cycle repeats

## Failure Modes Observed

### Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)

The OpenRouter upstream provider "Io Net" applies aggressive rate limits to `deepseek/deepseek-v4-pro`. When the rate limit is hit:

- The gateway receives a Telegram message
- Attempts to call the model via OpenRouter
- Receives HTTP 429 ("deepseek/deepseek-v4-pro is temporarily rate-limited upstream")
- Retries 3 times, all fail
- Gateway process exits with `status=75/TEMPFAIL`
- systemd auto-restarts the gateway
- The cycle repeats indefinitely, making the bot completely unavailable

**Log excerpt:**
```
ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net)
WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted)
systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL
systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.
```

### Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)

When `auxiliary.vision.provider` is set to `auto`, the auto-detect mechanism resolves to `deepseek/deepseek-chat-v3-0324` — a model with only 16,384 tokens of context. Hermes Agent requires a minimum of 64,000 tokens for auxiliary operations, causing a hard crash:

```
ValueError: Model deepseek/deepseek-chat-v3-0324 has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent.
```

This crash happens **before** the agent can respond to any Telegram message, making the bot appear completely dead even though the gateway process stays running.

### Failure Mode 3: HTTP 401 Authentication Failure (April 27)

On a separate occasion, `deepseek/deepseek-v4-pro` started returning HTTP 401 errors with message "User not found":

```
ERROR: HTTP 401 - User not found (deepseek/deepseek-v4-pro via OpenRouter)
```

This is a non-retryable error that prevents the fallback mechanism from working correctly.

## Expected Behavior

1. **Rate limit resilience**: When the primary model hits rate limits, the gateway should gracefully fall back to a configured fallback model without crashing the entire process.
2. **Auxiliary model validation**: The `auto` provider resolution should validate that the resolved model meets the minimum context window requirement (64K tokens) before accepting it, and fall back to another model if it doesn't.
3. **Non-retryable error handling**: HTTP 401/403 errors should be treated as non-retryable and immediately trigger fallback rather than retrying.
4. **Gateway stability**: A model API failure should never crash the gateway process. The gateway should remain running and responsive even when all model calls fail.

## Actual Behavior

- Gateway crashes completely on model failures
- No graceful degradation — the Telegram bot goes 100% offline
- Auxiliary vision provider `auto` mode can select models with insufficient context
- systemd restart loop consumes resources without resolving the issue
- Orphaned gateway processes and stale PID files accumulate

## Root Cause Analysis

1. **No circuit breaker**: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.

2. **Fallback timing**: The `fallback_providers` configuration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.

3. **Auto provider resolution lacks constraints**: The `auxiliary.vision.provider: auto` setting resolves to any available model without checking if it meets the 64K minimum context window requirement.

4. **Process-level crash on API errors**: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.

## Suggested Fixes

1. **Implement a circuit breaker**: After N consecutive failures with a specific model, stop attempting that model for a cooldown period and route all requests to fallback providers.

2. **Validate auxiliary model selection**: When `provider: auto` resolves a model, validate that it meets the minimum context window requirement. If not, skip it and try the next available model.

3. **Separate gateway health from model health**: The gateway process should never crash due to a model API error. Failed model calls should return an error message to the user (e.g., "Model temporarily unavailable") while keeping the gateway running.

4. **Immediate fallback on non-retryable errors**: HTTP 4xx errors (401, 403) should skip retries entirely and immediately fall back.

5. **Rate limit backoff**: On HTTP 429, implement exponential backoff at the gateway level rather than retrying immediately and crashing.

6. **Official DeepSeek V4 Pro parser**: The Hermes parser currently only supports DeepSeek v3/v3.1. Adding a dedicated v4 parser would improve tool calling and response formatting reliability (related to #14902).

## Workaround (Current)

The following configuration changes mitigate the issues:

```yaml
# Use a stable fallback model
fallback_providers:
  - provider: openrouter
    model: z-ai/glm-5.1

# Force auxiliary vision to use a high-context model
auxiliary:
  vision:
    provider: openrouter  # NOT "auto"
    model: google/gemini-2.5-flash
```

Additionally, adding a personal DeepSeek API key at https://openrouter.ai/settings/integrations provides individual rate limits instead of relying on the shared OpenRouter pool.

## Environment

- **Hermes Agent version**: latest (as of April 27, 2026)
- **OS**: Linux (Ubuntu, systemd)
- **Provider**: OpenRouter
- **Model**: deepseek/deepseek-v4-pro (published 2026-04-24)
- **Integration**: Telegram bot (polling mode)
- **Gateway**: hermes-gateway.service (systemd user service)

## Related

- #14902 — Request for official DeepSeek V4 Pro parser support


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure #16677

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

Bug Description

Steps to Reproduce

Failure Modes Observed

Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)

Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)

Failure Mode 3: HTTP 401 Authentication Failure (April 27)

Expected Behavior

Actual Behavior

Root Cause Analysis

Suggested Fixes

Workaround (Current)

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure #16677

Description

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

Bug Description

Steps to Reproduce

Failure Modes Observed

Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)

Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)

Failure Mode 3: HTTP 401 Authentication Failure (April 27)

Expected Behavior

Actual Behavior

Root Cause Analysis

Suggested Fixes

Workaround (Current)

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions