Skip to content

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure #16677

@LineckerN

Description

@LineckerN

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

Bug Description

When using deepseek/deepseek-v4-pro as the default model via OpenRouter, the Hermes Agent gateway enters a crash loop that renders the Telegram bot (and any other messaging integration) completely unresponsive. This has been a recurring issue over the past 2 days (April 26-27, 2026) with multiple distinct failure modes.

Steps to Reproduce

  1. Configure model.default: deepseek/deepseek-v4-pro with model.provider: openrouter
  2. Connect a Telegram bot via the Hermes gateway
  3. Wait for OpenRouter upstream rate limits or provider outages to trigger
  4. Gateway crashes with status=75/TEMPFAIL, systemd auto-restarts, and the cycle repeats

Failure Modes Observed

Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)

The OpenRouter upstream provider "Io Net" applies aggressive rate limits to deepseek/deepseek-v4-pro. When the rate limit is hit:

  • The gateway receives a Telegram message
  • Attempts to call the model via OpenRouter
  • Receives HTTP 429 ("deepseek/deepseek-v4-pro is temporarily rate-limited upstream")
  • Retries 3 times, all fail
  • Gateway process exits with status=75/TEMPFAIL
  • systemd auto-restarts the gateway
  • The cycle repeats indefinitely, making the bot completely unavailable

Log excerpt:

ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net)
WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted)
systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL
systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.

Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)

When auxiliary.vision.provider is set to auto, the auto-detect mechanism resolves to deepseek/deepseek-chat-v3-0324 — a model with only 16,384 tokens of context. Hermes Agent requires a minimum of 64,000 tokens for auxiliary operations, causing a hard crash:

ValueError: Model deepseek/deepseek-chat-v3-0324 has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent.

This crash happens before the agent can respond to any Telegram message, making the bot appear completely dead even though the gateway process stays running.

Failure Mode 3: HTTP 401 Authentication Failure (April 27)

On a separate occasion, deepseek/deepseek-v4-pro started returning HTTP 401 errors with message "User not found":

ERROR: HTTP 401 - User not found (deepseek/deepseek-v4-pro via OpenRouter)

This is a non-retryable error that prevents the fallback mechanism from working correctly.

Expected Behavior

  1. Rate limit resilience: When the primary model hits rate limits, the gateway should gracefully fall back to a configured fallback model without crashing the entire process.
  2. Auxiliary model validation: The auto provider resolution should validate that the resolved model meets the minimum context window requirement (64K tokens) before accepting it, and fall back to another model if it doesn't.
  3. Non-retryable error handling: HTTP 401/403 errors should be treated as non-retryable and immediately trigger fallback rather than retrying.
  4. Gateway stability: A model API failure should never crash the gateway process. The gateway should remain running and responsive even when all model calls fail.

Actual Behavior

  • Gateway crashes completely on model failures
  • No graceful degradation — the Telegram bot goes 100% offline
  • Auxiliary vision provider auto mode can select models with insufficient context
  • systemd restart loop consumes resources without resolving the issue
  • Orphaned gateway processes and stale PID files accumulate

Root Cause Analysis

  1. No circuit breaker: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.

  2. Fallback timing: The fallback_providers configuration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.

  3. Auto provider resolution lacks constraints: The auxiliary.vision.provider: auto setting resolves to any available model without checking if it meets the 64K minimum context window requirement.

  4. Process-level crash on API errors: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.

Suggested Fixes

  1. Implement a circuit breaker: After N consecutive failures with a specific model, stop attempting that model for a cooldown period and route all requests to fallback providers.

  2. Validate auxiliary model selection: When provider: auto resolves a model, validate that it meets the minimum context window requirement. If not, skip it and try the next available model.

  3. Separate gateway health from model health: The gateway process should never crash due to a model API error. Failed model calls should return an error message to the user (e.g., "Model temporarily unavailable") while keeping the gateway running.

  4. Immediate fallback on non-retryable errors: HTTP 4xx errors (401, 403) should skip retries entirely and immediately fall back.

  5. Rate limit backoff: On HTTP 429, implement exponential backoff at the gateway level rather than retrying immediately and crashing.

  6. Official DeepSeek V4 Pro parser: The Hermes parser currently only supports DeepSeek v3/v3.1. Adding a dedicated v4 parser would improve tool calling and response formatting reliability (related to [Feature]: support for deepseek-v4-pro model #14902).

Workaround (Current)

The following configuration changes mitigate the issues:

# Use a stable fallback model
fallback_providers:
  - provider: openrouter
    model: z-ai/glm-5.1

# Force auxiliary vision to use a high-context model
auxiliary:
  vision:
    provider: openrouter  # NOT "auto"
    model: google/gemini-2.5-flash

Additionally, adding a personal DeepSeek API key at https://openrouter.ai/settings/integrations provides individual rate limits instead of relying on the shared OpenRouter pool.

Environment

  • Hermes Agent version: latest (as of April 27, 2026)
  • OS: Linux (Ubuntu, systemd)
  • Provider: OpenRouter
  • Model: deepseek/deepseek-v4-pro (published 2026-04-24)
  • Integration: Telegram bot (polling mode)
  • Gateway: hermes-gateway.service (systemd user service)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adapterprovider/deepseekDeepSeek APIprovider/openrouterOpenRouter aggregatortype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions