You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure
Bug Description
When using deepseek/deepseek-v4-pro as the default model via OpenRouter, the Hermes Agent gateway enters a crash loop that renders the Telegram bot (and any other messaging integration) completely unresponsive. This has been a recurring issue over the past 2 days (April 26-27, 2026) with multiple distinct failure modes.
Steps to Reproduce
Configure model.default: deepseek/deepseek-v4-pro with model.provider: openrouter
Connect a Telegram bot via the Hermes gateway
Wait for OpenRouter upstream rate limits or provider outages to trigger
Gateway crashes with status=75/TEMPFAIL, systemd auto-restarts, and the cycle repeats
The OpenRouter upstream provider "Io Net" applies aggressive rate limits to deepseek/deepseek-v4-pro. When the rate limit is hit:
The gateway receives a Telegram message
Attempts to call the model via OpenRouter
Receives HTTP 429 ("deepseek/deepseek-v4-pro is temporarily rate-limited upstream")
Retries 3 times, all fail
Gateway process exits with status=75/TEMPFAIL
systemd auto-restarts the gateway
The cycle repeats indefinitely, making the bot completely unavailable
Log excerpt:
ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net)
WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted)
systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL
systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.
When auxiliary.vision.provider is set to auto, the auto-detect mechanism resolves to deepseek/deepseek-chat-v3-0324 — a model with only 16,384 tokens of context. Hermes Agent requires a minimum of 64,000 tokens for auxiliary operations, causing a hard crash:
ValueError: Model deepseek/deepseek-chat-v3-0324 has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent.
This crash happens before the agent can respond to any Telegram message, making the bot appear completely dead even though the gateway process stays running.
On a separate occasion, deepseek/deepseek-v4-pro started returning HTTP 401 errors with message "User not found":
ERROR: HTTP 401 - User not found (deepseek/deepseek-v4-pro via OpenRouter)
This is a non-retryable error that prevents the fallback mechanism from working correctly.
Expected Behavior
Rate limit resilience: When the primary model hits rate limits, the gateway should gracefully fall back to a configured fallback model without crashing the entire process.
Auxiliary model validation: The auto provider resolution should validate that the resolved model meets the minimum context window requirement (64K tokens) before accepting it, and fall back to another model if it doesn't.
Non-retryable error handling: HTTP 401/403 errors should be treated as non-retryable and immediately trigger fallback rather than retrying.
Gateway stability: A model API failure should never crash the gateway process. The gateway should remain running and responsive even when all model calls fail.
Actual Behavior
Gateway crashes completely on model failures
No graceful degradation — the Telegram bot goes 100% offline
Auxiliary vision provider auto mode can select models with insufficient context
systemd restart loop consumes resources without resolving the issue
Orphaned gateway processes and stale PID files accumulate
Root Cause Analysis
No circuit breaker: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.
Fallback timing: The fallback_providers configuration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.
Auto provider resolution lacks constraints: The auxiliary.vision.provider: auto setting resolves to any available model without checking if it meets the 64K minimum context window requirement.
Process-level crash on API errors: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.
Suggested Fixes
Implement a circuit breaker: After N consecutive failures with a specific model, stop attempting that model for a cooldown period and route all requests to fallback providers.
Validate auxiliary model selection: When provider: auto resolves a model, validate that it meets the minimum context window requirement. If not, skip it and try the next available model.
Separate gateway health from model health: The gateway process should never crash due to a model API error. Failed model calls should return an error message to the user (e.g., "Model temporarily unavailable") while keeping the gateway running.
Immediate fallback on non-retryable errors: HTTP 4xx errors (401, 403) should skip retries entirely and immediately fall back.
Rate limit backoff: On HTTP 429, implement exponential backoff at the gateway level rather than retrying immediately and crashing.
Official DeepSeek V4 Pro parser: The Hermes parser currently only supports DeepSeek v3/v3.1. Adding a dedicated v4 parser would improve tool calling and response formatting reliability (related to [Feature]: support for deepseek-v4-pro model #14902).
Workaround (Current)
The following configuration changes mitigate the issues:
# Use a stable fallback modelfallback_providers:
- provider: openroutermodel: z-ai/glm-5.1# Force auxiliary vision to use a high-context modelauxiliary:
vision:
provider: openrouter # NOT "auto"model: google/gemini-2.5-flash
Additionally, adding a personal DeepSeek API key at https://openrouter.ai/settings/integrations provides individual rate limits instead of relying on the shared OpenRouter pool.
Environment
Hermes Agent version: latest (as of April 27, 2026)
DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure
Bug Description
When using
deepseek/deepseek-v4-proas the default model via OpenRouter, the Hermes Agent gateway enters a crash loop that renders the Telegram bot (and any other messaging integration) completely unresponsive. This has been a recurring issue over the past 2 days (April 26-27, 2026) with multiple distinct failure modes.Steps to Reproduce
model.default: deepseek/deepseek-v4-prowithmodel.provider: openrouterstatus=75/TEMPFAIL, systemd auto-restarts, and the cycle repeatsFailure Modes Observed
Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)
The OpenRouter upstream provider "Io Net" applies aggressive rate limits to
deepseek/deepseek-v4-pro. When the rate limit is hit:status=75/TEMPFAILLog excerpt:
Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)
When
auxiliary.vision.provideris set toauto, the auto-detect mechanism resolves todeepseek/deepseek-chat-v3-0324— a model with only 16,384 tokens of context. Hermes Agent requires a minimum of 64,000 tokens for auxiliary operations, causing a hard crash:This crash happens before the agent can respond to any Telegram message, making the bot appear completely dead even though the gateway process stays running.
Failure Mode 3: HTTP 401 Authentication Failure (April 27)
On a separate occasion,
deepseek/deepseek-v4-prostarted returning HTTP 401 errors with message "User not found":This is a non-retryable error that prevents the fallback mechanism from working correctly.
Expected Behavior
autoprovider resolution should validate that the resolved model meets the minimum context window requirement (64K tokens) before accepting it, and fall back to another model if it doesn't.Actual Behavior
automode can select models with insufficient contextRoot Cause Analysis
No circuit breaker: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.
Fallback timing: The
fallback_providersconfiguration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.Auto provider resolution lacks constraints: The
auxiliary.vision.provider: autosetting resolves to any available model without checking if it meets the 64K minimum context window requirement.Process-level crash on API errors: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.
Suggested Fixes
Implement a circuit breaker: After N consecutive failures with a specific model, stop attempting that model for a cooldown period and route all requests to fallback providers.
Validate auxiliary model selection: When
provider: autoresolves a model, validate that it meets the minimum context window requirement. If not, skip it and try the next available model.Separate gateway health from model health: The gateway process should never crash due to a model API error. Failed model calls should return an error message to the user (e.g., "Model temporarily unavailable") while keeping the gateway running.
Immediate fallback on non-retryable errors: HTTP 4xx errors (401, 403) should skip retries entirely and immediately fall back.
Rate limit backoff: On HTTP 429, implement exponential backoff at the gateway level rather than retrying immediately and crashing.
Official DeepSeek V4 Pro parser: The Hermes parser currently only supports DeepSeek v3/v3.1. Adding a dedicated v4 parser would improve tool calling and response formatting reliability (related to [Feature]: support for deepseek-v4-pro model #14902).
Workaround (Current)
The following configuration changes mitigate the issues:
Additionally, adding a personal DeepSeek API key at https://openrouter.ai/settings/integrations provides individual rate limits instead of relying on the shared OpenRouter pool.
Environment
Related