[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash
Summary
When the primary provider returns HTTP 402 (Insufficient Balance), the eager-fallback path at run_agent.py:13503-13527 is not activated. The retry loop also does not advance past attempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hits finish_reason: length, and the cron job dies with RuntimeError: Response truncated due to output length limit.
End user impact: a cron job (Email Mirror — Agent Briefings, schedule */15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — fallback_providers was configured the whole time and was never used.
Environment
- Hermes Agent v0.13.0 (2026.5.7) · upstream commit
44cdf555a83c1d8d605d095442e11efd58089533
- Python 3.11.15
- OpenAI SDK 2.32.0
- macOS 14 (Darwin 25.4.0)
Config (~/.hermes/config.yaml, relevant excerpt)
model:
provider: deepseek
model: deepseek-v4-pro
base_url: ''
providers: {}
fallback_providers:
- provider: openrouter
model: openai/gpt-oss-120b:free
- provider: openrouter
model: z-ai/glm-4.5-air:free
Failing job
{
"id": "b7fdbe31fc65",
"name": "Email Mirror — Agent Briefings",
"model": "deepseek-v4-pro",
"provider": "deepseek",
"schedule": {"kind": "cron", "expr": "*/15 * * * *"}
}
The job pins provider+model explicitly. It runs an agent with multiple tool calls (Gmail MCP, execute_code, memory).
Reproduction
- Configure DeepSeek as primary and at least one OpenRouter free model in
fallback_providers (as above).
- Drain the DeepSeek balance to 0 USD (or block the API key).
- Run any cron job that uses
provider: deepseek and performs multiple tool calls.
- Observe: every API call fails with HTTP 402, no fallback switch occurs, and the agent eventually crashes with
Response truncated due to output length limit instead of Insufficient balance.
Observed behavior
~/.hermes/logs/gateway.error.log (timestamps trimmed for brevity):
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek base_url=https://api.deepseek.com/v1 model=deepseek-v4-pro summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-1_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-3_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-5_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-7_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-9_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: Tool execute_code returned error (3.41s): {"status": "error", "output": "…stderr: Traceback…"}
WARNING run_agent: Tool execute_code returned error (3.31s): …
WARNING run_agent: Tool execute_code returned error (3.27s): …
WARNING run_agent: Unrepairable tool_call arguments for execute_code — replaced with empty object (was: import json, os, pathlib, re, datetime, subprocess, sys, glob, textwrap, traceba)
WARNING run_agent: Unrepairable tool_call arguments for mcp_gmail_send_email — replaced with empty object (was: {"body":"Quelle: Intelligence-Briefing Daily\nJob-ID: d1824b7c405c\nOutput-Datei)
ERROR cron.scheduler: Job 'Email Mirror — Agent Briefings' failed: RuntimeError: Response truncated due to output length limit
Traceback (most recent call last):
File "/…/cron/scheduler.py", line 1565, in run_job
raise RuntimeError(_err_text)
Things that are notably absent from the log:
- No
attempt 2/3 or attempt 3/3 line — every attempt is 1/3
- No
⚠️ Rate limited — switching to fallback provider... (_emit_status at run_agent.py:13522)
- No "Credential … (billing) — rotated to pool entry …" log (
_recover_with_credential_pool at run_agent.py:7075)
- No "Fallback skip" warning from
_try_activate_fallback
So the code never reached the eager-fallback branch on any of the 7 distinct threads observed.
Expected behavior
Per error_classifier.py:712-738 (_classify_402):
return result_fn(
FailoverReason.billing,
retryable=False,
should_rotate_credential=True,
should_fallback=True,
)
…and run_agent.py:13503-13527:
is_rate_limited = classified.reason in (
FailoverReason.rate_limit,
FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
pool_may_recover = _pool_may_recover_from_rate_limit(
self._credential_pool, provider=self.provider, base_url=…)
if not pool_may_recover:
self._emit_status("⚠️ Rate limited — switching to fallback provider...")
if self._try_activate_fallback(reason=classified.reason):
retry_count = 0
…
continue
The first 402 should trip FailoverReason.billing, _pool_may_recover_from_rate_limit should return False (no credential pool configured for DeepSeek; pool is None), and the agent should switch to openai/gpt-oss-120b:free on OpenRouter without further retries.
Hypothesis
I have not pinpointed the exact branch that swallows the 402. Three plausible candidates:
-
_pool_may_recover_from_rate_limit returns True: For DeepSeek there is no explicit credential pool, but if load_pool("deepseek") returns a one-entry pool with the .env key as a single auto-loaded entry, pool.has_available() is True and the function reaches len(pool.entries()) > 1 (returns False) — so this should be safe. Worth confirming whether the loaded pool has 0 or 1 entries in this scenario.
-
Sub-agent / tool worker loses _fallback_chain: The ThreadPoolExecutor-N_0 threads in the log indicate parallel tool execution (run_agent.py:10584). If any of those workers spin up a separate AIAgent (e.g. via model_tools.py use_model), the new agent may not inherit _fallback_chain even though delegate_tool.py:1067 does. A grep across tool modules for AIAgent( not passing fallback_model= would catch this.
-
asyncio_2 thread is the gateway/cron entrypoint and the 402 surfaces before the retry loop catches it: The 402 is raised once and propagates out of run_conversation before retry/fallback can run, e.g. during a streaming first-token call where the non-stream retry loop is not yet active. If the API call is in chat_completions non-stream mode, retries should be in run_agent.py:13360+; please confirm which code path the cron scheduler exercises.
Whichever branch is responsible, the user-visible failure mode is the same: a configured fallback chain stays unused, retry counter never increments past 1, and the job dies with a confusing Response truncated error that doesn't mention billing.
Suggested fix directions
- Tighten
_pool_may_recover_from_rate_limit so that for FailoverReason.billing it always returns False (rotating credentials cannot recover an exhausted account-level balance, even with a 2-entry pool — both keys hit the same account).
- Audit every
AIAgent( constructor call outside run_agent.py to ensure fallback_model= is always plumbed through (similar to the recent fix in delegate_tool.py:1102).
- Surface a clearer terminal error: when an agent dies due to an upstream 402 with no successful API call, the cron
last_error should say "billing exhausted on <provider>, fallback chain rejected with <reason>" instead of Response truncated due to output length limit. The current message looks like a model bug, not an account bug.
Related issues
Workaround in use
Pre-run balance check via LaunchAgent that polls https://api.deepseek.com/user/balance every 6h and sends a Telegram warning when balance drops below a configured threshold. This sidesteps the bug but does not fix it.
[Bug] Eager fallback on HTTP 402 (
FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crashSummary
When the primary provider returns HTTP 402 (
Insufficient Balance), the eager-fallback path atrun_agent.py:13503-13527is not activated. The retry loop also does not advance pastattempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hitsfinish_reason: length, and the cron job dies withRuntimeError: Response truncated due to output length limit.End user impact: a cron job (
Email Mirror — Agent Briefings, schedule*/15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up —fallback_providerswas configured the whole time and was never used.Environment
44cdf555a83c1d8d605d095442e11efd58089533Config (
~/.hermes/config.yaml, relevant excerpt)Failing job
{ "id": "b7fdbe31fc65", "name": "Email Mirror — Agent Briefings", "model": "deepseek-v4-pro", "provider": "deepseek", "schedule": {"kind": "cron", "expr": "*/15 * * * *"} }The job pins provider+model explicitly. It runs an agent with multiple tool calls (Gmail MCP,
execute_code,memory).Reproduction
fallback_providers(as above).provider: deepseekand performs multiple tool calls.Response truncated due to output length limitinstead ofInsufficient balance.Observed behavior
~/.hermes/logs/gateway.error.log(timestamps trimmed for brevity):Things that are notably absent from the log:
attempt 2/3orattempt 3/3line — every attempt is1/3⚠️ Rate limited — switching to fallback provider...(_emit_statusatrun_agent.py:13522)_recover_with_credential_poolatrun_agent.py:7075)_try_activate_fallbackSo the code never reached the eager-fallback branch on any of the 7 distinct threads observed.
Expected behavior
Per
error_classifier.py:712-738(_classify_402):…and
run_agent.py:13503-13527:The first 402 should trip
FailoverReason.billing,_pool_may_recover_from_rate_limitshould returnFalse(no credential pool configured for DeepSeek;pool is None), and the agent should switch toopenai/gpt-oss-120b:freeon OpenRouter without further retries.Hypothesis
I have not pinpointed the exact branch that swallows the 402. Three plausible candidates:
_pool_may_recover_from_rate_limitreturns True: For DeepSeek there is no explicit credential pool, but ifload_pool("deepseek")returns a one-entry pool with the .env key as a single auto-loaded entry,pool.has_available()is True and the function reacheslen(pool.entries()) > 1(returns False) — so this should be safe. Worth confirming whether the loaded pool has 0 or 1 entries in this scenario.Sub-agent / tool worker loses
_fallback_chain: TheThreadPoolExecutor-N_0threads in the log indicate parallel tool execution (run_agent.py:10584). If any of those workers spin up a separateAIAgent(e.g. viamodel_tools.pyuse_model), the new agent may not inherit_fallback_chaineven thoughdelegate_tool.py:1067does. A grep across tool modules forAIAgent(not passingfallback_model=would catch this.asyncio_2thread is the gateway/cron entrypoint and the 402 surfaces before the retry loop catches it: The 402 is raised once and propagates out ofrun_conversationbefore retry/fallback can run, e.g. during a streaming first-token call where the non-stream retry loop is not yet active. If the API call is inchat_completionsnon-stream mode, retries should be inrun_agent.py:13360+; please confirm which code path the cron scheduler exercises.Whichever branch is responsible, the user-visible failure mode is the same: a configured fallback chain stays unused, retry counter never increments past 1, and the job dies with a confusing
Response truncatederror that doesn't mention billing.Suggested fix directions
_pool_may_recover_from_rate_limitso that forFailoverReason.billingit always returns False (rotating credentials cannot recover an exhausted account-level balance, even with a 2-entry pool — both keys hit the same account).AIAgent(constructor call outsiderun_agent.pyto ensurefallback_model=is always plumbed through (similar to the recent fix indelegate_tool.py:1102).last_errorshould say "billing exhausted on<provider>, fallback chain rejected with<reason>" instead ofResponse truncated due to output length limit. The current message looks like a model bug, not an account bug.Related issues
Workaround in use
Pre-run balance check via LaunchAgent that polls
https://api.deepseek.com/user/balanceevery 6h and sends a Telegram warning when balance drops below a configured threshold. This sidesteps the bug but does not fix it.