Skip to content

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash #23138

@johnny6699g-jpg

Description

@johnny6699g-jpg

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash

Summary

When the primary provider returns HTTP 402 (Insufficient Balance), the eager-fallback path at run_agent.py:13503-13527 is not activated. The retry loop also does not advance past attempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hits finish_reason: length, and the cron job dies with RuntimeError: Response truncated due to output length limit.

End user impact: a cron job (Email Mirror — Agent Briefings, schedule */15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — fallback_providers was configured the whole time and was never used.

Environment

  • Hermes Agent v0.13.0 (2026.5.7) · upstream commit 44cdf555a83c1d8d605d095442e11efd58089533
  • Python 3.11.15
  • OpenAI SDK 2.32.0
  • macOS 14 (Darwin 25.4.0)

Config (~/.hermes/config.yaml, relevant excerpt)

model:
  provider: deepseek
  model: deepseek-v4-pro
  base_url: ''
providers: {}
fallback_providers:
- provider: openrouter
  model: openai/gpt-oss-120b:free
- provider: openrouter
  model: z-ai/glm-4.5-air:free

Failing job

{
  "id": "b7fdbe31fc65",
  "name": "Email Mirror — Agent Briefings",
  "model": "deepseek-v4-pro",
  "provider": "deepseek",
  "schedule": {"kind": "cron", "expr": "*/15 * * * *"}
}

The job pins provider+model explicitly. It runs an agent with multiple tool calls (Gmail MCP, execute_code, memory).

Reproduction

  1. Configure DeepSeek as primary and at least one OpenRouter free model in fallback_providers (as above).
  2. Drain the DeepSeek balance to 0 USD (or block the API key).
  3. Run any cron job that uses provider: deepseek and performs multiple tool calls.
  4. Observe: every API call fails with HTTP 402, no fallback switch occurs, and the agent eventually crashes with Response truncated due to output length limit instead of Insufficient balance.

Observed behavior

~/.hermes/logs/gateway.error.log (timestamps trimmed for brevity):

WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek base_url=https://api.deepseek.com/v1 model=deepseek-v4-pro summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-1_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-3_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-5_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-7_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-9_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance

WARNING run_agent: Tool execute_code returned error (3.41s): {"status": "error", "output": "…stderr: Traceback…"}
WARNING run_agent: Tool execute_code returned error (3.31s): …
WARNING run_agent: Tool execute_code returned error (3.27s): …
WARNING run_agent: Unrepairable tool_call arguments for execute_code — replaced with empty object (was: import json, os, pathlib, re, datetime, subprocess, sys, glob, textwrap, traceba)
WARNING run_agent: Unrepairable tool_call arguments for mcp_gmail_send_email — replaced with empty object (was: {"body":"Quelle: Intelligence-Briefing Daily\nJob-ID: d1824b7c405c\nOutput-Datei)

ERROR cron.scheduler: Job 'Email Mirror — Agent Briefings' failed: RuntimeError: Response truncated due to output length limit
Traceback (most recent call last):
  File "/…/cron/scheduler.py", line 1565, in run_job
    raise RuntimeError(_err_text)

Things that are notably absent from the log:

  • No attempt 2/3 or attempt 3/3 line — every attempt is 1/3
  • No ⚠️ Rate limited — switching to fallback provider... (_emit_status at run_agent.py:13522)
  • No "Credential … (billing) — rotated to pool entry …" log (_recover_with_credential_pool at run_agent.py:7075)
  • No "Fallback skip" warning from _try_activate_fallback

So the code never reached the eager-fallback branch on any of the 7 distinct threads observed.

Expected behavior

Per error_classifier.py:712-738 (_classify_402):

return result_fn(
    FailoverReason.billing,
    retryable=False,
    should_rotate_credential=True,
    should_fallback=True,
)

…and run_agent.py:13503-13527:

is_rate_limited = classified.reason in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    pool_may_recover = _pool_may_recover_from_rate_limit(
        self._credential_pool, provider=self.provider, base_url=…)
    if not pool_may_recover:
        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
        if self._try_activate_fallback(reason=classified.reason):
            retry_count = 0continue

The first 402 should trip FailoverReason.billing, _pool_may_recover_from_rate_limit should return False (no credential pool configured for DeepSeek; pool is None), and the agent should switch to openai/gpt-oss-120b:free on OpenRouter without further retries.

Hypothesis

I have not pinpointed the exact branch that swallows the 402. Three plausible candidates:

  1. _pool_may_recover_from_rate_limit returns True: For DeepSeek there is no explicit credential pool, but if load_pool("deepseek") returns a one-entry pool with the .env key as a single auto-loaded entry, pool.has_available() is True and the function reaches len(pool.entries()) > 1 (returns False) — so this should be safe. Worth confirming whether the loaded pool has 0 or 1 entries in this scenario.

  2. Sub-agent / tool worker loses _fallback_chain: The ThreadPoolExecutor-N_0 threads in the log indicate parallel tool execution (run_agent.py:10584). If any of those workers spin up a separate AIAgent (e.g. via model_tools.py use_model), the new agent may not inherit _fallback_chain even though delegate_tool.py:1067 does. A grep across tool modules for AIAgent( not passing fallback_model= would catch this.

  3. asyncio_2 thread is the gateway/cron entrypoint and the 402 surfaces before the retry loop catches it: The 402 is raised once and propagates out of run_conversation before retry/fallback can run, e.g. during a streaming first-token call where the non-stream retry loop is not yet active. If the API call is in chat_completions non-stream mode, retries should be in run_agent.py:13360+; please confirm which code path the cron scheduler exercises.

Whichever branch is responsible, the user-visible failure mode is the same: a configured fallback chain stays unused, retry counter never increments past 1, and the job dies with a confusing Response truncated error that doesn't mention billing.

Suggested fix directions

  • Tighten _pool_may_recover_from_rate_limit so that for FailoverReason.billing it always returns False (rotating credentials cannot recover an exhausted account-level balance, even with a 2-entry pool — both keys hit the same account).
  • Audit every AIAgent( constructor call outside run_agent.py to ensure fallback_model= is always plumbed through (similar to the recent fix in delegate_tool.py:1102).
  • Surface a clearer terminal error: when an agent dies due to an upstream 402 with no successful API call, the cron last_error should say "billing exhausted on <provider>, fallback chain rejected with <reason>" instead of Response truncated due to output length limit. The current message looks like a model bug, not an account bug.

Related issues

Workaround in use

Pre-run balance check via LaunchAgent that polls https://api.deepseek.com/user/balance every 6h and sends a Telegram warning when balance drops below a configured threshold. This sidesteps the bug but does not fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/cronCron scheduler and job managementsweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions