[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash

# [Bug] Eager fallback on HTTP 402 (`FailoverReason.billing`) does not activate; cron job loops on dead primary until output-length crash

## Summary

When the primary provider returns HTTP 402 (`Insufficient Balance`), the eager-fallback path at `run_agent.py:13503-13527` is not activated. The retry loop also does not advance past `attempt 1/3`. Instead, downstream code paths produce truncated/empty tool-call arguments (`Unrepairable tool_call arguments — replaced with empty object`), the model hallucinates oversized completions, hits `finish_reason: length`, and the cron job dies with `RuntimeError: Response truncated due to output length limit`.

End user impact: a cron job (`Email Mirror — Agent Briefings`, schedule `*/15 * * * *`) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — `fallback_providers` was configured the whole time and was never used.

## Environment

- Hermes Agent **v0.13.0 (2026.5.7)** · upstream commit `44cdf555a83c1d8d605d095442e11efd58089533`
- Python 3.11.15
- OpenAI SDK 2.32.0
- macOS 14 (Darwin 25.4.0)

### Config (`~/.hermes/config.yaml`, relevant excerpt)

```yaml
model:
  provider: deepseek
  model: deepseek-v4-pro
  base_url: ''
providers: {}
fallback_providers:
- provider: openrouter
  model: openai/gpt-oss-120b:free
- provider: openrouter
  model: z-ai/glm-4.5-air:free
```

### Failing job

```json
{
  "id": "b7fdbe31fc65",
  "name": "Email Mirror — Agent Briefings",
  "model": "deepseek-v4-pro",
  "provider": "deepseek",
  "schedule": {"kind": "cron", "expr": "*/15 * * * *"}
}
```

The job pins provider+model explicitly. It runs an agent with multiple tool calls (Gmail MCP, `execute_code`, `memory`).

## Reproduction

1. Configure DeepSeek as primary and at least one OpenRouter free model in `fallback_providers` (as above).
2. Drain the DeepSeek balance to 0 USD (or block the API key).
3. Run any cron job that uses `provider: deepseek` and performs multiple tool calls.
4. Observe: every API call fails with HTTP 402, **no** fallback switch occurs, and the agent eventually crashes with `Response truncated due to output length limit` instead of `Insufficient balance`.

## Observed behavior

`~/.hermes/logs/gateway.error.log` (timestamps trimmed for brevity):

```
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek base_url=https://api.deepseek.com/v1 model=deepseek-v4-pro summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-1_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-3_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-5_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-7_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-9_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance

WARNING run_agent: Tool execute_code returned error (3.41s): {"status": "error", "output": "…stderr: Traceback…"}
WARNING run_agent: Tool execute_code returned error (3.31s): …
WARNING run_agent: Tool execute_code returned error (3.27s): …
WARNING run_agent: Unrepairable tool_call arguments for execute_code — replaced with empty object (was: import json, os, pathlib, re, datetime, subprocess, sys, glob, textwrap, traceba)
WARNING run_agent: Unrepairable tool_call arguments for mcp_gmail_send_email — replaced with empty object (was: {"body":"Quelle: Intelligence-Briefing Daily\nJob-ID: d1824b7c405c\nOutput-Datei)

ERROR cron.scheduler: Job 'Email Mirror — Agent Briefings' failed: RuntimeError: Response truncated due to output length limit
Traceback (most recent call last):
  File "/…/cron/scheduler.py", line 1565, in run_job
    raise RuntimeError(_err_text)
```

Things that are notably **absent** from the log:
- No `attempt 2/3` or `attempt 3/3` line — every attempt is `1/3`
- No `⚠️ Rate limited — switching to fallback provider...` (`_emit_status` at `run_agent.py:13522`)
- No "Credential … (billing) — rotated to pool entry …" log (`_recover_with_credential_pool` at `run_agent.py:7075`)
- No "Fallback skip" warning from `_try_activate_fallback`

So the code never reached the eager-fallback branch on any of the 7 distinct threads observed.

## Expected behavior

Per `error_classifier.py:712-738` (`_classify_402`):

```python
return result_fn(
    FailoverReason.billing,
    retryable=False,
    should_rotate_credential=True,
    should_fallback=True,
)
```

…and `run_agent.py:13503-13527`:

```python
is_rate_limited = classified.reason in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    pool_may_recover = _pool_may_recover_from_rate_limit(
        self._credential_pool, provider=self.provider, base_url=…)
    if not pool_may_recover:
        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
        if self._try_activate_fallback(reason=classified.reason):
            retry_count = 0
            …
            continue
```

The first 402 should trip `FailoverReason.billing`, `_pool_may_recover_from_rate_limit` should return `False` (no credential pool configured for DeepSeek; `pool is None`), and the agent should switch to `openai/gpt-oss-120b:free` on OpenRouter without further retries.

## Hypothesis

I have not pinpointed the exact branch that swallows the 402. Three plausible candidates:

1. **`_pool_may_recover_from_rate_limit` returns True**: For DeepSeek there is no explicit credential pool, but if `load_pool("deepseek")` returns a one-entry pool with the .env key as a single auto-loaded entry, `pool.has_available()` is True and the function reaches `len(pool.entries()) > 1` (returns False) — so this should be safe. Worth confirming whether the loaded pool has 0 or 1 entries in this scenario.

2. **Sub-agent / tool worker loses `_fallback_chain`**: The `ThreadPoolExecutor-N_0` threads in the log indicate parallel tool execution (run_agent.py:10584). If any of those workers spin up a separate `AIAgent` (e.g. via `model_tools.py` `use_model`), the new agent may not inherit `_fallback_chain` even though `delegate_tool.py:1067` does. A grep across tool modules for `AIAgent(` not passing `fallback_model=` would catch this.

3. **`asyncio_2` thread is the gateway/cron entrypoint and the 402 surfaces before the retry loop catches it**: The 402 is raised once and propagates out of `run_conversation` before retry/fallback can run, e.g. during a streaming first-token call where the non-stream retry loop is not yet active. If the API call is in `chat_completions` non-stream mode, retries should be in `run_agent.py:13360+`; please confirm which code path the cron scheduler exercises.

Whichever branch is responsible, the user-visible failure mode is the same: a configured fallback chain stays unused, retry counter never increments past 1, and the job dies with a confusing `Response truncated` error that doesn't mention billing.

## Suggested fix directions

- Tighten `_pool_may_recover_from_rate_limit` so that for `FailoverReason.billing` it always returns False (rotating credentials cannot recover an exhausted account-level balance, even with a 2-entry pool — both keys hit the same account).
- Audit every `AIAgent(` constructor call outside `run_agent.py` to ensure `fallback_model=` is always plumbed through (similar to the recent fix in `delegate_tool.py:1102`).
- Surface a clearer terminal error: when an agent dies due to an upstream 402 with no successful API call, the cron `last_error` should say "billing exhausted on `<provider>`, fallback chain rejected with `<reason>`" instead of `Response truncated due to output length limit`. The current message looks like a model bug, not an account bug.

## Related issues

- #21165 — "401 authentication errors do not trigger fallback provider in auxiliary_client.py" (same shape, different status code)
- #19411 — "Gateway fallback provider keeps primary model instead of fallback model" (related)
- #13887 (closed) — "Auxiliary auto fallback fails on OpenRouter 403 credit/key-limit errors" (closed but same pattern)
- #5220 (closed) — "Provider-side HTTP 402 can kill entire gateway service" (closed but the failure-mode cousin)
- #11737 — "Multi-provider credential pools for cross-provider failover and rotation" (related feature)

## Workaround in use

Pre-run balance check via LaunchAgent that polls `https://api.deepseek.com/user/balance` every 6h and sends a Telegram warning when balance drops below a configured threshold. This sidesteps the bug but does not fix it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash #23138

[Bug] Eager fallback on HTTP 402 (`FailoverReason.billing`) does not activate; cron job loops on dead primary until output-length crash

Summary

Environment

Config (`~/.hermes/config.yaml`, relevant excerpt)

Failing job

Reproduction

Observed behavior

Expected behavior

Hypothesis

Suggested fix directions

Related issues

Workaround in use

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash #23138

Description

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash

Summary

Environment

Config (~/.hermes/config.yaml, relevant excerpt)

Failing job

Reproduction

Observed behavior

Expected behavior

Hypothesis

Suggested fix directions

Related issues

Workaround in use

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] Eager fallback on HTTP 402 (`FailoverReason.billing`) does not activate; cron job loops on dead primary until output-length crash

Config (`~/.hermes/config.yaml`, relevant excerpt)