[Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do

# [Bug] Interactive CLI session does not auto-fallback on Codex 429 `usage_limit_reached`, while cron jobs with the same fallback chain do

## Summary

When the primary provider is `openai-codex/gpt-5.5` and Codex returns HTTP 429 `usage_limit_reached` (the periodic 5-hour quota wall, not billing), an interactive CLI session exhausts its 3 retries against Codex and surfaces `API call failed after 3 retries: HTTP 429: The usage limit has been reached` to the user. The configured `fallback_providers` chain is **never activated** in this path, even though `hermes fallback list` confirms it is loaded.

The exact same `fallback_providers` chain **does activate successfully** for cron jobs running concurrently against the same Codex quota.

## Environment

- Hermes Agent: tip of `main` at commit `0d41e94ca` (`feat(i18n): add French (fr) locale support`, 2026-05-05) — also reproduced on the v0.12.0 release (`87b113c2e`).
- Install: WSL2 Ubuntu 24.04 on Windows 11.
- Primary: `openai-codex` / `gpt-5.5` (ChatGPT Plus subscription, OAuth via `hermes auth`).
- Fallback chain (verified via `hermes fallback list`): one entry, tested with three configurations — all show identical broken behavior in interactive sessions:
  1. `provider: custom` + `base_url: http://host.docker.internal:11434/v1` + `api_key: ollama` + `model: qwen3.6:35b-a3b-q4_K_M` (local Ollama on Windows host)
  2. `provider: ollama-local` (named provider defined in `providers:` block) + `model: qwen3.6:35b-a3b-q4_K_M`
  3. `provider: openrouter` + `model: openai/gpt-5.5`

## Reproduction

1. Configure `model.provider: openai-codex` / `model.default: gpt-5.5` and any working `fallback_providers` chain (verified loaded by `hermes fallback list`).
2. Use Hermes interactively until the Codex 5-hour quota window is hit. Error returned by Codex:
   ```json
   {"error": {"type": "usage_limit_reached", "message": "The usage limit has been reached", "plan_type": "plus", "resets_at": <epoch>, "resets_in_seconds": <seconds>}}
   ```
3. Send another message in the interactive chat. Hermes retries 3 times against Codex, then surfaces:
   ```
   API call failed after 3 retries: HTTP 429: The usage limit has been reached
   ```
4. The fallback model is never tried; `provider=openai-codex` is reported in the error log.

## Evidence — divergence between cron and interactive paths

Same chain, same Codex 429, same wall-clock window:

```
# Cron jobs — fallback activates correctly
2026-05-06 00:30:31 INFO [cron_8bc7a04c68f9_20260506_003030] root: Fallback activated: gpt-5.5 → nemotron-3-nano:30b (custom)
2026-05-06 01:00:51 INFO [cron_8bc7a04c68f9_20260506_010049] root: Fallback activated: gpt-5.5 → openai/gpt-5.5 (openrouter)
2026-05-06 01:30:50 INFO [cron_8bc7a04c68f9_20260506_013049] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)
2026-05-06 01:45:44 INFO [cron_8bc7a04c68f9_20260506_014544] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)

# Interactive sessions — fallback never fires (same chain, same time window)
2026-05-06 01:42:12 ERROR [20260506_014139_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:44:27 ERROR [20260506_014356_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:48:09 ERROR [20260506_014753_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
[…repeats every few minutes for the duration of the quota window…]
```

`provider=openai-codex` in every interactive failure indicates `_try_activate_fallback()` was either not called or returned False without leaving any `Fallback activated` / `Fallback to <provider> failed: provider not configured` / `Failed to activate fallback` log lines (the three observable outcomes I'd expect).

The auxiliary client did log a successful fallback for one `title_generation` call earlier in the same session window:

```
2026-05-06 00:28:14 INFO agent.auxiliary_client: Auxiliary title_generation: rate limit on auto (Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}), trying fallback
```

…so aux-side fallback (post-PR #20294) appears to work; only the main-agent interactive retry-then-fallback handoff at run_agent.py around line 12879 does not produce any "trying fallback" / "Fallback activated" / "Failed to activate fallback" output for this 429 type.

## What I tried

- Verified `hermes fallback list` shows the chain loaded.
- Migrated config from legacy `fallback_model:` (single dict) to `fallback_providers:` (list) — no change.
- Tried three different fallback provider configs (above) — no change.
- Restarted Hermes from a fresh shell (no `/continue`) — no change.
- Ran `hermes update -y` to pull current `main` (`0d41e94ca`) — no change.
- `hermes -z "say pong" -m qwen3.6:35b-a3b-q4_K_M --provider ollama-local` returns `pong` (confirms the fallback target itself is reachable and configured correctly).

## Possibly related open issues

- #19839 — apply fallback cooldown for all failover reasons (cooldown gating may be related, but doesn't explain "fallback never fires at all").
- #17446 — Fallback announced but never sent (similar symptom shape, different trigger).
- #19411 — Gateway fallback keeps primary model (gateway-specific; this report is CLI/interactive).
- #15714 — Aux compression ignores fallback_providers (separate bug).

This report appears distinct: the *main agent* in *interactive* mode never produces any fallback-attempt log line for Codex `usage_limit_reached` 429s, while the *cron* path under the same agent code on the same chain consistently does.

## Suggested diagnostic

A debug log line at the entry of `_try_activate_fallback()` and at every `return False` site in `run_agent.py` would let users with this symptom confirm in one repro which branch is failing. If the function isn't being called at all on Codex 429 in interactive mode, the divergence is upstream — likely in how `codex_responses` API errors propagate out of `_run_codex_stream` versus how the chat-completions retry loop catches them.

## Severity

Effectively breaks the documented use case of "primary subscription provider with local Ollama as cost-free fallback". Users hit a 5-hour wall on every Codex quota cycle with no automatic recovery, despite the fallback chain being correctly configured per `hermes fallback list`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do #20465

[Bug] Interactive CLI session does not auto-fallback on Codex 429 `usage_limit_reached`, while cron jobs with the same fallback chain do

Summary

Environment

Reproduction

Evidence — divergence between cron and interactive paths

What I tried

Possibly related open issues

Suggested diagnostic

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do #20465

Description

[Bug] Interactive CLI session does not auto-fallback on Codex 429 usage_limit_reached, while cron jobs with the same fallback chain do

Summary

Environment

Reproduction

Evidence — divergence between cron and interactive paths

What I tried

Possibly related open issues

Suggested diagnostic

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] Interactive CLI session does not auto-fallback on Codex 429 `usage_limit_reached`, while cron jobs with the same fallback chain do