Skip to content

[Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do #20465

@ddoKx

Description

@ddoKx

[Bug] Interactive CLI session does not auto-fallback on Codex 429 usage_limit_reached, while cron jobs with the same fallback chain do

Summary

When the primary provider is openai-codex/gpt-5.5 and Codex returns HTTP 429 usage_limit_reached (the periodic 5-hour quota wall, not billing), an interactive CLI session exhausts its 3 retries against Codex and surfaces API call failed after 3 retries: HTTP 429: The usage limit has been reached to the user. The configured fallback_providers chain is never activated in this path, even though hermes fallback list confirms it is loaded.

The exact same fallback_providers chain does activate successfully for cron jobs running concurrently against the same Codex quota.

Environment

  • Hermes Agent: tip of main at commit 0d41e94ca (feat(i18n): add French (fr) locale support, 2026-05-05) — also reproduced on the v0.12.0 release (87b113c2e).
  • Install: WSL2 Ubuntu 24.04 on Windows 11.
  • Primary: openai-codex / gpt-5.5 (ChatGPT Plus subscription, OAuth via hermes auth).
  • Fallback chain (verified via hermes fallback list): one entry, tested with three configurations — all show identical broken behavior in interactive sessions:
    1. provider: custom + base_url: http://host.docker.internal:11434/v1 + api_key: ollama + model: qwen3.6:35b-a3b-q4_K_M (local Ollama on Windows host)
    2. provider: ollama-local (named provider defined in providers: block) + model: qwen3.6:35b-a3b-q4_K_M
    3. provider: openrouter + model: openai/gpt-5.5

Reproduction

  1. Configure model.provider: openai-codex / model.default: gpt-5.5 and any working fallback_providers chain (verified loaded by hermes fallback list).
  2. Use Hermes interactively until the Codex 5-hour quota window is hit. Error returned by Codex:
    {"error": {"type": "usage_limit_reached", "message": "The usage limit has been reached", "plan_type": "plus", "resets_at": <epoch>, "resets_in_seconds": <seconds>}}
  3. Send another message in the interactive chat. Hermes retries 3 times against Codex, then surfaces:
    API call failed after 3 retries: HTTP 429: The usage limit has been reached
    
  4. The fallback model is never tried; provider=openai-codex is reported in the error log.

Evidence — divergence between cron and interactive paths

Same chain, same Codex 429, same wall-clock window:

# Cron jobs — fallback activates correctly
2026-05-06 00:30:31 INFO [cron_8bc7a04c68f9_20260506_003030] root: Fallback activated: gpt-5.5 → nemotron-3-nano:30b (custom)
2026-05-06 01:00:51 INFO [cron_8bc7a04c68f9_20260506_010049] root: Fallback activated: gpt-5.5 → openai/gpt-5.5 (openrouter)
2026-05-06 01:30:50 INFO [cron_8bc7a04c68f9_20260506_013049] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)
2026-05-06 01:45:44 INFO [cron_8bc7a04c68f9_20260506_014544] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)

# Interactive sessions — fallback never fires (same chain, same time window)
2026-05-06 01:42:12 ERROR [20260506_014139_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:44:27 ERROR [20260506_014356_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:48:09 ERROR [20260506_014753_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
[…repeats every few minutes for the duration of the quota window…]

provider=openai-codex in every interactive failure indicates _try_activate_fallback() was either not called or returned False without leaving any Fallback activated / Fallback to <provider> failed: provider not configured / Failed to activate fallback log lines (the three observable outcomes I'd expect).

The auxiliary client did log a successful fallback for one title_generation call earlier in the same session window:

2026-05-06 00:28:14 INFO agent.auxiliary_client: Auxiliary title_generation: rate limit on auto (Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}), trying fallback

…so aux-side fallback (post-PR #20294) appears to work; only the main-agent interactive retry-then-fallback handoff at run_agent.py around line 12879 does not produce any "trying fallback" / "Fallback activated" / "Failed to activate fallback" output for this 429 type.

What I tried

  • Verified hermes fallback list shows the chain loaded.
  • Migrated config from legacy fallback_model: (single dict) to fallback_providers: (list) — no change.
  • Tried three different fallback provider configs (above) — no change.
  • Restarted Hermes from a fresh shell (no /continue) — no change.
  • Ran hermes update -y to pull current main (0d41e94ca) — no change.
  • hermes -z "say pong" -m qwen3.6:35b-a3b-q4_K_M --provider ollama-local returns pong (confirms the fallback target itself is reachable and configured correctly).

Possibly related open issues

This report appears distinct: the main agent in interactive mode never produces any fallback-attempt log line for Codex usage_limit_reached 429s, while the cron path under the same agent code on the same chain consistently does.

Suggested diagnostic

A debug log line at the entry of _try_activate_fallback() and at every return False site in run_agent.py would let users with this symptom confirm in one repro which branch is failing. If the function isn't being called at all on Codex 429 in interactive mode, the divergence is upstream — likely in how codex_responses API errors propagate out of _run_codex_stream versus how the chat-completions retry loop catches them.

Severity

Effectively breaks the documented use case of "primary subscription provider with local Ollama as cost-free fallback". Users hit a 5-hour wall on every Codex quota cycle with no automatic recovery, despite the fallback chain being correctly configured per hermes fallback list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildercomp/cliCLI entry point, hermes_cli/, setup wizardprovider/openaiOpenAI / Codex Responses APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions