Summary
After openai-codex failed and Hermes activated a cross-provider fallback, the fallback provider's 429 was recorded against the primary openai-codex OAuth credential pool.
This made a valid Codex OAuth session appear rate-limited/exhausted even though the quota/billing error came from the fallback provider.
Related context: #32903 / #32963 fixed the original openai-codex null-output streaming crash. This issue is about the secondary provider-state contamination after fallback activation, not the null-output parser crash itself.
Environment
- Hermes Agent:
v0.14.0, local checkout 2517917de before pulling main
- OpenAI SDK in Hermes venv:
2.24.0
- Primary provider/model:
openai-codex / gpt-5.5
- Primary base URL:
https://chatgpt.com/backend-api/codex
- Fallback provider/model:
zai / glm-5.1
Observed sequence
Sanitized local log sequence:
2026-05-27 14:28:42 CST Fallback activated: gpt-5.5 -> glm-5.1 (zai)
2026-05-27 14:28:43 CST zai returned 429 code=1113 "Insufficient balance or no resource package"
2026-05-27 14:28:48 CST credential pool: marking openai-codex-oauth-2 exhausted
After this, hermes auth list showed the openai-codex OAuth credentials as rate-limited/exhausted with the Z.AI 1113 error. That sent debugging in the wrong direction: it looked like a Codex OAuth/quota problem, while the actual 429 came from the fallback provider.
Resetting the openai-codex auth state cleared the stale exhausted status and the same Codex OAuth credentials worked again.
Expected behavior
When fallback activation switches the active provider from openai-codex to zai, credential-pool recovery should not mutate the stale openai-codex pool based on a zai response.
Either:
- the active credential pool should be reloaded for the fallback provider, or
- the stale primary pool should be cleared before fallback calls continue, and
- recovery code should skip pool mutation if the pool provider does not match the current agent provider.
Local patch shape that fixed the misattribution
I tested a narrow local patch with two guardrails:
-
In fallback activation, after assigning agent.provider = fb_provider, reload or clear agent._credential_pool for the fallback provider.
-
In recover_with_credential_pool(), before calling mark_exhausted_and_rotate(), skip credential-pool recovery when the loaded pool provider differs from the agent's current provider.
The important defensive check is:
if pool_provider and current_provider and pool_provider != current_provider:
return False, has_retried_429
Audit notes
- The change is intentionally narrow: it only realigns the in-memory pool after a provider switch and adds a defensive provider-match check before any pool mutation.
- If the fallback provider has no credential pool, clearing
agent._credential_pool is safer than retaining and mutating the stale primary pool.
- One caveat: if Hermes supports provider aliases for credential pools, this guard should probably use the same provider-name normalization used elsewhere. For the observed
openai-codex -> zai fallback, the mismatch is unambiguous.
Local validation
pytest tests/run_agent/test_provider_fallback.py tests/agent/test_credential_pool_routing.py -q
# 34 passed
hermes auth reset openai-codex
hermes -z "只回复 OK" --provider openai-codex -m gpt-5.5 --ignore-rules
# OK
Summary
After
openai-codexfailed and Hermes activated a cross-provider fallback, the fallback provider's429was recorded against the primaryopenai-codexOAuth credential pool.This made a valid Codex OAuth session appear rate-limited/exhausted even though the quota/billing error came from the fallback provider.
Related context: #32903 / #32963 fixed the original
openai-codexnull-output streaming crash. This issue is about the secondary provider-state contamination after fallback activation, not the null-output parser crash itself.Environment
v0.14.0, local checkout2517917debefore pulling main2.24.0openai-codex/gpt-5.5https://chatgpt.com/backend-api/codexzai/glm-5.1Observed sequence
Sanitized local log sequence:
After this,
hermes auth listshowed theopenai-codexOAuth credentials as rate-limited/exhausted with the Z.AI1113error. That sent debugging in the wrong direction: it looked like a Codex OAuth/quota problem, while the actual429came from the fallback provider.Resetting the
openai-codexauth state cleared the stale exhausted status and the same Codex OAuth credentials worked again.Expected behavior
When fallback activation switches the active provider from
openai-codextozai, credential-pool recovery should not mutate the staleopenai-codexpool based on azairesponse.Either:
Local patch shape that fixed the misattribution
I tested a narrow local patch with two guardrails:
In fallback activation, after assigning
agent.provider = fb_provider, reload or clearagent._credential_poolfor the fallback provider.In
recover_with_credential_pool(), before callingmark_exhausted_and_rotate(), skip credential-pool recovery when the loaded pool provider differs from the agent's current provider.The important defensive check is:
Audit notes
agent._credential_poolis safer than retaining and mutating the stale primary pool.openai-codex->zaifallback, the mismatch is unambiguous.Local validation