Skip to content

fix: 24h cooldown for 401/403 auth failures + user notification#10058

Closed
teknium1 wants to merge 1 commit into
mainfrom
hermes/hermes-050c727e
Closed

fix: 24h cooldown for 401/403 auth failures + user notification#10058
teknium1 wants to merge 1 commit into
mainfrom
hermes/hermes-050c727e

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Addresses a user-reported UX issue where invalid credentials (Copilot 401, Codex 429) caused silent fallback with no recovery path visible to the user.

Two changes:

1. 401/403 auth failures now have a 24-hour cooldown instead of 1 hour

Previously, credentials exhausted due to 401 (invalid token) or 403 (forbidden) used the same 1-hour cooldown as 429 rate limits. The system would retry the same dead token every hour, fail immediately, and re-exhaust — an infinite cycle. Now:

Error code Cooldown Rationale
429 (rate limit) 1 hour Transient, likely resets soon
401/403 (auth failure) 24 hours Token permanently invalid until user re-authenticates
Provider reset_at Exact time Always overrides defaults

2. User-facing notification when all credentials are rejected

When all pool credentials for a provider get 401'd and the system falls back, it now emits an actionable message via _emit_status():

🔐 All copilot credentials rejected (HTTP 401). Run `hermes auth reset copilot` to clear, or `hermes model` to re-authenticate.

This propagates to both CLI (force-printed regardless of quiet mode) and gateway (Telegram, Discord, etc. via status_callback).

Files changed

  • agent/credential_pool.py — new EXHAUSTED_TTL_AUTH_SECONDS constant, updated _exhausted_ttl()
  • run_agent.py — notification in _recover_with_credential_pool() auth path
  • tests/agent/test_credential_pool.py — 3 new TTL tests (401 stays at 1h, 401 resets at 24h, 403 stays at 1h)
  • tests/agent/test_credential_pool_routing.py — 2 new notification tests (emits on pool exhaustion, silent on rotation)

Test results

All 5 new tests pass. 0 new failures introduced (6 pre-existing failures confirmed on unmodified main).

Previously, credentials exhausted due to 401 (invalid token) or 403
(forbidden) used the same 1-hour cooldown as 429 rate limits. This meant
the system would retry an invalid token every hour forever — burning API
calls and confusing users who had no idea why their primary provider
wasn't being used.

Changes:
- credential_pool: EXHAUSTED_TTL_AUTH_SECONDS = 24h for 401/403 errors
  (rate limits keep 1h cooldown, provider reset_at timestamps still
  override both)
- run_agent: emit actionable status message via _emit_status() when all
  pool credentials are rejected — tells the user to run
  `hermes auth reset <provider>` or `hermes model` to re-authenticate.
  Message propagates to both CLI (force-printed) and gateway (Telegram,
  Discord, etc.)
- Tests for all three TTL cases (401 stays exhausted at 1h, 401 resets
  at 24h, 403 stays exhausted at 1h) and auth exhaustion notification
  (emits when pool exhausted, silent when rotation succeeds)

Addresses user report: Copilot 401 + Codex 429 caused silent fallback
with no recovery path visible to the user.
@teknium1

Copy link
Copy Markdown
Contributor Author

Closing — keeping the existing 1h cooldown as-is.

@teknium1 teknium1 closed this Apr 15, 2026
aangelinsf pushed a commit to aangelinsf/hermes-agent that referenced this pull request Apr 15, 2026
…th-exhausted

When all credentials for a provider are exhausted due to 401/403 failures,
emit a plain-language _emit_status() notification so gateway users (Telegram,
Discord, etc.) know their primary AI has become unavailable and what to do.

Same-provider key rotation remains silent — the message only fires when
rotation itself fails and Hermes is forced to fall back.

This is distinct from the cooldown duration change in PR NousResearch#10058 (which was
closed). The notification half of that fix stands on its own: the configured
fallback_model path already calls _emit_status() on provider switch, so this
makes the credential pool exhaustion path consistent with that behavior.

Closes NousResearch#10476
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant