Skip to content

Credential pool: token_invalidated / token_revoked should be terminal failures, not 1-hour cooldowns #32849

@markschonfeld

Description

@markschonfeld

Summary

When an OpenAI-Codex OAuth credential is revoked or invalidated upstream, Hermes marks it exhausted with a 1-hour TTL cooldown. After the TTL expires, the broken credential re-enters the rotation pool and fails again — usually on context compression in long sessions, where the failure surfaces as Failed to generate context summary and breaks the session's compaction.

Cooldown semantics are correct for transient errors (429, 5xx, quota throttles). They are wrong for permanent OAuth states like token_invalidated and token_revoked.

Repro

Today (2026-05-26) on a fresh install, observed 7 separate 401 token_invalidated failures from the same revoked Codex OAuth credential between 10:27 and 17:50 UTC:

Failed to generate context summary:
Error code: 401 - {'error': {'message': 'Your authentication token has been invalidated. Please try signing in again.',
                              'type': 'invalid_request_error',
                              'code': 'token_invalidated',
                              'param': None},
                   'status': 401}

Removing the credential manually via hermes auth → option 2 → remove openai-codex #1 silenced the failures temporarily, but a fresh credential under the same label Hermes Agent Codex re-appeared in the pool later (separate re-auth flow possibly, or the cooldown re-rotating from a stale auth.json entry — needs upstream confirmation).

Root cause

In hermes-agent credential pool logic:

EXHAUSTED_TTL_DEFAULT_SECONDS = 60 * 60

A 401 token_invalidated from the model provider takes the same exhausted code path as a 429 rate_limit — both get a 1-hour TTL, after which the credential is re-considered eligible. This means a permanently-revoked OAuth token will keep getting picked back up.

Expected behavior

token_invalidated and token_revoked should transition the credential to a terminal dead state — never re-enter rotation until the credential is explicitly re-added or refreshed by the operator. Other 401 codes (e.g. token_expired if Hermes can refresh) should keep cooldown semantics, but _invalidated / _revoked cannot be auto-recovered.

Suggested fix

Extend the credential pool state machine to include a dead state alongside exhausted:

  • 429 / 503 / network errors → exhausted with TTL cooldown (current behavior, correct)
  • 401 with code == 'token_invalidated' or code == 'token_revoked'dead, no auto-recovery
  • Successful OAuth refresh on a dead credential → transition back to ok
  • dead credentials excluded from pick_for_provider() unconditionally
  • hermes auth UI surfaces dead separately so the operator can see why it's offline

This mirrors the pattern already implemented in some open-source job-queue libraries (e.g., Sidekiq's dead set vs. retry set).

Workaround for affected users

Manual: hermes auth → option 2 → remove the openai-codex credential whose last_status: exhausted and last_error_reason: token_invalidated. Do NOT switch auxiliary.compression.provider to a different model provider as a workaround if that other provider is per-token-billed (e.g., Anthropic API) — at agentic-run query volume the bill gets expensive fast.

Environment

  • Hermes config: model.provider = openai-codex, model.default = gpt-5.5, auxiliary.compression.provider = auto
  • Pool: 3 openai-codex credentials, 1 revoked, 2 healthy
  • Failure surfaces in ~/.hermes/logs/errors.log as Failed to generate context summary: 401 token_invalidated

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/authAuthentication, OAuth, credential poolscomp/agentCore agent loop, run_agent.py, prompt builderprovider/copilotGitHub Copilot (ACP + Chat)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions