fix(xai-oauth): quarantine terminal refresh errors to prevent dead-token replay across sessions by EloquentBrush0x · Pull Request #27898 · NousResearch/hermes-agent

EloquentBrush0x · 2026-05-18T09:30:28Z

Problem

When refresh_xai_oauth_pure returns HTTP 400/401/403 (revoked token, invalid_grant, or a reused refresh token not caught by the pre-refresh race-recovery sync), _refresh_entry fell through to _mark_exhausted without clearing providers["xai-oauth"]["tokens"] from auth.json.

On the next Hermes startup _seed_from_singletons reads auth.json and re-seeds the pool with the same revoked credentials. The next refresh attempt fails identically, and the cycle repeats on every session restart until the user manually re-authenticates.

Root cause

c90556262 added a full terminal-error quarantine for Nous OAuth (_is_terminal_nous_refresh_error + _quarantine_nous_oauth_state + pool entry removal), but the parallel xAI OAuth error-handler path was not updated at the same time.

Fix

hermes_cli/auth.py

Add _is_terminal_xai_oauth_refresh_error(exc) — returns True for xai_refresh_failed (HTTP 400/401/403) and xai_auth_missing_refresh_token with relogin_required=True; transient errors (429, 5xx) carry relogin_required=False and are not matched

agent/credential_pool.py

After the existing auth.json race-recovery block in _refresh_entry, add a quarantine path for xAI OAuth:
- Re-checks that auth.json still holds the same (dead) refresh token — if another process has already rotated it, no action is taken
- Clears access_token and refresh_token from providers["xai-oauth"]["tokens"]
- Writes last_auth_error for hermes doctor / hermes auth status diagnostics
- Removes all loopback_pkce-sourced entries from the in-memory pool
- Persists the updated pool; returns None

Tests

test_is_terminal_xai_oauth_refresh_error — unit-tests the predicate for terminal vs transient vs wrong-provider errors
test_xai_oauth_terminal_refresh_clears_auth_json_and_removes_pool_entries — end-to-end: terminal error clears tokens, sets last_auth_error, removes loopback entries, keeps manual entries; second try_refresh_current() does not call refresh_xai_oauth_pure again
test_xai_oauth_nonterminal_refresh_does_not_quarantine — 429/5xx error leaves auth.json tokens untouched

49/49 tests pass.

Checklist

Mirrors the Nous quarantine pattern from c90556262 exactly
Does not affect the existing race-recovery path (another process rotated tokens → adopt and return)
Does not quarantine on transient errors (rate limits, server errors)
Manual/API-key pool entries survive the quarantine
All existing credential pool tests pass

… not replayed across sessions When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403, i.e. revoked or reused refresh token), _refresh_entry's existing race- recovery path re-syncs from auth.json and returns if another process has already rotated the tokens. If auth.json still holds the same stale token pair, the function fell through to _mark_exhausted — leaving the dead credentials in auth.json. On the next Hermes startup _seed_from_singletons re-seeded the pool from those stale tokens, causing the same failure loop on every session. Fix: after the auth.json re-sync check in the xAI-oauth error handler, detect terminal errors with the new _is_terminal_xai_oauth_refresh_error helper and apply a quarantine: - Clear access_token and refresh_token from providers["xai-oauth"]["tokens"] in auth.json so they are not re-seeded. - Write a last_auth_error entry for hermes doctor / auth status diagnostics. - Remove all loopback_pkce entries from the in-memory pool so the current session stops retrying with the dead credentials. Mirrors the identical quarantine already in place for Nous OAuth (c905562). Closes the parity gap introduced when c905562 added Nous-only terminal error handling without a corresponding xAI-oauth path.

BoardJames-Bot · 2026-05-18T10:22:45Z

Board James triage pass: check-attribution and e2e are green. The only red check is test, and the logs show cancellation late in the suite (~95% progress) rather than a branch-local assertion/error. I opened #27931 for the shared main-line CI drift found while triaging this batch; after that lands, please rebase/rerun checks.

teknium1 · 2026-05-18T17:28:14Z

Merged via #28116. Cherry-picked onto current main with your authorship preserved. Thanks!

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder area/auth Authentication, OAuth, credential pools provider/xai xAI (Grok) P3 Low — cosmetic, nice to have labels May 18, 2026

EloquentBrush0x mentioned this pull request May 18, 2026

fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions #27911

Closed

2 tasks

BoardJames-Bot mentioned this pull request May 18, 2026

fix: unblock current PR CI drift #27931

Open

EloquentBrush0x mentioned this pull request May 18, 2026

fix(minimax-oauth): quarantine dead tokens on terminal refresh failure #28003

Closed

3 tasks

teknium1 mentioned this pull request May 18, 2026

fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions #28116

Merged

teknium1 closed this in #28116 May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(xai-oauth): quarantine terminal refresh errors to prevent dead-token replay across sessions#27898

fix(xai-oauth): quarantine terminal refresh errors to prevent dead-token replay across sessions#27898
EloquentBrush0x wants to merge 1 commit into
NousResearch:mainfrom
EloquentBrush0x:fix/xai-oauth-terminal-error-quarantine

EloquentBrush0x commented May 18, 2026

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

teknium1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

EloquentBrush0x commented May 18, 2026

Problem

Root cause

Fix

Tests

Checklist

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

teknium1 commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants