fix(xai-oauth): quarantine terminal refresh errors to prevent dead-token replay across sessions#27898
Closed
EloquentBrush0x wants to merge 1 commit into
Closed
Conversation
… not replayed across sessions When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403, i.e. revoked or reused refresh token), _refresh_entry's existing race- recovery path re-syncs from auth.json and returns if another process has already rotated the tokens. If auth.json still holds the same stale token pair, the function fell through to _mark_exhausted — leaving the dead credentials in auth.json. On the next Hermes startup _seed_from_singletons re-seeded the pool from those stale tokens, causing the same failure loop on every session. Fix: after the auth.json re-sync check in the xAI-oauth error handler, detect terminal errors with the new _is_terminal_xai_oauth_refresh_error helper and apply a quarantine: - Clear access_token and refresh_token from providers["xai-oauth"]["tokens"] in auth.json so they are not re-seeded. - Write a last_auth_error entry for hermes doctor / auth status diagnostics. - Remove all loopback_pkce entries from the in-memory pool so the current session stops retrying with the dead credentials. Mirrors the identical quarantine already in place for Nous OAuth (c905562). Closes the parity gap introduced when c905562 added Nous-only terminal error handling without a corresponding xAI-oauth path.
2 tasks
|
Board James triage pass: |
3 tasks
Contributor
|
Merged via #28116. Cherry-picked onto current main with your authorship preserved. Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
refresh_xai_oauth_purereturns HTTP 400/401/403 (revoked token,invalid_grant, or a reused refresh token not caught by the pre-refresh race-recovery sync),_refresh_entryfell through to_mark_exhaustedwithout clearingproviders["xai-oauth"]["tokens"]fromauth.json.On the next Hermes startup
_seed_from_singletonsreadsauth.jsonand re-seeds the pool with the same revoked credentials. The next refresh attempt fails identically, and the cycle repeats on every session restart until the user manually re-authenticates.Root cause
c90556262added a full terminal-error quarantine for Nous OAuth (_is_terminal_nous_refresh_error+_quarantine_nous_oauth_state+ pool entry removal), but the parallel xAI OAuth error-handler path was not updated at the same time.Fix
hermes_cli/auth.py_is_terminal_xai_oauth_refresh_error(exc)— returnsTrueforxai_refresh_failed(HTTP 400/401/403) andxai_auth_missing_refresh_tokenwithrelogin_required=True; transient errors (429, 5xx) carryrelogin_required=Falseand are not matchedagent/credential_pool.py_refresh_entry, add a quarantine path for xAI OAuth:auth.jsonstill holds the same (dead) refresh token — if another process has already rotated it, no action is takenaccess_tokenandrefresh_tokenfromproviders["xai-oauth"]["tokens"]last_auth_errorforhermes doctor/hermes auth statusdiagnosticsloopback_pkce-sourced entries from the in-memory poolNoneTests
test_is_terminal_xai_oauth_refresh_error— unit-tests the predicate for terminal vs transient vs wrong-provider errorstest_xai_oauth_terminal_refresh_clears_auth_json_and_removes_pool_entries— end-to-end: terminal error clears tokens, setslast_auth_error, removes loopback entries, keeps manual entries; secondtry_refresh_current()does not callrefresh_xai_oauth_pureagaintest_xai_oauth_nonterminal_refresh_does_not_quarantine— 429/5xx error leaves auth.json tokens untouched49/49 tests pass.
Checklist
c90556262exactly