fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions#27911
Closed
EloquentBrush0x wants to merge 1 commit into
Conversation
…re not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted is called but auth.json is left with the dead credentials. On the next session, _seed_from_singletons re-reads auth.json and re-seeds the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in c905562 and the xAI quarantine in NousResearch#27898. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton.
|
Board James triage pass: |
3 tasks
teknium1
pushed a commit
that referenced
this pull request
May 18, 2026
…re not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in c905562 and the xAI quarantine in #28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.
Contributor
|
Salvaged via #28118 — your branch was unfortunately very stale (would have reverted ~5000 LOC of unrelated subsystems on cherry-pick), so the fix was re-applied surgically on current main. Your predicate and tests are preserved verbatim, and you remain the commit author. Thanks for catching this parity gap. |
3 tasks
Lillard01
pushed a commit
to Lillard01/hermes-agent
that referenced
this pull request
May 21, 2026
…re not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in c905562 and the xAI quarantine in NousResearch#28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from NousResearch#27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.
Mucky010
pushed a commit
to Mucky010/hermes-agent
that referenced
this pull request
May 24, 2026
…re not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in c905562 and the xAI quarantine in NousResearch#28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from NousResearch#27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…re not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in c905562 and the xAI quarantine in NousResearch#28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from NousResearch#27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403 — token revoked,
invalid_grant, orrefresh_token_reused),_mark_exhaustedis called butauth.jsonis left unchanged. On the next Hermes session,_seed_from_singletonsre-readsauth.jsonand re-seeds the pool with the same revoked token, which triggers the same terminal failure in a loop across restarts.The same gap was fixed for Nous in c905562 and is pending for xAI in #27898. This PR closes the identical gap for
openai-codex.Changes
hermes_cli/auth.py_is_terminal_codex_oauth_refresh_error(exc): returnsTrueforAuthErrorinstances withprovider="openai-codex", a terminal error code (codex_refresh_failed,codex_auth_missing_refresh_token,invalid_grant,invalid_token,refresh_token_reused), andrelogin_required=True. Transient failures (429, 5xx) carryrelogin_required=Falseand are not matched.agent/credential_pool.pyauth.jsonbefore callingrefresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoidrefresh_token_reusedraces when multiple Hermes processes share the sameauth.jsonsingleton.auth.jsonhas a newerrefresh_token, adopt it and return the recovered entry (same pattern as xAI at line 892 and Nous at line 910)._is_terminal_codex_oauth_refresh_erroris true andauth.jsonholds no newer tokens, clearaccess_token/refresh_tokenfromauth.json, writelast_auth_error, and remove alldevice_code-sourced entries from the in-memory pool. Mirrors the Nous quarantine path exactly.tests/agent/test_credential_pool.pytest_is_terminal_codex_oauth_refresh_error— predicate unit test (terminal vs transient, wrong provider, generic exception)test_codex_oauth_terminal_refresh_clears_auth_json_and_removes_pool_entries— integration test: pool seeded from auth.json, terminal failure quarantines auth.json and removes device_code entries while manual entries survive; secondtry_refresh_currentdoes not call the refresh function againtest_codex_oauth_nonterminal_refresh_does_not_quarantine— transient failure leaves auth.json tokens intactTest plan
uv run pytest tests/agent/test_credential_pool.py -x -q→ 49 passeduv run python -c "from hermes_cli.auth import _is_terminal_codex_oauth_refresh_error; print('OK')"→ OK