Skip to content

fix(minimax-oauth): quarantine dead tokens on terminal refresh failure#28003

Closed
EloquentBrush0x wants to merge 1 commit into
NousResearch:mainfrom
EloquentBrush0x:fix/minimax-oauth-terminal-quarantine
Closed

fix(minimax-oauth): quarantine dead tokens on terminal refresh failure#28003
EloquentBrush0x wants to merge 1 commit into
NousResearch:mainfrom
EloquentBrush0x:fix/minimax-oauth-terminal-quarantine

Conversation

@EloquentBrush0x

Copy link
Copy Markdown
Contributor

Summary

  • resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state without catching its AuthError. On a terminal failure (invalid_grant, refresh_token_reused, invalid_refresh_token), the error propagated but the dead refresh_token remained in auth.json — so every subsequent API call retried the same token via a network round-trip, failing identically each time.
  • Fix: wrap the refresh call; when exc.relogin_required is True and a refresh_token is still present, clear the dead OAuth fields (access_token, refresh_token, expires_*) and write a last_auth_error quarantine marker to auth.json before re-raising. The next call sees no access_token and fails fast with "not_logged_in" — no network retry.
  • Mirrors the established quarantine pattern for Nous (_quarantine_nous_oauth_state), xAI-OAuth (PR fix(xai-oauth): quarantine terminal refresh errors to prevent dead-token replay across sessions #27898), and Codex-OAuth (PR fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions #27911). Persist failure is best-effort (logged at DEBUG; original error still re-raised).

Test plan

  • Unit: mock _refresh_minimax_oauth_state to raise AuthError(relogin_required=True, code="refresh_failed"); assert auth.json loses refresh_token and gains last_auth_error.relogin_required = True.
  • Unit: mock non-terminal AuthError(relogin_required=False); assert auth.json is unchanged.
  • Integration: after simulated terminal failure, second call to resolve_minimax_oauth_runtime_credentials raises "not_logged_in" without a network request.

resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with "not_logged_in" — no network retry — and the user is
prompted to re-authenticate.

Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (PR NousResearch#27898), and Codex-OAuth (PR NousResearch#27911). Persist failure is
best-effort (logged at DEBUG, error still re-raised).
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have area/auth Authentication, OAuth, credential pools provider/minimax MiniMax (Anthropic transport) labels May 18, 2026
@BoardJames-Bot

Copy link
Copy Markdown

BoardJames triage: this also looks shared/systemic rather than branch-local. All non-test checks are green (attribution, common-ancestor, ruff/ty, ruff enforcement, Windows footguns, nix ubuntu/macos, supply-chain, e2e, and both Docker builds). The full Tests / test job reached about 95% progress and then ended with The operation was canceled, which matches the broader full-suite timeout/cancellation wave across unrelated PRs. No branch-local Minimax OAuth failure is visible from CI.

teknium1 pushed a commit that referenced this pull request May 18, 2026
resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with 'not_logged_in' — no network retry — and the user is
prompted to re-authenticate.

Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (#28116), and Codex-OAuth (#28118). Persist failure is
best-effort (logged at DEBUG, error still re-raised).

Salvaged from #28003 by @EloquentBrush0x — contributor's branch was
severely stale (would have reverted ~5000 LOC across azure/kanban/i18n
subsystems); fix re-applied surgically with their pattern preserved and
added two regression tests (terminal-quarantines + transient-does-not-quarantine).
@teknium1

Copy link
Copy Markdown
Contributor

Salvaged via #28119 — same severely-stale-branch situation as #27911, fix re-applied surgically with your pattern preserved + added the two regression tests the verdict noted. You remain the commit author. Thanks for the parity sweep across all four providers.

Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with 'not_logged_in' — no network retry — and the user is
prompted to re-authenticate.

Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (NousResearch#28116), and Codex-OAuth (NousResearch#28118). Persist failure is
best-effort (logged at DEBUG, error still re-raised).

Salvaged from NousResearch#28003 by @EloquentBrush0x — contributor's branch was
severely stale (would have reverted ~5000 LOC across azure/kanban/i18n
subsystems); fix re-applied surgically with their pattern preserved and
added two regression tests (terminal-quarantines + transient-does-not-quarantine).
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with 'not_logged_in' — no network retry — and the user is
prompted to re-authenticate.

Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (NousResearch#28116), and Codex-OAuth (NousResearch#28118). Persist failure is
best-effort (logged at DEBUG, error still re-raised).

Salvaged from NousResearch#28003 by @EloquentBrush0x — contributor's branch was
severely stale (would have reverted ~5000 LOC across azure/kanban/i18n
subsystems); fix re-applied surgically with their pattern preserved and
added two regression tests (terminal-quarantines + transient-does-not-quarantine).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with 'not_logged_in' — no network retry — and the user is
prompted to re-authenticate.

Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (NousResearch#28116), and Codex-OAuth (NousResearch#28118). Persist failure is
best-effort (logged at DEBUG, error still re-raised).

Salvaged from NousResearch#28003 by @EloquentBrush0x — contributor's branch was
severely stale (would have reverted ~5000 LOC across azure/kanban/i18n
subsystems); fix re-applied surgically with their pattern preserved and
added two regression tests (terminal-quarantines + transient-does-not-quarantine).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools P3 Low — cosmetic, nice to have provider/minimax MiniMax (Anthropic transport) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants