Skip to content

fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error#31527

Closed
Moikapy wants to merge 1 commit into
NousResearch:mainfrom
Moikapy:fix/xai-oauth-403-recovery
Closed

fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error#31527
Moikapy wants to merge 1 commit into
NousResearch:mainfrom
Moikapy:fix/xai-oauth-403-recovery

Conversation

@Moikapy

@Moikapy Moikapy commented May 24, 2026

Copy link
Copy Markdown
Contributor

What

xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials when an OAuth2 access token has expired or is invalid. The existing _is_auth_error() only matched status 401, so these tokens were never refreshed and the 403 propagated as a generic "Auxiliary title_generation failed" / "context summary failed" error.

Why

Three gaps in the auxiliary client error recovery path for xAI OAuth:

  1. _is_auth_error() — only checked HTTP 401. xAI uses 403 for expired/invalid OAuth tokens, which is semantically a 401 auth failure. The recovery system never recognized these as auth errors, so it never triggered token refresh or pool rotation.

  2. _refresh_provider_credentials() — had branches for openai-codex, nous, and anthropic, but not xai-oauth. Even if the 403 was detected as auth, the refresh function returned False without attempting any refresh.

  3. _recoverable_pool_provider() — mapped hostnames for chatgpt.com, openrouter.ai, api.anthropic.com, etc. but not api.x.ai, so auto-resolved providers couldn't find the xAI credential pool for recovery.

Changes

  1. _is_auth_error() — Detect xAI's 403 + "bad-credentials" pattern and "unauthenticated" + "bad-credentials" string pattern as auth failures.

  2. _refresh_provider_credentials() — Add xai-oauth branch: pool-level refresh via try_refresh_current() (with select() to ensure a current entry), then fall back to resolve_xai_oauth_runtime_credentials(force_refresh=True).

  3. _recoverable_pool_provider() — Map api.x.ai host to "xai-oauth" pool provider.

How to test

  1. Authenticate with xAI OAuth: hermes model → select xAI Grok OAuth
  2. Configure an auxiliary task to use xai-oauth (e.g. auxiliary.title_generation.model: grok-4.3)
  3. Wait for the OAuth token to expire (or force-expire by replacing the access token in auth.json with garbage)
  4. Trigger an auxiliary call (new conversation → title generation fires)
  5. Before fix: 403 error logged, title generation fails silently
  6. After fix: 403 detected as auth error, token refreshed, call retried with fresh token

Unit tests:

pytest tests/agent/test_auxiliary_client_xai_oauth_recovery.py -v

14 tests covering detection, host mapping, and graceful fallback.

Tested on

  • Raspberry Pi 5 (Linux aarch64), Python 3.13.5
  • xAI OAuth with Grok 4.3
  • All 14 new tests pass, 164 existing sync tests pass (7 pre-existing async test failures from missing pytest-asyncio, unrelated)

@Moikapy Moikapy force-pushed the fix/xai-oauth-403-recovery branch from affca71 to e301d40 Compare May 24, 2026 14:59
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/xai xAI (Grok) labels May 24, 2026
xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials
when an OAuth2 access token has expired or is invalid. The existing
_is_auth_error() only checked for 401 status codes, so these tokens
were never refreshed and the 403 propagated as a generic permission
denied error.

Three fixes:

1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as
   an auth failure, triggering token refresh instead of silent failure.

2. _refresh_provider_credentials: Add xai-oauth branch with
   pool-level refresh (try_refresh_current with select to ensure
   current entry) then fallback to singleton resolver with
   force_refresh=True.

3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth
   pool for auto-resolved providers, matching existing pattern for
   openai-codex/openrouter/nous/anthropic.

Includes 14 tests covering the new detection logic, host mapping,
and graceful fallback behavior.

Signed-off-by: moikapy <moikapy@devmoi.com>
@teknium1

Copy link
Copy Markdown
Contributor

Closing — merged via salvage PR #34431 (#34431) with your commit 3860872d6 cherry-picked onto current main with authorship preserved.

One merge conflict to resolve: _recoverable_pool_provider grew a main_runtime-based PROVIDER_REGISTRY fallback block between your branch point and now (for opencode-go and similar api_key providers). Resolved by keeping both — your api.x.ai → xai-oauth host check fires first, then falls through to the registry fallback for unknown providers.

Distinct from the previously-merged WKE disambiguator fix (#30872) which covered the main agent loop's recovery path in run_agent.py. The auxiliary client has its own parallel recovery system you correctly identified, and the three gaps you fixed (_is_auth_error, _refresh_provider_credentials, _recoverable_pool_provider) weren't touched by that earlier PR. Both code surfaces needed coverage.

Your 14 tests covered all three functions cleanly — no follow-up test work was needed. Thanks!

@teknium1 teknium1 closed this May 29, 2026
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/xai xAI (Grok) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants