Skip to content

fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344)#29348

Closed
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/29344-xai-bad-credentials-misclassified
Closed

fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344)#29348
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/29344-xai-bad-credentials-misclassified

Conversation

@xxxigm

@xxxigm xxxigm commented May 20, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Closes the "long-running TUI sessions can't auto-refresh stale OAuth tokens against provider/xai-oauth" bug in #29344. The reporter (SmelterLabs) did the root-cause work and even drafted the test cases — this PR implements their option 1 (tightest) recommendation and adds the suggested coverage.

xAI returns the same permission-denied code text for two distinct conditions on a 403:

Condition code (identical) error (the disambiguator)
Unsubscribed account (entitlement) The caller does not have permission to execute the specified operation ... active Grok subscription. Manage at https://grok.com
Stale OAuth access token (auth) The caller does not have permission to execute the specified operation The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]

_is_entitlement_failure in run_agent.py was added in #26847 to stop the credential-pool refresh loop on unsubscribed-account 403s — refreshing an OAuth token cannot fix an account that lacks a Grok subscription, so the classifier returns True and _recover_with_credential_pool short-circuits before try_refresh_current. But the heuristic over-matched: it returned True for the bad-credentials variant too, because that body also carries does not have permission and (via _extract_api_error_context's fallback to str(error)) grok somewhere in the wire text. Result: long-running TUI sessions stuck on a stale token surfaced as a non-retryable client error, and the only workaround was exit + reopen the TUI so the startup-resolve path (which doesn't go through the classifier) refreshed the token cleanly.

xAI's [WKE=unauthenticated:...] suffix on the error field is the explicit, authoritative disambiguator. This PR honors it.

Fix shape (run_agent.py, _is_entitlement_failure):

  1. Broaden the haystack to cover both extraction shapes — the production _extract_api_error_context path normalises to {message, reason}, but the classifier is also reachable directly with raw bodies that carry {code, error}. Build the haystack from all four fields so the disambiguator fires regardless of entry point. Backwards compatible: missing keys default to empty strings, the empty-haystack short-circuit still triggers when nothing is set.
  2. Add two disambiguator early-returns before the existing entitlement keyword checks:
    • [WKE=unauthenticated: anywhere in the haystack → return False (xAI's authoritative auth signal — bad-credentials, expired-token, revoked, and any future reason code all route to refresh).
    • oauth2 access token could not be validatedreturn False (belt-and-braces guard if xAI ever drops the WKE suffix but keeps the human-readable phrasing).
  3. Order matters: the disambiguators run BEFORE the entitlement keyword checks. If a future xAI body ever carries both signals at once, auth wins — auth is recoverable, entitlement isn't, and a refreshed token will resurface the entitlement message on the next request anyway. The original [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847 loop-protection still kicks in on the second 403 if the refresh itself failed.

The unsubscribed-account path, the generic auth-error path, the refresh-on-401 path, and the _recover_with_credential_pool shape are all left intact — the patch is purely additive between the haystack build and the entitlement keyword checks.

Related Issue

Fixes #29344

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

Backwards compatible: no schema changes, no new public functions, no new config knobs. Existing 22 tests in this file continue to pass unchanged.

How to Test

# Full xai-oauth recovery suite (22 existing + 11 new = 33 tests)
./scripts/run_tests.sh tests/run_agent/test_codex_xai_oauth_recovery.py
# expected: 33 passed

Behavioural verification against a stale OAuth token, post-fix:

INFO  run_agent: Credential 403 — auth failure from xai-oauth; refreshing pool entry…
INFO  run_agent: Credential auth failure — refreshed pool entry oauth_xxx
INFO  run_agent: <retry succeeds with the refreshed token>

Pre-fix on the same body:

INFO  run_agent: Credential 403 — entitlement-shaped 403 from xai-oauth; skipping pool refresh
              (account lacks subscription, not a transient auth failure).
ERROR run_agent: Non-retryable client error: 403 ...
# user has to exit + reopen the TUI to recover

Checklist

xxxigm added 2 commits May 20, 2026 21:46
…tlement classifier (NousResearch#29344)

``_is_entitlement_failure`` over-matched on xAI 403s.  xAI returns the
same permission-denied ``code`` text for two distinct conditions:

  1. Unsubscribed account ("active Grok subscription. Manage at
     https://grok.com" in the ``error`` field).
  2. Stale OAuth access token ("OAuth2 access token could not be
     validated. [WKE=unauthenticated:bad-credentials]" in the ``error``
     field).

The classifier's "does not have permission + grok" substring heuristic
treated both identically, so the credential-pool refresh path was
short-circuited for case (2) — long-running TUI sessions stuck on a
stale OAuth token surfaced a non-retryable client error and the user
had to exit + reopen the TUI to recover (the startup-resolve path
bypasses the classifier entirely, which is why bridge adapters with
proactive refresh cadences didn't see this in practice).

This patch adopts the reporter's recommended fix (option 1, tightest):
honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the
``OAuth2 access token could not be validated`` phrasing as
authoritative "this is auth, not entitlement" signals.  When either
appears anywhere in the body's text fields, the classifier returns
False eagerly — *before* the entitlement keyword checks run — so the
refresh-on-401 path takes over and the existing loop-protection still
guards against runaway refresh storms if the refresh itself fails.

Two small adjustments fall out of this:

* The haystack now also covers ``code`` and ``error`` keys directly,
  not just the ``message``/``reason`` shape ``_extract_api_error_context``
  produces.  Real runtime paths use the normalised shape, but the test
  suite and any future call sites that pass raw bodies get the same
  treatment.  Backwards compatible: missing keys default to empty
  strings, the haystack still skips when everything is blank.

* Both disambiguator checks fire BEFORE the entitlement keyword
  checks.  If a future xAI body somehow lands with both an entitlement
  message AND the WKE suffix, the WKE suffix wins (correct — auth is
  recoverable; entitlement is not, and a refreshed token will surface
  the entitlement message on the next request anyway).

Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``,
``test_is_entitlement_failure_false_for_unrelated_auth_errors``,
``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``,
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``)
continue to pass unchanged — the unsubscribed-account path, the
generic auth-error path, and the refresh-on-401 path are all left
intact.
…uator (NousResearch#29344)

Eleven new tests pinning the NousResearch#29344 fix.  Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.

Classifier-level coverage:

* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
  — verbatim shape from the reporter's wire capture
  (``{code: 'caller does not have permission', error: 'OAuth2 access
  token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
  ↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
  — same body after ``_extract_api_error_context`` has rewritten it
  to ``{reason, message}``.  The disambiguator must fire in BOTH
  shapes; without this guard the production call site at
  ``_recover_with_credential_pool`` (which goes through the
  normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
  — parametrised forward-compat: ``bad-credentials``,
  ``expired-token``, ``revoked``, ``some-future-reason``.  xAI
  documents the prefix as stable, the suffix after the colon as a
  reason code that can grow; every variant under
  ``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
  — belt-and-braces guard: if a future API revision drops the WKE
  suffix but keeps "OAuth2 access token could not be validated", we
  still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
  — defensive: if a body ever carries BOTH the WKE suffix and
  entitlement language, the WKE signal wins.  Auth is recoverable;
  entitlement isn't, and a refreshed token will resurface the
  entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
  pins that the classifier lowercases the haystack so a future xAI
  build that uppercases the prefix doesn't reintroduce the bug.

Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):

* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
  — the headline test the reporter requested: a bad-credentials 403
  with the exact wire body must call ``try_refresh_current()``
  exactly once and ``_swap_credential`` once.  Pre-fix this returned
  ``(False, _)`` because the entitlement classifier over-matched and
  short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
  — companion regression guard for NousResearch#26847: a pure unsubscribed-
  account body (no WKE suffix, no OAuth2-validation phrase) must
  still surface as entitlement and skip refresh.  The new
  disambiguator must not weaken the original loop-protection it
  was added to preserve.

The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/xai xAI (Grok) P3 Low — cosmetic, nice to have labels May 20, 2026
@MKI13

MKI13 commented May 20, 2026

Copy link
Copy Markdown

Autofix follow-up for the failing checks on this PR:

  • Root cause of Supply Chain Audit: the fork branch was stale, and the workflow scans base..head, so it saw old install-hook files unrelated to this PR.
  • Tests also failed on the xAI bad-credentials recovery regression, plus a stale bundled web-provider expectation after the xAI web provider was present on the branch.
  • I prepared a fix branch that merges current main, preserves the xAI OAuth bad-credentials refresh behavior against the current code, and updates the web-provider plugin test.
  • Direct push to xxxigm/hermes-agent was denied for MKI13, so I opened a PR against your fork branch instead: fix(xai-oauth): refresh bad-credentials 403s xxxigm/hermes-agent#3

Prepared branch/commit:

  • MKI13:autofix/29348-xai-bad-credentials
  • ec77c4664e90e2bafb940fe0c071631fc4d239ca

Local verification on that branch:

  • python -m pytest tests/run_agent/test_codex_xai_oauth_recovery.py tests/plugins/web/test_web_search_provider_plugins.py tests/hermes_cli/test_update_hangup_protection.py::TestInstallHangupProtection::test_wraps_stdout_and_stderr_with_mirror -q -o 'addopts=' → 87 passed
  • python -m py_compile run_agent.py agent/agent_runtime_helpers.py
  • git diff --check
  • Supply-chain scanner patterns against origin/main..HEAD → no findings

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #30872 (#30872) — your two commits cherry-picked onto current main with authorship preserved, plus a follow-up commit that closes the gap your end-to-end test exposed.

There's a second classification site in agent/agent_runtime_helpers.py:_recover_with_credential_pool — a defense-in-depth catch-all from #26847 that blanket-treats any 403 against xai-oauth as entitlement, regardless of _is_entitlement_failure's verdict. That override defeated your fix at the classifier level, which is why test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403 was failing locally for me. The follow-up applies the same WKE / OAuth2-phrase guard at the override site so your disambiguator wins there too; the #26847 protection still triggers for genuine entitlement bodies (no WKE, no OAuth2 phrase).

Thank you for the careful root-cause work and the comprehensive test coverage — the parametrised forward-compat for future unauthenticated:* reason codes was a nice touch.

@teknium1 teknium1 closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/xai xAI (Grok) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens

4 participants