fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344)#29348
Conversation
…tlement classifier (NousResearch#29344) ``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the same permission-denied ``code`` text for two distinct conditions: 1. Unsubscribed account ("active Grok subscription. Manage at https://grok.com" in the ``error`` field). 2. Stale OAuth access token ("OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]" in the ``error`` field). The classifier's "does not have permission + grok" substring heuristic treated both identically, so the credential-pool refresh path was short-circuited for case (2) — long-running TUI sessions stuck on a stale OAuth token surfaced a non-retryable client error and the user had to exit + reopen the TUI to recover (the startup-resolve path bypasses the classifier entirely, which is why bridge adapters with proactive refresh cadences didn't see this in practice). This patch adopts the reporter's recommended fix (option 1, tightest): honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the ``OAuth2 access token could not be validated`` phrasing as authoritative "this is auth, not entitlement" signals. When either appears anywhere in the body's text fields, the classifier returns False eagerly — *before* the entitlement keyword checks run — so the refresh-on-401 path takes over and the existing loop-protection still guards against runaway refresh storms if the refresh itself fails. Two small adjustments fall out of this: * The haystack now also covers ``code`` and ``error`` keys directly, not just the ``message``/``reason`` shape ``_extract_api_error_context`` produces. Real runtime paths use the normalised shape, but the test suite and any future call sites that pass raw bodies get the same treatment. Backwards compatible: missing keys default to empty strings, the haystack still skips when everything is blank. * Both disambiguator checks fire BEFORE the entitlement keyword checks. If a future xAI body somehow lands with both an entitlement message AND the WKE suffix, the WKE suffix wins (correct — auth is recoverable; entitlement is not, and a refreshed token will surface the entitlement message on the next request anyway). Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``, ``test_is_entitlement_failure_false_for_unrelated_auth_errors``, ``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``, ``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``) continue to pass unchanged — the unsubscribed-account path, the generic auth-error path, and the refresh-on-401 path are all left intact.
…uator (NousResearch#29344) Eleven new tests pinning the NousResearch#29344 fix. Layout mirrors the existing "Fix D" entitlement section so the bad-credentials disambiguator sits alongside the entitlement-block tests it complements. Classifier-level coverage: * ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix`` — verbatim shape from the reporter's wire capture (``{code: 'caller does not have permission', error: 'OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]'}``) ↦ classifier must return False so the refresh path runs. * ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape`` — same body after ``_extract_api_error_context`` has rewritten it to ``{reason, message}``. The disambiguator must fire in BOTH shapes; without this guard the production call site at ``_recover_with_credential_pool`` (which goes through the normalised extractor) would still misclassify. * ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant`` — parametrised forward-compat: ``bad-credentials``, ``expired-token``, ``revoked``, ``some-future-reason``. xAI documents the prefix as stable, the suffix after the colon as a reason code that can grow; every variant under ``unauthenticated:`` must route to refresh. * ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone`` — belt-and-braces guard: if a future API revision drops the WKE suffix but keeps "OAuth2 access token could not be validated", we still classify correctly. * ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords`` — defensive: if a body ever carries BOTH the WKE suffix and entitlement language, the WKE signal wins. Auth is recoverable; entitlement isn't, and a refreshed token will resurface the entitlement message on the next request. * ``test_is_entitlement_failure_case_insensitive_wke_match`` — pins that the classifier lowercases the haystack so a future xAI build that uppercases the prefix doesn't reintroduce the bug. Recovery-path coverage (end-to-end through ``_recover_with_credential_pool``): * ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403`` — the headline test the reporter requested: a bad-credentials 403 with the exact wire body must call ``try_refresh_current()`` exactly once and ``_swap_credential`` once. Pre-fix this returned ``(False, _)`` because the entitlement classifier over-matched and short-circuited the refresh path. * ``test_recover_with_credential_pool_still_blocks_real_entitlement`` — companion regression guard for NousResearch#26847: a pure unsubscribed- account body (no WKE suffix, no OAuth2-validation phrase) must still surface as entitlement and skip refresh. The new disambiguator must not weaken the original loop-protection it was added to preserve. The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the existing ``MagicMock`` patterns from the surrounding tests so the new section reads as a natural extension of "Fix D" rather than a separate test file.
|
Autofix follow-up for the failing checks on this PR:
Prepared branch/commit:
Local verification on that branch:
|
|
Merged via PR #30872 (#30872) — your two commits cherry-picked onto current main with authorship preserved, plus a follow-up commit that closes the gap your end-to-end test exposed. There's a second classification site in Thank you for the careful root-cause work and the comprehensive test coverage — the parametrised forward-compat for future |
What does this PR do?
Closes the "long-running TUI sessions can't auto-refresh stale OAuth tokens against
provider/xai-oauth" bug in #29344. The reporter (SmelterLabs) did the root-cause work and even drafted the test cases — this PR implements their option 1 (tightest) recommendation and adds the suggested coverage.xAI returns the same permission-denied
codetext for two distinct conditions on a 403:code(identical)error(the disambiguator)The caller does not have permission to execute the specified operation... active Grok subscription. Manage at https://grok.comThe caller does not have permission to execute the specified operationThe OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]_is_entitlement_failureinrun_agent.pywas added in #26847 to stop the credential-pool refresh loop on unsubscribed-account 403s — refreshing an OAuth token cannot fix an account that lacks a Grok subscription, so the classifier returns True and_recover_with_credential_poolshort-circuits beforetry_refresh_current. But the heuristic over-matched: it returned True for the bad-credentials variant too, because that body also carriesdoes not have permissionand (via_extract_api_error_context's fallback tostr(error))groksomewhere in the wire text. Result: long-running TUI sessions stuck on a stale token surfaced as a non-retryable client error, and the only workaround was exit + reopen the TUI so the startup-resolve path (which doesn't go through the classifier) refreshed the token cleanly.xAI's
[WKE=unauthenticated:...]suffix on theerrorfield is the explicit, authoritative disambiguator. This PR honors it.Fix shape (run_agent.py,
_is_entitlement_failure):_extract_api_error_contextpath normalises to{message, reason}, but the classifier is also reachable directly with raw bodies that carry{code, error}. Build the haystack from all four fields so the disambiguator fires regardless of entry point. Backwards compatible: missing keys default to empty strings, the empty-haystack short-circuit still triggers when nothing is set.[WKE=unauthenticated:anywhere in the haystack →return False(xAI's authoritative auth signal —bad-credentials,expired-token,revoked, and any future reason code all route to refresh).oauth2 access token could not be validated→return False(belt-and-braces guard if xAI ever drops the WKE suffix but keeps the human-readable phrasing).The unsubscribed-account path, the generic auth-error path, the refresh-on-401 path, and the
_recover_with_credential_poolshape are all left intact — the patch is purely additive between the haystack build and the entitlement keyword checks.Related Issue
Fixes #29344
Type of Change
Changes Made
run_agent.py—_is_entitlement_failurehaystack now also coverscodeanderrorkeys; two disambiguator checks ([wke=unauthenticated:+oauth2 access token could not be validated) fire before the entitlement keyword block. Docstring updated to explain the WKE contract and link back to [Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens #29344.tests/run_agent/test_codex_xai_oauth_recovery.py— 11 new tests in the "Fix D-bis" section, mirroring the layout of the existing entitlement tests:_extract_api_error_context, parametrised forward-compat acrossunauthenticated:reason codes (bad-credentials,expired-token,revoked,some-future-reason), the OAuth2-validation-phrase fallback, the "WKE wins over entitlement keywords" defensive case, and case-insensitive matching.try_refresh_current()exactly once and_swap_credentialonce (the headline test the reporter requested); plus a companion guard that a pure unsubscribed-account body still surfaces as entitlement and skips refresh — the new disambiguator must not weaken the [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847 protection.Backwards compatible: no schema changes, no new public functions, no new config knobs. Existing 22 tests in this file continue to pass unchanged.
How to Test
Behavioural verification against a stale OAuth token, post-fix:
Pre-fix on the same body:
Checklist
fix(xai-oauth):,test(xai-oauth):)