fix(xai-oauth): honor WKE=unauthenticated disambiguator at both classifier sites (#29344) by teknium1 · Pull Request #30872 · NousResearch/hermes-agent

teknium1 · 2026-05-23T09:45:51Z

Summary

Salvages PR #29348 (@xxxigm) onto current main and closes the end-to-end gap that left their headline recovery test failing.

xAI returns the same permission-denied code text for two distinct conditions on a 403: unsubscribed account vs. stale OAuth token. The error field's [WKE=unauthenticated:...] suffix is xAI's authoritative disambiguator. Pre-fix, _is_entitlement_failure over-matched on the code text and classified stale-token 403s as entitlement, so long-running TUI sessions against provider/xai-oauth couldn't auto-refresh and the user had to exit + reopen to recover.

Closes #29344. Closes #28250 (same root cause — token going stale at ~24h with no auto-refresh).

Changes

run_agent.py (@xxxigm) — _is_entitlement_failure haystack now also covers code and error keys; two disambiguator early-returns ([wke=unauthenticated: and oauth2 access token could not be validated) fire before the entitlement keyword block.
tests/run_agent/test_codex_xai_oauth_recovery.py (@xxxigm) — 11 new tests in the "Fix D-bis" section: classifier-level (raw body + normalised shape + parametrised forward-compat across reason codes + case-insensitive + WKE-wins-over-entitlement-keywords + OAuth2-phrase-only fallback) and end-to-end (bad-credentials body triggers try_refresh_current exactly once; pure entitlement body still skips refresh as a regression guard).
agent/agent_runtime_helpers.py (follow-up) — _recover_with_credential_pool had a second classification site that blanket-treated any 403 against xai-oauth as entitlement (defense-in-depth catch-all from [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847). The catch-all defeated @xxxigm's classifier-level fix, which is why their end-to-end recovery test was failing locally. Applied the same WKE / OAuth2-phrase guard at the override site so the disambiguator wins there too. Genuine entitlement bodies (no WKE, no OAuth2 phrase) still hit the catch-all and skip refresh.

Validation

scripts/run_tests.sh tests/run_agent/test_codex_xai_oauth_recovery.py
=== 37 tests passed, 0 failed in 3.1s ===

	Before	After
Stale-token 403 with `[WKE=unauthenticated:bad-credentials]`	Misclassified as entitlement → non-retryable error → user re-auths manually	Routes through credential pool → `try_refresh_current()` → silent recovery
Genuine unsubscribed-account 403 (no WKE, no OAuth2 phrase)	Classified as entitlement, refresh skipped	Unchanged — still classified as entitlement, refresh skipped (#26847 protection preserved)

Credit

@xxxigm did the classifier-level fix and wrote 11 of the 12 tests. They also clearly documented xAI's WKE contract for forward-compat. The follow-up commit just extends the same disambiguator to the second classification site they didn't know about.

…tlement classifier (#29344) ``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the same permission-denied ``code`` text for two distinct conditions: 1. Unsubscribed account ("active Grok subscription. Manage at https://grok.com" in the ``error`` field). 2. Stale OAuth access token ("OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]" in the ``error`` field). The classifier's "does not have permission + grok" substring heuristic treated both identically, so the credential-pool refresh path was short-circuited for case (2) — long-running TUI sessions stuck on a stale OAuth token surfaced a non-retryable client error and the user had to exit + reopen the TUI to recover (the startup-resolve path bypasses the classifier entirely, which is why bridge adapters with proactive refresh cadences didn't see this in practice). This patch adopts the reporter's recommended fix (option 1, tightest): honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the ``OAuth2 access token could not be validated`` phrasing as authoritative "this is auth, not entitlement" signals. When either appears anywhere in the body's text fields, the classifier returns False eagerly — *before* the entitlement keyword checks run — so the refresh-on-401 path takes over and the existing loop-protection still guards against runaway refresh storms if the refresh itself fails. Two small adjustments fall out of this: * The haystack now also covers ``code`` and ``error`` keys directly, not just the ``message``/``reason`` shape ``_extract_api_error_context`` produces. Real runtime paths use the normalised shape, but the test suite and any future call sites that pass raw bodies get the same treatment. Backwards compatible: missing keys default to empty strings, the haystack still skips when everything is blank. * Both disambiguator checks fire BEFORE the entitlement keyword checks. If a future xAI body somehow lands with both an entitlement message AND the WKE suffix, the WKE suffix wins (correct — auth is recoverable; entitlement is not, and a refreshed token will surface the entitlement message on the next request anyway). Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``, ``test_is_entitlement_failure_false_for_unrelated_auth_errors``, ``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``, ``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``) continue to pass unchanged — the unsubscribed-account path, the generic auth-error path, and the refresh-on-401 path are all left intact.

…uator (#29344) Eleven new tests pinning the #29344 fix. Layout mirrors the existing "Fix D" entitlement section so the bad-credentials disambiguator sits alongside the entitlement-block tests it complements. Classifier-level coverage: * ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix`` — verbatim shape from the reporter's wire capture (``{code: 'caller does not have permission', error: 'OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]'}``) ↦ classifier must return False so the refresh path runs. * ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape`` — same body after ``_extract_api_error_context`` has rewritten it to ``{reason, message}``. The disambiguator must fire in BOTH shapes; without this guard the production call site at ``_recover_with_credential_pool`` (which goes through the normalised extractor) would still misclassify. * ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant`` — parametrised forward-compat: ``bad-credentials``, ``expired-token``, ``revoked``, ``some-future-reason``. xAI documents the prefix as stable, the suffix after the colon as a reason code that can grow; every variant under ``unauthenticated:`` must route to refresh. * ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone`` — belt-and-braces guard: if a future API revision drops the WKE suffix but keeps "OAuth2 access token could not be validated", we still classify correctly. * ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords`` — defensive: if a body ever carries BOTH the WKE suffix and entitlement language, the WKE signal wins. Auth is recoverable; entitlement isn't, and a refreshed token will resurface the entitlement message on the next request. * ``test_is_entitlement_failure_case_insensitive_wke_match`` — pins that the classifier lowercases the haystack so a future xAI build that uppercases the prefix doesn't reintroduce the bug. Recovery-path coverage (end-to-end through ``_recover_with_credential_pool``): * ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403`` — the headline test the reporter requested: a bad-credentials 403 with the exact wire body must call ``try_refresh_current()`` exactly once and ``_swap_credential`` once. Pre-fix this returned ``(False, _)`` because the entitlement classifier over-matched and short-circuited the refresh path. * ``test_recover_with_credential_pool_still_blocks_real_entitlement`` — companion regression guard for #26847: a pure unsubscribed- account body (no WKE suffix, no OAuth2-validation phrase) must still surface as entitlement and skip refresh. The new disambiguator must not weaken the original loop-protection it was added to preserve. The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the existing ``MagicMock`` patterns from the surrounding tests so the new section reads as a natural extension of "Fix D" rather than a separate test file.

…29344) _recover_with_credential_pool had a second classification site that blanket- treated any 403 against xai-oauth as entitlement (defense-in-depth for #26847). That override defeated the new _is_entitlement_failure disambiguator from the parent commit — bad-credentials 403s still short-circuited the refresh path. Apply the same WKE-unauthenticated / OAuth2-validation-phrase guard at the override site so xAI's authoritative 'this is auth, not entitlement' signal wins there too. The #26847 catch-all still triggers for genuine entitlement bodies that don't carry the disambiguator. Closes the end-to-end gap exposed by test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403.

github-actions · 2026-05-23T09:46:33Z

🔎 Lint report: `hermes/hermes-a58f808b` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9027 on HEAD, 9027 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4804 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

xxxigm and others added 3 commits May 23, 2026 02:42

teknium1 merged commit cc93053 into main May 23, 2026
21 of 23 checks passed

teknium1 deleted the hermes/hermes-a58f808b branch May 23, 2026 09:48

teknium1 mentioned this pull request May 23, 2026

fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344) #29348

Closed

7 tasks

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/xai xAI (Grok) labels May 23, 2026

BrewTestBot mentioned this pull request May 28, 2026

hermes-agent 2026.5.28 Homebrew/homebrew-core#285115

Merged

1 task

This was referenced May 29, 2026

fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error (#31527) #34431

Merged

fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error #31527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(xai-oauth): honor WKE=unauthenticated disambiguator at both classifier sites (#29344)#30872

fix(xai-oauth): honor WKE=unauthenticated disambiguator at both classifier sites (#29344)#30872
teknium1 merged 3 commits into
mainfrom
hermes/hermes-a58f808b

teknium1 commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 23, 2026

Summary

Changes

Validation

Credit

Uh oh!

github-actions Bot commented May 23, 2026

🔎 Lint report: hermes/hermes-a58f808b vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔎 Lint report: `hermes/hermes-a58f808b` vs `origin/main`