fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344) by xxxigm · Pull Request #29348 · NousResearch/hermes-agent

xxxigm · 2026-05-20T14:51:51Z

What does this PR do?

Closes the "long-running TUI sessions can't auto-refresh stale OAuth tokens against provider/xai-oauth" bug in #29344. The reporter (SmelterLabs) did the root-cause work and even drafted the test cases — this PR implements their option 1 (tightest) recommendation and adds the suggested coverage.

xAI returns the same permission-denied code text for two distinct conditions on a 403:

Condition	`code` (identical)	`error` (the disambiguator)
Unsubscribed account (entitlement)	`The caller does not have permission to execute the specified operation`	`... active Grok subscription. Manage at https://grok.com`
Stale OAuth access token (auth)	`The caller does not have permission to execute the specified operation`	`The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]`

_is_entitlement_failure in run_agent.py was added in #26847 to stop the credential-pool refresh loop on unsubscribed-account 403s — refreshing an OAuth token cannot fix an account that lacks a Grok subscription, so the classifier returns True and _recover_with_credential_pool short-circuits before try_refresh_current. But the heuristic over-matched: it returned True for the bad-credentials variant too, because that body also carries does not have permission and (via _extract_api_error_context's fallback to str(error)) grok somewhere in the wire text. Result: long-running TUI sessions stuck on a stale token surfaced as a non-retryable client error, and the only workaround was exit + reopen the TUI so the startup-resolve path (which doesn't go through the classifier) refreshed the token cleanly.

xAI's [WKE=unauthenticated:...] suffix on the error field is the explicit, authoritative disambiguator. This PR honors it.

Fix shape (run_agent.py, _is_entitlement_failure):

Broaden the haystack to cover both extraction shapes — the production _extract_api_error_context path normalises to {message, reason}, but the classifier is also reachable directly with raw bodies that carry {code, error}. Build the haystack from all four fields so the disambiguator fires regardless of entry point. Backwards compatible: missing keys default to empty strings, the empty-haystack short-circuit still triggers when nothing is set.
Add two disambiguator early-returns before the existing entitlement keyword checks:
- [WKE=unauthenticated: anywhere in the haystack → return False (xAI's authoritative auth signal — bad-credentials, expired-token, revoked, and any future reason code all route to refresh).
- oauth2 access token could not be validated → return False (belt-and-braces guard if xAI ever drops the WKE suffix but keeps the human-readable phrasing).
Order matters: the disambiguators run BEFORE the entitlement keyword checks. If a future xAI body ever carries both signals at once, auth wins — auth is recoverable, entitlement isn't, and a refreshed token will resurface the entitlement message on the next request anyway. The original [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847 loop-protection still kicks in on the second 403 if the refresh itself failed.

The unsubscribed-account path, the generic auth-error path, the refresh-on-401 path, and the _recover_with_credential_pool shape are all left intact — the patch is purely additive between the haystack build and the entitlement keyword checks.

Related Issue

Fixes #29344

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

run_agent.py — _is_entitlement_failure haystack now also covers code and error keys; two disambiguator checks ([wke=unauthenticated: + oauth2 access token could not be validated) fire before the entitlement keyword block. Docstring updated to explain the WKE contract and link back to [Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens #29344.
tests/run_agent/test_codex_xai_oauth_recovery.py — 11 new tests in the "Fix D-bis" section, mirroring the layout of the existing entitlement tests:
- Classifier-level: verbatim reporter body, normalised shape after _extract_api_error_context, parametrised forward-compat across unauthenticated: reason codes (bad-credentials, expired-token, revoked, some-future-reason), the OAuth2-validation-phrase fallback, the "WKE wins over entitlement keywords" defensive case, and case-insensitive matching.
- Recovery-path end-to-end: a bad-credentials 403 with the exact wire body must call try_refresh_current() exactly once and _swap_credential once (the headline test the reporter requested); plus a companion guard that a pure unsubscribed-account body still surfaces as entitlement and skips refresh — the new disambiguator must not weaken the [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847 protection.

Backwards compatible: no schema changes, no new public functions, no new config knobs. Existing 22 tests in this file continue to pass unchanged.

How to Test

# Full xai-oauth recovery suite (22 existing + 11 new = 33 tests)
./scripts/run_tests.sh tests/run_agent/test_codex_xai_oauth_recovery.py
# expected: 33 passed

Behavioural verification against a stale OAuth token, post-fix:

INFO  run_agent: Credential 403 — auth failure from xai-oauth; refreshing pool entry…
INFO  run_agent: Credential auth failure — refreshed pool entry oauth_xxx
INFO  run_agent: <retry succeeds with the refreshed token>

Pre-fix on the same body:

INFO  run_agent: Credential 403 — entitlement-shaped 403 from xai-oauth; skipping pool refresh
              (account lacks subscription, not a transient auth failure).
ERROR run_agent: Non-retryable client error: 403 ...
# user has to exit + reopen the TUI to recover

Checklist

Conventional Commits (fix(xai-oauth):, test(xai-oauth):)
2 focused commits, single author
11 new tests pass; 22 existing tests in the same file pass; 33 total
Tested on macOS 15.6 (darwin 24.6.0)
No new config keys, no schema changes; backwards compatible
The [Bug]: xAI OAuth (xai-oauth) returns HTTP 403 for standard SuperGrok subscribers — backend enforcing Heavy-only despite docs claiming all tiers #26847 entitlement loop-protection is preserved by the companion regression guard

…tlement classifier (NousResearch#29344) ``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the same permission-denied ``code`` text for two distinct conditions: 1. Unsubscribed account ("active Grok subscription. Manage at https://grok.com" in the ``error`` field). 2. Stale OAuth access token ("OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]" in the ``error`` field). The classifier's "does not have permission + grok" substring heuristic treated both identically, so the credential-pool refresh path was short-circuited for case (2) — long-running TUI sessions stuck on a stale OAuth token surfaced a non-retryable client error and the user had to exit + reopen the TUI to recover (the startup-resolve path bypasses the classifier entirely, which is why bridge adapters with proactive refresh cadences didn't see this in practice). This patch adopts the reporter's recommended fix (option 1, tightest): honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the ``OAuth2 access token could not be validated`` phrasing as authoritative "this is auth, not entitlement" signals. When either appears anywhere in the body's text fields, the classifier returns False eagerly — *before* the entitlement keyword checks run — so the refresh-on-401 path takes over and the existing loop-protection still guards against runaway refresh storms if the refresh itself fails. Two small adjustments fall out of this: * The haystack now also covers ``code`` and ``error`` keys directly, not just the ``message``/``reason`` shape ``_extract_api_error_context`` produces. Real runtime paths use the normalised shape, but the test suite and any future call sites that pass raw bodies get the same treatment. Backwards compatible: missing keys default to empty strings, the haystack still skips when everything is blank. * Both disambiguator checks fire BEFORE the entitlement keyword checks. If a future xAI body somehow lands with both an entitlement message AND the WKE suffix, the WKE suffix wins (correct — auth is recoverable; entitlement is not, and a refreshed token will surface the entitlement message on the next request anyway). Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``, ``test_is_entitlement_failure_false_for_unrelated_auth_errors``, ``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``, ``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``) continue to pass unchanged — the unsubscribed-account path, the generic auth-error path, and the refresh-on-401 path are all left intact.

…uator (NousResearch#29344) Eleven new tests pinning the NousResearch#29344 fix. Layout mirrors the existing "Fix D" entitlement section so the bad-credentials disambiguator sits alongside the entitlement-block tests it complements. Classifier-level coverage: * ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix`` — verbatim shape from the reporter's wire capture (``{code: 'caller does not have permission', error: 'OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]'}``) ↦ classifier must return False so the refresh path runs. * ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape`` — same body after ``_extract_api_error_context`` has rewritten it to ``{reason, message}``. The disambiguator must fire in BOTH shapes; without this guard the production call site at ``_recover_with_credential_pool`` (which goes through the normalised extractor) would still misclassify. * ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant`` — parametrised forward-compat: ``bad-credentials``, ``expired-token``, ``revoked``, ``some-future-reason``. xAI documents the prefix as stable, the suffix after the colon as a reason code that can grow; every variant under ``unauthenticated:`` must route to refresh. * ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone`` — belt-and-braces guard: if a future API revision drops the WKE suffix but keeps "OAuth2 access token could not be validated", we still classify correctly. * ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords`` — defensive: if a body ever carries BOTH the WKE suffix and entitlement language, the WKE signal wins. Auth is recoverable; entitlement isn't, and a refreshed token will resurface the entitlement message on the next request. * ``test_is_entitlement_failure_case_insensitive_wke_match`` — pins that the classifier lowercases the haystack so a future xAI build that uppercases the prefix doesn't reintroduce the bug. Recovery-path coverage (end-to-end through ``_recover_with_credential_pool``): * ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403`` — the headline test the reporter requested: a bad-credentials 403 with the exact wire body must call ``try_refresh_current()`` exactly once and ``_swap_credential`` once. Pre-fix this returned ``(False, _)`` because the entitlement classifier over-matched and short-circuited the refresh path. * ``test_recover_with_credential_pool_still_blocks_real_entitlement`` — companion regression guard for NousResearch#26847: a pure unsubscribed- account body (no WKE suffix, no OAuth2-validation phrase) must still surface as entitlement and skip refresh. The new disambiguator must not weaken the original loop-protection it was added to preserve. The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the existing ``MagicMock`` patterns from the surrounding tests so the new section reads as a natural extension of "Fix D" rather than a separate test file.

MKI13 · 2026-05-20T15:09:02Z

Autofix follow-up for the failing checks on this PR:

Root cause of Supply Chain Audit: the fork branch was stale, and the workflow scans base..head, so it saw old install-hook files unrelated to this PR.
Tests also failed on the xAI bad-credentials recovery regression, plus a stale bundled web-provider expectation after the xAI web provider was present on the branch.
I prepared a fix branch that merges current main, preserves the xAI OAuth bad-credentials refresh behavior against the current code, and updates the web-provider plugin test.
Direct push to xxxigm/hermes-agent was denied for MKI13, so I opened a PR against your fork branch instead: fix(xai-oauth): refresh bad-credentials 403s xxxigm/hermes-agent#3

Prepared branch/commit:

MKI13:autofix/29348-xai-bad-credentials
ec77c4664e90e2bafb940fe0c071631fc4d239ca

Local verification on that branch:

python -m pytest tests/run_agent/test_codex_xai_oauth_recovery.py tests/plugins/web/test_web_search_provider_plugins.py tests/hermes_cli/test_update_hangup_protection.py::TestInstallHangupProtection::test_wraps_stdout_and_stderr_with_mirror -q -o 'addopts=' → 87 passed
python -m py_compile run_agent.py agent/agent_runtime_helpers.py
git diff --check
Supply-chain scanner patterns against origin/main..HEAD → no findings

teknium1 · 2026-05-23T09:48:24Z

Merged via PR #30872 (#30872) — your two commits cherry-picked onto current main with authorship preserved, plus a follow-up commit that closes the gap your end-to-end test exposed.

There's a second classification site in agent/agent_runtime_helpers.py:_recover_with_credential_pool — a defense-in-depth catch-all from #26847 that blanket-treats any 403 against xai-oauth as entitlement, regardless of _is_entitlement_failure's verdict. That override defeated your fix at the classifier level, which is why test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403 was failing locally for me. The follow-up applies the same WKE / OAuth2-phrase guard at the override site so your disambiguator wins there too; the #26847 protection still triggers for genuine entitlement bodies (no WKE, no OAuth2 phrase).

Thank you for the careful root-cause work and the comprehensive test coverage — the parametrised forward-compat for future unauthenticated:* reason codes was a nice touch.

xxxigm added 2 commits May 20, 2026 21:46

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/xai xAI (Grok) P3 Low — cosmetic, nice to have labels May 20, 2026

MKI13 mentioned this pull request May 20, 2026

fix(xai-oauth): refresh bad-credentials 403s xxxigm/hermes-agent#3

Open

teknium1 mentioned this pull request May 23, 2026

fix(xai-oauth): honor WKE=unauthenticated disambiguator at both classifier sites (#29344) #30872

Merged

teknium1 closed this May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344)#29348

fix(xai-oauth): honor [WKE=unauthenticated] disambiguator in entitlement classifier (#29344)#29348
xxxigm wants to merge 2 commits into
NousResearch:mainfrom
xxxigm:fix/29344-xai-bad-credentials-misclassified

xxxigm commented May 20, 2026 •

edited

Loading

Uh oh!

MKI13 commented May 20, 2026 •

edited

Loading

Uh oh!

teknium1 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xxxigm commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Uh oh!

MKI13 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teknium1 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xxxigm commented May 20, 2026 •

edited

Loading

MKI13 commented May 20, 2026 •

edited

Loading