[Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens

## Summary

`_is_entitlement_failure` in `run_agent.py` over-matches on xAI Grok 403 responses, causing legitimate "OAuth access token failed validation" errors to be misclassified as unsubscribed-account entitlement failures. The defensive guard against entitlement refresh loops (existing test references issue #26847) suppresses the refresh-on-401 path for both real cases, leaving long-running TUI sessions stuck on a stale token with no recovery.

Workaround: exit and reopen the TUI — the startup refresh path bypasses the broken classifier.

## Repro

1. Open a Hermes TUI session against `provider/xai-oauth` (SuperGrok).
2. Let it sit idle long enough that the access token goes stale by xAI's server-side criteria (in my case, ~22 hours; can happen sooner if xAI rotates session-side).
3. Send a request.
4. xAI returns HTTP 403 with this body:

```json
{
  "code": "The caller does not have permission to execute the specified operation",
  "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"
}
```

5. Hermes logs `Non-retryable client error` and surfaces it to the user. No refresh attempt happens, even though the credential pool's `_refresh_entry` for this provider works fine (proven by opening a new TUI session — the startup-resolve path refreshes successfully).

## Expected

The `[WKE=unauthenticated:bad-credentials]` suffix unambiguously indicates this is a credential-validation failure, not an entitlement failure. Hermes should:
- Call `_recover_with_credential_pool` → `try_refresh_current()` → `_swap_credential`
- Retry the request with the refreshed token
- Either succeed (the typical case after a stale token) or, if the refresh itself fails terminally, fall through to the existing terminal-quarantine path

## Actual

`_is_entitlement_failure` returns `True` because the response body matches its substring heuristic on `"caller does not have permission"`. The recovery short-circuits, returns `False`, error surfaces as non-retryable.

## Root cause

xAI's API returns the *same* `code` field text for two distinct conditions:

| Condition | `code` (same) | `error` field (the disambiguator) |
|---|---|---|
| Entitlement (account isn't SuperGrok-subscribed) | `"The caller does not have permission to execute the specified operation"` | `"... active Grok subscription. Manage at https://grok.com"` (or similar entitlement language) |
| Bad credentials (access token failed validation) | `"The caller does not have permission to execute the specified operation"` | `"The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"` |

The existing tests in `tests/run_agent/test_codex_xai_oauth_recovery.py` cover the entitlement case correctly (`test_is_entitlement_failure_matches_real_xai_bodies`), but there's no test case for the bad-credentials variant — so the classifier treats both identically.

The `[WKE=unauthenticated:bad-credentials]` suffix is xAI's authoritative disambiguator. Hermes currently ignores it.

## Proposed fixes (escalating, pick one)

1. **Tightest** — In `_is_entitlement_failure`, check the body's `error` field first: if it contains `[WKE=unauthenticated:` (or specifically `[WKE=unauthenticated:bad-credentials]`), return `False` immediately. Refresh path then handles it.

2. **Pragmatic** — Require BOTH the entitlement keyword AND the absence of `"OAuth2 access token could not be validated"` before classifying as entitlement.

3. **Safest** — When the WKE suffix says `unauthenticated`, attempt refresh-once before classifying. The existing loop-protection still kicks in on the second 403 if refresh didn't actually help.

Fix #1 is mechanical and matches the explicit disambiguator xAI sends. Recommended.

## Test additions

Suggested cases for `tests/run_agent/test_codex_xai_oauth_recovery.py`:

```python
def test_is_entitlement_failure_false_for_bad_credentials_wke_suffix():
    """403 with WKE=unauthenticated:bad-credentials is auth failure, not entitlement."""
    from run_agent import AIAgent
    assert not AIAgent._is_entitlement_failure(
        {
            "code": "The caller does not have permission to execute the specified operation",
            "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",
        },
        403,
    )

def test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403():
    """A bad-credentials 403 from xai-oauth must trigger refresh."""
    # Same scaffolding as test_recover_with_credential_pool_still_refreshes_genuine_auth_failure,
    # but with status_code=403 and the bad-credentials error body. Should call try_refresh_current().
```

## Impact

- Any long-running TUI / chat session against `provider/xai-oauth` will eventually 403 once the token goes stale, and the user has to exit/reopen to recover.
- Bridge adapters (Discord, Telegram, etc.) appear unaffected in practice because their process lifecycle / proactive refresh cadence keeps tokens fresh enough that the reactive-recovery path is rarely exercised. But they're vulnerable to the same bug under the right timing.
- Reproduced on two independent installations of Hermes against two separate SuperGrok-active xAI OAuth accounts — same exact symptom, same exact 403 body.

## Environment

- Hermes — recent v0.14.x snapshot (cloned source, current main)
- Python 3.11.15 on Linux
- `provider/xai-oauth` source `manual:xai_pkce` (not `loopback_pkce`, but the bug is upstream of the loopback-vs-manual distinction)
- xAI Grok backend, `grok-4.3` model, `https://api.x.ai/v1`


Condition	`code` (same)	`error` field (the disambiguator)
Entitlement (account isn't SuperGrok-subscribed)	`"The caller does not have permission to execute the specified operation"`	`"... active Grok subscription. Manage at https://grok.com"` (or similar entitlement language)
Bad credentials (access token failed validation)	`"The caller does not have permission to execute the specified operation"`	`"The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens #29344

Summary

Repro

Expected

Actual

Root cause

Proposed fixes (escalating, pick one)

Test additions

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens #29344

Description

Summary

Repro

Expected

Actual

Root cause

Proposed fixes (escalating, pick one)

Test additions

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions