Skip to content

[Bug]: OAuth token refresh fails on headless gateway — wrong endpoint + no recovery, causing persistent 401s with Anthropic Max #2962

@nrubioru

Description

@nrubioru

Summary

When running Hermes with Anthropic Max (OAuth sk-ant-oat tokens) on a headless server via the gateway, the token expires after ~8 hours and auto-refresh fails silently, causing persistent 401 errors with no recovery path. The gateway continues serving error responses to all connected chat platforms until manually restarted with a fresh token.

This does not happen with other OAuth-based agent frameworks (e.g., OpenClaw) using the same Anthropic Max account and token structure.

Environment

  • Hermes Agent (installed via git, latest main branch as of Mar 25, 2026)
  • Provider: anthropic (native Messages API)
  • Auth: Anthropic Max subscription via OAuth token (sk-ant-oat...)
  • Platform: Ubuntu 22.04 headless server, gateway mode (Telegram)
  • No browser available on server

Steps to Reproduce

  1. Set up Hermes gateway on a headless server with model.provider: anthropic
  2. Authenticate with Anthropic Max (OAuth token in ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in ~/.hermes/.env)
  3. Run claude auth login to populate ~/.claude/.credentials.json with refresh token (requires manual browser interaction — already painful on headless)
  4. Wait ~8 hours for the access token to expire
  5. Send a message to any connected platform

Actual Behavior

  • Gateway returns 401 errors: {'type': 'authentication_error', 'message': 'invalid x-api-key'}
  • Error response sent to user for every message: "Sorry, I encountered an error (AuthenticationError)... Check your API key or run claude /login"
  • _refresh_oauth_token() in agent/anthropic_adapter.py fails because:
    • It uses https://console.anthropic.com/v1/oauth/token with application/x-www-form-urlencoded (line 231)
    • The actual working endpoint is https://platform.claude.com/v1/oauth/token with application/json content type (confirmed by reading Claude Code's own cli.js source)
    • The refresh call returns HTTP 500, so no auto-refresh ever succeeds
  • Gateway continues failing on every subsequent message with no self-healing
  • Requires full manual re-authentication (browser-based OAuth flow) + gateway restart

Expected Behavior

  • Token refresh should work automatically using the correct endpoint and format
  • Gateway should detect persistent auth failures and attempt credential refresh proactively
  • On a headless server, the OAuth flow should be manageable without requiring repeated browser interaction

Root Cause Analysis

1. Wrong token endpoint for refresh (bug)

_refresh_oauth_token() in agent/anthropic_adapter.py line 231:

req = urllib.request.Request(
    "https://console.anthropic.com/v1/oauth/token",  # WRONG
    data=data,
    headers={
        "Content-Type": "application/x-www-form-urlencoded",  # WRONG for auth code exchange
    },
)

Claude Code's own source (cli.js) uses:

TOKEN_URL: "https://platform.claude.com/v1/oauth/token"
// Uses application/json for token exchange

The refresh token grant may work with form-urlencoded on console.anthropic.com for some token types, but returns HTTP 500 for tokens obtained via the Claude Code OAuth flow.

2. No startup validation (design gap)

When ANTHROPIC_API_KEY is an OAuth token (sk-ant-oat...) with no ~/.claude/.credentials.json present, the gateway starts normally and only fails hours later when the token expires. There should be a startup warning: "OAuth token detected without refresh credentials — token will expire and cannot be auto-renewed."

3. Gateway doesn't recover from persistent 401s (design gap)

When auth fails, the gateway logs the error and sends an error message to the user, but makes no attempt to:

  • Re-read credentials from disk
  • Attempt token refresh
  • Alert the operator via a different channel
  • Enter a degraded mode that retries periodically

Each subsequent message hits the same expired token and fails identically.

4. claude auth login is impractical on headless servers

The only way to get a refresh token is claude auth login, which tries to open a browser. On a headless server, this requires manual workaround (generate PKCE challenge, open URL on another machine, paste code back). The token exchange itself requires knowledge of the correct endpoint (platform.claude.com, not console.anthropic.com) and format (application/json with state parameter).

Suggested Fixes

  1. Fix the refresh token endpoint — Update _refresh_oauth_token() to use https://platform.claude.com/v1/oauth/token and try both application/json and application/x-www-form-urlencoded content types for resilience.

  2. Startup credential validation — At gateway start, if the resolved token is an OAuth token, check for a valid refresh token source. Warn loudly if none exists.

  3. Gateway auth failure recovery — On 401 errors, attempt to refresh credentials from ~/.claude/.credentials.json before returning the error to the user. If multiple consecutive 401s occur, log a prominent warning.

  4. Headless OAuth support — Provide a hermes auth login --headless command that handles the PKCE flow end-to-end, printing the URL and accepting the code via stdin, without requiring claude CLI's browser-opening behavior.

Logs

2026-03-25 03:47:47,853 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 03:48:14,166 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 03:48:57,657 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 13:05:49,431 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'Invalid authentication credentials'}}
2026-03-25 13:20:50,585 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:35:51,743 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:50:52,931 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:58:20,929 ERROR root: Non-retryable client error: Error code: 401

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions