Summary
When running Hermes with Anthropic Max (OAuth sk-ant-oat tokens) on a headless server via the gateway, the token expires after ~8 hours and auto-refresh fails silently, causing persistent 401 errors with no recovery path. The gateway continues serving error responses to all connected chat platforms until manually restarted with a fresh token.
This does not happen with other OAuth-based agent frameworks (e.g., OpenClaw) using the same Anthropic Max account and token structure.
Environment
- Hermes Agent (installed via git, latest main branch as of Mar 25, 2026)
- Provider:
anthropic (native Messages API)
- Auth: Anthropic Max subscription via OAuth token (
sk-ant-oat...)
- Platform: Ubuntu 22.04 headless server, gateway mode (Telegram)
- No browser available on server
Steps to Reproduce
- Set up Hermes gateway on a headless server with
model.provider: anthropic
- Authenticate with Anthropic Max (OAuth token in
ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in ~/.hermes/.env)
- Run
claude auth login to populate ~/.claude/.credentials.json with refresh token (requires manual browser interaction — already painful on headless)
- Wait ~8 hours for the access token to expire
- Send a message to any connected platform
Actual Behavior
- Gateway returns 401 errors:
{'type': 'authentication_error', 'message': 'invalid x-api-key'}
- Error response sent to user for every message: "Sorry, I encountered an error (AuthenticationError)... Check your API key or run
claude /login"
_refresh_oauth_token() in agent/anthropic_adapter.py fails because:
- It uses
https://console.anthropic.com/v1/oauth/token with application/x-www-form-urlencoded (line 231)
- The actual working endpoint is
https://platform.claude.com/v1/oauth/token with application/json content type (confirmed by reading Claude Code's own cli.js source)
- The refresh call returns HTTP 500, so no auto-refresh ever succeeds
- Gateway continues failing on every subsequent message with no self-healing
- Requires full manual re-authentication (browser-based OAuth flow) + gateway restart
Expected Behavior
- Token refresh should work automatically using the correct endpoint and format
- Gateway should detect persistent auth failures and attempt credential refresh proactively
- On a headless server, the OAuth flow should be manageable without requiring repeated browser interaction
Root Cause Analysis
1. Wrong token endpoint for refresh (bug)
_refresh_oauth_token() in agent/anthropic_adapter.py line 231:
req = urllib.request.Request(
"https://console.anthropic.com/v1/oauth/token", # WRONG
data=data,
headers={
"Content-Type": "application/x-www-form-urlencoded", # WRONG for auth code exchange
},
)
Claude Code's own source (cli.js) uses:
TOKEN_URL: "https://platform.claude.com/v1/oauth/token"
// Uses application/json for token exchange
The refresh token grant may work with form-urlencoded on console.anthropic.com for some token types, but returns HTTP 500 for tokens obtained via the Claude Code OAuth flow.
2. No startup validation (design gap)
When ANTHROPIC_API_KEY is an OAuth token (sk-ant-oat...) with no ~/.claude/.credentials.json present, the gateway starts normally and only fails hours later when the token expires. There should be a startup warning: "OAuth token detected without refresh credentials — token will expire and cannot be auto-renewed."
3. Gateway doesn't recover from persistent 401s (design gap)
When auth fails, the gateway logs the error and sends an error message to the user, but makes no attempt to:
- Re-read credentials from disk
- Attempt token refresh
- Alert the operator via a different channel
- Enter a degraded mode that retries periodically
Each subsequent message hits the same expired token and fails identically.
4. claude auth login is impractical on headless servers
The only way to get a refresh token is claude auth login, which tries to open a browser. On a headless server, this requires manual workaround (generate PKCE challenge, open URL on another machine, paste code back). The token exchange itself requires knowledge of the correct endpoint (platform.claude.com, not console.anthropic.com) and format (application/json with state parameter).
Suggested Fixes
-
Fix the refresh token endpoint — Update _refresh_oauth_token() to use https://platform.claude.com/v1/oauth/token and try both application/json and application/x-www-form-urlencoded content types for resilience.
-
Startup credential validation — At gateway start, if the resolved token is an OAuth token, check for a valid refresh token source. Warn loudly if none exists.
-
Gateway auth failure recovery — On 401 errors, attempt to refresh credentials from ~/.claude/.credentials.json before returning the error to the user. If multiple consecutive 401s occur, log a prominent warning.
-
Headless OAuth support — Provide a hermes auth login --headless command that handles the PKCE flow end-to-end, printing the URL and accepting the code via stdin, without requiring claude CLI's browser-opening behavior.
Logs
2026-03-25 03:47:47,853 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 03:48:14,166 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 03:48:57,657 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'invalid x-api-key'}}
2026-03-25 13:05:49,431 ERROR root: Non-retryable client error: Error code: 401 - {'type': 'error', 'error': {'type': 'authentication_error', 'message': 'Invalid authentication credentials'}}
2026-03-25 13:20:50,585 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:35:51,743 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:50:52,931 ERROR root: Non-retryable client error: Error code: 401
2026-03-25 13:58:20,929 ERROR root: Non-retryable client error: Error code: 401
Related
Summary
When running Hermes with Anthropic Max (OAuth
sk-ant-oattokens) on a headless server via the gateway, the token expires after ~8 hours and auto-refresh fails silently, causing persistent 401 errors with no recovery path. The gateway continues serving error responses to all connected chat platforms until manually restarted with a fresh token.This does not happen with other OAuth-based agent frameworks (e.g., OpenClaw) using the same Anthropic Max account and token structure.
Environment
anthropic(native Messages API)sk-ant-oat...)Steps to Reproduce
model.provider: anthropicANTHROPIC_API_KEYorCLAUDE_CODE_OAUTH_TOKENin~/.hermes/.env)claude auth loginto populate~/.claude/.credentials.jsonwith refresh token (requires manual browser interaction — already painful on headless)Actual Behavior
{'type': 'authentication_error', 'message': 'invalid x-api-key'}claude /login"_refresh_oauth_token()inagent/anthropic_adapter.pyfails because:https://console.anthropic.com/v1/oauth/tokenwithapplication/x-www-form-urlencoded(line 231)https://platform.claude.com/v1/oauth/tokenwithapplication/jsoncontent type (confirmed by reading Claude Code's owncli.jssource)Expected Behavior
Root Cause Analysis
1. Wrong token endpoint for refresh (bug)
_refresh_oauth_token()inagent/anthropic_adapter.pyline 231:Claude Code's own source (
cli.js) uses:The refresh token grant may work with form-urlencoded on
console.anthropic.comfor some token types, but returns HTTP 500 for tokens obtained via the Claude Code OAuth flow.2. No startup validation (design gap)
When
ANTHROPIC_API_KEYis an OAuth token (sk-ant-oat...) with no~/.claude/.credentials.jsonpresent, the gateway starts normally and only fails hours later when the token expires. There should be a startup warning: "OAuth token detected without refresh credentials — token will expire and cannot be auto-renewed."3. Gateway doesn't recover from persistent 401s (design gap)
When auth fails, the gateway logs the error and sends an error message to the user, but makes no attempt to:
Each subsequent message hits the same expired token and fails identically.
4.
claude auth loginis impractical on headless serversThe only way to get a refresh token is
claude auth login, which tries to open a browser. On a headless server, this requires manual workaround (generate PKCE challenge, open URL on another machine, paste code back). The token exchange itself requires knowledge of the correct endpoint (platform.claude.com, notconsole.anthropic.com) and format (application/jsonwithstateparameter).Suggested Fixes
Fix the refresh token endpoint — Update
_refresh_oauth_token()to usehttps://platform.claude.com/v1/oauth/tokenand try bothapplication/jsonandapplication/x-www-form-urlencodedcontent types for resilience.Startup credential validation — At gateway start, if the resolved token is an OAuth token, check for a valid refresh token source. Warn loudly if none exists.
Gateway auth failure recovery — On 401 errors, attempt to refresh credentials from
~/.claude/.credentials.jsonbefore returning the error to the user. If multiple consecutive 401s occur, log a prominent warning.Headless OAuth support — Provide a
hermes auth login --headlesscommand that handles the PKCE flow end-to-end, printing the URL and accepting the code via stdin, without requiringclaudeCLI's browser-opening behavior.Logs
Related
resolve_anthropic_token()code path)