Summary
As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.
The real claude CLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.
Environment
- Hermes Agent: latest (post
hermes update 2026-04-28)
- macOS arm64
- Anthropic provider, OAuth subscription token (sk-ant-oat01-…)
- Account: Claude Max 20x, organizationRateLimitTier
default_claude_max_20x, hasExtraUsageEnabled: true
- Anthropic status page: All Systems Operational at time of testing
Reproduction
Same OAuth token, two requests run within seconds of each other.
Hermes-shaped request (curl) — 429:
TOKEN=$(security find-generic-password -s "Claude Code-credentials" -w | jq -r .claudeAiOauth.accessToken)
curl -sD - -X POST 'https://api.anthropic.com/v1/messages?beta=true' \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,advanced-tool-use-2025-11-20,context-1m-2025-08-07,effort-2025-11-24,cache-diagnosis-2026-04-07" \
-H "anthropic-dangerous-direct-browser-access: true" \
-H "user-agent: claude-cli/2.1.122 (external, sdk-cli)" \
-H "x-app: cli" \
-H "x-claude-code-session-id: $(uuidgen)" \
-H "x-client-request-id: $(uuidgen)" \
-H "x-stainless-arch: arm64" \
-H "x-stainless-lang: js" \
-H "x-stainless-os: MacOS" \
-H "x-stainless-package-version: 0.81.0" \
-H "x-stainless-runtime: node" \
-H "x-stainless-runtime-version: v24.3.0" \
-d '{"model":"claude-sonnet-4-6","max_tokens":10,"system":"x","messages":[{"role":"user","content":"hi"}]}'
Result: HTTP 429, body {"type":"error","error":{"type":"rate_limit_error","message":"Error"},"request_id":"req_011CaWyhtmFS2ob5td4b1mmj"}. Response has no anthropic-ratelimit-* headers — only generic Cloudflare headers and x-should-retry: true.
Real claude CLI — 200:
claude -p "hi" --model claude-sonnet-4-6
# → "<friendly assistant reply>"
Same token. Same model. Same machine. Within seconds.
The successful response shows the account has plenty of quota:
anthropic-ratelimit-unified-5h-utilization: 0.09
anthropic-ratelimit-unified-7d-utilization: 0.16
anthropic-ratelimit-unified-7d_sonnet-utilization: 0.20
anthropic-ratelimit-unified-overage-disabled-reason: org_level_disabled_until
anthropic-ratelimit-unified-overage-status: rejected
So overage is disabled at org level (which is fine — base quota is 80% available), but the underlying gate is something else.
What's identical between real-claude and Hermes-spoofed
- Same OAuth token
- Same
?beta=true URL
- All
anthropic-beta values match
anthropic-dangerous-direct-browser-access: true
user-agent: claude-cli/2.1.122 (external, sdk-cli)
- All
x-stainless-* values match (arch, lang, os, package-version 0.81.0, runtime, runtime-version v24.3.0)
x-app: cli
- Synthetic
x-client-request-id and x-claude-code-session-id UUIDs
What's different
Things real claude sends that Hermes doesn't:
- Body shape: real claude includes
metadata, output_config, thinking, context_management, diagnostics top-level fields. Hermes sends only model, messages, system, tools, max_tokens.
- TLS fingerprint: curl/openssl vs Bun/Node — different JA3/JA4 likely.
- Streaming: real claude uses
stream: true always.
Hypothesis
Anthropic added a new enforcement layer for Sonnet/Opus on subscription OAuth, separate from the existing prompt-text content filter. It probably keys on either TLS fingerprint or required body fields (most likely the structured metadata.user_id / output_config / context_management fields that real Claude Code adds).
Affected
- Hermes Anthropic native provider (
agent/anthropic_adapter.py's build_anthropic_client)
- Any user on Claude Max / Claude Pro OAuth selecting Sonnet 4.5+ or Opus 4.6+ as primary or fallback
- Telegram, Discord, webui, api_server — all platforms
Workaround
Switch the primary model to claude-haiku-4-5-20251001 (Haiku is unaffected). Sonnet/Opus can stay in fallback chain but they'll always 429 until this is fixed.
What might fix it
- Have
build_anthropic_client always emit the body fields that real Claude Code emits: metadata: {user_id: <hashed-account-uuid>}, output_config: {...}, thinking: {type: "adaptive"}, context_management: {...}, plus stream: true by default.
- If the gate is TLS-level, the SDK already uses Node's https stack — should match. But if Anthropic is fingerprinting handshake details specific to Bun, that's harder.
Happy to provide gateway logs, full request dumps, or run additional diagnostics. Multiple request_ids above can be cross-referenced server-side.
Summary
As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return
HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}}with noanthropic-ratelimit-*response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.The real
claudeCLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.Environment
hermes update2026-04-28)default_claude_max_20x, hasExtraUsageEnabled: trueReproduction
Same OAuth token, two requests run within seconds of each other.
Hermes-shaped request (curl) — 429:
Result:
HTTP 429, body{"type":"error","error":{"type":"rate_limit_error","message":"Error"},"request_id":"req_011CaWyhtmFS2ob5td4b1mmj"}. Response has noanthropic-ratelimit-*headers — only generic Cloudflare headers andx-should-retry: true.Real
claudeCLI — 200:Same token. Same model. Same machine. Within seconds.
The successful response shows the account has plenty of quota:
anthropic-ratelimit-unified-5h-utilization: 0.09anthropic-ratelimit-unified-7d-utilization: 0.16anthropic-ratelimit-unified-7d_sonnet-utilization: 0.20anthropic-ratelimit-unified-overage-disabled-reason: org_level_disabled_untilanthropic-ratelimit-unified-overage-status: rejectedSo overage is disabled at org level (which is fine — base quota is 80% available), but the underlying gate is something else.
What's identical between real-claude and Hermes-spoofed
?beta=trueURLanthropic-betavalues matchanthropic-dangerous-direct-browser-access: trueuser-agent: claude-cli/2.1.122 (external, sdk-cli)x-stainless-*values match (arch, lang, os, package-version 0.81.0, runtime, runtime-version v24.3.0)x-app: clix-client-request-idandx-claude-code-session-idUUIDsWhat's different
Things real claude sends that Hermes doesn't:
metadata,output_config,thinking,context_management,diagnosticstop-level fields. Hermes sends onlymodel,messages,system,tools,max_tokens.stream: truealways.Hypothesis
Anthropic added a new enforcement layer for Sonnet/Opus on subscription OAuth, separate from the existing prompt-text content filter. It probably keys on either TLS fingerprint or required body fields (most likely the structured
metadata.user_id/output_config/context_managementfields that real Claude Code adds).Affected
agent/anthropic_adapter.py'sbuild_anthropic_client)Workaround
Switch the primary model to
claude-haiku-4-5-20251001(Haiku is unaffected). Sonnet/Opus can stay in fallback chain but they'll always 429 until this is fixed.What might fix it
build_anthropic_clientalways emit the body fields that real Claude Code emits:metadata: {user_id: <hashed-account-uuid>},output_config: {...},thinking: {type: "adaptive"},context_management: {...}, plusstream: trueby default.Happy to provide gateway logs, full request dumps, or run additional diagnostics. Multiple
request_ids above can be cross-referenced server-side.