Skip to content

Sonnet 4.6 / Opus 4.7 return generic 429 on Claude Max OAuth while Haiku 4.5 succeeds — appears to be new Anthropic-side enforcement beyond header inspection #17169

@sean808080

Description

@sean808080

Summary

As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.

The real claude CLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.

Environment

  • Hermes Agent: latest (post hermes update 2026-04-28)
  • macOS arm64
  • Anthropic provider, OAuth subscription token (sk-ant-oat01-…)
  • Account: Claude Max 20x, organizationRateLimitTier default_claude_max_20x, hasExtraUsageEnabled: true
  • Anthropic status page: All Systems Operational at time of testing

Reproduction

Same OAuth token, two requests run within seconds of each other.

Hermes-shaped request (curl) — 429:

TOKEN=$(security find-generic-password -s "Claude Code-credentials" -w | jq -r .claudeAiOauth.accessToken)
curl -sD - -X POST 'https://api.anthropic.com/v1/messages?beta=true' \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,advanced-tool-use-2025-11-20,context-1m-2025-08-07,effort-2025-11-24,cache-diagnosis-2026-04-07" \
  -H "anthropic-dangerous-direct-browser-access: true" \
  -H "user-agent: claude-cli/2.1.122 (external, sdk-cli)" \
  -H "x-app: cli" \
  -H "x-claude-code-session-id: $(uuidgen)" \
  -H "x-client-request-id: $(uuidgen)" \
  -H "x-stainless-arch: arm64" \
  -H "x-stainless-lang: js" \
  -H "x-stainless-os: MacOS" \
  -H "x-stainless-package-version: 0.81.0" \
  -H "x-stainless-runtime: node" \
  -H "x-stainless-runtime-version: v24.3.0" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":10,"system":"x","messages":[{"role":"user","content":"hi"}]}'

Result: HTTP 429, body {"type":"error","error":{"type":"rate_limit_error","message":"Error"},"request_id":"req_011CaWyhtmFS2ob5td4b1mmj"}. Response has no anthropic-ratelimit-* headers — only generic Cloudflare headers and x-should-retry: true.

Real claude CLI — 200:

claude -p "hi" --model claude-sonnet-4-6
# → "<friendly assistant reply>"

Same token. Same model. Same machine. Within seconds.

The successful response shows the account has plenty of quota:

  • anthropic-ratelimit-unified-5h-utilization: 0.09
  • anthropic-ratelimit-unified-7d-utilization: 0.16
  • anthropic-ratelimit-unified-7d_sonnet-utilization: 0.20
  • anthropic-ratelimit-unified-overage-disabled-reason: org_level_disabled_until
  • anthropic-ratelimit-unified-overage-status: rejected

So overage is disabled at org level (which is fine — base quota is 80% available), but the underlying gate is something else.

What's identical between real-claude and Hermes-spoofed

  • Same OAuth token
  • Same ?beta=true URL
  • All anthropic-beta values match
  • anthropic-dangerous-direct-browser-access: true
  • user-agent: claude-cli/2.1.122 (external, sdk-cli)
  • All x-stainless-* values match (arch, lang, os, package-version 0.81.0, runtime, runtime-version v24.3.0)
  • x-app: cli
  • Synthetic x-client-request-id and x-claude-code-session-id UUIDs

What's different

Things real claude sends that Hermes doesn't:

  1. Body shape: real claude includes metadata, output_config, thinking, context_management, diagnostics top-level fields. Hermes sends only model, messages, system, tools, max_tokens.
  2. TLS fingerprint: curl/openssl vs Bun/Node — different JA3/JA4 likely.
  3. Streaming: real claude uses stream: true always.

Hypothesis

Anthropic added a new enforcement layer for Sonnet/Opus on subscription OAuth, separate from the existing prompt-text content filter. It probably keys on either TLS fingerprint or required body fields (most likely the structured metadata.user_id / output_config / context_management fields that real Claude Code adds).

Affected

  • Hermes Anthropic native provider (agent/anthropic_adapter.py's build_anthropic_client)
  • Any user on Claude Max / Claude Pro OAuth selecting Sonnet 4.5+ or Opus 4.6+ as primary or fallback
  • Telegram, Discord, webui, api_server — all platforms

Workaround

Switch the primary model to claude-haiku-4-5-20251001 (Haiku is unaffected). Sonnet/Opus can stay in fallback chain but they'll always 429 until this is fixed.

What might fix it

  1. Have build_anthropic_client always emit the body fields that real Claude Code emits: metadata: {user_id: <hashed-account-uuid>}, output_config: {...}, thinking: {type: "adaptive"}, context_management: {...}, plus stream: true by default.
  2. If the gate is TLS-level, the SDK already uses Node's https stack — should match. But if Anthropic is fingerprinting handshake details specific to Bun, that's harder.

Happy to provide gateway logs, full request dumps, or run additional diagnostics. Multiple request_ids above can be cross-referenced server-side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/authAuthentication, OAuth, credential poolsprovider/anthropicAnthropic native Messages APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions