Skip to content

fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576

Closed
tymrtn wants to merge 1 commit into
NousResearch:mainfrom
tymrtn:fix/anthropic-oauth-prompt-sanitizer
Closed

fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576
tymrtn wants to merge 1 commit into
NousResearch:mainfrom
tymrtn:fix/anthropic-oauth-prompt-sanitizer

Conversation

@tymrtn

@tymrtn tymrtn commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • sanitize Hermes system prompt text on the Anthropic OAuth / Claude Code path
  • avoid known brittle wording that can trigger false extra usage failures on Claude Max proxy/subscription routes
  • keep the existing Claude Code identity/tool prefix behavior, but route text through a dedicated sanitizer helper

Root cause

On Tyler's machine, the Claude Max proxy path was healthy and minimal Anthropic Messages requests succeeded, but the full Hermes-assembled system prompt triggered:

HTTP 400: You're out of extra usage. Add more at claude.ai/settings/usage and keep going.

The failure was request-shape sensitive:

  • minimal user-only request: works
  • Claude Code identity only: works
  • tools only: works
  • full Hermes system block: fails

Local reproduction showed the trigger lives in the assembled Hermes system prompt on the OAuth path, not in the proxy transport itself.

Validation

  • restored local Claude Max proxy on 127.0.0.1:18801
  • verified minimal Anthropic SDK request through the same proxy/auth works
  • verified full Hermes/Skippy local turn succeeds again after this sanitizer change

Closes #10575
Relates to #6475

@ceo94HEHE

Copy link
Copy Markdown

Heads up — the scrubber here covers session_search and skill_manage, which is enough for the bare CLI path. The gateway path (Telegram/Discord) injects an additional system block containing the literal string MEDIA: (file-delivery instructions), which independently trips the same misclassification. Worth adding MEDIA: to the replacement list so the patch fixes both surfaces. Full repro + verification in #10575 (comment).

@ShinonKagura

Copy link
Copy Markdown

Applied this PR locally and verified via a post-sanitize dump that the
rewrites reach the wire — thanks for the clean fix!

One suggestion for the replacements list: on my account, the skill
catalogue rendered into the system prompt still contained red-team-
adjacent terms that appear to influence the same classifier. After
adding these additional rewrites locally, the dump came out clean of
obvious flag-triggers:

("Jailbreak", "Bypass"),                                                                                                           
("jailbreak", "bypass"),                                                
("G0DM0D3", "PRIVILEGE"),                                                                                                          
("godmode:", "privilege-mode:"),                                        
("Remove refusal behaviors", "adjust safety heuristics"),
("refusal behaviors", "safety heuristics"),
("obliteratus:", "adjust-tuning:"),                                                                                                
("OBLITERAT", "ADJUST"),                
("red-teaming", "security-testing"),                                                                                               
                                                                                                                                   
These terms appear naturally in Hermes's <available_skills> block
(skill names under red-teaming/, skill descriptions mentioning                                                                     
jailbreak/refusal-removal tooling) and would trigger content filters    
even with otherwise neutral prompts.                                                                                               
                                        
Context: I'm debugging a related but orthogonal tools-parameter                                                                    
trigger on the same classifier in #15080 — adding these to your                                                                    
sanitizer would cover more of the system-prompt surface area
regardless of how that separate discussion goes.      

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround provider/anthropic Anthropic native Messages API area/auth Authentication, OAuth, credential pools labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Fix for #10575. Also addresses the same class of issue reported in #15080 (Claude Max 20x OAuth HTTP 400).

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the careful diagnosis @tymrtn — the prompt-shape sensitivity finding is real and well-documented.

We're not going to ship this upstream though. This is the same class of issue as #6475, which we closed previously: the Claude Max / subscription OAuth path's "out of extra usage" response is Anthropic's server-side classification of non-Claude-Code-shaped traffic, not a Hermes bug. Shipping a scrubber that rewrites Hermes-specific tool/feature names (session_search, skill_manage, MEDIA:, etc.) so requests look more like Claude Code is something we don't want to maintain in tree — it's an adversarial treadmill against Anthropic's classifier (every new tool or skill name potentially needs a scrubber entry) and pushes the existing minimal product-name swap further than we're comfortable with on a policy basis.

If you're hitting this on the OAuth/Max path, the supported paths are:

Credit for the root-cause analysis stays in the issue thread for anyone landing here from search. Closing in favor of the policy stance set on #6475.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools P1 High — major feature broken, no workaround provider/anthropic Anthropic native Messages API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Anthropic OAuth/Claude Max proxy path can misclassify full Hermes system prompt as extra-usage exhausted

5 participants