fix(anthropic): sanitize oauth system prompt for Claude Max proxy by tymrtn · Pull Request #10576 · NousResearch/hermes-agent

tymrtn · 2026-04-16T00:01:27Z

Summary

sanitize Hermes system prompt text on the Anthropic OAuth / Claude Code path
avoid known brittle wording that can trigger false extra usage failures on Claude Max proxy/subscription routes
keep the existing Claude Code identity/tool prefix behavior, but route text through a dedicated sanitizer helper

Root cause

On Tyler's machine, the Claude Max proxy path was healthy and minimal Anthropic Messages requests succeeded, but the full Hermes-assembled system prompt triggered:

HTTP 400: You're out of extra usage. Add more at claude.ai/settings/usage and keep going.

The failure was request-shape sensitive:

minimal user-only request: works
Claude Code identity only: works
tools only: works
full Hermes system block: fails

Local reproduction showed the trigger lives in the assembled Hermes system prompt on the OAuth path, not in the proxy transport itself.

Validation

restored local Claude Max proxy on 127.0.0.1:18801
verified minimal Anthropic SDK request through the same proxy/auth works
verified full Hermes/Skippy local turn succeeds again after this sanitizer change

Closes #10575
Relates to #6475

ceo94HEHE · 2026-04-19T12:20:48Z

Heads up — the scrubber here covers session_search and skill_manage, which is enough for the bare CLI path. The gateway path (Telegram/Discord) injects an additional system block containing the literal string MEDIA: (file-delivery instructions), which independently trips the same misclassification. Worth adding MEDIA: to the replacement list so the patch fixes both surfaces. Full repro + verification in #10575 (comment).

ShinonKagura · 2026-04-24T11:02:43Z

Applied this PR locally and verified via a post-sanitize dump that the
rewrites reach the wire — thanks for the clean fix!

One suggestion for the replacements list: on my account, the skill
catalogue rendered into the system prompt still contained red-team-
adjacent terms that appear to influence the same classifier. After
adding these additional rewrites locally, the dump came out clean of
obvious flag-triggers:

("Jailbreak", "Bypass"),                                                                                                           
("jailbreak", "bypass"),                                                
("G0DM0D3", "PRIVILEGE"),                                                                                                          
("godmode:", "privilege-mode:"),                                        
("Remove refusal behaviors", "adjust safety heuristics"),
("refusal behaviors", "safety heuristics"),
("obliteratus:", "adjust-tuning:"),                                                                                                
("OBLITERAT", "ADJUST"),                
("red-teaming", "security-testing"),                                                                                               
                                                                                                                                   
These terms appear naturally in Hermes's <available_skills> block
(skill names under red-teaming/, skill descriptions mentioning                                                                     
jailbreak/refusal-removal tooling) and would trigger content filters    
even with otherwise neutral prompts.                                                                                               
                                        
Context: I'm debugging a related but orthogonal tools-parameter                                                                    
trigger on the same classifier in #15080 — adding these to your                                                                    
sanitizer would cover more of the system-prompt surface area
regardless of how that separate discussion goes.

alt-glitch · 2026-04-25T23:41:11Z

Fix for #10575. Also addresses the same class of issue reported in #15080 (Claude Max 20x OAuth HTTP 400).

teknium1 · 2026-05-16T08:52:42Z

Thanks for the careful diagnosis @tymrtn — the prompt-shape sensitivity finding is real and well-documented.

We're not going to ship this upstream though. This is the same class of issue as #6475, which we closed previously: the Claude Max / subscription OAuth path's "out of extra usage" response is Anthropic's server-side classification of non-Claude-Code-shaped traffic, not a Hermes bug. Shipping a scrubber that rewrites Hermes-specific tool/feature names (session_search, skill_manage, MEDIA:, etc.) so requests look more like Claude Code is something we don't want to maintain in tree — it's an adversarial treadmill against Anthropic's classifier (every new tool or skill name potentially needs a scrubber entry) and pushes the existing minimal product-name swap further than we're comfortable with on a policy basis.

If you're hitting this on the OAuth/Max path, the supported paths are:

use Anthropic API billing (extra usage credits, ~30% off API price per Teknium's note on Anthropic Claude subscription auth returns 'You're out of extra usage' in Hermes even after restart/re-login #6475), or
keep a local patch like this one in your own fork.

Credit for the root-cause analysis stays in the issue thread for anyone landing here from search. Closing in favor of the policy stance set on #6475.

fix(anthropic): sanitize oauth system prompt for Claude Max proxy

f8fcba9

tymrtn mentioned this pull request Apr 16, 2026

[Bug] Anthropic OAuth/Claude Max proxy path can misclassify full Hermes system prompt as extra-usage exhausted #10575

Closed

alt-glitch mentioned this pull request Apr 21, 2026

fix(anthropic): move oversized OAuth system prompt to user prefix #13611

Open

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround provider/anthropic Anthropic native Messages API area/auth Authentication, OAuth, credential pools labels Apr 25, 2026

alt-glitch mentioned this pull request Apr 27, 2026

fix(agent): support Claude Code OAuth subscription route #16692

Open

alt-glitch mentioned this pull request May 6, 2026

Error code: 400 | "You're out of extra usage. Add more at claude.ai/settings/usage and keep going." #20732

Closed

LeonSGP43 mentioned this pull request May 7, 2026

fix(anthropic): classify OAuth tool-use overage failures #21019

Open

teknium1 closed this May 16, 2026

alt-glitch mentioned this pull request May 19, 2026

OAuth Anthropic Max: <available_skills> system-prompt injection on every turn triggers "out of extra usage" 400 when skills/session_search toolsets are active #28902

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576

fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576
tymrtn wants to merge 1 commit into
NousResearch:mainfrom
tymrtn:fix/anthropic-oauth-prompt-sanitizer

tymrtn commented Apr 16, 2026

Uh oh!

ceo94HEHE commented Apr 19, 2026

Uh oh!

ShinonKagura commented Apr 24, 2026

Uh oh!

alt-glitch commented Apr 25, 2026

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tymrtn commented Apr 16, 2026

Summary

Root cause

Validation

Uh oh!

ceo94HEHE commented Apr 19, 2026

Uh oh!

ShinonKagura commented Apr 24, 2026

Uh oh!

alt-glitch commented Apr 25, 2026

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants