fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576
fix(anthropic): sanitize oauth system prompt for Claude Max proxy#10576tymrtn wants to merge 1 commit into
Conversation
|
Heads up — the scrubber here covers |
|
Applied this PR locally and verified via a post-sanitize dump that the One suggestion for the ("Jailbreak", "Bypass"),
("jailbreak", "bypass"),
("G0DM0D3", "PRIVILEGE"),
("godmode:", "privilege-mode:"),
("Remove refusal behaviors", "adjust safety heuristics"),
("refusal behaviors", "safety heuristics"),
("obliteratus:", "adjust-tuning:"),
("OBLITERAT", "ADJUST"),
("red-teaming", "security-testing"),
These terms appear naturally in Hermes's <available_skills> block
(skill names under red-teaming/, skill descriptions mentioning
jailbreak/refusal-removal tooling) and would trigger content filters
even with otherwise neutral prompts.
Context: I'm debugging a related but orthogonal tools-parameter
trigger on the same classifier in #15080 — adding these to your
sanitizer would cover more of the system-prompt surface area
regardless of how that separate discussion goes. |
|
Thanks for the careful diagnosis @tymrtn — the prompt-shape sensitivity finding is real and well-documented. We're not going to ship this upstream though. This is the same class of issue as #6475, which we closed previously: the Claude Max / subscription OAuth path's "out of extra usage" response is Anthropic's server-side classification of non-Claude-Code-shaped traffic, not a Hermes bug. Shipping a scrubber that rewrites Hermes-specific tool/feature names ( If you're hitting this on the OAuth/Max path, the supported paths are:
Credit for the root-cause analysis stays in the issue thread for anyone landing here from search. Closing in favor of the policy stance set on #6475. |
Summary
extra usagefailures on Claude Max proxy/subscription routesRoot cause
On Tyler's machine, the Claude Max proxy path was healthy and minimal Anthropic Messages requests succeeded, but the full Hermes-assembled system prompt triggered:
HTTP 400: You're out of extra usage. Add more at claude.ai/settings/usage and keep going.The failure was request-shape sensitive:
Local reproduction showed the trigger lives in the assembled Hermes system prompt on the OAuth path, not in the proxy transport itself.
Validation
127.0.0.1:18801Closes #10575
Relates to #6475