chore(skills): move red-team skills (godmode, obliteratus) to optional-skills — Anthropic classifier#43221
Merged
Merged
Conversation
…dled catalog Anthropic's output classifier on claude-fable-5 (and likely other Claude models served through it) intermittently returns empty content for sessions whose system prompt advertises these skills. The bundled skills-catalog block is injected into every session's system prompt, so the descriptions - red-teaming/godmode 'Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN' - mlops/inference/obliteratus 'OBLITERATUS: abliterate LLM refusals (diff-in-means)' trip the classifier on EVERY session regardless of which skill is actually loaded, killing unrelated legitimate work (PR review, codebase audits, etc.). Measured impact (controlled, interleaved A/B, claude-fable-5 via OpenRouter, prompts differing only by the ~204 chars of these catalog lines, N=20 each): catalog lines present -> 19/20 (95%) blocked catalog lines absent -> 5/20 (25%) blocked Removing them ~quartered the block rate. Rewording the descriptions was not enough; the skills must leave the bundled catalog. - Delete skills/red-teaming/godmode and skills/mlops/inference/obliteratus - Drop their generated doc pages + catalog/sidebar entries (EN + zh-Hans) - Drop the godmode hand-written-page exception in generate-skill-docs.py
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-reference |
2 |
call-non-callable |
2 |
unresolved-import |
2 |
unresolved-attribute |
1 |
invalid-parameter-default |
1 |
invalid-argument-type |
1 |
First entries
optional-skills/security/godmode/scripts/auto_jailbreak.py:527: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[str]` in union `list[str] | dict[str, str] | dict[Unknown, Unknown]`
optional-skills/security/godmode/scripts/auto_jailbreak.py:383: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `str`
optional-skills/security/godmode/scripts/parseltongue.py:475: [invalid-argument-type] invalid-argument-type: Argument to function `escape` is incorrect: Argument type `Sized` does not satisfy constraints (`str`, `bytes`) of type variable `AnyStr`
optional-skills/security/godmode/scripts/auto_jailbreak.py:620: [unresolved-reference] unresolved-reference: Name `score_response` used when not defined
optional-skills/security/godmode/scripts/parseltongue.py:520: [call-non-callable] call-non-callable: Object of type `str` is not callable
optional-skills/security/godmode/scripts/auto_jailbreak.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
optional-skills/security/godmode/scripts/parseltongue.py:476: [call-non-callable] call-non-callable: Object of type `int` is not callable
optional-skills/security/godmode/scripts/auto_jailbreak.py:540: [unresolved-reference] unresolved-reference: Name `escalate_encoding` used when not defined
optional-skills/security/godmode/scripts/godmode_race.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
✅ Fixed issues (9):
| Rule | Count |
|---|---|
call-non-callable |
2 |
unresolved-import |
2 |
unresolved-reference |
2 |
invalid-parameter-default |
1 |
invalid-argument-type |
1 |
unresolved-attribute |
1 |
First entries
skills/red-teaming/godmode/scripts/parseltongue.py:476: [call-non-callable] call-non-callable: Object of type `int` is not callable
skills/red-teaming/godmode/scripts/godmode_race.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:383: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `str`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:620: [unresolved-reference] unresolved-reference: Name `score_response` used when not defined
skills/red-teaming/godmode/scripts/parseltongue.py:475: [invalid-argument-type] invalid-argument-type: Argument to function `escape` is incorrect: Argument type `Sized` does not satisfy constraints (`str`, `bytes`) of type variable `AnyStr`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:527: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[str]` in union `list[str] | dict[str, str] | dict[Unknown, Unknown]`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:540: [unresolved-reference] unresolved-reference: Name `escalate_encoding` used when not defined
skills/red-teaming/godmode/scripts/parseltongue.py:520: [call-non-callable] call-non-callable: Object of type `str` is not callable
Unchanged: 5555 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
tonydwb
approved these changes
Jun 10, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved
Looks Good
- Clean removal of red-team skills (godmode, obliteratus) that were tripping Anthropic policy.
- The 5506-line deletion across 25 files represents complete removal of the skill directories.
- The small +2 lines likely represent any remaining SKILL.md or metadata cleanup.
- No sensitive content left behind.
- This is a policy compliance cleanup.
Reviewed by Hermes Agent
Rather than deleting outright, move both into optional-skills/ so they remain installable via `hermes skills install` while leaving the always-injected bundled catalog (which is what tripped Anthropic's classifier). - optional-skills/security/godmode (was skills/red-teaming/godmode) - optional-skills/mlops/obliteratus (was skills/mlops/inference/obliteratus) - regenerate optional-skills catalog + sidebar entries
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…l-skills — Anthropic classifier (NousResearch#43221) * chore(skills): remove red-team skills (godmode, obliteratus) from bundled catalog Anthropic's output classifier on claude-fable-5 (and likely other Claude models served through it) intermittently returns empty content for sessions whose system prompt advertises these skills. The bundled skills-catalog block is injected into every session's system prompt, so the descriptions - red-teaming/godmode 'Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN' - mlops/inference/obliteratus 'OBLITERATUS: abliterate LLM refusals (diff-in-means)' trip the classifier on EVERY session regardless of which skill is actually loaded, killing unrelated legitimate work (PR review, codebase audits, etc.). Measured impact (controlled, interleaved A/B, claude-fable-5 via OpenRouter, prompts differing only by the ~204 chars of these catalog lines, N=20 each): catalog lines present -> 19/20 (95%) blocked catalog lines absent -> 5/20 (25%) blocked Removing them ~quartered the block rate. Rewording the descriptions was not enough; the skills must leave the bundled catalog. - Delete skills/red-teaming/godmode and skills/mlops/inference/obliteratus - Drop their generated doc pages + catalog/sidebar entries (EN + zh-Hans) - Drop the godmode hand-written-page exception in generate-skill-docs.py * chore(skills): relocate godmode + obliteratus to optional-skills Rather than deleting outright, move both into optional-skills/ so they remain installable via `hermes skills install` while leaving the always-injected bundled catalog (which is what tripped Anthropic's classifier). - optional-skills/security/godmode (was skills/red-teaming/godmode) - optional-skills/mlops/obliteratus (was skills/mlops/inference/obliteratus) - regenerate optional-skills catalog + sidebar entries
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Moves the two red-team skills (
godmode,obliteratus) out of the bundled catalog and intooptional-skills/, because their descriptions — injected into every session's system prompt via the bundled<available_skills>list — trip Anthropic's output classifier and intermittently kill unrelated work. They remain installable on demand.Root cause
The bundled
<available_skills>catalog is part of the system prompt of every session, regardless of which skill is actually loaded. Two entries read as jailbreak/abliteration tooling:red-teaming/godmode— "Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN"mlops/inference/obliteratus— "OBLITERATUS: abliterate LLM refusals (diff-in-means)"On
claude-fable-5(Anthropic, via OpenRouter), the output classifier sees these in context and returns empty content a large fraction of the time. The user-visible symptom is the agent dying with:This blocks legitimate day-to-day work (PR review, codebase audits, optimization sweeps) that has nothing to do with red-teaming — the skills just need to be listed in the always-injected catalog to do the damage.
optional-skills/entries are not in that catalog until a user explicitly installs them, so relocating fixes the trigger without removing the skills.Measured impact
Controlled, interleaved A/B (calls alternated so server-side classifier drift hits both arms equally). Same live context, same task, prompts differing only by the ~204 chars of these two catalog lines, N=20 each:
Removing them from the bundled catalog roughly quartered the block rate. Rewording the descriptions to neutral phrasing did not help — the skills have to leave the always-injected catalog. (Confirmed separately that the loaded
hermes-agent-devskill itself is inert: full-skill vs no-skill measured 7/20 == 7/20.)Changes
skills/red-teaming/godmode/→optional-skills/security/godmode/skills/mlops/inference/obliteratus/→optional-skills/mlops/obliteratus/godmodehand-written-page exception ingenerate-skill-docs.py(now an auto-generated optional page)Both skills stay fully available via
hermes skills install official/security/godmodeandofficial/mlops/obliteratus.Validation
generate-skill-docs.pyregenerated cleanly (170 skills; 2 moved bundled → optional)prefill_messagesconfig format (cli.py,cron/scheduler.py) mention "godmode-generated configs" — that's a config-format reference, not a skill dependency, left intact.Infographic