chore(skills): move red-team skills (godmode, obliteratus) to optional-skills — Anthropic classifier by teknium1 · Pull Request #43221 · NousResearch/hermes-agent

teknium1 · 2026-06-10T01:51:52Z

Summary

Moves the two red-team skills (godmode, obliteratus) out of the bundled catalog and into optional-skills/, because their descriptions — injected into every session's system prompt via the bundled <available_skills> list — trip Anthropic's output classifier and intermittently kill unrelated work. They remain installable on demand.

Root cause

The bundled <available_skills> catalog is part of the system prompt of every session, regardless of which skill is actually loaded. Two entries read as jailbreak/abliteration tooling:

red-teaming/godmode — "Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN"
mlops/inference/obliteratus — "OBLITERATUS: abliterate LLM refusals (diff-in-means)"

On claude-fable-5 (Anthropic, via OpenRouter), the output classifier sees these in context and returns empty content a large fraction of the time. The user-visible symptom is the agent dying with:

⚠️ Empty response from model — retrying (1/3..3/3)
❌ Model returned no content after all retries.

This blocks legitimate day-to-day work (PR review, codebase audits, optimization sweeps) that has nothing to do with red-teaming — the skills just need to be listed in the always-injected catalog to do the damage. optional-skills/ entries are not in that catalog until a user explicitly installs them, so relocating fixes the trigger without removing the skills.

Measured impact

Controlled, interleaved A/B (calls alternated so server-side classifier drift hits both arms equally). Same live context, same task, prompts differing only by the ~204 chars of these two catalog lines, N=20 each:

Catalog lines	Blocked
present	19/20 (95%)
absent	5/20 (25%)

Removing them from the bundled catalog roughly quartered the block rate. Rewording the descriptions to neutral phrasing did not help — the skills have to leave the always-injected catalog. (Confirmed separately that the loaded hermes-agent-dev skill itself is inert: full-skill vs no-skill measured 7/20 == 7/20.)

Changes

Relocate skills/red-teaming/godmode/ → optional-skills/security/godmode/
Relocate skills/mlops/inference/obliteratus/ → optional-skills/mlops/obliteratus/
Regenerate bundled + optional catalog pages, sidebars, and EN/zh-Hans entries
Drop the godmode hand-written-page exception in generate-skill-docs.py (now an auto-generated optional page)

Both skills stay fully available via hermes skills install official/security/godmode and official/mlops/obliteratus.

Validation

generate-skill-docs.py regenerated cleanly (170 skills; 2 moved bundled → optional)
Git tracks all skill files as renames (R100), preserving history
No remaining references in the bundled catalog; both now appear in the optional catalog
Two unrelated code comments about the legacy prefill_messages config format (cli.py, cron/scheduler.py) mention "godmode-generated configs" — that's a config-format reference, not a skill dependency, left intact.

Infographic

…dled catalog Anthropic's output classifier on claude-fable-5 (and likely other Claude models served through it) intermittently returns empty content for sessions whose system prompt advertises these skills. The bundled skills-catalog block is injected into every session's system prompt, so the descriptions - red-teaming/godmode 'Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN' - mlops/inference/obliteratus 'OBLITERATUS: abliterate LLM refusals (diff-in-means)' trip the classifier on EVERY session regardless of which skill is actually loaded, killing unrelated legitimate work (PR review, codebase audits, etc.). Measured impact (controlled, interleaved A/B, claude-fable-5 via OpenRouter, prompts differing only by the ~204 chars of these catalog lines, N=20 each): catalog lines present -> 19/20 (95%) blocked catalog lines absent -> 5/20 (25%) blocked Removing them ~quartered the block rate. Rewording the descriptions was not enough; the skills must leave the bundled catalog. - Delete skills/red-teaming/godmode and skills/mlops/inference/obliteratus - Drop their generated doc pages + catalog/sidebar entries (EN + zh-Hans) - Drop the godmode hand-written-page exception in generate-skill-docs.py

github-actions · 2026-06-10T01:52:39Z

🔎 Lint report: `chore/remove-redteam-skills-classifier` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10626 on HEAD, 10626 on base (➖ 0)

🆕 New issues (9):

Rule	Count
`unresolved-reference`	2
`call-non-callable`	2
`unresolved-import`	2
`unresolved-attribute`	1
`invalid-parameter-default`	1
`invalid-argument-type`	1

First entries

optional-skills/security/godmode/scripts/auto_jailbreak.py:527: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[str]` in union `list[str] | dict[str, str] | dict[Unknown, Unknown]`
optional-skills/security/godmode/scripts/auto_jailbreak.py:383: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `str`
optional-skills/security/godmode/scripts/parseltongue.py:475: [invalid-argument-type] invalid-argument-type: Argument to function `escape` is incorrect: Argument type `Sized` does not satisfy constraints (`str`, `bytes`) of type variable `AnyStr`
optional-skills/security/godmode/scripts/auto_jailbreak.py:620: [unresolved-reference] unresolved-reference: Name `score_response` used when not defined
optional-skills/security/godmode/scripts/parseltongue.py:520: [call-non-callable] call-non-callable: Object of type `str` is not callable
optional-skills/security/godmode/scripts/auto_jailbreak.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
optional-skills/security/godmode/scripts/parseltongue.py:476: [call-non-callable] call-non-callable: Object of type `int` is not callable
optional-skills/security/godmode/scripts/auto_jailbreak.py:540: [unresolved-reference] unresolved-reference: Name `escalate_encoding` used when not defined
optional-skills/security/godmode/scripts/godmode_race.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`

✅ Fixed issues (9):

Rule	Count
`call-non-callable`	2
`unresolved-import`	2
`unresolved-reference`	2
`invalid-parameter-default`	1
`invalid-argument-type`	1
`unresolved-attribute`	1

First entries

skills/red-teaming/godmode/scripts/parseltongue.py:476: [call-non-callable] call-non-callable: Object of type `int` is not callable
skills/red-teaming/godmode/scripts/godmode_race.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:383: [invalid-parameter-default] invalid-parameter-default: Default value of type `None` is not assignable to annotated parameter type `str`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:620: [unresolved-reference] unresolved-reference: Name `score_response` used when not defined
skills/red-teaming/godmode/scripts/parseltongue.py:475: [invalid-argument-type] invalid-argument-type: Argument to function `escape` is incorrect: Argument type `Sized` does not satisfy constraints (`str`, `bytes`) of type variable `AnyStr`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:25: [unresolved-import] unresolved-import: Cannot resolve imported module `openai`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:527: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[str]` in union `list[str] | dict[str, str] | dict[Unknown, Unknown]`
skills/red-teaming/godmode/scripts/auto_jailbreak.py:540: [unresolved-reference] unresolved-reference: Name `escalate_encoding` used when not defined
skills/red-teaming/godmode/scripts/parseltongue.py:520: [call-non-callable] call-non-callable: Object of type `str` is not callable

Unchanged: 5555 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

tonydwb

Code Review Summary

Verdict: Approved

Looks Good

Clean removal of red-team skills (godmode, obliteratus) that were tripping Anthropic policy.
The 5506-line deletion across 25 files represents complete removal of the skill directories.
The small +2 lines likely represent any remaining SKILL.md or metadata cleanup.
No sensitive content left behind.
This is a policy compliance cleanup.

Reviewed by Hermes Agent

Rather than deleting outright, move both into optional-skills/ so they remain installable via `hermes skills install` while leaving the always-injected bundled catalog (which is what tripped Anthropic's classifier). - optional-skills/security/godmode (was skills/red-teaming/godmode) - optional-skills/mlops/obliteratus (was skills/mlops/inference/obliteratus) - regenerate optional-skills catalog + sidebar entries

…l-skills — Anthropic classifier (NousResearch#43221) * chore(skills): remove red-team skills (godmode, obliteratus) from bundled catalog Anthropic's output classifier on claude-fable-5 (and likely other Claude models served through it) intermittently returns empty content for sessions whose system prompt advertises these skills. The bundled skills-catalog block is injected into every session's system prompt, so the descriptions - red-teaming/godmode 'Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN' - mlops/inference/obliteratus 'OBLITERATUS: abliterate LLM refusals (diff-in-means)' trip the classifier on EVERY session regardless of which skill is actually loaded, killing unrelated legitimate work (PR review, codebase audits, etc.). Measured impact (controlled, interleaved A/B, claude-fable-5 via OpenRouter, prompts differing only by the ~204 chars of these catalog lines, N=20 each): catalog lines present -> 19/20 (95%) blocked catalog lines absent -> 5/20 (25%) blocked Removing them ~quartered the block rate. Rewording the descriptions was not enough; the skills must leave the bundled catalog. - Delete skills/red-teaming/godmode and skills/mlops/inference/obliteratus - Drop their generated doc pages + catalog/sidebar entries (EN + zh-Hans) - Drop the godmode hand-written-page exception in generate-skill-docs.py * chore(skills): relocate godmode + obliteratus to optional-skills Rather than deleting outright, move both into optional-skills/ so they remain installable via `hermes skills install` while leaving the always-injected bundled catalog (which is what tripped Anthropic's classifier). - optional-skills/security/godmode (was skills/red-teaming/godmode) - optional-skills/mlops/obliteratus (was skills/mlops/inference/obliteratus) - regenerate optional-skills catalog + sidebar entries

tonydwb approved these changes Jun 10, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working tool/skills Skills system (list, view, manage) P2 Medium — degraded but workaround exists labels Jun 10, 2026

teknium1 changed the title ~~chore(skills): remove red-team skills (godmode, obliteratus) tripping Anthropic classifier~~ chore(skills): move red-team skills (godmode, obliteratus) to optional-skills — Anthropic classifier Jun 10, 2026

teknium1 merged commit fdc9034 into main Jun 10, 2026
24 checks passed

teknium1 deleted the chore/remove-redteam-skills-classifier branch June 10, 2026 04:41

teknium1 mentioned this pull request Jun 10, 2026

gate red-teaming/jailbreaking skills behind opt-in env flag #3714

Closed

f3rs3n mentioned this pull request Jun 11, 2026

fix(skills): report retired bundled-skill leftovers before manifest cleanup #38561

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(skills): move red-team skills (godmode, obliteratus) to optional-skills — Anthropic classifier#43221

chore(skills): move red-team skills (godmode, obliteratus) to optional-skills — Anthropic classifier#43221
teknium1 merged 2 commits into
mainfrom
chore/remove-redteam-skills-classifier

teknium1 commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

tonydwb left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Measured impact

Changes

Validation

Infographic

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: chore/remove-redteam-skills-classifier vs origin/main

ruff

ty (type checker)

Uh oh!

tonydwb left a comment

Choose a reason for hiding this comment

Code Review Summary

Looks Good

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

🔎 Lint report: `chore/remove-redteam-skills-classifier` vs `origin/main`