Summary
The skills.guard_agent_created config option controls whether the Skills Guard scanner (tools/skills_guard.py) scans agent-created skills for security concerns (prompt injection patterns, data exfiltration, destructive commands, etc.).
Currently this defaults to false (hermes_cli/config.py:837), meaning skills autonomously created by the agent's learning loop are saved to ~/.hermes/skills/ without any security scan.
Current Behavior
# hermes_cli/config.py:837
"guard_agent_created": False,
When the agent creates a new skill after completing a complex task:
- Skill content is written to
~/.hermes/skills/<name>/SKILL.md
- No
skills_guard.scan_skill() is invoked
- The skill is immediately loadable via
/<skill-name> slash command
Proposed Change
Change the default to true:
"guard_agent_created": True,
When enabled, agent-created skills pass through skills_guard.py before being saved. The guard checks for patterns like:
- Prompt injection (role hijack, system prompt override)
- Data exfiltration (curl with env vars, base64 encoding pipelines)
- Destructive commands (rm, mkfs, shutdown patterns)
- Obfuscation (base64+exec, hex encoding)
Skills that trigger high-severity findings would require user review before being loadable.
Rationale
- Defense in depth: The learning loop is a powerful feature, but agent-generated content should be treated as untrusted — the agent may have been influenced by prompt injection during the session that created the skill.
- Low impact: The scanner is already implemented and tested. This change only flips the default from opt-in to opt-out.
- User override: Users who find the scanner too conservative can set
skills.guard_agent_created: false in config.yaml.
References
tools/skills_guard.py:48-50 — agent-created source configuration
hermes_cli/config.py:837 — default value
- Commit
ce089169 — original implementation (explicitly chose default off for backward compat)
Impact
- No breaking changes — existing skills are not re-scanned
- Users who already set
guard_agent_created: true see no change
- New installs get the safer default
Summary
The
skills.guard_agent_createdconfig option controls whether the Skills Guard scanner (tools/skills_guard.py) scans agent-created skills for security concerns (prompt injection patterns, data exfiltration, destructive commands, etc.).Currently this defaults to
false(hermes_cli/config.py:837), meaning skills autonomously created by the agent's learning loop are saved to~/.hermes/skills/without any security scan.Current Behavior
When the agent creates a new skill after completing a complex task:
~/.hermes/skills/<name>/SKILL.mdskills_guard.scan_skill()is invoked/<skill-name>slash commandProposed Change
Change the default to
true:When enabled, agent-created skills pass through
skills_guard.pybefore being saved. The guard checks for patterns like:Skills that trigger high-severity findings would require user review before being loadable.
Rationale
skills.guard_agent_created: falsein config.yaml.References
tools/skills_guard.py:48-50— agent-created source configurationhermes_cli/config.py:837— default valuece089169— original implementation (explicitly chose default off for backward compat)Impact
guard_agent_created: truesee no change