Skip to content

feat(security): Default skills.guard_agent_created to true #16461

@fr33d3m0n

Description

@fr33d3m0n

Summary

The skills.guard_agent_created config option controls whether the Skills Guard scanner (tools/skills_guard.py) scans agent-created skills for security concerns (prompt injection patterns, data exfiltration, destructive commands, etc.).

Currently this defaults to false (hermes_cli/config.py:837), meaning skills autonomously created by the agent's learning loop are saved to ~/.hermes/skills/ without any security scan.

Current Behavior

# hermes_cli/config.py:837
"guard_agent_created": False,

When the agent creates a new skill after completing a complex task:

  1. Skill content is written to ~/.hermes/skills/<name>/SKILL.md
  2. No skills_guard.scan_skill() is invoked
  3. The skill is immediately loadable via /<skill-name> slash command

Proposed Change

Change the default to true:

"guard_agent_created": True,

When enabled, agent-created skills pass through skills_guard.py before being saved. The guard checks for patterns like:

  • Prompt injection (role hijack, system prompt override)
  • Data exfiltration (curl with env vars, base64 encoding pipelines)
  • Destructive commands (rm, mkfs, shutdown patterns)
  • Obfuscation (base64+exec, hex encoding)

Skills that trigger high-severity findings would require user review before being loadable.

Rationale

  • Defense in depth: The learning loop is a powerful feature, but agent-generated content should be treated as untrusted — the agent may have been influenced by prompt injection during the session that created the skill.
  • Low impact: The scanner is already implemented and tested. This change only flips the default from opt-in to opt-out.
  • User override: Users who find the scanner too conservative can set skills.guard_agent_created: false in config.yaml.

References

  • tools/skills_guard.py:48-50 — agent-created source configuration
  • hermes_cli/config.py:837 — default value
  • Commit ce089169 — original implementation (explicitly chose default off for backward compat)

Impact

  • No breaking changes — existing skills are not re-scanned
  • Users who already set guard_agent_created: true see no change
  • New installs get the safer default

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/configConfig system, migrations, profilestool/skillsSkills system (list, view, manage)type/securitySecurity vulnerability or hardening

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions