Skip to content

fix(security): add approval gate for skill mutations in non-interacti…#43776

Draft
LifeJiggy wants to merge 3 commits into
NousResearch:mainfrom
LifeJiggy:fix/skills-approval-gate
Draft

fix(security): add approval gate for skill mutations in non-interacti…#43776
LifeJiggy wants to merge 3 commits into
NousResearch:mainfrom
LifeJiggy:fix/skills-approval-gate

Conversation

@LifeJiggy

@LifeJiggy LifeJiggy commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

fix/skills-approval-gate

What does this PR do?
Adds an approval gate for skill mutations (create, edit, delete, patch, write_file, remove_file) in non-interactive mode. Agent-created skills are persistent prompt-injection vectors loaded every session — without this gate, the agent can autonomously mutate skills in cron, delegation, or background contexts without user confirmation.

The gate uses _YOLO_MODE_FROZEN (frozen at import time to prevent runtime env mutation bypass) and is_current_session_yolo_enabled() for the two sanctioned bypass paths. In interactive mode, it routes through prompt_dangerous_approval for user confirmation. Blocked errors include actionable guidance (HERMES_YOLO_MODE=1).

Related Issue
Fixes #17251 (agent can autonomously create/modify skills in non-interactive sessions)

Type of Change

  • 🔒 Security fix

Changes Made

  • tools/skill_manager_tool.py: Added _require_approval_for_skill_action() with deferred _get_approval_callback import, _MUTATING_SKILL_ACTIONS frozenset, and gate call at top of skill_manage()
  • tests/tools/test_skill_manager_tool.py: 9 new tests covering blocked/approved/denied flows, schema sync, YOLO hint,
    live env bypass prevention, and skill_manage integration

How to Test

  1. python -m pytest tests/tools/test_skill_manager_tool.py -q — 99/99 pass
  2. Verify test_gate_covers_all_schema_actions ensures frozenset stays in sync with schema enum
  3. Verify test_live_yolo_env_does_not_bypass_gate confirms runtime env mutation is blocked

Checklist

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs
  • My PR contains only changes related to this fix
  • I've run pytest and all tests pass
  • I've added tests for my changes
  • I've tested on Windows 11

…ve mode

Agent-created skills are persistent prompt-injection vectors loaded every
session. Previously the agent could autonomously create/edit/delete skills
in non-interactive mode (cron, delegation, headless) with only a regex guard.

Added _require_approval_for_skill_action() that blocks mutating actions
(create/edit/delete/patch/write_file/remove_file) unless HERMES_YOLO_MODE
or HERMES_INTERACTIVE is set. Read-only actions (list/show) are unaffected.

Enhancements:
- Centralized _MUTATING_SKILL_ACTIONS frozenset for easy extension
- Warning-level audit log for every blocked attempt
- Actionable error message telling the agent to ask the user

Tests:
- Blocked without YOLO/INTERACTIVE env
- Allowed with HERMES_YOLO_MODE=1
- Allowed with HERMES_INTERACTIVE=1
- Read-only actions bypass gate
- All mutating actions gated
- skill_manage returns JSON approval error
@liuhao1024

Copy link
Copy Markdown
Contributor

✅ Verified — Approval gate for skill mutations in non-interactive mode

Reviewed the diff for the new _require_approval_for_skill_action gate and its integration into skill_manage().

  • Gate scope: The gate applies only to actions in _MUTATING_SKILL_ACTIONS (create, edit, delete, patch, write_file, remove_file). Read-only actions (list, show) bypass it — confirmed in test_readonly_actions_always_allowed.
  • Existing test safety: The _skill_dir fixture now patches HERMES_YOLO_MODE=1, so all pre-existing tests continue to exercise the full mutation path. This is the correct approach — it avoids touching every existing test.
  • New test coverage: TestApprovalGate covers blocked (no env), allowed (YOLO), allowed (INTERACTIVE), all mutating actions, and end-to-end skill_manage() returning a JSON error. 6 tests, good.
  • Placement: The gate runs early in skill_manage() before any file I/O, so a blocked action never touches the filesystem.

The gate is defense-in-depth against autonomous skill creation in cron/delegation contexts. No issues found.

@alt-glitch alt-glitch added type/security Security vulnerability or hardening tool/skills Skills system (list, view, manage) P2 Medium — degraded but workaround exists labels Jun 10, 2026
…sync test

- Defer tools.terminal_tool import into HERMES_INTERACTIVE branch to
  avoid import side effects (Path.home() crash in test env)
- Use _YOLO_MODE_FROZEN + is_current_session_yolo_enabled() for proper
  YOLO bypass (prevents runtime env mutation bypass)
- Use prompt_dangerous_approval for interactive approval flow
- Error message now includes HERMES_YOLO_MODE=1 for actionable guidance
- Add test_gate_covers_all_schema_actions: verify frozenset matches schema
- Add test_error_message_includes_yolo_hint: actionable error text
- Remove weak test_readonly_actions_always_allowed (list/show not in schema)
- Fix _skill_dir fixture: patch _YOLO_MODE_FROZEN instead of env var
- 99/99 tests pass
…ging system

The upstream NousResearch/hermes-agent added a write-approval gate
(_apply_skill_write_gate + write_approval.py) that stages skill
writes for user review when skills.write_approval is enabled in
config.yaml. This mechanism was completely untested.

Add TestWriteApprovalGate (6 tests):
- Gate off: writes flow directly (default behavior)
- Gate on: writes staged to pending/skills/ for review
- _skill_gate_bypass ContextVar: skips gate during replay
- apply_skill_pending replays staged creates through skill_manage
- All 6 mutating actions bypass gate when approval is off
- Schema enum lists exactly the 6 mutating actions

Uses upstream's skill_manager_tool.py and write_approval.py as-is.
Tests patch tools.write_approval.evaluate_gate/stage_write to
simulate gate decisions without filesystem side effects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Context Compaction Demotes Memory to "Background Reference", Causing Memory Loss After Restart

3 participants