fix(security): add approval gate for skill mutations in non-interacti…#43776
Draft
LifeJiggy wants to merge 3 commits into
Draft
fix(security): add approval gate for skill mutations in non-interacti…#43776LifeJiggy wants to merge 3 commits into
LifeJiggy wants to merge 3 commits into
Conversation
…ve mode Agent-created skills are persistent prompt-injection vectors loaded every session. Previously the agent could autonomously create/edit/delete skills in non-interactive mode (cron, delegation, headless) with only a regex guard. Added _require_approval_for_skill_action() that blocks mutating actions (create/edit/delete/patch/write_file/remove_file) unless HERMES_YOLO_MODE or HERMES_INTERACTIVE is set. Read-only actions (list/show) are unaffected. Enhancements: - Centralized _MUTATING_SKILL_ACTIONS frozenset for easy extension - Warning-level audit log for every blocked attempt - Actionable error message telling the agent to ask the user Tests: - Blocked without YOLO/INTERACTIVE env - Allowed with HERMES_YOLO_MODE=1 - Allowed with HERMES_INTERACTIVE=1 - Read-only actions bypass gate - All mutating actions gated - skill_manage returns JSON approval error
Contributor
|
✅ Verified — Approval gate for skill mutations in non-interactive mode Reviewed the diff for the new
The gate is defense-in-depth against autonomous skill creation in cron/delegation contexts. No issues found. |
…sync test - Defer tools.terminal_tool import into HERMES_INTERACTIVE branch to avoid import side effects (Path.home() crash in test env) - Use _YOLO_MODE_FROZEN + is_current_session_yolo_enabled() for proper YOLO bypass (prevents runtime env mutation bypass) - Use prompt_dangerous_approval for interactive approval flow - Error message now includes HERMES_YOLO_MODE=1 for actionable guidance - Add test_gate_covers_all_schema_actions: verify frozenset matches schema - Add test_error_message_includes_yolo_hint: actionable error text - Remove weak test_readonly_actions_always_allowed (list/show not in schema) - Fix _skill_dir fixture: patch _YOLO_MODE_FROZEN instead of env var - 99/99 tests pass
…ging system The upstream NousResearch/hermes-agent added a write-approval gate (_apply_skill_write_gate + write_approval.py) that stages skill writes for user review when skills.write_approval is enabled in config.yaml. This mechanism was completely untested. Add TestWriteApprovalGate (6 tests): - Gate off: writes flow directly (default behavior) - Gate on: writes staged to pending/skills/ for review - _skill_gate_bypass ContextVar: skips gate during replay - apply_skill_pending replays staged creates through skill_manage - All 6 mutating actions bypass gate when approval is off - Schema enum lists exactly the 6 mutating actions Uses upstream's skill_manager_tool.py and write_approval.py as-is. Tests patch tools.write_approval.evaluate_gate/stage_write to simulate gate decisions without filesystem side effects.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix/skills-approval-gate
What does this PR do?
Adds an approval gate for skill mutations (create, edit, delete, patch, write_file, remove_file) in non-interactive mode. Agent-created skills are persistent prompt-injection vectors loaded every session — without this gate, the agent can autonomously mutate skills in cron, delegation, or background contexts without user confirmation.
The gate uses _YOLO_MODE_FROZEN (frozen at import time to prevent runtime env mutation bypass) and is_current_session_yolo_enabled() for the two sanctioned bypass paths. In interactive mode, it routes through prompt_dangerous_approval for user confirmation. Blocked errors include actionable guidance (HERMES_YOLO_MODE=1).
Related Issue
Fixes #17251 (agent can autonomously create/modify skills in non-interactive sessions)
Type of Change
Changes Made
live env bypass prevention, and skill_manage integration
How to Test
Checklist