feat(approval): block agents from killing their own gateway/host process#128
Merged
Conversation
…host process Reproduced 2026-06-09: a session cleared stale bytecode by killing its own desktop dashboard PID. The host process died mid-turn, the session was orphaned (blank indicator, stop failed, session-not-found on the next prompt) and in-progress work was lost. Companion fixes already on main (graceful SIGTERM finalize in tui_gateway/entry.py, resume fallback to state.db, desktop ws rebind NousResearch#43004) make sessions recoverable after a gateway death — this guard removes the foot-gun that caused it. - tools/approval.py: _check_self_host_kill extracts kill numeric targets and $$/$PPID self-tokens, compares against os.getpid()/os.getppid(), and blocks unconditionally (hardline-style: yolo / approvals.mode=off / cron approve cannot bypass; containers still skip — their PID namespace is not the host's). Block message points at supervisor-driven restarts. - Negative-pgid kill forms stay out of scope (kill -1 is already hardline); pkill name-matching is a separate concern. - tests: 10 new cases in tests/tools/test_hardline_blocklist.py (own pid, parent pid, $$/$PPID, chained, foreign-pid allow, pgid/pkill allow, end-to-end hardline shape, yolo no-bypass, container bypass). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
🔎 Lint report:
|
OmarB97
pushed a commit
that referenced
this pull request
Jun 10, 2026
… fork consolidation; finish fork-feature ports Per-cluster restoration with the test suite as the oracle, after comparing the merged tree's failures against a pristine-upstream run in the same environment (14 file-level deltas, now zero): - gateway/run.py: upstream wholesale (fork's monolith had undone the mixin decomposition; both real fork deltas re-applied — voice_ack_callback **kwargs; the custom-providers context-length fix exists upstream). - agent/conversation_loop.py + turn_context.py: upstream structure with the fork features regrafted at their new homes — sender_device attribution (#131), preflight token-usage emission + compression-complete status and live-estimate snapshots (#126). - agent/chat_completion_helpers.py: upstream wholesale (brings the second partial-stream-stub routing site and the NousResearch#6600 cancellation fix). - agent/tool_executor.py: usage= kwarg on tool start/complete callbacks now falls back to the bare 3-arg form for legacy receivers. - tools/approval.py: upstream's resolved-HERMES_HOME rewrite + normalize steps restored alongside the fork's self-host kill guard (#128). - hermes_cli/main.py: desktop install-identity stale-build cluster and the post-subcommand global-flag hoister ported from fork main. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Reproduced 2026-06-09 (mesh record
hermes-dashboard-kill-orphans-hosted-session-20260609): an agent session killed its own desktop dashboard PID to clear stale bytecode. The host process died mid-turn, the session indicator went blank, stop failed, and prompting returned session-not-found. The recovery half already exists on main — graceful SIGTERM finalize intui_gateway/entry.py,session.resumefallback tostate.db, desktop websocket rebind (NousResearch#43004) — but nothing stopped an agent from pulling the rug out from under itself in the first place.What changed
tools/approval.py: new function guard_check_self_host_killrunning incheck_all_command_guardsright after the sudo-stdin guard, before any yolo/mode bypass. It extracts numerickilltargets plus the$$/$PPIDshell self-tokens and blocks when they matchos.getpid()/os.getppid()— i.e. the process hosting this very session (desktop dashboard with in-process tui_gateway, or the CLI). Block result is hardline-shaped (hardline: true) with a message pointing at supervisor restarts (hermes gateway restart, desktop Gateway menu).docker/singularity/modal/daytona) keep their existing early bypass — PIDs in a container namespace are not the host gateway's.tests/tools/test_hardline_blocklist.py: 10 new cases.How to review
_check_self_host_kill+_self_host_kill_block_resultintools/approval.py— the regex scopes tokillat command position;pkilland negative-pgid forms are deliberately out of scope (rationale in the module comment;kill -1is already hardline).check_all_command_guards— confirm it sits in the unconditional section (before yolo/mode=off checks).test_hardline_blocklist.py.Evidence
test_yolo_cannot_bypass_self_host_killandtest_check_all_command_guards_blocks_self_host_killpin the unconditional behavior;test_self_host_kill_allows_foreign_pidpins the non-overreach.Verification
tests/tools/test_hardline_blocklist.py— 110 passed (100 pre-existing + 10 new).tests/tools/test_approval.py test_cron_approval_mode.py test_execute_code_approval_cluster.py— 250 passed, 2 failed; both failures (TestHermesConfigWriteProtectionperl/ruby in-place) reproduce identically on cleanorigin/mainwith this diff stashed — pre-existing, environment-dependent, untouched by this change.Risks / gaps
pkill -f <pattern>matching the host's cmdline can still self-kill — out of scope here as name-matching needs cmdline introspection; covered by mesh recordhermes-dashboard-kill-orphans-hosted-session-20260609follow-up notes.kill -- -<pgid>) are not compared against our pgid — accepted scope,kill -1(all processes) is already hardline-blocked.os.getppid()per invocation; if the host is re-parented mid-session (supervisor restart) the parent check follows reality — low risk.Collaborators