Skip to content

Shell Hooks#4

Draft
pefontana wants to merge 18 commits into
mainfrom
shell-hooks
Draft

Shell Hooks#4
pefontana wants to merge 18 commits into
mainfrom
shell-hooks

Conversation

@pefontana

@pefontana pefontana commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Shell Hooks let you intercept agent lifecycle events with a shell script written in whatever language you want — bash, python, go, anything that reads stdin and writes stdout. Drop a script in ~/.hermes/agent-hooks/, list it in cli-config.yaml, and it fires on every matching event with a JSON payload on stdin.

Three things you can do now that were awkward or impossible before:

  • Block dangerous tool calls from outside Python. Return {"decision": "block", "reason": "..."} from a pre_tool_call hook and the agent refuses with your message. Matcher-regex-scoped, so you can gate just terminal or web_search without touching anything else.
  • Inject dynamic context into every LLM turn. A pre_llm_call hook returning {"context": "..."} prepends its output to the next user message. Great for current time, git status, pwd, MOTD, or anything else you want the model to know without typing it.
  • Observe delegation. New subagent_stop event fires once per delegate_task child with parent_session_id, child_role, child_summary, child_status, duration_ms. Log, audit, forward to a dashboard — without patching Hermes.

Example — block rm -rf in bash

# ~/.hermes/agent-hooks/block-rm-rf.sh
#!/usr/bin/env bash
payload=$(cat)
if echo "$payload" | grep -q "rm -rf"; then
  printf '{"decision": "block", "reason": "rm -rf is blocked by policy"}\n'
else
  printf '{}\n'
fi
# ~/.hermes/config.yaml
hooks:
  pre_tool_call:
    - matcher: "terminal"
      command: "~/.hermes/agent-hooks/block-rm-rf.sh"
      timeout: 5
  pre_llm_call:
    - command: "python3 ~/.hermes/agent-hooks/inject-context.py"
  subagent_stop:
    - command: "~/.hermes/agent-hooks/log-orchestration.sh"
$ hermes --accept-hooks chat -q "clean the repo with rm -rf"
Agent refused: rm -rf is blocked by policy

Supported events

pre_tool_call (can veto), post_tool_call, pre_llm_call (can inject context), post_llm_call, pre_api_request, post_api_request, on_session_start, on_session_end, on_session_finalize, on_session_reset, subagent_stop, transform_terminal_output, transform_tool_result — same set as Python plugin hooks, registered through the same dispatcher.

Security model

First-use consent prompt on the TTY; decisions persist to ~/.hermes/shell-hooks-allowlist.json at mode 0o600. Non-TTY runs (gateway, cron, CI) need one of three explicit opt-ins:

  1. --accept-hooks on the CLI (hermes --accept-hooks gateway run, hermes gateway run --accept-hooks, hermes --accept-hooks cron tick, ...).
  2. HERMES_ACCEPT_HOOKS=1 environment variable.
  3. hooks_auto_accept: true in cli-config.yaml.
  • Subprocess execution is shell=False + shlex.split — no shell-injection vector.
  • Allowlist writes are atomic (tempfile.mkstemp + os.replace) and cross-process-safe (fcntl.flock on a sibling .lock file).
  • Timeouts default to 60s, capped at 300s. Subprocess failures (missing binary, timeout, bad JSON) log a warning but never crash the agent loop.
  • Script mtime is captured at approval time; hermes hooks doctor flags drift so you can spot post-approval edits.
  • Registration is gated to subcommands that actually run the agent (chat, gateway run, mcp serve, acp, cron run, cron tick, rl). Management commands like hermes gateway status, cron list, or mcp add never trigger plugin discovery or consent prompts.

hermes hooks CLI

Command Purpose
hermes hooks list Configured hooks with matcher, timeout, and consent status
hermes hooks test <event> [--for-tool X] [--payload-file F] Fire every matching hook against a synthetic payload and print the parsed response
hermes hooks doctor For each configured hook: check exec bit, allowlist, mtime drift, JSON output validity, execution time
hermes hooks revoke <command> Drop every allowlist entry matching <command>

Wire protocol (for hook authors)

stdin (JSON, same top-level shape for every event):

{
  "hook_event_name": "pre_tool_call",
  "tool_name":       "terminal",
  "tool_input":      {"command": "rm -rf /"},
  "session_id":      "sess_abc123",
  "cwd":             "/home/user/project",
  "extra":           {...}
}

stdout (JSON, optional — anything else is ignored):

  • Block pre_tool_call: {"decision": "block", "reason": "..."} (Claude-Code style) or {"action": "block", "message": "..."} (Hermes canonical). Both shapes accepted and normalised.
  • Inject context on pre_llm_call: {"context": "..."}.
  • Silent no-op: empty.

Full spec + worked examples in website/docs/user-guide/features/hooks.md.

Interpreter-prefixed commands

python3 ~/hook.py, bash ~/hook.sh, and /usr/bin/env python3 ~/hook.py are all first-class — the bridge resolves the actual script path, so hermes hooks doctor reports the right executability and mtime drift instead of checking python3 on $PATH.

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)
  • 📝 Documentation update
  • ✅ Tests

How to Test

  1. pytest tests/agent/test_shell_hooks.py tests/agent/test_shell_hooks_consent.py tests/agent/test_subagent_stop_hook.py tests/hermes_cli/test_hooks_cli.py -v — 80+ passing.
  2. Drop the block-rm-rf.sh + config snippet above into ~/.hermes/, then:
    $ hermes --accept-hooks hooks doctor
    All shell hooks look healthy.
    $ hermes --accept-hooks chat -q "clean temp with rm -rf /tmp/x"
    Agent should surface "rm -rf is blocked by policy" and refuse.
  3. hermes hooks list / hermes hooks revoke ~/.hermes/agent-hooks/block-rm-rf.sh for the round-trip.

Backward Compatibility

  • Nothing registers unless the user adds a hooks: block to cli-config.yaml — existing installs are unchanged.
  • Existing Python plugin hooks keep working; shell hooks are additive callbacks on the same invoke_hook() dispatcher, with Python plugins registered first so their block decisions win ties.
  • subagent_stop is a net-new entry in VALID_HOOKS; nothing else changed.

Checklist

  • Tests added (80+ passing)
  • Docs updated (website/docs/user-guide/features/hooks.md)
  • cli-config.yaml.example updated with a commented hooks: example
  • Conventional Commits
  • No unrelated changes (all commits are shell-hook scoped)
  • Tested on macOS 15 (Darwin 25.4.0)

New event fired once per child after delegate_task completes. This
commit reserves the event name in the dispatcher vocabulary; the
actual firing site in delegate_tool lands separately.
After every child agent completes, invoke_hook("subagent_stop", ...)
fires once with parent_session_id, child_role, child_summary,
child_status, and duration_ms.

Firing happens on the parent thread after as_completed() drains the
futures, not on the worker threads that ran the children. Hook
callbacks never see concurrent invocation.

child_role reads child._delegate_role via a defensive getattr so this
works without depending on the orchestrator-role work. The value is
stashed on each entry dict before child.close() runs in the finally
block, then stripped again before delegate_task serialises its result
to JSON.
Shell-script hooks declared in the hooks: block of cli-config.yaml.
Each entry becomes a plugin-hook callback via
agent.shell_hooks.register_from_config(), so every existing
invoke_hook() site dispatches to the configured script with no
additional wiring on the caller side.

Wire protocol: JSON payload on stdin, optional JSON response on
stdout. For pre_tool_call the Claude-Code-style
{"decision": "block", "reason": str} payload is translated to the
canonical Hermes {"action": "block", "message": str} shape so that
get_pre_tool_call_block_message() surfaces the block. This is the one
correctness invariant the test suite gates on.

First-use consent prompts on a TTY and persists approvals to
~/.hermes/shell-hooks-allowlist.json with the script mtime captured
at approval time (used later by `hermes hooks doctor` for drift
detection). Non-TTY runs need --accept-hooks, HERMES_ACCEPT_HOOKS=1,
or hooks_auto_accept: true.

Subprocess execution uses shlex.split + shell=False, default 60s
timeout, capped at 300s. Registration is idempotent per
(event, command) pair so CLI and gateway startup can both call
register_from_config() in the same process.
hermes_cli/main.py calls register_from_config() at startup, gated to
commands that actually run the agent (chat, gateway, acp, mcp, cron,
rl). Introspection-only commands skip registration so
`hermes hooks list` never triggers a consent prompt for a hook the
user is still inspecting.

gateway/run.py calls register_from_config() right before
self.hooks.discover_and_load() at the existing startup point.

Adds --accept-hooks on the top-level parser and the chat subparser,
following the --worktree pattern.

Adds the hermes hooks subcommand group:

- list          configured hooks with matcher, timeout, consent
- test <event>  fire matching hooks against a synthetic payload
- revoke <cmd>  drop allowlist entries
- doctor        check exec bit, allowlist, mtime drift, JSON output

Also adds a commented hooks: example block to
cli-config.yaml.example.
New Shell Hooks section covers the concept, a three-way comparison
table (shell vs plugin vs gateway), the JSON wire protocol with the
Claude-Code-to-canonical block translation, four worked examples
(auto-format, block-rm-rf, inject-cwd-context, log-orchestration),
the consent model, and the hermes hooks CLI.

New subagent_stop entry in the plugin hooks reference with the
callback signature, payload fields, and a logging example.

Also reconciles the two places that claimed pre_tool_call return
values are always ignored — get_pre_tool_call_block_message actually
acts on {"action": "block", "message": str}. Intro paragraph,
quick-reference table, and pre_tool_call detail section updated.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Unify the subprocess invocation between the live callback path and the
`hermes hooks test` diagnostic helper. Both now go through a single
_spawn() function that returns a consistent result dict (returncode,
stdout, stderr, timed_out, elapsed_seconds, error), cutting the two
copies of timeout / FileNotFoundError / PermissionError handling.

Drop _safe_extras and _safe_cwd. The outer json.dumps(default=str) call
already renders every unserialisable value as its str() form, so the
per-value try/except dance in _safe_extras was pure overhead.

Replace the ad-hoc _nearest_event/_similarity pair with
difflib.get_close_matches — same behaviour in one stdlib call instead
of 40 lines.

Trim _record_approval: reuse script_mtime_iso() for the mtime lookup
and extract _utc_now_iso() so both the approved_at and mtime fields
share a single formatter.

Net: shell_hooks.py drops from 807 to 718 lines with the same test
surface green (72 tests).
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…tomic allowlist

Five issues surfaced by reviewers of PR #4 (codex, claude):

Shell-hook precedence in the gateway startup path was reversed. CLI
called discover_plugins() before register_from_config(); gateway
relied on the discover_plugins() side-effect inside model_tools.py,
but gateway lazy-imports run_agent per-request so that side-effect
never fires at startup. Add an explicit discover_plugins() call before
register_from_config() in gateway/run.py.

Shell-hook dedupe was keyed on (event, command) alone, so reusing the
same script under one event with two matchers silently dropped the
second entry. Include matcher in the identity triple. Regression test
for the same-command-different-matcher case.

`hermes hooks test` and `hermes hooks doctor` built stdin directly
from _DEFAULT_PAYLOADS, bypassing _serialize_payload. The synthetic
shape was missing cwd and session_id top-level keys, and buried
parent_session_id under extra instead of lifting it to session_id
like production does. A script validated via `hermes hooks test` could
silently break in production. Rewrite _DEFAULT_PAYLOADS as kwargs dicts
and route run_once through _serialize_payload so the two paths produce
identical JSON. Regression test asserts the top-level key set.

discover_plugins() at CLI startup was running for every subcommand,
not just agent-running ones. Pre-PR behaviour was lazy discovery via
model_tools import — commands like hermes --help / doctor / tools /
hooks list never paid the cost and never triggered plugin register()
side effects. Move the discover_plugins() call inside the
_AGENT_COMMANDS gate so that stays true.

Allowlist writes were a naïve write_text, risking truncated JSON if
two CLI processes race or one dies mid-write. Use tmp + os.replace.

Harden tests/agent/test_subagent_stop_hook.py to run without openai
installed. delegate_task calls _build_child_agent which imports
run_agent → openai even when _run_single_child is patched. Add an
autouse fixture that replaces _build_child_agent with a MagicMock
factory.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

The _AGENT_COMMANDS gate included "cron" for every cron subcommand,
so `hermes cron list`, `cron status`, `cron create`, and the rest
triggered discover_plugins() + register_from_config() even though
none of them run the agent loop. Users hit the first-use consent
prompt on a hook they were still inspecting, or paid plugin-discovery
cost on a read-only CRUD command.

Narrow the cron carve-out to the two subcommands that actually run
jobs: `cron run` and `cron tick`. Every other cron subcommand falls
through the gate and stays plugin-free, matching pre-PR behaviour
for introspection commands.
save_allowlist() wrote to a fixed `shell-hooks-allowlist.json.tmp`
path and _record_approval() did read-modify-write without any lock,
so two concurrent CLI / gateway processes calling the consent path
raced two ways: os.replace() hit ENOENT when one writer's temp file
got moved out from under another, and the read-modify-write clobbered
entries because whichever writer finished last overwrote the other's
snapshot. A 50-way stress run kept only 2/50 approvals before this
change.

save_allowlist() now uses tempfile.mkstemp() for a per-process unique
temp file inside ~/.hermes and cleans it up on failure, so concurrent
writers never collide on the staging path. _record_approval() and
revoke() now run their read-modify-write inside
_locked_update_approvals(), which holds an exclusive fcntl.flock on
~/.hermes/shell-hooks-allowlist.json.lock for the duration of the
update. Same 50-way stress run now preserves all 50 approvals.

The non-POSIX fallback uses a dedicated _allowlist_write_lock rather
than _registered_lock, because register_from_config() already holds
_registered_lock when it triggers _record_approval() via the consent
prompt path — reusing it would self-deadlock on threading.Lock
(non-reentrant). Regression test confirms: fails with a 3 s timeout
against the shared-lock variant, passes against the dedicated lock.

Three new tests under TestAllowlistConcurrency:

- test_parallel_record_approval_does_not_lose_entries — 32 threads
  race through _record_approval behind a Barrier, asserts all 32
  entries land.
- test_save_allowlist_uses_unique_tmp_paths — spies on
  tempfile.mkstemp and asserts successive save_allowlist() calls
  pick distinct temp paths.
- test_non_posix_fallback_does_not_self_deadlock — monkeypatches
  fcntl to None, runs _record_approval while holding
  _registered_lock in a daemon thread, bounds the wait via
  threading.Event.wait(timeout=3).

Side effect: tempfile.mkstemp defaults to 0o600, so the allowlist
file now lands with user-only permissions instead of whatever the
default umask would have given it.
`hermes hooks doctor` ran the synthetic-payload smoke test against
every configured hook regardless of allowlist status. That directly
undermined the workflow documented in hooks.md — "re-run
`hermes hooks doctor` after you pull a shared config to spot
newly-added hooks *before they register*" — because doctor itself
executed every listed script before reporting the "✗ not allowlisted"
status.

_doctor_one now skips check #4 (produces valid JSON on a synthetic
payload) for any entry without an allowlist record, and prints an
explicit "ℹ skipped JSON smoke test — not allowlisted yet" note
instead. Allowlisted entries still run the smoke test unchanged, so
the drift-detection workflow continues to work.

New test_unallowlisted_script_is_not_executed uses a sentinel-file
pattern: the hook script would touch a file on disk if spawned, and
the test asserts the file does not exist after running doctor.
`hermes hooks test pre_api_request` fed scripts a stdin carrying
only {"session_id": "test-session"}, while the real firing site at
run_agent.py passes task_id, platform, model, provider, base_url,
api_mode, api_call_count, message_count, tool_count,
approx_input_tokens, request_char_count, max_tokens — and
post_api_request additionally passes api_duration, finish_reason,
response_model, usage, assistant_content_chars, and
assistant_tool_call_count. A script validated under `hooks test`
could then break in production on a key that `hooks test` never
surfaced.

Replace the one-key synthetic dict for each of pre_api_request and
post_api_request in _DEFAULT_PAYLOADS with realistic kwargs mirroring
the invoke_hook() call sites. Uses claude-sonnet-4-6 for the model
field to match a typical Hermes provider rather than the previous
gpt-4 placeholder.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…accept-hooks

Two issues around the CLI entrypoint gating:

`hermes gateway status`, `gateway setup`, `gateway install`, and
friends all triggered discover_plugins() + register_from_config() at
CLI startup — even though none of them run the agent in-process.
Same story for `hermes mcp list`, `mcp add`, `mcp test`, etc.  The
`_AGENT_COMMANDS` gate was too coarse: it listed the whole `gateway`
and `mcp` command groups regardless of subcommand.  Management
commands would hit a first-use consent prompt for a hook the user
was still inspecting, or pay plugin-discovery cost with no agent
loop to run.

Narrow the gate the same way we did for `cron`: keep the root
`gateway` / `mcp` strings out of `_AGENT_COMMANDS`, and fall through
to dedicated `_GATEWAY_AGENT_SUBCOMMANDS = {"run"}` and
`_MCP_AGENT_SUBCOMMANDS = {"serve"}` sets.  Only the subcommands
that actually invoke the agent loop in-process now register hooks.

Separately, `--accept-hooks` was only wired on the top-level parser
and the `chat` subparser, so `hermes gateway run --accept-hooks`
and `hermes gateway --accept-hooks run` both errored out with
`unrecognized arguments`.  Only `hermes --accept-hooks gateway run`
worked.  That blocked the documented headless opt-in path for
gateway/cron/mcp/acp launches.

Add `--accept-hooks` (with `default=argparse.SUPPRESS` so subparser
parsing doesn't clobber a truthy top-level value) to every agent
subparser: `gateway`, `gateway run`, `cron`, `cron run`, `cron tick`,
`mcp`, `mcp serve`, `acp`.  The flag now works at every position
along the command path.  Parametrised smoke test in
`test_argparse_flag_propagation.py` invokes each variant via
`python -m hermes_cli.main` and asserts exit 0 with no
`unrecognized arguments` in stderr.

Also drop the redundant `accept_hooks=bool(_cfg.get("hooks_auto_accept"))`
pass-through in `gateway/run.py`.  `register_from_config` already
resolves the effective accept flag via `_resolve_effective_accept`
from the same cfg plus `HERMES_ACCEPT_HOOKS`, so re-reading
`hooks_auto_accept` at the call site was pure duplication.  Pass
`accept_hooks=False` and let the internal resolver do the single
check.
A hook command like ``python3 /path/hook.py`` or ``bash hook.sh`` is
valid: ``_spawn`` runs it via ``shlex.split`` + ``subprocess.run(...,
shell=False)`` and it executes fine.  But the inspection helpers
``script_is_executable`` and ``script_mtime_iso`` ran
``_command_script_path`` which returned only the first shlex token,
so they ended up checking ``python3`` or ``bash`` against the on-disk
path rather than the script the user actually cares about.

Consequences visible to the user:

- ``hermes hooks doctor`` reported ``✗ script missing or not
  executable`` for every interpreter-prefixed hook, even when the
  script ran successfully at runtime.
- The "modified since approval" drift check tracked the mtime of the
  interpreter (which basically never changes) instead of the script,
  so real script edits after approval went undetected.

Rewrite ``_command_script_path`` with three-step resolution:
prefer tokens ending in a known script extension
(``.sh``/``.py``/``.rb``/...), fall back to the first token
containing a path separator or leading ``~``, fall back to the first
token for bare names.  That handles the common cases — plain script
paths, ``INTERP SCRIPT`` invocations, ``/usr/bin/env INTERP SCRIPT``
shebangs, and interpreter flags like ``python3 -u /path/hook.py`` —
while preserving the existing behaviour for bare executables like
``/bin/echo hi``.

Parametric regression test in ``TestAllowlistConcurrency`` covers 15
commands spanning every resolution branch.
``register_from_config`` acquired ``_registered_lock`` per spec and
then, inside that critical section, called ``_prompt_and_record``
which issues a blocking ``input()`` on the TTY.  Any other thread
that reached ``register_from_config`` parked on the lock until the
user answered the prompt — even if it was trying to register a
completely different spec that wouldn't have prompted at all.

Single-threaded startup callers (CLI ``main()`` and
``GatewayRunner.start()``) make this benign today, but the shape of
the code defeats the very reason the lock is there.  Flagged in
three consecutive review rounds.

Split the critical section: the lock now wraps only the idempotence
check (+ allowlist read) at the top of the loop and the mutation
(appending the callback, updating ``_registered``) at the bottom.
The TTY prompt runs outside the lock.  A defensive re-check on the
idempotence key inside the second lock section keeps double
registration from landing if a future caller parallelises this path
— two threads that both clear the prompt for the same spec would
otherwise each append a duplicate callback.

Also tightens the "not allowlisted" warning message: the previous
``Run hermes hooks list`` recommendation doesn't actually approve
anything; the three real opt-in channels are ``--accept-hooks``,
``HERMES_ACCEPT_HOOKS=1``, and ``hooks_auto_accept: true``.
Two user-facing diagnostic gaps:

``matcher:`` is only consulted for ``pre_tool_call`` /
``post_tool_call`` — the runtime callback ignores it for every other
event.  Misconfigured hooks like

    pre_llm_call:
      - matcher: "terminal"
        command: ~/hook.sh

used to parse silently and then fire on every LLM call regardless of
the matcher string, which is the opposite of what the user wanted.
``hermes hooks list`` even printed the non-functional matcher back
as if it were active.  ``_parse_single_entry`` now warns and drops
the matcher at parse time so the rendered spec matches runtime
behaviour.

When ``save_allowlist`` fails (disk full, read-only ``~/.hermes``,
permission denied) the caller already logged a generic warning, but
the message didn't tell the user what that meant for the next run
— the approval lands in the in-memory ``_registered`` set and the
hook registers for the current process, yet the next startup
re-prompts because the approval never hit disk.  Without context
the symptom looks like "hermes keeps asking me about the same hook".
Expand the warning to include the allowlist path, the errno, and
the re-prompt caveat explicitly.

Tests:

- ``test_non_tool_event_matcher_warns_and_drops`` — configures a
  matcher on ``pre_llm_call`` and asserts the spec's matcher ends up
  ``None`` and a warning containing ``pre_llm_call`` and
  "only honored for pre_tool_call" is emitted.
- ``test_save_allowlist_failure_logs_actionable_warning`` —
  monkeypatches ``tempfile.mkstemp`` to raise ``OSError(28)``,
  confirms the warning names the file path, the errno message, and
  the re-prompt behaviour.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

No behaviour change, 143/143 tests still green.  Follow-up cleanup
on the last four commits (a5ea68f, 262f547, 7412d4e, f54a348)
that leaned too heavily on repetition and prose.

- Factor the eight duplicated ``--accept-hooks`` argparse blocks in
  ``hermes_cli/main.py`` into a single ``_add_accept_hooks_flag()``
  helper.  Drops ~45 lines across gateway/cron/mcp/acp subparsers.

- Collapse the three ``_*_AGENT_SUBCOMMANDS`` sets + the four-branch
  ``or``-chain that consumed them into a single ``_AGENT_SUBCOMMANDS``
  dict keyed by command name with ``(attr, allowed_set)`` values.
  The gate is now a two-line lookup.

- Trim the oversized docstrings on ``_command_script_path`` and
  ``save_allowlist`` back to the core invariant — the behaviour is
  already covered by tests.

- Reduce the refactor-note block inside ``register_from_config`` to
  three lines; the structure (check under lock, prompt outside,
  re-check on mutation) is visible from the code.

- Shrink the two new diagnostic tests (non-tool matcher warning,
  save-failure warning) from long-form docstring + explicit helpers
  to tight inline assertions.

Net: −132 lines (213 removed, 81 added) across three files.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

Found during end-to-end testing of the feature.

After the earlier commit taught ``_command_script_path`` to return
the actual script for interpreter-prefixed commands
(``python3 /path/hook.py``, ``/usr/bin/env bash hook.sh``),
``script_is_executable`` kept requiring ``os.X_OK`` on that path.
But when the command is ``python3 hook.py`` there is no need for the
script to be executable — the interpreter opens and reads it.
``hermes hooks doctor`` reported ``✗ script missing or not executable``
for hooks that ran fine at runtime, which is user-hostile noise.

Narrow the check: if the resolved script path equals ``argv[0]`` the
invocation is bare (``/path/hook.sh``) and X_OK is required; otherwise
the script is an argument to an interpreter and R_OK is enough.
Mirrors what ``subprocess.run`` would actually do.

New ``test_script_is_executable_handles_interpreter_prefix`` covers
both paths and the mode change.
@github-actions

Copy link
Copy Markdown

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Dependency manifest files modified

Changes to dependency files can introduce new packages or change version pins. Verify all dependency changes are intentional and from trusted sources.

Files:

pyproject.toml

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

# Conflicts:
#	hermes_cli/main.py
#	website/docs/user-guide/features/hooks.md
@pefontana pefontana changed the title Shell hooks Shell Hooks Apr 20, 2026
@pefontana pefontana mentioned this pull request Apr 22, 2026
3 tasks
pefontana added a commit that referenced this pull request Apr 24, 2026
Three independent reviews surfaced a handful of real bugs. Fixing all
of them here:

* **SIGTERM orphans hook subprocesses (codex #1).** The CLI only
  installed a SIGINT handler — SIGTERM (from ``kill``, ``timeout``,
  systemd stop, CI harnesses) skips atexit entirely and leaves every
  in-flight hook subprocess running as an orphan owned by init.  Adds
  ``_async_pool_sigterm_handler`` which terminates tracked subprocess
  groups inline, then routes to ``sys.exit(128 + SIGTERM)``.  Inline
  termination is required because ``ThreadPoolExecutor`` uses
  non-daemon threads: Python waits for every worker to return before
  running atexit, and workers block inside
  ``proc.communicate(timeout=spec.timeout)`` until the subprocess
  dies.  Renamed ``_maybe_install_sigint_handler`` →
  ``_maybe_install_signal_handlers`` (with back-compat alias).
  Verified: ``kill -TERM`` on a hermes CLI running a 4 s ``sleep``
  hook now exits in ~0.7 s with no orphan, was 4 s + orphan.

* **Subprocess groups for reliable termination.** Hooks are now
  spawned with ``start_new_session=True`` so the subprocess is its
  own PGID leader.  Shutdown / SIGINT / SIGTERM paths call
  ``os.killpg`` on the group instead of ``proc.terminate()`` — without
  this, a bash script's orphaned ``sleep`` child kept the parent
  stdout FD open and blocked ``proc.communicate`` for the full sleep
  duration.  ``_terminate_group`` / ``_kill_group`` helpers fall back
  to plain ``terminate`` / ``kill`` on edge cases where ``getpgid``
  fails (already-exited proc, non-POSIX).

* **``hermes hooks test --no-wait`` blocks for full hook runtime
  (codex #2).** The flag advertised fire-and-forget but the CLI's
  ``ThreadPoolExecutor`` atexit ``pool.shutdown(wait=True)`` joined
  the worker anyway, which in turn waited for the subprocess.
  ``_cmd_test`` now polls briefly for ``_live_procs`` to fill (so the
  subprocess definitely spawned), then ``os._exit(0)`` — skipping
  atexit entirely.  The subprocess keeps running under init because
  of ``start_new_session=True``.  Verified: CLI exit dropped from 2.3
  s to 76 ms for a 2-second hook, and the hook still writes its
  audit log 3 s later after the CLI is gone.

* **Stale ``_child_role_for_batch`` test (claude #1 / hermes #2).**
  The test from commit 76d3ffd4 asserted the *old* helper field name
  — no code path sets it post-refactor (455c136f), so the test
  passed trivially without verifying anything.  Fixed to assert
  ``_child_role`` (the real field) is stripped, and added an
  explanatory message so a future failure is easier to diagnose.
  Module-header docstring updated too.

* **``submit()`` RuntimeError branch: stale-semaphore parity fix
  (claude #3).**  Same pattern I already fixed in
  ``_on_async_future_done``, missed here: a concurrent
  ``_reset_async_pool`` between ``acquire`` and ``release`` would
  cause ``_async_sem_get()`` to lazy-create a fresh sem and over-
  release on it.  Snapshot ``_async_sem_inst`` + swallow
  ``ValueError`` like the symmetric path.

* **Shutdown race: proc registered after the snapshot (claude #4 /
  hermes #1).** Worker that got between ``subprocess.Popen()`` and
  ``_register_live_proc(proc)`` would miss the shutdown-sweep
  snapshot and block for the full ``spec.timeout``.  After
  registering, the worker now checks ``_async_shutting_down`` and
  self-terminates its subprocess group.

* **WARN log noise on SIGTERM'd children (claude #5).**
  Shutdown-induced exits (rc = -15 / -9) no longer spam a per-proc
  ``WARNING`` — demoted to ``DEBUG`` when ``_async_shutting_down``
  is set.  Both the atexit path and the signal handlers now set the
  flag before terminating, so a Ctrl-C or a ``kill -TERM`` with 10
  running hooks emits zero warn lines instead of 10.

Still outstanding (documented trade-offs, not fixed here):

* Gateway shutdown blocks the event loop for up to ``grace_seconds``
  (claude #2).  Acknowledged as a follow-up candidate via
  ``loop.run_in_executor``.
* ``_maybe_install_signal_handlers`` is still leading-underscore
  (claude NousResearch#6).  Cosmetic; kept consistent with the rest of the
  module's private-by-convention API.

All 101 hook tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant