Shell hooks for `pre_tool_call` and `on_session_finalize` do not fire reliably in kanban-worker `chat -q` context (v0.13.0)

**Version observed:** v0.13.0 (release tag 2026-05-07 per `RELEASE_v0.13.0.md`)
**Discovered during:** M2 §5.6 audit-schema enforcement work in a downstream Hermes-based orchestrator project (private dev-notes; happy to share specific excerpts).
**Diagnostic plugin (minimal reproducer):** see §3 below.

---

## Summary

Two distinct symptoms with similar shape:

1. **`pre_tool_call` shell hooks do not fire from the worker's tool-dispatch path** when the worker is spawned by the kanban dispatcher OR invoked manually as `hermes --accept-hooks -p <profile> --skills kanban-worker chat -q "<prompt>"`. Registration succeeds, but `invoke_hook("pre_tool_call", ...)` is never called for the worker's `kanban_show`, `write_file`, `terminal`, `kanban_complete`, etc.

2. **`on_session_finalize` shell hooks beyond the first one in config also do not fire for clean kanban-worker sessions.** A second on_session_finalize hook (registered alongside the existing safety-net hook) does not fire on a worker that calls `kanban_complete` cleanly and exits 0. The same hook fires perfectly when invoked directly as a subprocess.

Both failures look the same from above: registration succeeds (`hermes hooks list` shows ✓ allowed; `agent.log` confirms `INFO agent.shell_hooks: shell hook registered: ...`), the hook script is correct and direct-invokable, but the worker's session lifecycle never triggers the relevant `invoke_hook` call.

---

## Why this matters for downstream projects

The Hermes Orchestrator project's M2 milestone needed structural enforcement of `kanban_complete(metadata=...)` schema (PAF §4.2 / kickoff §4.2). Two natural Hermes-canonical implementations failed for the reasons above:

- `pre_tool_call` hook scoped to `matcher=kanban_complete` — would reject invalid `metadata` BEFORE the task transitions to `done`. Cannot be wired in worker context.
- `on_session_finalize` hook — would validate `runs[-1].metadata` AFTER the worker exits but while the substrate can still annotate / spawn follow-up tasks. Cannot be wired either.

The project's workaround is a cron-driven scanner that runs `audit_validate_recent.py` every 5 min. This loses structural strength (the task is `done` before invalid metadata is detected) and adds operational complexity (cron registration, retry hygiene). With either hook mechanism working in worker context, the workaround would be unnecessary.

---

## Repro 1 — `pre_tool_call` does not fire in worker

### Setup

1. Add a Python plugin to `~/.hermes/plugins/m2_5_6_diagnostic/` with:

   **`plugin.yaml`:**
   ```yaml
   name: m2_5_6_diagnostic
   version: 0.0.1
   description: "pre_tool_call dispatch diagnostic"
   author: "anyone"
   hooks:
     - pre_tool_call
   ```

   **`__init__.py`:**
   ```python
   import datetime, os, pathlib
   TRACE = pathlib.Path.home() / ".hermes" / "m2_5_6_plugin_trace.log"

   def _trace(event, **kw):
       try:
           with TRACE.open("a", encoding="utf-8") as f:
               f.write(f"{datetime.datetime.utcnow().isoformat()}\tpid={os.getpid()}\tevent={event}\t{kw}\n")
       except Exception:
           pass

   def _pre_tool_call(**kwargs):
       _trace("pre_tool_call", **kwargs)
       return None  # observer-only

   def register(ctx):
       _trace("register_called")
       ctx.register_hook("pre_tool_call", _pre_tool_call)
       _trace("register_done")
   ```

2. Enable the plugin: `hermes plugins enable m2_5_6_diagnostic`.

3. Verify in-process firing works:
   ```bash
   hermes-agent/venv/bin/python -c "
   from hermes_cli.plugins import discover_plugins, get_plugin_manager
   discover_plugins(force=True)
   get_plugin_manager().invoke_hook('pre_tool_call', tool_name='test', args={}, task_id='t', session_id='s', tool_call_id='c')
   "
   cat ~/.hermes/m2_5_6_plugin_trace.log
   # Expect: 3 trace lines (register_called, register_done, pre_tool_call)
   ```

4. Submit a fresh kanban task that exercises a tool call:
   ```bash
   rm -f ~/.hermes/m2_5_6_plugin_trace.log
   hermes kanban create "diagnostic probe $(date +%s)" \
     --assignee claude-code-worker \
     --idempotency-key "diag-$(date +%s%N)" \
     --body "Call kanban_show, write a trivial file, then call kanban_complete. Any tool call is fine."
   sleep 90
   cat ~/.hermes/m2_5_6_plugin_trace.log
   ```

### Expected

The trace log should contain at least one `pre_tool_call` line per tool the worker invoked (`kanban_show`, possibly `write_file`, `kanban_complete`).

### Observed

The trace log is **not created**, even though the worker's session log shows multiple tool calls completing. The plugin's `register(ctx)` is called (verified by trace line) only in the in-process REPL test from step 3 — NOT in the worker subprocess.

Manual worker invocation produces the same result:
```bash
HERMES_KANBAN_TASK=t_existing hermes --accept-hooks -p claude-code-worker --skills kanban-worker chat -q "say hi" 2>&1
# Worker runs 0 tool calls — but even when given a prompt that invokes tools, no trace lines appear.
```

### Suspected location

`run_agent.py:_invoke_tool` at line 10452 explicitly calls `get_pre_tool_call_block_message` from `hermes_cli.plugins` before tool execution. Either:
- This code path is not reached by the worker subprocess (the worker uses a different dispatch path); OR
- The worker subprocess's plugin manager has no registered callbacks for `pre_tool_call` because `register_from_config` was bypassed or silently failed during worker startup.

`main.py:11800-11808` calls `register_from_config(load_config(), accept_hooks=_accept_hooks)` for `args.command in _AGENT_COMMANDS = {None, "chat", "acp", "rl"}` — `chat -q` should match. The wrapping try/except logs failures at DEBUG only, so silent failures are plausible.

---

## Repro 2 — `on_session_finalize` does not fire for clean kanban-worker sessions

### Setup

1. Add a second on_session_finalize shell hook to `~/.hermes/config.yaml` alongside the existing one (the project has an M1-deployed safety-net hook):

   ```yaml
   hooks:
     on_session_finalize:
     - command: /home/<user>/projects/<project>/.hermes/hooks/safety_net.py
       timeout: 30
     - command: /home/<user>/projects/<project>/.hermes/hooks/diagnostic.py
       timeout: 30
   ```

   `diagnostic.py` is a bare-script hook with `#!/usr/bin/env python3` shebang that writes one line to a trace log on every invocation:

   ```python
   #!/usr/bin/env python3
   import datetime, os, sys, pathlib
   TRACE = pathlib.Path.home() / ".hermes" / "on_session_finalize_trace.log"
   with TRACE.open("a", encoding="utf-8") as f:
       f.write(f"{datetime.datetime.utcnow().isoformat()}\tpid={os.getpid()}\tkanban_task={os.environ.get('HERMES_KANBAN_TASK','<none>')}\n")
   sys.stdout.write("{}\n")  # required by hooks contract
   ```

   `chmod +x` the file. Allowlist its (event, command) pair in `~/.hermes/shell-hooks-allowlist.json`.

2. Restart the gateway (`hermes gateway stop && nohup env ORCH_M2_GATES=1 hermes gateway run > log 2>&1 &`).

3. Confirm both hooks are registered:
   ```bash
   tail -10 ~/.hermes/logs/agent.log | grep "shell hook registered"
   # Expect: 2 entries for on_session_finalize (safety_net, diagnostic)
   ```

4. Verify direct invocation works:
   ```bash
   rm -f ~/.hermes/on_session_finalize_trace.log
   HERMES_KANBAN_TASK=t_test echo '{}' | /path/to/diagnostic.py
   cat ~/.hermes/on_session_finalize_trace.log
   # Expect: 1 trace line
   ```

5. Submit a kanban task that the worker will complete cleanly:
   ```bash
   rm -f ~/.hermes/on_session_finalize_trace.log
   hermes kanban create "diag $(date +%s)" \
     --assignee claude-code-worker \
     --idempotency-key "diag-$(date +%s%N)" \
     --body "Trivial work. Call kanban_complete when done."
   sleep 90
   cat ~/.hermes/on_session_finalize_trace.log
   ```

### Expected

The trace log should contain at least one entry (the worker's session ending fires on_session_finalize, invoking all registered callbacks including the diagnostic).

### Observed

The trace log is **not created**. Direct invocation works perfectly. Both hooks are registered (confirmed via agent.log AND `hermes hooks list`). The worker completes the task cleanly and the session ends — but `invoke_hook("on_session_finalize", ...)` is not called from this code path.

The existing safety-net hook (1st in config) does fire for *protocol-violation* sessions (M1 evidence: two prior tasks where workers crashed with HTTP 400; specific task IDs available on request). We have no direct evidence it fires for clean sessions either — it just silent-exits when the task is already `done`, so the lack of side effects doesn't disprove firing.

### Suspected location

`cli.py:728` calls `invoke_hook("on_session_finalize", session_id=..., platform="cli")` at "actual session boundary" per the surrounding comment. `gateway/run.py:8099` has a similar call for gateway-tracked sessions. For a worker invoked as `hermes ... chat -q "..."`, the cli.py call site should fire. Either:

- The `chat -q` (quiet) code path bypasses the `cli.py:728` invocation entirely; OR
- It reaches the call but `invoke_hook` returns immediately because the worker's plugin manager has no callbacks registered for the event (despite agent.log confirming gateway-side registration).

---

## Common pattern

Both repros share this property: **the hook is correctly registered (per `hermes hooks list` AND per `agent.log` `shell hook registered` INFO line) and works perfectly when invoked directly as a subprocess. But the worker's session lifecycle never triggers `invoke_hook(...)` for the registered hook.**

The §5.6 work spent two hours in the original investigation + several more hours during the Path A/B confirmations, without isolating which code path the worker takes that bypasses the hook dispatch. A maintainer with knowledge of the worker spawn lifecycle could likely resolve this in minutes.

---

## Workaround (for downstream visibility)

The Hermes Orchestrator project closed M2 §5.6 by pivoting to a cron-driven scanner that bypasses the hook system entirely:

```bash
*/5 * * * * /path/to/audit_validate_recent.py --once --apply
```

The scanner queries `hermes kanban list --json --status done`, filters by `completed_at` window, validates `runs[-1].metadata` against the project schema, and applies remediation (`kanban_comment` + AUDIT_REVIEW follow-up task) for invalid completions. Works reliably; loses ≤5 min detection latency vs. on_session_finalize.

---

## Asks

- Could a maintainer confirm whether `register_from_config` actually runs (and completes) in kanban-spawned worker subprocesses? If it does, the issue is in `invoke_hook` dispatch; if it doesn't, the issue is registration.
- If the worker subprocess uses a distinct dispatch path that intentionally skips plugin hooks (for performance, isolation, etc.), please document this in `agent/shell_hooks.py` docstring + the user-guide hooks page. Downstream consumers will hit the same dead-end without that guidance.
- The `--accept-hooks` / `hooks_auto_accept` / `HERMES_ACCEPT_HOOKS` chain is well-documented for first-use TTY consent, but its interaction with kanban-spawned non-TTY workers is empirically uneven. Worth a §3 example in the docs.

Happy to provide additional artifacts (worker session logs, agent.log snapshots, full plugin source). Two days of investigation across two separate paths (Path A on `pre_tool_call`, Path B on `on_session_finalize`) are documented in the downstream project's private dev-notes; happy to share specific excerpts on request.

---

*Reproducer plugin is in place on the affected install — happy to share its source if helpful.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shell hooks for `pre_tool_call` and `on_session_finalize` do not fire reliably in kanban-worker `chat -q` context (v0.13.0) #25204

Summary

Why this matters for downstream projects

Repro 1 — `pre_tool_call` does not fire in worker

Setup

Expected

Observed

Suspected location

Repro 2 — `on_session_finalize` does not fire for clean kanban-worker sessions

Setup

Expected

Observed

Suspected location

Common pattern

Workaround (for downstream visibility)

Asks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Shell hooks for pre_tool_call and on_session_finalize do not fire reliably in kanban-worker chat -q context (v0.13.0) #25204

Description

Summary

Why this matters for downstream projects

Repro 1 — pre_tool_call does not fire in worker

Setup

Expected

Observed

Suspected location

Repro 2 — on_session_finalize does not fire for clean kanban-worker sessions

Setup

Expected

Observed

Suspected location

Common pattern

Workaround (for downstream visibility)

Asks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Shell hooks for `pre_tool_call` and `on_session_finalize` do not fire reliably in kanban-worker `chat -q` context (v0.13.0) #25204

Repro 1 — `pre_tool_call` does not fire in worker

Repro 2 — `on_session_finalize` does not fire for clean kanban-worker sessions