Skip to content

fix(gateway): exclude CLI invoker ancestor chain in _append_unique_pid#13273

Closed
hclsys wants to merge 1 commit into
NousResearch:mainfrom
hclsys:fix/scan-gateway-exclude-ancestor-pids-13242
Closed

fix(gateway): exclude CLI invoker ancestor chain in _append_unique_pid#13273
hclsys wants to merge 1 commit into
NousResearch:mainfrom
hclsys:fix/scan-gateway-exclude-ancestor-pids-13242

Conversation

@hclsys

@hclsys hclsys commented Apr 21, 2026

Copy link
Copy Markdown

Problem

`hermes gateway run` invoked from an interactive CLI session gets a self false-positive: the `ps aux` scan in `_scan_gateway_pids` matches generic substrings like `hermes_cli.main gateway`, which picks up the invoking CLI process AND its ancestor chain (parent shell, tmux session, launcher wrapper). The gateway then concludes it's 'already running' and shuts down immediately.

Reported by @yes999zc in #13242.

Fix

`_append_unique_pid` already filtered `os.getpid()` but not the ancestor chain. The repo already has an ancestor-detection helper `_is_pid_ancestor_of_current_process` used by `_request_gateway_self_restart`. Extend `_append_unique_pid` to reuse it — one gate, covers every call site (`find_gateway_pids`, service probes, `_scan_gateway_pids`).

Pre-implement audit

  • A (existing helper): `_is_pid_ancestor_of_current_process` already defined at `hermes_cli/gateway.py:150`. Reuse rather than re-walking the parent chain. ✓
  • B (shared callers): `_append_unique_pid` is called from `find_gateway_pids` (service + PID-file + ps-scan paths). Adding an extra exclusion strictly narrows what's returned — existing callers that wanted to kill/probe gateway processes still get real gateway PIDs, they just stop seeing their own invoker tree. Contract preserved. ✓
  • C (broader rival): No rival on gateway run detects calling CLI process as running gateway instance (self false positive) #13242. ✓

Testing

  • New `test_append_unique_pid_excludes_current_process_ancestors` — monkeypatches the parent chain to return `5000 → 4000 → 1`, asserts that `_append_unique_pid` rejects both 5000 (current) and 4000 (parent), keeps only the unrelated 7777 gateway PID.
  • Touched the existing `test_find_gateway_pids_falls_back_to_pid_file_when_process_scan_fails` to stub `_is_pid_ancestor_of_current_process` (so the fallback path doesn't shell out to real `ps -o ppid=` for the fake service pid 321).
  • Full `test_gateway.py` + `test_gateway_service.py` suites pass (117/117).

Fixes #13242

Reporter @yes999zc in NousResearch#13242: `hermes gateway run` from an interactive
CLI session got false-positive 'already running' and shut down
immediately. The `ps aux` scan in `_scan_gateway_pids` matches
generic substrings like 'hermes_cli.main gateway', which picks up the
invoking CLI process AND its ancestor chain (parent shell, tmux session,
launcher).

`_append_unique_pid` already filtered `os.getpid()` but not the
ancestor chain. The repo already has an ancestor-detection helper
(`_is_pid_ancestor_of_current_process`) used by
`_request_gateway_self_restart`; extend `_append_unique_pid` to
reuse it, so CLI-invoker self-matches are filtered at the same point
that covers `find_gateway_pids`, service probes, and scan results.

Fixes NousResearch#13242
@hclsys

hclsys commented Apr 21, 2026

Copy link
Copy Markdown
Author

Closing as stale per personal cycle policy: 20h ceiling with zero maintainer movement (no comment, no review, no label). If this becomes relevant again I'll re-open with a fresh rebase.

@hclsys hclsys closed this Apr 21, 2026
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/cli CLI entry point, hermes_cli/, setup wizard labels Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gateway run detects calling CLI process as running gateway instance (self false positive)

2 participants