Skip to content

fix: exclude ancestor PIDs from gateway process scan (#13242)#19146

Closed
cixuuz wants to merge 1 commit into
NousResearch:mainfrom
cixuuz:fix/gateway-stop-profile-scoped
Closed

fix: exclude ancestor PIDs from gateway process scan (#13242)#19146
cixuuz wants to merge 1 commit into
NousResearch:mainfrom
cixuuz:fix/gateway-stop-profile-scoped

Conversation

@cixuuz

@cixuuz cixuuz commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #13242_scan_gateway_pids() false-positives on the calling CLI process.

Problem

_scan_gateway_pids() uses ps pattern matching to find running gateways. When invoked from the CLI (e.g. hermes gateway status), the calling process command line contains hermes gateway, matching the scan patterns. While _append_unique_pid() already excludes os.getpid(), it does not exclude parent/ancestor processes — so wrapper scripts, shell invocations, or nested process trees can still produce false positives.

Fix

  • Add _get_ancestor_pids() that walks the process tree from the current PID up to init (PID 1), capped at 64 iterations.
  • At the top of _scan_gateway_pids(), merge the ancestor set into exclude_pids so the entire chain is filtered out before any pattern matching.

This is a hardening fix — the primary duplicate-instance guard in gateway/run.py already uses PID-file-based detection (get_running_pid()), so the self-detection issue mostly manifests in status/stop paths that fall back to process scanning.

Testing

  • Verified _get_parent_pid() correctly walks the chain on Linux (uses /proc/{pid}/status with ps -o ppid= fallback).
  • The existing _is_pid_ancestor_of_current_process() helper (used by _request_gateway_self_restart) validates the same walk logic.
  • No behavioral change for legitimate gateway PIDs — only the invoking CLI's process tree is excluded.

)

_scan_gateway_pids() uses ps-based pattern matching to find running
gateways. When invoked from the CLI (e.g. `hermes gateway status`),
the calling process itself matches gateway patterns, causing false
positives — the CLI is mistakenly counted as a running gateway.

Add _get_ancestor_pids() that walks the process tree from the current
PID up to init (PID 1). Merge this set into exclude_pids at the top
of _scan_gateway_pids() so the entire ancestor chain is filtered out.

This complements the existing os.getpid() exclusion in
_append_unique_pid() by also covering parent/grandparent processes
(e.g. when hermes is invoked via a wrapper script or shell).

Closes NousResearch#13242
@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists labels May 3, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Fixes #13242. Competing PRs #13273 and #14177 were previously closed — this approach (ancestor chain exclusion in _scan_gateway_pids) looks correct.

@teknium1

teknium1 commented May 4, 2026

Copy link
Copy Markdown
Contributor

Salvaged via #19586 onto current main - your commit authorship was preserved. Thanks!

@teknium1 teknium1 closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gateway run detects calling CLI process as running gateway instance (self false positive)

3 participants