Skip to content

fix(gateway): unify gateway status across CLI surfaces (salvaged from #11167)#11896

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-d2ed496c
Apr 18, 2026
Merged

fix(gateway): unify gateway status across CLI surfaces (salvaged from #11167)#11896
teknium1 merged 2 commits into
mainfrom
hermes/hermes-d2ed496c

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

hermes gateway status, hermes status, hermes dump, and the profile gateway-running check now agree on whether a gateway is live.

Root cause

Each surface had its own liveness check — systemd/launchd service probe, ps scan, or a bare os.kill probe — while hermes gateway run relied on get_running_pid() reading the profile-scoped PID file. A service-stopped-but-manual-process-running setup would show "not running" from status and simultaneously refuse hermes gateway run because the PID file found a live instance.

Changes

  • gateway/status.py: get_running_pid() / is_gateway_running() accept an explicit pid_path + cleanup_stale kwarg so other profiles' PID files aren't deleted by a check on the current profile.
  • hermes_cli/gateway.py: new GatewayRuntimeSnapshot dataclass + get_gateway_runtime_snapshot() as the single source of truth for gateway liveness. find_gateway_pids() now seeds from the profile PID file before scanning ps.
  • hermes gateway status: after systemd_status / launchd_status, prints a mismatch warning when the service is inactive but a gateway process is running for this profile.
  • hermes_cli/status.py and hermes_cli/dump.py: replace per-platform if-ladders with the shared snapshot.
  • hermes_cli/profiles.py: _check_gateway_running() delegates to get_running_pid() with cleanup_stale=False instead of a weaker custom probe.

Plus AUTHOR_MAP entry for Sara Reynolds.

Validation

Before After
service inactive + manual hermes gateway run status says "not running"; run refuses to start status reports service state + mismatch warning naming the live PID
profile gateway check bare os.kill — matches any live PID validates start_time + gateway-like cmdline
status output paths 3 independent platform ladders one GatewayRuntimeSnapshot

Targeted tests: 209 passed (gateway/test_status, test_gateway*, test_profiles, test_status, test_runtime_health, test_container_aware_cli).

Live E2E: hermes gateway status and hermes status exercised against the isolated HERMES_HOME on a host with an active systemd gateway — both correctly read the live service state via the new snapshot path. Mismatch warning verified via direct function call with simulated service-inactive + manual-process state.

Credit

Original PR: #11167 by @snreynolds — cherry-picked with authorship preserved. Supersedes #9559 (@LevSky22, narrower docker-focused fix) and #11445 (@Hyena0x, narrower macOS-focused fix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants