Skip to content

fix(gateway): prefer pid file for manual status#9559

Closed
LevSky22 wants to merge 1 commit into
NousResearch:mainfrom
LevSky22:fix/gateway-status-docker
Closed

fix(gateway): prefer pid file for manual status#9559
LevSky22 wants to merge 1 commit into
NousResearch:mainfrom
LevSky22:fix/gateway-status-docker

Conversation

@LevSky22

Copy link
Copy Markdown

What does this PR do?

Fixes a remaining Docker false negative in hermes gateway status when the gateway is running as the container foreground process.

After #7032, the packaged image includes procps, so ps is available. But the manual gateway status path in hermes_cli/gateway.py still relies on find_gateway_pids(), which shells out to ps eww -ax -o pid=,command=. In the Docker environment I tested, that probe can still fail even though the gateway is actually running as PID 1 and the runtime state files are correct.

This PR makes hermes gateway status prefer the existing PID-file based helper from gateway.status before falling back to raw process scanning. That is the right approach because Hermes already persists authoritative gateway runtime metadata in gateway.pid / gateway_state.json, and that path is more robust in container foreground mode than shelling out to ps.

Related Issue

Related to:

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • Updated hermes_cli/gateway.py so manual hermes gateway status prefers gateway.status.get_running_pid() before find_gateway_pids()
  • Added a regression test in tests/hermes_cli/test_gateway_service.py covering the manual gateway status path when PID-file metadata is valid but process scanning is unreliable

How to Test

  1. Build and run Hermes in Docker with hermes gateway run as the main container process
  2. Confirm the gateway is actually running via gateway.pid / gateway_state.json
  3. Run hermes gateway status

Before this change:

  • hermes gateway status can incorrectly report ✗ Gateway is not running

After this change:

  • hermes gateway status reports ✓ Gateway is running (PID: 1)

Automated validation performed:

  • pytest -o addopts="" tests/test_hermes_constants.py tests/hermes_cli/test_gateway_service.py tests/hermes_cli/test_gateway_runtime_health.py tests/hermes_cli/test_container_aware_cli.py tests/hermes_cli/test_status.py tests/gateway/test_status.py -k "not test_system_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout and not test_supports_systemd_services_returns_true_when_systemctl_present and not test_system_unit_includes_local_bin_in_path"
  • Result: 110 passed, 3 deselected

The three deselected tests are systemd/root-oriented service-install cases that are not relevant to the Docker foreground gateway status path this PR changes.

Manual validation performed:

  • Built and ran the patched image in Docker Compose
  • Confirmed hermes gateway status changed from a false negative to:
    • ✓ Gateway is running (PID: 1)
    • (Running manually, not as a system service)

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu 24.04 / Docker Compose

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Before:

✗ Gateway is not running

After:

✓ Gateway is running (PID: 1)
  (Running manually, not as a system service)

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #11896 (which salvaged @snreynolds's broader #11167). Your PR correctly identified the root cause — status was shelling out to ps when authoritative runtime metadata already lives in gateway.pid — and the fix landed covers the Docker foreground case you were targeting. Thanks @LevSky22!

@teknium1 teknium1 closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants