fix(gateway): prefer pid file for manual status#9559
Closed
LevSky22 wants to merge 1 commit into
Closed
Conversation
Contributor
|
Superseded by #11896 (which salvaged @snreynolds's broader #11167). Your PR correctly identified the root cause — status was shelling out to ps when authoritative runtime metadata already lives in gateway.pid — and the fix landed covers the Docker foreground case you were targeting. Thanks @LevSky22! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes a remaining Docker false negative in
hermes gateway statuswhen the gateway is running as the container foreground process.After
#7032, the packaged image includesprocps, sopsis available. But the manual gateway status path inhermes_cli/gateway.pystill relies onfind_gateway_pids(), which shells out tops eww -ax -o pid=,command=. In the Docker environment I tested, that probe can still fail even though the gateway is actually running as PID 1 and the runtime state files are correct.This PR makes
hermes gateway statusprefer the existing PID-file based helper fromgateway.statusbefore falling back to raw process scanning. That is the right approach because Hermes already persists authoritative gateway runtime metadata ingateway.pid/gateway_state.json, and that path is more robust in container foreground mode than shelling out tops.Related Issue
Related to:
Type of Change
Changes Made
hermes_cli/gateway.pyso manualhermes gateway statusprefersgateway.status.get_running_pid()beforefind_gateway_pids()tests/hermes_cli/test_gateway_service.pycovering the manual gateway status path when PID-file metadata is valid but process scanning is unreliableHow to Test
hermes gateway runas the main container processgateway.pid/gateway_state.jsonhermes gateway statusBefore this change:
hermes gateway statuscan incorrectly report✗ Gateway is not runningAfter this change:
hermes gateway statusreports✓ Gateway is running (PID: 1)Automated validation performed:
pytest -o addopts="" tests/test_hermes_constants.py tests/hermes_cli/test_gateway_service.py tests/hermes_cli/test_gateway_runtime_health.py tests/hermes_cli/test_container_aware_cli.py tests/hermes_cli/test_status.py tests/gateway/test_status.py -k "not test_system_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout and not test_supports_systemd_services_returns_true_when_systemctl_present and not test_system_unit_includes_local_bin_in_path"110 passed, 3 deselectedThe three deselected tests are systemd/root-oriented service-install cases that are not relevant to the Docker foreground
gateway statuspath this PR changes.Manual validation performed:
hermes gateway statuschanged from a false negative to:✓ Gateway is running (PID: 1)(Running manually, not as a system service)Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
Before:
After: