[codex] Fix gateway update restart race#13713
Closed
tommy29tmar wants to merge 1 commit into
Closed
Conversation
Collaborator
This was referenced Apr 22, 2026
Contributor
|
Closing as superseded by #14200. Triage notes (medium confidence): Thanks for the contribution — the underlying problem this PR addresses has been resolved by the linked PR on current main. If you believe this was closed in error, please comment and we'll reopen. (Bulk-closed during a CLI PR triage sweep.) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This fixes a gateway restart race seen during
hermes updatewhen multiplehermes-gateway*systemd units are active. The change is not related to STT, Whisper, CUDA, or any local audio configuration.Root Cause
get_running_pid()already determines when a PID file is stale, but cleanup for the currentHERMES_HOMEpath delegated toremove_pid_file(). That helper intentionally refuses to delete a PID file owned by another process. For a stale PID record from a previous gateway process, this leftgateway.pidbehind and caused the next systemd start to fail repeatedly withPID file race lost to another gateway instance.Separately,
hermes updatediscovered activehermes-gateway*units in systemctl output order. If a profile gateway restarted before the default gateway, the default profile could hit stale PID state while another profile was already running.Changes
hermes-gatewayrestarts before profile units such ashermes-gateway-scout.Validation
PYTHONPATH=/tmp/hermes-agent-upstream-pr /home/tommaso/.hermes/hermes-agent/venv/bin/python -m pytest /tmp/hermes-agent-upstream-pr/tests/gateway/test_status.py /tmp/hermes-agent-upstream-pr/tests/hermes_cli/test_update_gateway_restart.py -q68 passed in 3.50s