fix(gateway): explain stale runtime lock failures (#28561)#28603
Open
wesleysimplicio wants to merge 1 commit into
Open
fix(gateway): explain stale runtime lock failures (#28561)#28603wesleysimplicio wants to merge 1 commit into
wesleysimplicio wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR improves gateway startup diagnostics around runtime-lock acquisition.
When Hermes cannot acquire
gateway.lockand also cannot identify a live gateway PID, it now reports the failure as a likely stale-lock case and points the user to the supported recovery path (hermes gateway run --replace) instead of implying that another live gateway instance definitely exists.Root cause
The detailed rationale from the original PR body is preserved below. This template update keeps the review structure consistent with #29640.
Fix
gateway.lockandgateway.pidpaths in the error so users can inspect the runtime state quickly.--replacepath instead of force-killing or auto-removing lock files.Why this shape
This shape mirrors #29640 so reviewers can quickly compare scope, root cause, fix, tests, and related context without having to decode a custom PR description.
Tests
The original PR body below contains the previous validation notes, commands, or test plan.
No code changes are introduced by this formatting update itself.
Related PRs / issues
Fixes #28561
Original body
What does this PR do?
This PR improves gateway startup diagnostics around runtime-lock acquisition.
When Hermes cannot acquire
gateway.lockand also cannot identify a live gateway PID, it now reports the failure as a likely stale-lock case and points the user to the supported recovery path (hermes gateway run --replace) instead of implying that another live gateway instance definitely exists.Solution Sketch
gateway.lockandgateway.pidpaths in the error so users can inspect the runtime state quickly.--replacepath instead of force-killing or auto-removing lock files.Related Issue
Fixes #28561
Related / Overlap Check
fix(gateway): auto-recover stale runtime lock after Windows sleep/wake).--replacerecovery path.Type of Change
Changes Made
gateway.lockandgateway.pidpaths plus the supported recovery pathHow to Test
python -m pytest tests/gateway/test_runner_startup_failures.py::test_start_gateway_reports_stale_runtime_lock_guidance tests/gateway/test_runner_startup_failures.py::test_start_gateway_replace_clears_marker_on_permission_denied tests/gateway/test_runner_startup_failures.py::test_start_gateway_verbosity_imports_redacting_formatter -q -n 4.Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
python -m pytest tests/gateway/test_runner_startup_failures.py::test_start_gateway_reports_stale_runtime_lock_guidance tests/gateway/test_runner_startup_failures.py::test_start_gateway_replace_clears_marker_on_permission_denied tests/gateway/test_runner_startup_failures.py::test_start_gateway_verbosity_imports_redacting_formatter -q -n 4.3 passed in 15.33son Windows.testworkflow is currently failing in unrelated gateway/kanban/tools tests; the touched focused startup tests are listed above.Generated by Hermes Turbo