fix(gateway): auto-recover stale runtime lock after Windows sleep/wake#22483
Open
wshshz wants to merge 1 commit into
Open
fix(gateway): auto-recover stale runtime lock after Windows sleep/wake#22483wshshz wants to merge 1 commit into
wshshz wants to merge 1 commit into
Conversation
When Windows enters sleep mode, the gateway process is suspended but its byte-range lock (gateway.lock) persists. After wake-up, the old process may be alive but its connections are dead. The lock is mandatory on Windows (msvcrt.LK_NBLCK), so a new instance cannot start — hitting "Gateway runtime lock is already held by another instance. Exiting." Changes in gateway/status.py: - Add Windows-compatible process detection via kernel32.OpenProcess + wmic fallback (the old code relied on /proc/pid/cmdline which does not exist on Windows) - Add force_recover_stale_gateway_lock() with three-tier detection: 1. PID lookup (process dead → lock is stale) 2. Gateway process identity check (not a gateway → lock is stale) 3. Heartbeat staleness (last heartbeat > 10 min → sleep/wake) - When any tier flags staleness, force-kill the old process and clean up the lock file so the new instance can proceed Changes in gateway/run.py: - On lock acquisition failure, call force_recover_stale_gateway_lock() before giving up - Add a periodic heartbeat thread (120 s) that updates the runtime status timestamp so the staleness detection is reliable Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When Windows enters sleep mode, the gateway process is suspended but its byte-range lock (gateway.lock) persists. After wake-up, the old process may be alive but its connections are dead. The lock is mandatory on Windows (msvcrt.LK_NBLCK), so a new instance cannot start — hitting "Gateway runtime lock is already held by another instance. Exiting."
Changes in gateway/status.py:
Changes in gateway/run.py:
What does this PR do?
Related Issue
Fixes #
Type of Change
Changes Made
How to Test
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AFor New Skills
hermes --toolsets skills -q "Use the X skill to do Y"Screenshots / Logs