fix(gateway): catch OSError from os.kill on Windows for stale PID detection#14364
fix(gateway): catch OSError from os.kill on Windows for stale PID detection#14364Bartok9 wants to merge 1 commit into
Conversation
…ection On Windows, os.kill(pid, 0) raises OSError([WinError 11]) instead of ProcessLookupError when the target process no longer exists. The current handler only catches ProcessLookupError and PermissionError, so the OSError propagates unhandled and the gateway refuses to start until the stale PID file is manually deleted. Fix: add OSError to the exception tuple. This matches the pattern already used at other os.kill call sites in the same file (lines 522, 633). Adds regression tests covering the Windows OSError path, the existing ProcessLookupError path, and a missing-PID-file baseline. Fixes NousResearch#14359
|
Related: #14388 by @eLeanwang addresses a complementary stale-PID scenario — |
|
Thanks for the fix, @Bartok9! This is an automated hermes-sweeper review. The functional change in this PR is already on
The one thing not yet on |
Problem
Fixes #14359.
On Windows,
os.kill(existing_pid, 0)raisesOSError([WinError 11])(and otherOSErrorvariants) instead ofProcessLookupErrorwhen the process no longer exists. The current handler only catchesProcessLookupErrorandPermissionError:This causes every gateway restart on Windows after a non-graceful shutdown (terminal close, crash, Ctrl+C) to fail with an unhandled
OSError— requiring manual deletion of the PID file each time.Root Cause
Windows does not raise
ProcessLookupError(ESRCH) for dead PIDs inos.kill. It raises a platform-specificOSError. This is a known Python/Windows platform difference.Fix
Add
OSErrorto the exception tuple at the one affected call site:This matches the pattern already used at two other
os.killcall sites in the same file (lines 522 and 633 in the current main), which already includeOSError.Changes
gateway/status.pyOSErrorto exception tuple at line 499tests/test_gateway_status_pid_check.pyTests
New test file
tests/test_gateway_status_pid_check.pycovers:os.killraisesOSError(11, ...)→ treated as stale (the regression)os.killraisesProcessLookupError→ treated as stale (non-regression)PermissionErrorpath — no unhandled exception (non-regression)Nonereturned (baseline)