Environment
- Hermes Agent v0.10.0 (2026.4.16)
- OS: Windows 11
- Python: 3.11.15
Summary
On Windows, os.kill(pid, 0) raises OSError: [WinError 87] ERROR_INVALID_PARAMETER for non-existent PIDs, not ProcessLookupError as on POSIX. Several liveness checks only catch (ProcessLookupError, PermissionError), so a stale gateway.pid file from a previous run makes hermes gateway run crash before the gateway can start.
Reproduce
- On Windows, run
hermes gateway run, then kill the process (or reboot) so that %LOCALAPPDATA%\hermes\gateway.pid is left behind referencing a PID that no longer exists.
- Run
hermes gateway run again.
Actual
```
File "...\gateway\status.py", line 578, in get_running_pid
os.kill(pid, 0)
OSError: [WinError 87] 参数错误。
```
Additional identical failures surface downstream once the first one is fixed:
- `gateway/status.py:343` in `acquire_scoped_lock` — breaks Telegram connect (`[Telegram] Failed to connect to Telegram: [WinError 87]`)
- `tools/process_registry.py:258` in `_is_host_pid_alive` — logged as `"Process checkpoint recovery: [WinError 87]"`
- `gateway/run.py:10318` — `--replace` wait loop would hang/crash on a dead PID
Expected
Stale/non-existent PIDs on Windows are treated as "process not found", the stale pid/lock files are cleaned up, and the gateway starts normally.
Root cause
Python docs note that on Windows `os.kill` signals invalid PIDs via `OSError` (errno 22 / WinError 87), not `ProcessLookupError`. The existing handlers are POSIX-only.
Suggested fix
Add `except OSError` (treat as "process gone" + cleanup) alongside the existing `except (ProcessLookupError, PermissionError)` at:
- `gateway/status.py:578` (`get_running_pid`)
- `gateway/status.py:343` (`acquire_scoped_lock`)
- `gateway/run.py:10318` (replace-wait loop)
- `tools/process_registry.py:258` (`_is_host_pid_alive`)
Each site is a one-line addition mirroring the existing cleanup branch. Verified locally on Windows 11 — gateway starts cleanly after applying these four changes. Happy to open a PR if useful.
Environment
Summary
On Windows,
os.kill(pid, 0)raisesOSError: [WinError 87] ERROR_INVALID_PARAMETERfor non-existent PIDs, notProcessLookupErroras on POSIX. Several liveness checks only catch(ProcessLookupError, PermissionError), so a stalegateway.pidfile from a previous run makeshermes gateway runcrash before the gateway can start.Reproduce
hermes gateway run, then kill the process (or reboot) so that%LOCALAPPDATA%\hermes\gateway.pidis left behind referencing a PID that no longer exists.hermes gateway runagain.Actual
```
File "...\gateway\status.py", line 578, in get_running_pid
os.kill(pid, 0)
OSError: [WinError 87] 参数错误。
```
Additional identical failures surface downstream once the first one is fixed:
Expected
Stale/non-existent PIDs on Windows are treated as "process not found", the stale pid/lock files are cleaned up, and the gateway starts normally.
Root cause
Python docs note that on Windows `os.kill` signals invalid PIDs via `OSError` (errno 22 / WinError 87), not `ProcessLookupError`. The existing handlers are POSIX-only.
Suggested fix
Add `except OSError` (treat as "process gone" + cleanup) alongside the existing `except (ProcessLookupError, PermissionError)` at:
Each site is a one-line addition mirroring the existing cleanup branch. Verified locally on Windows 11 — gateway starts cleanly after applying these four changes. Happy to open a PR if useful.