Skip to content

[BUG]:Windows: os.kill(pid, 0) raises OSError [WinError 87] and crashes gateway startup #12359

@yushuosun

Description

@yushuosun

Environment

  • Hermes Agent v0.10.0 (2026.4.16)
  • OS: Windows 11
  • Python: 3.11.15

Summary

On Windows, os.kill(pid, 0) raises OSError: [WinError 87] ERROR_INVALID_PARAMETER for non-existent PIDs, not ProcessLookupError as on POSIX. Several liveness checks only catch (ProcessLookupError, PermissionError), so a stale gateway.pid file from a previous run makes hermes gateway run crash before the gateway can start.

Reproduce

  1. On Windows, run hermes gateway run, then kill the process (or reboot) so that %LOCALAPPDATA%\hermes\gateway.pid is left behind referencing a PID that no longer exists.
  2. Run hermes gateway run again.

Actual

```
File "...\gateway\status.py", line 578, in get_running_pid
os.kill(pid, 0)
OSError: [WinError 87] 参数错误。
```

Additional identical failures surface downstream once the first one is fixed:

  • `gateway/status.py:343` in `acquire_scoped_lock` — breaks Telegram connect (`[Telegram] Failed to connect to Telegram: [WinError 87]`)
  • `tools/process_registry.py:258` in `_is_host_pid_alive` — logged as `"Process checkpoint recovery: [WinError 87]"`
  • `gateway/run.py:10318` — `--replace` wait loop would hang/crash on a dead PID

Expected

Stale/non-existent PIDs on Windows are treated as "process not found", the stale pid/lock files are cleaned up, and the gateway starts normally.

Root cause

Python docs note that on Windows `os.kill` signals invalid PIDs via `OSError` (errno 22 / WinError 87), not `ProcessLookupError`. The existing handlers are POSIX-only.

Suggested fix

Add `except OSError` (treat as "process gone" + cleanup) alongside the existing `except (ProcessLookupError, PermissionError)` at:

  • `gateway/status.py:578` (`get_running_pid`)
  • `gateway/status.py:343` (`acquire_scoped_lock`)
  • `gateway/run.py:10318` (replace-wait loop)
  • `tools/process_registry.py:258` (`_is_host_pid_alive`)

Each site is a one-line addition mirroring the existing cleanup branch. Verified locally on Windows 11 — gateway starts cleanly after applying these four changes. Happy to open a PR if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundarea/configConfig system, migrations, profilescomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions