Skip to content

fix(gateway): handle Windows OSError/SystemError in os.kill probes (#5760)#5762

Open
r266-tech wants to merge 2 commits into
NousResearch:mainfrom
r266-tech:fix-windows-os-kill-status
Open

fix(gateway): handle Windows OSError/SystemError in os.kill probes (#5760)#5762
r266-tech wants to merge 2 commits into
NousResearch:mainfrom
r266-tech:fix-windows-os-kill-status

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

Summary

Fixes #5760. On Windows, os.kill(pid, 0) does not behave like POSIX. When the PID is invalid or the process has already exited, CPython raises:

  • OSError: [WinError 87] The parameter is incorrect (instead of ProcessLookupError)
  • SystemError: <built-in function kill> returned a result with an exception set (when the C-level wrapper sets an exception but returns a non-standard error)

gateway/status.py only catches (ProcessLookupError, PermissionError) in two places, so on Windows the exception escapes and:

  • acquire_scoped_lock (line 267) blocks the Telegram platform connection
  • get_running_pid (line 370) blocks gateway startup entirely

Both happen whenever a previous gateway instance leaves a stale PID file behind, which is the common case after a crash.

Fix

Add except (OSError, SystemError) clauses next to the existing (ProcessLookupError, PermissionError) clauses at both call sites. The new clauses follow the same code path (treat the previous owner as gone) and only widen behavior on Windows — POSIX still hits the original ProcessLookupError / PermissionError branches first.

Tests

Added 4 new tests in tests/gateway/test_status.py::TestGatewayWindowsCompatibility covering both call sites under both OSError and SystemError. They monkeypatch status.os.kill to raise the Windows-style exception and assert:

  • get_running_pid() returns None and removes the stale PID file
  • acquire_scoped_lock() treats the existing record as stale and acquires the lock cleanly

Verified manually against the patched module before pushing — all four new scenarios pass and the existing "kill returns None means alive" path still works.

Test plan

  • Patched module syntax-checked locally
  • All 4 new tests pass against the patched module
  • Negative case (os.kill succeeds -> process treated as alive) still works
  • CI green on Linux/macOS/Windows

zhanggttry added a commit to zhanggttry/hermes-agent that referenced this pull request Apr 22, 2026
…g to file reads

- os.kill(pid, 0) on Windows raises OSError (WinError 87) for non-existent
  PIDs instead of ProcessLookupError. Catch OSError everywhere to prevent
  crash on Windows process-existence checks.

- Path.read_text() and open() default to gbk on Chinese Windows. Add
  explicit encoding='utf-8' to all file reads to prevent UnicodeDecodeError
  when config files or skill manifests contain non-ASCII characters.

Files changed:
- gateway/run.py, gateway/status.py (os.kill + read_text)
- hermes_cli/*.py (read_text + open)
- tools/*.py (read_text)

Closes: NousResearch#13587 NousResearch#5762 NousResearch#7835 NousResearch#9024
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing with #6310 and #7552 which also fix #5760. Maintainer should pick one.

1 similar comment
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing with #6310 and #7552 which also fix #5760. Maintainer should pick one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: os.kill(pid, 0) raises SystemError/OSError in gateway/status.py

2 participants