Skip to content

fix(gateway): #8978 resolve Windows startup crash caused by os.kill(0…#9024

Closed
dip8989 wants to merge 1 commit into
NousResearch:mainfrom
dip8989:patch-6
Closed

fix(gateway): #8978 resolve Windows startup crash caused by os.kill(0…#9024
dip8989 wants to merge 1 commit into
NousResearch:mainfrom
dip8989:patch-6

Conversation

@dip8989

@dip8989 dip8989 commented Apr 13, 2026

Copy link
Copy Markdown

…) WinError 87

What does this PR do?

Resolves #8978

🚨 Architectural Defect

A critical OS-compatibility issue was identified in gateway/status.py. The get_running_pid() function relies on os.kill(pid, 0) for process existence checking. While this is a standard POSIX convention, passing a 0 signal on Windows immediately raises an OSError (WinError 87: The parameter is incorrect). This caused the gateway daemon to crash unconditionally during its startup health-check sequence on Windows machines.

🛠️ Mitigation Strategy

Implemented a robust, zero-dependency cross-platform branching logic:

  • Windows (nt): Interfaced directly with ctypes.windll.kernel32 to use OpenProcess and GetExitCodeProcess. This safely determines process vitality (STILL_ACTIVE) without triggering signal errors or spawning heavy subshells.
  • POSIX: Retained the lightning-fast os.kill(pid, 0) check.
  • Broadened Error Handling: Added OSError to the exception catch block to gracefully handle any underlying permission/access anomalies on both operating systems, preventing future hard crashes.

📈 System Impact

  • Full gateway runtime restored for Windows 10/11 environments.
  • No added external dependencies (e.g., psutil), maintaining a lightweight core.

zhanggttry added a commit to zhanggttry/hermes-agent that referenced this pull request Apr 22, 2026
…g to file reads

- os.kill(pid, 0) on Windows raises OSError (WinError 87) for non-existent
  PIDs instead of ProcessLookupError. Catch OSError everywhere to prevent
  crash on Windows process-existence checks.

- Path.read_text() and open() default to gbk on Chinese Windows. Add
  explicit encoding='utf-8' to all file reads to prevent UnicodeDecodeError
  when config files or skill manifests contain non-ASCII characters.

Files changed:
- gateway/run.py, gateway/status.py (os.kill + read_text)
- hermes_cli/*.py (read_text + open)
- tools/*.py (read_text)

Closes: NousResearch#13587 NousResearch#5762 NousResearch#7835 NousResearch#9024
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery labels Apr 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Superseded by #14504 (merged) which fixes the same os.kill(pid, 0) Windows crash in gateway/status.py. See also #13587 (closed) and #12359 (closed).

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the detailed write-up and the fix, @dip8989! The underlying WinError 87 crash from os.kill(pid, 0) on Windows has already been resolved on main.

Automated hermes-sweeper review

  • The fix landed in commit 4c02e459 on 2026-04-23 (fix(status): catch OSError in os.kill(pid, 0) for Windows compatibility).
  • gateway/status.py now has except OSError clauses in both get_running_pid() and acquire_scoped_lock(), with comments explaining the WinError 87 scenario — covering exactly the crash this PR reported.
  • The implementation chose the lightweight except OSError path rather than the ctypes.windll.kernel32 approach you proposed, but the root crash is fully addressed.

Closing as implemented on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows: hermes gateway crashes on startup with OSError WinError 87

3 participants