fix(windows): add Windows-compatible PID checking and file locking#9112
fix(windows): add Windows-compatible PID checking and file locking#9112PhilipAD wants to merge 1 commit into
Conversation
- Add _pid_is_running() using OpenProcess on Windows; os.kill(pid, 0) raises WinError 87 on Windows when the process does not exist, breaking every PID liveness check in the codebase - Replace all bare os.kill(pid, 0) calls with _pid_is_running() - Add msvcrt-based file locking for MemoryStore on Windows (fcntl unavailable) - Import _pid_is_running into browser_tool, process_registry, gateway, profiles - Normalise emoji log markers to [ok]/[fail] for Windows console compatibility
|
Thanks for this — appreciate the work. We're closing the entire cluster of open native-Windows PRs (44 of them spanning installer, terminal routing, file ops, gateway PID handling, encoding, docs, and more) because the surface area needs a designed, consolidated approach rather than piecemeal merges. Cherry-picking individual fixes keeps leaving inconsistencies and we'd rather land Windows support properly, in one coherent pass.\n\nYour PR is catalogued in our internal Windows support plan. When we pick this back up (soon), we'll mine every PR in the cluster for its fix shape and credit all contributors whose work informs the final patch via lines. Watch for the consolidating PR and feel free to chime in with context on the specific failure mode you were hitting.\n\nClosing for now, not as a rejection of the fix — just queueing it for the designed rollout. Thanks again. |
What does this PR do?
On Windows, os.kill(pid, 0) raises WinError 87 ("The parameter is incorrect") instead of ProcessLookupError when the target process does not exist. This means every PID liveness check across the codebase silently mis-classifies dead processes as alive (or crashes unexpectedly), breaking gateway lock handling, browser tool lifecycle tracking, process registry management, and profile session detection.
This PR introduces a single cross-platform _pid_is_running() helper that uses OpenProcess on Windows and falls back to os.kill(pid, 0) on POSIX. All seven call-sites that previously used the raw os.kill pattern are replaced with this helper. The PR also adds msvcrt-based file locking for MemoryStore (since fcntl is unavailable on Windows) and normalises emoji log markers (✓/✗) to plain [ok]/[fail] so output is readable on Windows console code pages that cannot render those characters.
Related Issue
Fixes #
Type of Change
Changes Made
gateway/status.py — Add _pid_is_running() using ctypes.windll.kernel32.OpenProcess on Windows with os.kill(pid, 0) POSIX fallback; replace all bare os.kill(pid, 0) liveness checks with _pid_is_running()
gateway/run.py — Replace os.kill(pid, 0) with _pid_is_running(); normalise emoji log markers to [ok]/[fail]
hermes_cli/gateway.py — Import and use _pid_is_running() in place of raw os.kill calls
hermes_cli/profiles.py — Import and use _pid_is_running() for session liveness detection
tools/browser_tool.py — Import and use _pid_is_running(); replace emoji markers with [ok]/[fail]
tools/memory_tool.py — Add msvcrt-based _lock_file() / _unlock_file() for Windows alongside the existing fcntl path; guard import behind sys.platform
tools/process_registry.py — Import and use _pid_is_running() in process liveness checks
How to Test
PID check regression (Windows) — Start a gateway session, note the PID, kill the process from Task Manager, then run hermes status. Confirm it reports the gateway as stopped rather than hanging or raising WinError 87.
MemoryStore file locking (Windows) — Run pytest tests/ -q -k memory on Windows and confirm no AttributeError: module 'fcntl' or lock-related failures.
Emoji / console output (Windows) — Run any hermes command that previously emitted ✓/✗ markers (e.g., hermes gateway start) in a Windows cmd.exe or PowerShell window with a non-UTF-8 code page (e.g., chcp 437) and confirm output is clean with [ok]/[fail] instead of garbled characters.
POSIX regression — Run the full test suite on Linux/macOS (pytest tests/ -q) and confirm no regressions.
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs