Skip to content

fix(windows): add Windows-compatible PID checking and file locking#9112

Closed
PhilipAD wants to merge 1 commit into
NousResearch:mainfrom
PhilipAD:windows-compat-20260413
Closed

fix(windows): add Windows-compatible PID checking and file locking#9112
PhilipAD wants to merge 1 commit into
NousResearch:mainfrom
PhilipAD:windows-compat-20260413

Conversation

@PhilipAD

Copy link
Copy Markdown
Contributor
  • Add _pid_is_running() using OpenProcess on Windows; os.kill(pid, 0) raises WinError 87 on Windows when the process does not exist, breaking every PID liveness check in the codebase
  • Replace all bare os.kill(pid, 0) calls with _pid_is_running()
  • Add msvcrt-based file locking for MemoryStore on Windows (fcntl unavailable)
  • Import _pid_is_running into browser_tool, process_registry, gateway, profiles
  • Normalise emoji log markers to [ok]/[fail] for Windows console compatibility

What does this PR do?

On Windows, os.kill(pid, 0) raises WinError 87 ("The parameter is incorrect") instead of ProcessLookupError when the target process does not exist. This means every PID liveness check across the codebase silently mis-classifies dead processes as alive (or crashes unexpectedly), breaking gateway lock handling, browser tool lifecycle tracking, process registry management, and profile session detection.

This PR introduces a single cross-platform _pid_is_running() helper that uses OpenProcess on Windows and falls back to os.kill(pid, 0) on POSIX. All seven call-sites that previously used the raw os.kill pattern are replaced with this helper. The PR also adds msvcrt-based file locking for MemoryStore (since fcntl is unavailable on Windows) and normalises emoji log markers (✓/✗) to plain [ok]/[fail] so output is readable on Windows console code pages that cannot render those characters.

Related Issue

Fixes #

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/status.py — Add _pid_is_running() using ctypes.windll.kernel32.OpenProcess on Windows with os.kill(pid, 0) POSIX fallback; replace all bare os.kill(pid, 0) liveness checks with _pid_is_running()

  • gateway/run.py — Replace os.kill(pid, 0) with _pid_is_running(); normalise emoji log markers to [ok]/[fail]

  • hermes_cli/gateway.py — Import and use _pid_is_running() in place of raw os.kill calls

  • hermes_cli/profiles.py — Import and use _pid_is_running() for session liveness detection

  • tools/browser_tool.py — Import and use _pid_is_running(); replace emoji markers with [ok]/[fail]

  • tools/memory_tool.py — Add msvcrt-based _lock_file() / _unlock_file() for Windows alongside the existing fcntl path; guard import behind sys.platform

  • tools/process_registry.py — Import and use _pid_is_running() in process liveness checks

How to Test

  1. PID check regression (Windows) — Start a gateway session, note the PID, kill the process from Task Manager, then run hermes status. Confirm it reports the gateway as stopped rather than hanging or raising WinError 87.

  2. MemoryStore file locking (Windows) — Run pytest tests/ -q -k memory on Windows and confirm no AttributeError: module 'fcntl' or lock-related failures.

  3. Emoji / console output (Windows) — Run any hermes command that previously emitted ✓/✗ markers (e.g., hermes gateway start) in a Windows cmd.exe or PowerShell window with a non-UTF-8 code page (e.g., chcp 437) and confirm output is clean with [ok]/[fail] instead of garbled characters.

  4. POSIX regression — Run the full test suite on Linux/macOS (pytest tests/ -q) and confirm no regressions.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform:

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

- Add _pid_is_running() using OpenProcess on Windows; os.kill(pid, 0) raises
  WinError 87 on Windows when the process does not exist, breaking every PID
  liveness check in the codebase
- Replace all bare os.kill(pid, 0) calls with _pid_is_running()
- Add msvcrt-based file locking for MemoryStore on Windows (fcntl unavailable)
- Import _pid_is_running into browser_tool, process_registry, gateway, profiles
- Normalise emoji log markers to [ok]/[fail] for Windows console compatibility
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery comp/agent Core agent loop, run_agent.py, prompt builder tool/browser Browser automation (CDP, Playwright) labels Apr 27, 2026
@teknium1

teknium1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for this — appreciate the work. We're closing the entire cluster of open native-Windows PRs (44 of them spanning installer, terminal routing, file ops, gateway PID handling, encoding, docs, and more) because the surface area needs a designed, consolidated approach rather than piecemeal merges. Cherry-picking individual fixes keeps leaving inconsistencies and we'd rather land Windows support properly, in one coherent pass.\n\nYour PR is catalogued in our internal Windows support plan. When we pick this back up (soon), we'll mine every PR in the cluster for its fix shape and credit all contributors whose work informs the final patch via lines. Watch for the consolidating PR and feel free to chime in with context on the specific failure mode you were hitting.\n\nClosing for now, not as a rejection of the fix — just queueing it for the designed rollout. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists tool/browser Browser automation (CDP, Playwright) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants