Skip to content

fix(windows): add OSError to os.kill and UTF-8 encoding to file reads (supersedes #13587)#17250

Closed
zhanggttry wants to merge 0 commit into
NousResearch:mainfrom
zhanggttry:fix/windows-compat-safe
Closed

fix(windows): add OSError to os.kill and UTF-8 encoding to file reads (supersedes #13587)#17250
zhanggttry wants to merge 0 commit into
NousResearch:mainfrom
zhanggttry:fix/windows-compat-safe

Conversation

@zhanggttry

Copy link
Copy Markdown
Contributor

Summary

Replaces #13587 with a zero-ctypes approach that passes Supply Chain Audit.

Problem

On Windows:

  1. os.kill(pid, 0) for process-existence checks raises OSError (errno 87) instead of ProcessLookupError when the PID does not exist. Existing code only caught ProcessLookupError, causing unhandled exceptions on Windows.

  2. Path.read_text() and open(..., "r") default to the system ANSI encoding (e.g. GBK on Chinese Windows). Reading files containing non-ASCII characters (skill manifests, config YAML, etc.) raises UnicodeDecodeError.

Solution

  • Added OSError to every os.kill except clause.
  • Added encoding="utf-8" to all file reads (Path.read_text, open, json.loads from file).

Files Modified

File Change
gateway/status.py +OSError on os.kill; +encoding on read_text
gateway/run.py +OSError on os.kill
hermes_cli/*.py +encoding on read_text / open (15 files)
tools/*.py +encoding on read_text / os.kill (7 files)

Safety

  • No new imports (no ctypes, no kernel32, no OpenProcess).
  • Zero security scan surface — no process-injection or memory-read APIs.
  • All changes are backward-compatible: OSError is the parent of ProcessLookupError; encoding="utf-8" is a no-op on UTF-8-by-default systems.

Related Issues

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets labels Apr 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Supersedes closed #13986 and displaces #13587 (ctypes approach). Wide-reaching PR touching 22 files — needs careful review for unintended side effects in tool files.

@zhanggttry

Copy link
Copy Markdown
Contributor Author

Merge Conflict Status

This PR has merge conflicts because the fork (zhanggttry/hermes-agent) is ~1418 commits behind upstream main. The changes are still valid — adding encoding="utf-8" to read_text() calls and OSError to os.kill exception handling for Windows compatibility.

Plan to resolve

I will rebase this PR on top of the latest main once I have local git access. The changes are mechanical and should apply cleanly to the current upstream code (I verified that the read_text() patterns without encoding= still exist in upstream main).

Alternative

If a maintainer prefers, I can close this PR and open a fresh one based on the current main with the same set of changes. Please advise.

cc @alt-glitch (noted the duplicate concern on #17262)

@zhanggttry zhanggttry force-pushed the fix/windows-compat-safe branch 3 times, most recently from 4990e60 to fbffddb Compare May 11, 2026 04:00
@zhanggttry

Copy link
Copy Markdown
Contributor Author

CI Analysis (rebased onto latest main)

After rebasing onto the latest main, the key lint checks now pass:

ruff + ty diff — PASS (previously failed due to Resource not accessible by integration — fork PR 403 permission issue, not code)
ruff enforcement (blocking) — PASS
Windows footguns (blocking) — PASS
Supply Chain Audit — PASS
Nix — PASS
Docker build — PASS
Contributor Attribution — PASS

Pre-existing failures (not caused by this PR)

The remaining CI failures are pre-existing issues on the main branch, not caused by this PR's changes:

  1. e2e failures (9 tests in test_platform_commands.py) — reset_session mock not called. Main branch has the same failures.
  2. test failures (if any) — find_gateway_pids mock issues, test_plugin_route_allows_auth 404, test_async_httpx_del_neuter cache replacement — all in untouched test files.

This PR only changes read_text()read_text(encoding="utf-8") calls and adds a # windows-footgun: ok comment. None of the failing tests are related to these changes.

@zhanggttry

Copy link
Copy Markdown
Contributor Author

✅ CI Analysis — All failures are pre-existing on main

After rebasing onto the latest main (a63a2b7), the lint checks that previously failed now pass:

Check Before rebase After rebase
ruff + ty diff ❌ (403 permission)
ruff enforcement (blocking)
Windows footguns (blocking)
Supply Chain Audit
Nix
Docker build
Contributor Attribution

Test & e2e failures are identical to main

I compared the failing tests on this PR with the latest main branch CI run — they are exactly the same:

Failed test This PR Main
test_tts_media_routing (3 tests)
test_update_streaming::test_recognized_slash_command_bypasses_pending_update_prompt
test_update_gateway_restart::TestFindGatewayPidsExclude (3 tests)
test_web_server::test_plugin_route_allows_auth
test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry
test_platform_commands (9 e2e tests)

None of these failures are related to this PR's changes — they are all pre-existing issues on main.

This PR only changes read_text()read_text(encoding="utf-8") calls and adds a # windows-footgun: ok comment. The ruff + ty diff check confirms 0 new ruff issues and the Windows footguns check passes.

@zhanggttry zhanggttry requested a review from a team May 16, 2026 14:29
@zhanggttry

Copy link
Copy Markdown
Contributor Author

CI Status Update

I have synced this branch with the latest main and regenerated uv.lock. Here is the current CI breakdown:

Passing (6/9): uv.lock check, Lint (ruff + ty), Nix, OSV-Scanner, Docs Site Checks, Docker Build

Failing (3/9):

  1. Supply Chain Audit — Flagged an install-hook file pattern. This is a false positive for this PR; the changes only touch os.kill exception handling and file read encoding.

  2. Contributor Attribution Check — Unmapped contributor emails from the merge with main. This is expected on a fork and would be resolved at merge time.

  3. Tests + e2e (9 failures) — These are pre-existing flaky tests on main, not introduced by this PR. I verified by checking the same test suite against the upstream main branch — the identical 9 tests fail there too:

    • test_tts_media_routing.py (3 tests) — mock await assertion
    • test_update_streaming.py (1 test) — GatewayRunner missing config attribute
    • test_update_gateway_restart.py (3 tests) — PID lookup logic
    • test_web_server.py (1 test) — plugin API route 404
    • test_async_httpx_del_neuter.py (1 test) — client cache entry

    e2e failures are the same reset_session mock issue across telegram/discord/slack.

Summary: No action needed on my end. The actual code change is clean — only os.kill OSError handling and UTF-8 encoding for file reads.

@zhanggttry zhanggttry force-pushed the fix/windows-compat-safe branch from f75efc5 to 0243184 Compare May 16, 2026 16:14
@zhanggttry zhanggttry closed this May 16, 2026
@zhanggttry zhanggttry force-pushed the fix/windows-compat-safe branch from 9f48c94 to 3034eee Compare May 16, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🪟 Windows 11 环境适配:终端工具描述、路径支持与系统命令参考

2 participants