fix(gateway): treat stat-error on PATH probes as missing dir, not crash#26640
fix(gateway): treat stat-error on PATH probes as missing dir, not crash#26640briandevans wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a defensive wrapper around Path.is_dir() so service unit PATH construction doesn’t fail when directory probes raise OSError (e.g., permission issues, broken symlinks, I/O errors), and introduces regression tests for those scenarios.
Changes:
- Introduce
_safe_is_dir()that treatsOSErrorfromPath.is_dir()as “missing directory”. - Update
_build_service_path_dirs()to use_safe_is_dir()for all candidate directories. - Add unit tests covering
PermissionErrorand genericOSErrorduring PATH dir probing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/hermes_cli/test_gateway_service_paths.py | Adds regression tests ensuring service PATH generation doesn’t crash on is_dir() errors. |
| hermes_cli/gateway.py | Adds _safe_is_dir() and uses it to make PATH construction resilient to stat/filesystem errors. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def fake_is_dir(self): | ||
| # Only the hermes_home node/bin probe trips EACCES; other paths | ||
| # must keep their real behavior so the rest of the function is | ||
| # exercised normally. | ||
| if str(self).endswith("/.hermes/node/bin"): | ||
| raise PermissionError(13, "Permission denied", str(self)) | ||
| return real_is_dir(self) |
|
|
||
| def fake_is_dir(self): | ||
| if str(self).endswith("/.hermes/node_modules/.bin"): |
| Reproduces the CI baseline (#26622 audit): on Ubuntu runners executing | ||
| as root, ``stat('/root/.hermes/node/bin')`` can raise ``EACCES`` even | ||
| though the calling user owns ``/root``. Before the guard, | ||
| ``generate_systemd_unit()`` and ``generate_launchd_plist()`` would | ||
| propagate the ``OSError`` and refuse to produce any unit at all. |
|
CI audit — both failing checks are pre-existing baselines unrelated to this PR's touched code (
Same two failures show on PR #26622 and PR #26594 in the same window. Happy to re-run CI once these are addressed on main. |
|
@copilot All three findings addressed in commit 3f4f934a7:
Both regression tests still pass locally and still trigger the production guard against an unguarded baseline. |
`_build_service_path_dirs()` calls `is_dir()` on candidate venv, node,
and `~/.hermes/node*` paths to build PATH for systemd / launchd units.
Each probe assumed `is_dir()` would only ever return True/False — but
`pathlib.Path.is_dir()` calls `os.stat()` under the hood and can raise
`PermissionError` (and other `OSError` variants for broken symlinks,
unreachable network mounts, etc.).
On CI runners executing as root, `os.stat('/root/.hermes/node/bin')`
can raise `EACCES` even though the calling user technically owns the
path. The unhandled `OSError` propagated out of `generate_systemd_unit`
and `generate_launchd_plist`, refusing to produce any unit at all.
Same shape would hit any production setup with a partially-locked-down
`~/.hermes` (multi-user systemd installs, sandboxed services).
Wrap each `is_dir()` probe in a small helper that returns False on
`OSError`. The remaining is-it-present semantics are unchanged.
Regression evidence: this exact code path is what made
`tests/hermes_cli/test_gateway_service.py::TestSystemUnitHermesHome`'s
two `_remap`/`_target_user_home` cases start failing on `origin/main`
CI immediately after d69eab1 extracted `_build_service_path_dirs`
(2026-05-15). The tests reproduce on CI run 25944419334 against clean
main; local Mac runs pass because `/root/.hermes` doesn't exist there.
Sibling code paths that may need the same fix: none located — the
other filesystem probes in this module (`_detect_venv_dir`,
`_build_user_local_paths`, `_build_wsl_interop_paths`) already guard
or operate on paths where EACCES is unreachable. Happy to widen if a
sibling crash surfaces.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address three Copilot review findings on PR NousResearch#26640: - Replace `str(self).endswith("/.hermes/node/bin")` with `self == target` (and likewise for `node_modules/.bin`) so the predicate fires on Windows where Path stringifies with backslashes. Without this the injected exception was never raised on Windows runners and the regression test silently became a no-op. - Reword the EACCES docstring to a concrete failure mode (locked-down sandboxed filesystem) without asserting ownership semantics that may be incorrect. Pure test-side change. The production guard in `_build_service_path_dirs` (`hermes_cli/gateway.py`) is unchanged. Both regression tests still pass and still trigger the guard when run against the unguarded baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3f4f934 to
a3b5de9
Compare
|
Superseded by #27562 (commit The two PRs are functionally equivalent — both wrap the four Verified: Sorry about the duplicate — appreciate the analysis and the targeted regression tests on your PR. Closing without salvage since the substance is already in. If you want to follow up with your two PermissionError-specific regression tests ( |
Summary
_build_service_path_dirs()propagatedPermissionError(and otherOSErrors) fromis_dir()probes, crashinggenerate_systemd_unit()/generate_launchd_plist()instead of just skipping the unreachable path._safe_is_dir()helper that returns False onOSError. Existing "skip non-existent" semantics are unchanged.TestSystemUnitHermesHomecases on CI.The bug
hermes_cli/gateway.py:2106(_build_service_path_dirs, introduced by d69eab1ef on 2026-05-15) probes four candidate directories withpathlib.Path.is_dir()to decide what to include in the service unit'sPATH. Two of those probes target~/.hermes/node/binand~/.hermes/node_modules/.binresolved throughget_hermes_home().Path.is_dir()callsos.stat()and re-raises whateverstatreturns. On CI runners executing as root,os.stat('/root/.hermes/node/bin')raisesEACCES(the directory exists in a partially-locked-down layout) — and the unhandledPermissionErrorthen propagates all the way up out ofgenerate_systemd_unit(). Same shape would hit any production setup with a partially-locked-down~/.hermes(multi-user systemd installs, sandboxed services).This is what made
tests/hermes_cli/test_gateway_service.py::TestSystemUnitHermesHome's two_remap/_target_user_homecases start failing onorigin/mainCI starting with run 25944419334 — the symptom of the bug, not the cause.The fix
Add
_safe_is_dir(path: Path) -> boolthat wrapspath.is_dir()in a try/except forOSErrorand returns False on stat failure. Route everyis_dir()probe in_build_service_path_dirs()through it. No other semantics change — a path that succeeds atis_dir()is still treated identically.Test plan
test_service_path_treats_permission_error_as_missingpatchesPath.is_dirto raisePermissionErroronly for the/.hermes/node/binprobe and asserts the function still returns without crashing and excludes that path. Mirror test for genericOSError(EIO).tests/hermes_cli/test_gateway_service_paths.py(5/5 passed) +tests/hermes_cli/test_gateway_service.py::TestSystemUnitHermesHome(5/5 passed locally).OSError/PermissionErrorpropagating out of_build_service_path_dirs. With the fix restored, both pass.The other failures in
test_gateway_service.pyon local Mac (TestSystemdServiceRefresh::*,TestGatewaySystemServiceRouting::*—UserSystemdUnavailableError) are pre-existing baseline failures on cleanorigin/main(no systemd /loginctlon macOS) and unrelated to this change.Related
fix(gateway): build service PATH from existing dirs only, include ~/.hermes/node_modules) left off — that commit correctly introduced the existence-check pattern but didn't account forstat-time errors.