fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049)#32213
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new “sandbox-mirror” soft-guard to prevent writes into per-task .hermes mirrors created by non-local terminal backends (e.g., Docker/Daytona), and wires it into the existing cross-profile write warning path.
Changes:
- Extend
_check_cross_profile_pathto also detect and warn on sandbox-mirror targets. - Implement sandbox-mirror path-shape detection and a model-facing warning message.
- Add a dedicated pytest suite validating sandbox-mirror classification/warning behavior and orthogonality vs cross-profile detection.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tools/file_tools.py | Integrates sandbox-mirror detection into the existing cross-profile warning hook and updates its docstring. |
| agent/file_safety.py | Adds sandbox-mirror classifier + warning generator used by file write guards. |
| tests/agent/test_file_safety_sandbox_mirror.py | Adds tests covering the new sandbox-mirror detection and warning behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert result["mirror_root"].endswith( | ||
| "sandboxes/docker/default/home/.hermes" | ||
| ) | ||
| assert result["inner_path"] == "profiles/group1/SOUL.md" |
| assert result["inner_path"] == inner | ||
| assert backend in result["mirror_root"] |
| assert "sandboxes/docker/default/home/.hermes" in warn | ||
| # Must hint at what the agent likely meant. | ||
| assert "profiles/group1/SOUL.md" in warn |
| def _check_cross_profile_path(filepath: str, task_id: str = "default") -> str | None: | ||
| """Return a cross-profile warning string when ``filepath`` lands in | ||
| another Hermes profile's skills/plugins/cron/memories directory. | ||
|
|
||
| Returns ``None`` when the write is in-scope (same profile) or outside | ||
| Hermes scope entirely. Soft guard — the agent can override by passing | ||
| ``cross_profile=True`` to its write tool after explicit user direction. | ||
|
|
||
| Defense-in-depth, NOT a security boundary — the terminal tool runs | ||
| as the same OS user and can write any of these paths directly. | ||
| See ``agent/file_safety.classify_cross_profile_target`` for the | ||
| detection rules. | ||
| """Return a soft-guard warning when ``filepath`` lands in another Hermes | ||
| profile's scoped area or a sandbox-mirror of authoritative profile state. |
|
|
||
| Two detectors run in order: | ||
|
|
||
| * cross-profile (#TBD) — writes that hit another profile's |
|
This closes the host-side speculation case nicely. One gap: when write_file executes inside the Docker container (the actual failure mode in #32049 — evidenced by root:root ownership on the divergent file), the guard receives /root/.hermes/… with no sandboxes/ segment and passes through. A ContextVar set by DockerEnvironment.init when persistent=True would let classify_sandbox_mirror_target catch this second path with the same warning and bypass contract. Working on a follow-up PR for this + a SKILL.md path-boundary section — happy to fold it here instead if that's cleaner. |
6bf3e7f to
168a7b5
Compare
…k .hermes mirrors NousResearch#32049 reports that under terminal.backend: docker, write_file / patch calls to authoritative profile state (SOUL.md, memories, etc.) land on the sandbox-local mirror at ``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...`` — a path the host Hermes process never reads. The tool reports success, the user sees no behavior change, and on disk two divergent copies of SOUL.md (or any other profile file) accumulate. The existing classify_cross_profile_target guard does not catch this: its parts[2] check sees "sandboxes" and returns None, and the path is in-profile from the inner-mirror perspective so even a fixed version would not fire. Add a parallel sandbox-mirror classifier in agent/file_safety: * classify_sandbox_mirror_target() detects the ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts. Detection is path-shape only — backend-agnostic, does not require the file to exist, and works regardless of which HERMES_HOME resolves. * get_sandbox_mirror_warning() returns a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant. Wire both detectors through tools/file_tools._check_cross_profile_path so the existing write_file and v4a patch call sites pick up the new guard with no API change. The bypass kwarg (``cross_profile=True``) remains shared between the two guards — same "I know what I'm doing" escape valve after explicit user direction. This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any …/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It catches the host-side speculation case where the agent writes a literal sandbox-mirror path. The inner-container case (where the bind mount strips the ``sandboxes/`` prefix from the agent's path view) is out of scope for this surgical change — that requires either a dispatch-layer host-side check before the container handoff, or the host-side ``profile_state`` / ``soul`` tool the issue also proposes. Soft guard, NOT a security boundary — matches the existing classify_cross_profile_target contract.
168a7b5 to
c4661e8
Compare
…r path (#32049) (#32407) * fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049) Brian's shape-based guard (#32213) catches paths that still carry the full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side. The inner-container case is not covered: when file tools execute inside Docker the bind-mount strips that prefix, so the guard receives plain /root/.hermes/… and passes through. The root:root ownership on the divergent SOUL.md in #32049 confirms this is the primary failure mode. Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment when persistent=True. classify_container_mirror_target / get_container_ mirror_warning detect any write whose resolved path falls under that prefix, using the same warning format and cross_profile=True bypass contract as the existing guards. Chain the new guard in _check_cross_profile_path after the two existing detectors. * fix(file-safety): derive Docker mirror guard from task --------- Co-authored-by: Ben <ben@nousresearch.com>
…k .hermes mirrors (NousResearch#32213) NousResearch#32049 reports that under terminal.backend: docker, write_file / patch calls to authoritative profile state (SOUL.md, memories, etc.) land on the sandbox-local mirror at ``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...`` — a path the host Hermes process never reads. The tool reports success, the user sees no behavior change, and on disk two divergent copies of SOUL.md (or any other profile file) accumulate. The existing classify_cross_profile_target guard does not catch this: its parts[2] check sees "sandboxes" and returns None, and the path is in-profile from the inner-mirror perspective so even a fixed version would not fire. Add a parallel sandbox-mirror classifier in agent/file_safety: * classify_sandbox_mirror_target() detects the ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts. Detection is path-shape only — backend-agnostic, does not require the file to exist, and works regardless of which HERMES_HOME resolves. * get_sandbox_mirror_warning() returns a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant. Wire both detectors through tools/file_tools._check_cross_profile_path so the existing write_file and v4a patch call sites pick up the new guard with no API change. The bypass kwarg (``cross_profile=True``) remains shared between the two guards — same "I know what I'm doing" escape valve after explicit user direction. This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any …/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It catches the host-side speculation case where the agent writes a literal sandbox-mirror path. The inner-container case (where the bind mount strips the ``sandboxes/`` prefix from the agent's path view) is out of scope for this surgical change — that requires either a dispatch-layer host-side check before the container handoff, or the host-side ``profile_state`` / ``soul`` tool the issue also proposes. Soft guard, NOT a security boundary — matches the existing classify_cross_profile_target contract. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Co-authored-by: Ben Barclay <ben@nousresearch.com>
…r path (NousResearch#32049) (NousResearch#32407) * fix(file-safety): extend sandbox-mirror guard to cover inner-container path (NousResearch#32049) Brian's shape-based guard (NousResearch#32213) catches paths that still carry the full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side. The inner-container case is not covered: when file tools execute inside Docker the bind-mount strips that prefix, so the guard receives plain /root/.hermes/… and passes through. The root:root ownership on the divergent SOUL.md in NousResearch#32049 confirms this is the primary failure mode. Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment when persistent=True. classify_container_mirror_target / get_container_ mirror_warning detect any write whose resolved path falls under that prefix, using the same warning format and cross_profile=True bypass contract as the existing guards. Chain the new guard in _check_cross_profile_path after the two existing detectors. * fix(file-safety): derive Docker mirror guard from task --------- Co-authored-by: Ben <ben@nousresearch.com>
What does this PR do?
Adds a path-shape soft guard that detects when a
write_file/patchtarget lands in the per-task sandbox-mirror of authoritative Hermes state — paths of the shape…/sandboxes/<backend>/<task>/home/.hermes/…. Those writes never make it back to the host process; the agent reports success while the authoritative file is left untouched and a divergent copy accumulates inside the sandbox.Detection lives next to
classify_cross_profile_targetinagent/file_safety.pyand reuses the same bypass kwarg (cross_profile=True). It mirrors the existing cross-profile guard's contract: soft, defense-in-depth, surfaces a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant.Scope is the "defense-in-depth" piece of the proposal in #32049 — specifically the path-shape detector for the host-side speculation case. The inner-container case (where the bind mount strips the
sandboxes/prefix from the agent's view) is out of scope: it needs either a dispatch-layer check before the container handoff or the host-sideprofile_state/soultool the issue also proposes.Related Issue
Fixes #32049
Type of Change
Changes Made
agent/file_safety.py— newclassify_sandbox_mirror_target()+get_sandbox_mirror_warning(); pure path-shape detection, backend-agnostic, does not require the file to exist (the agent is about to create the mirror file — that's the bug).tools/file_tools.py—_check_cross_profile_path()now runs the cross-profile classifier first and falls through to the sandbox-mirror classifier; existing call sites inwrite_fileand v4apatchpick up the new guard with no API change.tests/agent/test_file_safety_sandbox_mirror.py— 13 new tests covering the exact path shape from the issue, multiple backends (docker / daytona / podman), false-positive guards (plain workspace dirs undersandboxes/, truncated paths,home/<dotfile-not-hermes>), warning format invariants, and orthogonality with the existing cross-profile classifier.How to Test
Expected: 29 passed (13 new + 16 existing cross-profile tests, all still green).
Broader sweep on
tests/agent/test_file_safety.py+tests/tools/test_file_tools.py: 49 passed.Note:
tests/tools/test_cross_profile_guard.pyhas 7 pre-existing failures on cleanorigin/main— the macOS/private/var/...tmp path is matched by_check_sensitive_pathbefore reaching the cross-profile guard, so the assertion expects"cross-profile"in the error but gets"sensitive system path". Not caused by this PR.Checklist
Code
fix(scope):,feat(scope):, etc.)Documentation & Housekeeping
docs/, docstrings) — module docstrings on both new functions cover the contract; no user-facing docs touchedcli-config.yaml.exampleif I added/changed config keys — N/A (no new config keys)CONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/APath.partsso the path-shape match is portable; macOS-tested locallyRelated / Positioning
classify_cross_profile_targetsoft guard on file-write tools. This PR is the sibling axis: cross-profile catches writes into another profile'sskills/plugins/cron/memories; sandbox-mirror catches writes into a non-local backend's per-task mirror, where the path is in-profile but the mirror is the wrong "world." Both share thecross_profile=Truebypass kwarg.soul/profile_statetool and (b) aSKILL.md"Docker backend path boundary" doc section. Both are out of scope here — the first is a feature, the second is best done after this guard exists so the docs can cite it.Contract Protected
…/sandboxes/<backend>/<task>/home/.hermes/<thing>is classified as a sandbox-mirror target regardless of which Hermes profile is active.…/profiles/group1/sandboxes/docker/default/home/.hermes/profiles/group1/SOUL.md), plus parametrized backends (docker,daytona,podman) and inner files (SOUL.md,memories/MEMORY.md,cron/jobs.json,.env).sandboxes/<name>/<task>/home/layout is covered automatically.sandboxes/<backend>/<task>/workspace/main.py(non-Hermes),sandboxes/<backend>/<task>/home/.bashrc(home but not.hermes), and truncatedsandboxes/<backend>/<task>(nohome/.hermes/<thing>tail) all returnNoneand the existing cross-profile test (test_hermes_config_not_classified_as_cross_profile) is preserved.