Skip to content

fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049)#32213

Merged
benbarclay merged 2 commits into
NousResearch:mainfrom
briandevans:fix/file-safety-sandbox-mirror-32049
Jun 2, 2026
Merged

fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049)#32213
benbarclay merged 2 commits into
NousResearch:mainfrom
briandevans:fix/file-safety-sandbox-mirror-32049

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds a path-shape soft guard that detects when a write_file / patch target lands in the per-task sandbox-mirror of authoritative Hermes state — paths of the shape …/sandboxes/<backend>/<task>/home/.hermes/…. Those writes never make it back to the host process; the agent reports success while the authoritative file is left untouched and a divergent copy accumulates inside the sandbox.

Detection lives next to classify_cross_profile_target in agent/file_safety.py and reuses the same bypass kwarg (cross_profile=True). It mirrors the existing cross-profile guard's contract: soft, defense-in-depth, surfaces a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant.

Scope is the "defense-in-depth" piece of the proposal in #32049 — specifically the path-shape detector for the host-side speculation case. The inner-container case (where the bind mount strips the sandboxes/ prefix from the agent's view) is out of scope: it needs either a dispatch-layer check before the container handoff or the host-side profile_state / soul tool the issue also proposes.

Audited siblings: tools/file_tools._check_cross_profile_path is the single chokepoint both write_file and v4a patch go through. No widening needed at the call-site layer.

Related Issue

Fixes #32049

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • agent/file_safety.py — new classify_sandbox_mirror_target() + get_sandbox_mirror_warning(); pure path-shape detection, backend-agnostic, does not require the file to exist (the agent is about to create the mirror file — that's the bug).
  • tools/file_tools.py_check_cross_profile_path() now runs the cross-profile classifier first and falls through to the sandbox-mirror classifier; existing call sites in write_file and v4a patch pick up the new guard with no API change.
  • tests/agent/test_file_safety_sandbox_mirror.py — 13 new tests covering the exact path shape from the issue, multiple backends (docker / daytona / podman), false-positive guards (plain workspace dirs under sandboxes/, truncated paths, home/<dotfile-not-hermes>), warning format invariants, and orthogonality with the existing cross-profile classifier.

How to Test

uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest \
  tests/agent/test_file_safety_sandbox_mirror.py tests/agent/test_file_safety_cross_profile.py -v

Expected: 29 passed (13 new + 16 existing cross-profile tests, all still green).

Broader sweep on tests/agent/test_file_safety.py + tests/tools/test_file_tools.py: 49 passed.

Note: tests/tools/test_cross_profile_guard.py has 7 pre-existing failures on clean origin/main — the macOS /private/var/... tmp path is matched by _check_sensitive_path before reaching the cross-profile guard, so the assertion expects "cross-profile" in the error but gets "sensitive system path". Not caused by this PR.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run focused tests for the touched code and all pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — module docstrings on both new functions cover the contract; no user-facing docs touched
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — detector uses Path.parts so the path-shape match is portable; macOS-tested locally
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A (no tool schema changes; guard surfaces as an error string via the existing tool-result channel)

Related / Positioning

Contract Protected

  • Invariant: any path matching …/sandboxes/<backend>/<task>/home/.hermes/<thing> is classified as a sandbox-mirror target regardless of which Hermes profile is active.
  • Known-bad inputs: the exact path from the issue (…/profiles/group1/sandboxes/docker/default/home/.hermes/profiles/group1/SOUL.md), plus parametrized backends (docker, daytona, podman) and inner files (SOUL.md, memories/MEMORY.md, cron/jobs.json, .env).
  • Future-input coverage: backend name is not whitelisted; any new non-local backend that follows the established sandboxes/<name>/<task>/home/ layout is covered automatically.
  • Negative cases: plain sandboxes/<backend>/<task>/workspace/main.py (non-Hermes), sandboxes/<backend>/<task>/home/.bashrc (home but not .hermes), and truncated sandboxes/<backend>/<task> (no home/.hermes/<thing> tail) all return None and the existing cross-profile test (test_hermes_config_not_classified_as_cross_profile) is preserved.

Copilot AI review requested due to automatic review settings May 25, 2026 19:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “sandbox-mirror” soft-guard to prevent writes into per-task .hermes mirrors created by non-local terminal backends (e.g., Docker/Daytona), and wires it into the existing cross-profile write warning path.

Changes:

  • Extend _check_cross_profile_path to also detect and warn on sandbox-mirror targets.
  • Implement sandbox-mirror path-shape detection and a model-facing warning message.
  • Add a dedicated pytest suite validating sandbox-mirror classification/warning behavior and orthogonality vs cross-profile detection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
tools/file_tools.py Integrates sandbox-mirror detection into the existing cross-profile warning hook and updates its docstring.
agent/file_safety.py Adds sandbox-mirror classifier + warning generator used by file write guards.
tests/agent/test_file_safety_sandbox_mirror.py Adds tests covering the new sandbox-mirror detection and warning behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +47
assert result["mirror_root"].endswith(
"sandboxes/docker/default/home/.hermes"
)
assert result["inner_path"] == "profiles/group1/SOUL.md"
Comment on lines +71 to +72
assert result["inner_path"] == inner
assert backend in result["mirror_root"]
Comment on lines +171 to +173
assert "sandboxes/docker/default/home/.hermes" in warn
# Must hint at what the agent likely meant.
assert "profiles/group1/SOUL.md" in warn
Comment thread tools/file_tools.py
Comment on lines 196 to +198
def _check_cross_profile_path(filepath: str, task_id: str = "default") -> str | None:
"""Return a cross-profile warning string when ``filepath`` lands in
another Hermes profile's skills/plugins/cron/memories directory.

Returns ``None`` when the write is in-scope (same profile) or outside
Hermes scope entirely. Soft guard — the agent can override by passing
``cross_profile=True`` to its write tool after explicit user direction.

Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly.
See ``agent/file_safety.classify_cross_profile_target`` for the
detection rules.
"""Return a soft-guard warning when ``filepath`` lands in another Hermes
profile's scoped area or a sandbox-mirror of authoritative profile state.
Comment thread tools/file_tools.py

Two detectors run in order:

* cross-profile (#TBD) — writes that hit another profile's
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/file File tools (read, write, patch, search) backend/docker Docker container execution labels May 25, 2026
@whyhkzk

whyhkzk commented May 26, 2026

Copy link
Copy Markdown
Contributor

This closes the host-side speculation case nicely. One gap: when write_file executes inside the Docker container (the actual failure mode in #32049 — evidenced by root:root ownership on the divergent file), the guard receives /root/.hermes/… with no sandboxes/ segment and passes through. A ContextVar set by DockerEnvironment.init when persistent=True would let classify_sandbox_mirror_target catch this second path with the same warning and bypass contract. Working on a follow-up PR for this + a SKILL.md path-boundary section — happy to fold it here instead if that's cleaner.

…k .hermes mirrors

NousResearch#32049 reports that under terminal.backend: docker, write_file / patch
calls to authoritative profile state (SOUL.md, memories, etc.) land on
the sandbox-local mirror at
``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...``
— a path the host Hermes process never reads. The tool reports success,
the user sees no behavior change, and on disk two divergent copies of
SOUL.md (or any other profile file) accumulate.

The existing classify_cross_profile_target guard does not catch this:
its parts[2] check sees "sandboxes" and returns None, and the path is
in-profile from the inner-mirror perspective so even a fixed version
would not fire.

Add a parallel sandbox-mirror classifier in agent/file_safety:

  * classify_sandbox_mirror_target() detects the
    ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts.
    Detection is path-shape only — backend-agnostic, does not require
    the file to exist, and works regardless of which HERMES_HOME resolves.
  * get_sandbox_mirror_warning() returns a model-facing warning that
    names the mirror root and the inner authoritative path the agent
    likely meant.

Wire both detectors through tools/file_tools._check_cross_profile_path
so the existing write_file and v4a patch call sites pick up the new
guard with no API change. The bypass kwarg (``cross_profile=True``)
remains shared between the two guards — same "I know what I'm doing"
escape valve after explicit user direction.

This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any
…/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It
catches the host-side speculation case where the agent writes a literal
sandbox-mirror path. The inner-container case (where the bind mount
strips the ``sandboxes/`` prefix from the agent's path view) is out of
scope for this surgical change — that requires either a dispatch-layer
host-side check before the container handoff, or the host-side
``profile_state`` / ``soul`` tool the issue also proposes.

Soft guard, NOT a security boundary — matches the existing
classify_cross_profile_target contract.
@briandevans briandevans force-pushed the fix/file-safety-sandbox-mirror-32049 branch from 168a7b5 to c4661e8 Compare June 1, 2026 04:14
@benbarclay benbarclay merged commit 162c785 into NousResearch:main Jun 2, 2026
23 checks passed
benbarclay added a commit that referenced this pull request Jun 2, 2026
…r path (#32049) (#32407)

* fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049)

Brian's shape-based guard (#32213) catches paths that still carry the
full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side.
The inner-container case is not covered: when file tools execute inside
Docker the bind-mount strips that prefix, so the guard receives plain
/root/.hermes/… and passes through. The root:root ownership on the
divergent SOUL.md in #32049 confirms this is the primary failure mode.

Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment
when persistent=True. classify_container_mirror_target / get_container_
mirror_warning detect any write whose resolved path falls under that
prefix, using the same warning format and cross_profile=True bypass
contract as the existing guards. Chain the new guard in
_check_cross_profile_path after the two existing detectors.

* fix(file-safety): derive Docker mirror guard from task

---------

Co-authored-by: Ben <ben@nousresearch.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…k .hermes mirrors (NousResearch#32213)

NousResearch#32049 reports that under terminal.backend: docker, write_file / patch
calls to authoritative profile state (SOUL.md, memories, etc.) land on
the sandbox-local mirror at
``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...``
— a path the host Hermes process never reads. The tool reports success,
the user sees no behavior change, and on disk two divergent copies of
SOUL.md (or any other profile file) accumulate.

The existing classify_cross_profile_target guard does not catch this:
its parts[2] check sees "sandboxes" and returns None, and the path is
in-profile from the inner-mirror perspective so even a fixed version
would not fire.

Add a parallel sandbox-mirror classifier in agent/file_safety:

  * classify_sandbox_mirror_target() detects the
    ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts.
    Detection is path-shape only — backend-agnostic, does not require
    the file to exist, and works regardless of which HERMES_HOME resolves.
  * get_sandbox_mirror_warning() returns a model-facing warning that
    names the mirror root and the inner authoritative path the agent
    likely meant.

Wire both detectors through tools/file_tools._check_cross_profile_path
so the existing write_file and v4a patch call sites pick up the new
guard with no API change. The bypass kwarg (``cross_profile=True``)
remains shared between the two guards — same "I know what I'm doing"
escape valve after explicit user direction.

This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any
…/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It
catches the host-side speculation case where the agent writes a literal
sandbox-mirror path. The inner-container case (where the bind mount
strips the ``sandboxes/`` prefix from the agent's path view) is out of
scope for this surgical change — that requires either a dispatch-layer
host-side check before the container handoff, or the host-side
``profile_state`` / ``soul`` tool the issue also proposes.

Soft guard, NOT a security boundary — matches the existing
classify_cross_profile_target contract.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Co-authored-by: Ben Barclay <ben@nousresearch.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…r path (NousResearch#32049) (NousResearch#32407)

* fix(file-safety): extend sandbox-mirror guard to cover inner-container path (NousResearch#32049)

Brian's shape-based guard (NousResearch#32213) catches paths that still carry the
full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side.
The inner-container case is not covered: when file tools execute inside
Docker the bind-mount strips that prefix, so the guard receives plain
/root/.hermes/… and passes through. The root:root ownership on the
divergent SOUL.md in NousResearch#32049 confirms this is the primary failure mode.

Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment
when persistent=True. classify_container_mirror_target / get_container_
mirror_warning detect any write whose resolved path falls under that
prefix, using the same warning format and cross_profile=True bypass
contract as the existing guards. Chain the new guard in
_check_cross_profile_path after the two existing detectors.

* fix(file-safety): derive Docker mirror guard from task

---------

Co-authored-by: Ben <ben@nousresearch.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend/docker Docker container execution comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/file File tools (read, write, patch, search) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Docker terminal backend lets file tools write to a sandbox-mirror copy of authoritative profile state (SOUL.md, etc.) — follow-up to #31290

5 participants