fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049) by briandevans · Pull Request #32213 · NousResearch/hermes-agent

briandevans · 2026-05-25T19:19:17Z

What does this PR do?

Adds a path-shape soft guard that detects when a write_file / patch target lands in the per-task sandbox-mirror of authoritative Hermes state — paths of the shape …/sandboxes/<backend>/<task>/home/.hermes/…. Those writes never make it back to the host process; the agent reports success while the authoritative file is left untouched and a divergent copy accumulates inside the sandbox.

Detection lives next to classify_cross_profile_target in agent/file_safety.py and reuses the same bypass kwarg (cross_profile=True). It mirrors the existing cross-profile guard's contract: soft, defense-in-depth, surfaces a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant.

Scope is the "defense-in-depth" piece of the proposal in #32049 — specifically the path-shape detector for the host-side speculation case. The inner-container case (where the bind mount strips the sandboxes/ prefix from the agent's view) is out of scope: it needs either a dispatch-layer check before the container handoff or the host-side profile_state / soul tool the issue also proposes.

Audited siblings: tools/file_tools._check_cross_profile_path is the single chokepoint both write_file and v4a patch go through. No widening needed at the call-site layer.

Related Issue

Fixes #32049

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

agent/file_safety.py — new classify_sandbox_mirror_target() + get_sandbox_mirror_warning(); pure path-shape detection, backend-agnostic, does not require the file to exist (the agent is about to create the mirror file — that's the bug).
tools/file_tools.py — _check_cross_profile_path() now runs the cross-profile classifier first and falls through to the sandbox-mirror classifier; existing call sites in write_file and v4a patch pick up the new guard with no API change.
tests/agent/test_file_safety_sandbox_mirror.py — 13 new tests covering the exact path shape from the issue, multiple backends (docker / daytona / podman), false-positive guards (plain workspace dirs under sandboxes/, truncated paths, home/<dotfile-not-hermes>), warning format invariants, and orthogonality with the existing cross-profile classifier.

How to Test

uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest \
  tests/agent/test_file_safety_sandbox_mirror.py tests/agent/test_file_safety_cross_profile.py -v

Expected: 29 passed (13 new + 16 existing cross-profile tests, all still green).

Broader sweep on tests/agent/test_file_safety.py + tests/tools/test_file_tools.py: 49 passed.

Note: tests/tools/test_cross_profile_guard.py has 7 pre-existing failures on clean origin/main — the macOS /private/var/... tmp path is matched by _check_sensitive_path before reaching the cross-profile guard, so the assertion expects "cross-profile" in the error but gets "sensitive system path". Not caused by this PR.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run focused tests for the touched code and all pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15.x

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — module docstrings on both new functions cover the contract; no user-facing docs touched
I've updated cli-config.yaml.example if I added/changed config keys — N/A (no new config keys)
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — detector uses Path.parts so the path-shape match is portable; macOS-tested locally
I've updated tool descriptions/schemas if I changed tool behavior — N/A (no tool schema changes; guard surfaces as an error string via the existing tool-result channel)

Related / Positioning

fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint #31290 (merged) — classify_cross_profile_target soft guard on file-write tools. This PR is the sibling axis: cross-profile catches writes into another profile's skills/plugins/cron/memories; sandbox-mirror catches writes into a non-local backend's per-task mirror, where the path is in-profile but the mirror is the wrong "world." Both share the cross_profile=True bypass kwarg.
Issue [Bug]: Docker terminal backend lets file tools write to a sandbox-mirror copy of authoritative profile state (SOUL.md, etc.) — follow-up to #31290 #32049 also proposes (a) a host-side soul / profile_state tool and (b) a SKILL.md "Docker backend path boundary" doc section. Both are out of scope here — the first is a feature, the second is best done after this guard exists so the docs can cite it.

Contract Protected

Invariant: any path matching …/sandboxes/<backend>/<task>/home/.hermes/<thing> is classified as a sandbox-mirror target regardless of which Hermes profile is active.
Known-bad inputs: the exact path from the issue (…/profiles/group1/sandboxes/docker/default/home/.hermes/profiles/group1/SOUL.md), plus parametrized backends (docker, daytona, podman) and inner files (SOUL.md, memories/MEMORY.md, cron/jobs.json, .env).
Future-input coverage: backend name is not whitelisted; any new non-local backend that follows the established sandboxes/<name>/<task>/home/ layout is covered automatically.
Negative cases: plain sandboxes/<backend>/<task>/workspace/main.py (non-Hermes), sandboxes/<backend>/<task>/home/.bashrc (home but not .hermes), and truncated sandboxes/<backend>/<task> (no home/.hermes/<thing> tail) all return None and the existing cross-profile test (test_hermes_config_not_classified_as_cross_profile) is preserved.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “sandbox-mirror” soft-guard to prevent writes into per-task .hermes mirrors created by non-local terminal backends (e.g., Docker/Daytona), and wires it into the existing cross-profile write warning path.

Changes:

Extend _check_cross_profile_path to also detect and warn on sandbox-mirror targets.
Implement sandbox-mirror path-shape detection and a model-facing warning message.
Add a dedicated pytest suite validating sandbox-mirror classification/warning behavior and orthogonality vs cross-profile detection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
tools/file_tools.py	Integrates sandbox-mirror detection into the existing cross-profile warning hook and updates its docstring.
agent/file_safety.py	Adds sandbox-mirror classifier + warning generator used by file write guards.
tests/agent/test_file_safety_sandbox_mirror.py	Adds tests covering the new sandbox-mirror detection and warning behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        assert result["mirror_root"].endswith(
+            "sandboxes/docker/default/home/.hermes"
+        )
+        assert result["inner_path"] == "profiles/group1/SOUL.md"


+        assert result["inner_path"] == inner
+        assert backend in result["mirror_root"]


+        assert "sandboxes/docker/default/home/.hermes" in warn
+        # Must hint at what the agent likely meant.
+        assert "profiles/group1/SOUL.md" in warn


 def _check_cross_profile_path(filepath: str, task_id: str = "default") -> str | None:
-    """Return a cross-profile warning string when ``filepath`` lands in
-    another Hermes profile's skills/plugins/cron/memories directory.
-
-    Returns ``None`` when the write is in-scope (same profile) or outside
-    Hermes scope entirely. Soft guard — the agent can override by passing
-    ``cross_profile=True`` to its write tool after explicit user direction.
-
-    Defense-in-depth, NOT a security boundary — the terminal tool runs
-    as the same OS user and can write any of these paths directly.
-    See ``agent/file_safety.classify_cross_profile_target`` for the
-    detection rules.
+    """Return a soft-guard warning when ``filepath`` lands in another Hermes
+    profile's scoped area or a sandbox-mirror of authoritative profile state.


+
+    Two detectors run in order:
+
+    * cross-profile (#TBD) — writes that hit another profile's


whyhkzk · 2026-05-26T04:07:26Z

This closes the host-side speculation case nicely. One gap: when write_file executes inside the Docker container (the actual failure mode in #32049 — evidenced by root:root ownership on the divergent file), the guard receives /root/.hermes/… with no sandboxes/ segment and passes through. A ContextVar set by DockerEnvironment.init when persistent=True would let classify_sandbox_mirror_target catch this second path with the same warning and bypass contract. Working on a follow-up PR for this + a SKILL.md path-boundary section — happy to fold it here instead if that's cleaner.

…k .hermes mirrors NousResearch#32049 reports that under terminal.backend: docker, write_file / patch calls to authoritative profile state (SOUL.md, memories, etc.) land on the sandbox-local mirror at ``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...`` — a path the host Hermes process never reads. The tool reports success, the user sees no behavior change, and on disk two divergent copies of SOUL.md (or any other profile file) accumulate. The existing classify_cross_profile_target guard does not catch this: its parts[2] check sees "sandboxes" and returns None, and the path is in-profile from the inner-mirror perspective so even a fixed version would not fire. Add a parallel sandbox-mirror classifier in agent/file_safety: * classify_sandbox_mirror_target() detects the ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts. Detection is path-shape only — backend-agnostic, does not require the file to exist, and works regardless of which HERMES_HOME resolves. * get_sandbox_mirror_warning() returns a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant. Wire both detectors through tools/file_tools._check_cross_profile_path so the existing write_file and v4a patch call sites pick up the new guard with no API change. The bypass kwarg (``cross_profile=True``) remains shared between the two guards — same "I know what I'm doing" escape valve after explicit user direction. This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any …/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It catches the host-side speculation case where the agent writes a literal sandbox-mirror path. The inner-container case (where the bind mount strips the ``sandboxes/`` prefix from the agent's path view) is out of scope for this surgical change — that requires either a dispatch-layer host-side check before the container handoff, or the host-side ``profile_state`` / ``soul`` tool the issue also proposes. Soft guard, NOT a security boundary — matches the existing classify_cross_profile_target contract.

…r path (#32049) (#32407) * fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049) Brian's shape-based guard (#32213) catches paths that still carry the full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side. The inner-container case is not covered: when file tools execute inside Docker the bind-mount strips that prefix, so the guard receives plain /root/.hermes/… and passes through. The root:root ownership on the divergent SOUL.md in #32049 confirms this is the primary failure mode. Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment when persistent=True. classify_container_mirror_target / get_container_ mirror_warning detect any write whose resolved path falls under that prefix, using the same warning format and cross_profile=True bypass contract as the existing guards. Chain the new guard in _check_cross_profile_path after the two existing detectors. * fix(file-safety): derive Docker mirror guard from task --------- Co-authored-by: Ben <ben@nousresearch.com>

…k .hermes mirrors (NousResearch#32213) NousResearch#32049 reports that under terminal.backend: docker, write_file / patch calls to authoritative profile state (SOUL.md, memories, etc.) land on the sandbox-local mirror at ``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...`` — a path the host Hermes process never reads. The tool reports success, the user sees no behavior change, and on disk two divergent copies of SOUL.md (or any other profile file) accumulate. The existing classify_cross_profile_target guard does not catch this: its parts[2] check sees "sandboxes" and returns None, and the path is in-profile from the inner-mirror perspective so even a fixed version would not fire. Add a parallel sandbox-mirror classifier in agent/file_safety: * classify_sandbox_mirror_target() detects the ``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts. Detection is path-shape only — backend-agnostic, does not require the file to exist, and works regardless of which HERMES_HOME resolves. * get_sandbox_mirror_warning() returns a model-facing warning that names the mirror root and the inner authoritative path the agent likely meant. Wire both detectors through tools/file_tools._check_cross_profile_path so the existing write_file and v4a patch call sites pick up the new guard with no API change. The bypass kwarg (``cross_profile=True``) remains shared between the two guards — same "I know what I'm doing" escape valve after explicit user direction. This is the defense-in-depth piece of the proposal in NousResearch#32049 ("any …/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It catches the host-side speculation case where the agent writes a literal sandbox-mirror path. The inner-container case (where the bind mount strips the ``sandboxes/`` prefix from the agent's path view) is out of scope for this surgical change — that requires either a dispatch-layer host-side check before the container handoff, or the host-side ``profile_state`` / ``soul`` tool the issue also proposes. Soft guard, NOT a security boundary — matches the existing classify_cross_profile_target contract. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Co-authored-by: Ben Barclay <ben@nousresearch.com>

…r path (NousResearch#32049) (NousResearch#32407) * fix(file-safety): extend sandbox-mirror guard to cover inner-container path (NousResearch#32049) Brian's shape-based guard (NousResearch#32213) catches paths that still carry the full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side. The inner-container case is not covered: when file tools execute inside Docker the bind-mount strips that prefix, so the guard receives plain /root/.hermes/… and passes through. The root:root ownership on the divergent SOUL.md in NousResearch#32049 confirms this is the primary failure mode. Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment when persistent=True. classify_container_mirror_target / get_container_ mirror_warning detect any write whose resolved path falls under that prefix, using the same warning format and cross_profile=True bypass contract as the existing guards. Chain the new guard in _check_cross_profile_path after the two existing detectors. * fix(file-safety): derive Docker mirror guard from task --------- Co-authored-by: Ben <ben@nousresearch.com>

Copilot AI review requested due to automatic review settings May 25, 2026 19:19

Copilot AI reviewed May 25, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/file File tools (read, write, patch, search) backend/docker Docker container execution labels May 25, 2026

whyhkzk mentioned this pull request May 26, 2026

fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049) #32407

Merged

briandevans force-pushed the fix/file-safety-sandbox-mirror-32049 branch 7 times, most recently from 6bf3e7f to 168a7b5 Compare May 31, 2026 12:34

briandevans force-pushed the fix/file-safety-sandbox-mirror-32049 branch from 168a7b5 to c4661e8 Compare June 1, 2026 04:14

Merge branch 'main' into fix/file-safety-sandbox-mirror-32049

96e408e

benbarclay merged commit 162c785 into NousResearch:main Jun 2, 2026
23 checks passed

benbarclay mentioned this pull request Jun 2, 2026

chore(release): add whyhkzk to AUTHOR_MAP for PR #32407 #37121

Merged

github-actions Bot mentioned this pull request Jun 6, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.29.2 to v2026.6.5 Docker-Hub-sirmark/docker-hermes-agent#9

Merged

sadeegamhewa mentioned this pull request Jun 10, 2026

Honcho memory prefetch hang on fresh CLI subprocess in v0.15.0 (regression from #27190) #34070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049)#32213

fix(file-safety): add sandbox-mirror soft guard for writes to per-task .hermes mirrors (#32049)#32213
benbarclay merged 2 commits into
NousResearch:mainfrom
briandevans:fix/file-safety-sandbox-mirror-32049

briandevans commented May 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

whyhkzk commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		assert result["inner_path"] == inner
		assert backend in result["mirror_root"]


		Two detectors run in order:

		* cross-profile (#TBD) — writes that hit another profile's

Conversation

briandevans commented May 25, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Related / Positioning

Contract Protected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

whyhkzk commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants