fix(file-safety): refuse read_file on credential paths (SSH keys, .env, …) (#16809)#16851
fix(file-safety): refuse read_file on credential paths (SSH keys, .env, …) (#16809)#16851briandevans wants to merge 3 commits into
Conversation
…v, …) `read_file_tool` had no sensitive-path deny list. SSH private keys, `~/.hermes/.env`, ~/.aws credentials, ~/.gnupg secrings, ~/.kube/config, and shell history were all freely readable, leaving content-redaction (off by default since 7375341) as the only line of defense — and pattern-based redaction is best-effort by design. Add a read-side deny list as the access-control floor: * `agent/file_safety.py` gains `build_read_denied_paths()`, `build_read_denied_prefixes()`, and `is_read_denied()` mirroring the write-side helpers. The path set covers credential files (SSH keys, `.netrc`, `.pgpass`, `.npmrc`, `.pypirc`, the active HERMES_HOME `.env`) and shell history (`.bash_history`, `.zsh_history`, `.psql_history`); the prefix list covers `.ssh`, `.aws`, `.gnupg`, `.kube`, `.docker`, `.azure`, and `.config/gh`. * `read_file_tool` calls `is_read_denied()` after the existing device / binary / Hermes-internal guards and returns a structured error pointing the model at the `terminal` tool (which gates on user approval) when a credential read is genuinely intentional. Scope notes: * `/etc/passwd` is intentionally not blocked — it's world-readable user metadata, not a secret, and `/etc/shadow` is already unreadable to non-root callers. * Shell config files (`.bashrc`, `.zshrc`, …) are write-denied but *read*-allowed: legitimate debugging cases (PATH issues, alias setup) outweigh the risk, and inline `export FOO=key` strings remain covered by `redact_sensitive_text` when the user opts in. * Symlink shims are caught: `is_read_denied` resolves the input via `os.path.realpath` before checking, matching the write-side behavior. Refs NousResearch#16809. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an access-control “floor” for read_file_tool to prevent the model from reading high-value credential paths (SSH keys, .env, shell history, and common credential directories), complementing the existing write-side protections.
Changes:
- Introduce read-side denylist helpers (
build_read_denied_paths/prefixes,is_read_denied) inagent/file_safety.py. - Enforce the read denylist in
tools/file_tools.py::read_file_toolbefore dedup/caching returns content. - Add a new test suite validating deny/allow behavior and the end-to-end
read_file_tooldenial response.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tools/file_tools.py |
Applies is_read_denied() inside read_file_tool to block sensitive reads with a structured error. |
agent/file_safety.py |
Adds read-side denylist construction + is_read_denied() implementation. |
tests/tools/test_read_deny.py |
New tests covering exact-path denies, prefix denies, allowed paths, and read_file_tool denial behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if is_read_denied(path): | ||
| return json.dumps({ | ||
| "error": ( |
There was a problem hiding this comment.
is_read_denied() is called with the raw path string. This can be bypassed with task-relative paths (e.g. if the terminal cwd is $HOME, reading .ssh/id_ed25519 would resolve to a sensitive file for file_ops.read_file, but is_read_denied() will resolve the relative path against the Python process cwd instead). Use the same resolved path you already computed (_resolved/resolved_str) for the deny check (and ideally add a regression test that sets TERMINAL_CWD and uses a relative .ssh/... path).
| from agent.file_safety import is_read_denied | ||
| from agent.redact import redact_sensitive_text |
There was a problem hiding this comment.
This adds a second import from agent.file_safety (is_read_denied) even though the file already imports get_read_block_error from the same module above. Consider importing both names in the existing import to avoid duplication and keep imports grouped consistently.
| Mirrors build_write_denied_paths plus shell history files. Pattern-based | ||
| output redaction (agent/redact.py) is best-effort and was made off by | ||
| default; this list is the access-control floor for high-value credential | ||
| files where leakage is unrecoverable (private keys, .env, history). |
There was a problem hiding this comment.
The docstring says this "Mirrors build_write_denied_paths plus shell history files", but the sets are not actually symmetric (e.g. this includes id_ecdsa/id_dsa while the write-deny exact list currently doesn’t, and the write-deny list includes /etc/* paths that this intentionally omits). Please reword to reflect the actual intent (credential files under $HOME + hermes .env + history), rather than claiming it mirrors the write list.
| Mirrors build_write_denied_paths plus shell history files. Pattern-based | |
| output redaction (agent/redact.py) is best-effort and was made off by | |
| default; this list is the access-control floor for high-value credential | |
| files where leakage is unrecoverable (private keys, .env, history). | |
| Covers high-value credential files under ``home``, the Hermes ``.env``, | |
| and shell/database history files. Pattern-based output redaction | |
| (agent/redact.py) is best-effort and was made off by default; this list | |
| is the access-control floor for files where leakage is unrecoverable | |
| (private keys, .env, history). |
…ring Address Copilot review on PR NousResearch#16851: **Finding 1 (`tools/file_tools.py:490` — task-relative path bypass):** ``read_file_tool`` previously called ``is_read_denied(path)`` with the *raw* string. ``is_read_denied`` resolves relative paths against the Python process cwd via ``os.path.realpath``, NOT the terminal cwd that ``_resolve_path_for_task`` uses. So an agent whose terminal cwd was ``$HOME`` could pass ``".ssh/id_ed25519"``, hit the resolver, get back ``$HOME/.ssh/id_ed25519`` — but ``is_read_denied`` would resolve the same string against, say, ``/tmp/runner``, miss the deny list, and let the read through. Pass the already-resolved path (``str(_resolved)``) to ``is_read_denied`` so the security check sees the same path the file system sees. New ``test_task_relative_ssh_key_path_does_not_bypass_guard`` covers the bypass; without the fix the test reads back the SSH key contents, with the fix it returns the credential-deny error. **Finding 2 (`tools/file_tools.py:21` — duplicate file_safety import):** Folded ``is_read_denied`` into the existing ``from agent.file_safety import get_read_block_error`` line. **Finding 3 (`agent/file_safety.py:99` — misleading "mirrors" docstring):** Replaced the "Mirrors build_write_denied_paths" claim with a precise list of the asymmetries: read-deny adds shell-history files and older SSH key types (``id_ecdsa`` / ``id_dsa``) that the write list omits, and intentionally drops ``/etc/*`` paths (those are useful to read for debugging; the truly sensitive ones are already root-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot All three findings addressed in commit Finding 1 ( Added regression test Finding 2 ( Finding 3 ( All 27 tests in |
…orkers
The ``test_task_relative_ssh_key_path_does_not_bypass_guard`` regression
test passed locally and on a fresh CI runner but failed on Linux CI when
the same xdist worker had previously run a terminal-tool test:
AssertionError: 'sensitive credential path' in 'File not found: .ssh/id_ed25519'
Root cause: ``_resolve_path_for_task`` prefers the live tracking cwd
(``_file_ops_cache`` / ``_active_environments`` lookups via
``_resolve_container_task_id``) over the ``TERMINAL_CWD`` env var. A
warm worker carries those caches across tests; my ``test_relative_ssh``
task_id was being resolved against a stale ``"default"`` entry pointing
at the runner workspace, so the resolved path landed at
``/home/runner/.ssh/id_ed25519`` (no such file) instead of the
test-tmp ``$HOME/.ssh/id_ed25519`` we created.
Fix: monkeypatch ``_get_live_tracking_cwd`` to return ``None`` for the
duration of the test, forcing the predictable ``TERMINAL_CWD`` branch.
Also use a unique task_id so we don't share state with sibling tests.
This is purely a test-stability fix — the production guard added in
a1d0427 is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
CI follow-up — |
|
CI audit — all 26
Same baseline set is documented on adjacent open PRs (#16786, #16734, #16666). The new |
Summary
build_write_denied_paths()inagent/file_safety.pyand haveread_file_toolconsult it before performing the read..env, shell history, and credential directories (.ssh,.aws,.gnupg,.kube,.docker,.azure,.config/gh) from being read by the model.The bug
read_filehad no sensitive-path deny list. The reproduction in #16809 — and on a fresh checkout ofmain— succeeds today:The only defense was pattern-based output redaction via
redact_sensitive_text(), and that path was made off by default in73753417("feat(security): make secret redaction off by default", #16794) and uses a module-level_REDACT_ENABLEDsnapshotted at import time — so config-driven opt-in is fragile, and well-known credential prefixes are the only thing recognised even when redaction is on.Write-side protection exists (
build_write_denied_paths+_check_sensitive_path) but has no read-side counterpart.The fix
Add an access-control floor for high-value credential files where leakage is unrecoverable.
agent/file_safety.pygains three new helpers:build_read_denied_paths(home)— exact paths: SSH keys (id_rsa/id_ed25519/id_ecdsa/id_dsa/authorized_keys/config),.netrc,.pgpass,.npmrc,.pypirc, the activeHERMES_HOME / .env(resolved throughget_hermes_home()so profile-aware), and shell history (.bash_history,.zsh_history,.psql_history).build_read_denied_prefixes(home)— directory prefixes:~/.ssh,~/.aws,~/.gnupg,~/.kube,~/.docker,~/.azure,~/.config/gh.is_read_denied(path)— symmetric helper. Resolves throughos.path.realpathbefore checking, so symlink shims are caught.tools/file_tools.py—read_file_toolcallsis_read_denied(path)after the existing device / binary / Hermes-internal guards (so denied paths can't return cached content from the dedup tracker either) and returns a structured error message pointing the model at theterminaltool when a credential read is genuinely intentional.Intentional scope decisions
/etc/passwdis not blocked — world-readable user metadata, not a secret./etc/shadowis already unreadable to non-root callers, so blocking it would be cosmetic..bashrc,.zshrc,.profile, …) is write-denied but read-allowed: debugging PATH issues / alias setup is a legitimate case, and inlineexport FOO=keystrings remain covered byredact_sensitive_textwhensecurity.redact_secretsis on.terminaltool, which gates on user approval — exactly the layer this guard is designed to defer to.Test plan
tests/tools/test_read_deny.py(new): 26 tests covering exact paths, prefixes, allowed paths, and end-to-endread_file_tooldenial. All pass.tests/tools/test_write_deny.py,tests/tools/test_file_read_guards.py,tests/tools/test_file_operations.py,tests/tools/test_read_loop_detection.py,tests/tools/test_file_write_safety.py,tests/agent/test_copilot_acp_client.py. The 6test_file_read_guards.py::TestFileDedup/TestWriteInvalidatesDedupfailures and the 1test_copilot_acp_client.py::test_read_text_file_redacts_sensitive_contentfailure reproduce on cleanorigin/main(8081425) — they're pre-existing macOS/private/varwrite-deny collision and the redaction-off-by-default regression respectively, unrelated to this change.Related