Skip to content

Security: read_file can exfiltrate credentials from auth.json and .anthropic_oauth.json #17656

@haolxx

Description

@haolxx

Summary

The agent's read_file tool is sandboxed to HERMES_HOME (typically ~/.hermes or /opt/data in containerized deploys). Inside that scope, agent/file_safety.py:get_read_block_error deny-lists skills/.hub/ but nothing else. That leaves credential-pool files — auth.json (provider OAuth state + plaintext API keys) and .anthropic_oauth.json (Anthropic PKCE tokens) — fully readable by the agent. A prompt-injection attack reaching read_file can exfiltrate active provider credentials in plaintext.

Reproducer

Tested against the current main branch (commit e63929d4).

>>> from agent.file_safety import get_read_block_error
>>> get_read_block_error("/opt/data/auth.json")
>>> # Returns None — read is allowed.

Concretely on a running deployment with DEEPSEEK_API_KEY set in process env: auth.json materializes at ${HERMES_HOME}/auth.json mode 0600 with the active key as plaintext in the credential_pool.deepseek[].access_token field. Mode 0600 prevents other Unix users from reading the file, but the agent itself runs as the file's owner — read_file is unaffected.

Suggested fix

Extend get_read_block_error to also block reads of ${HERMES_HOME}/auth.json, ${HERMES_HOME}/auth.lock, and ${HERMES_HOME}/.anthropic_oauth.json. Same pattern as the existing skills/.hub/ deny — pure path check, no I/O. Returns a "credential store, cannot be read directly" error message so the agent (and humans reading the trace) understand the boundary.

The agent doesn't need to read its own credentials — provider tools (auxiliary_client, credential_pool) consume them through process env / OAuth flows that bypass read_file.

Materials we have ready

We're running this fix as a local Dockerfile-overlay patch in production. Happy to send it as a PR if useful — it's:

  • ~30 lines added to agent/file_safety.py (one helper + one extra branch in get_read_block_error).
  • 7 unit tests covering the deny-list, the existing skills/.hub regression, path-traversal resolution, and the negative case (arbitrary HERMES_HOME files remain readable). All pass against main with the patch applied.

Let me know if you'd like the PR or if you'd prefer a different shape (e.g. extending tools/credential_files.py to centrally register these paths so other readers benefit too).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Critical — data loss, security, crash looparea/authAuthentication, OAuth, credential poolscomp/agentCore agent loop, run_agent.py, prompt buildertool/fileFile tools (read, write, patch, search)type/securitySecurity vulnerability or hardening

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions