Skip to content

docs(security): rewrite policy around OS-level isolation as the boundary#20317

Merged
jquesnelle merged 2 commits into
mainfrom
meta/security-policy
May 11, 2026
Merged

docs(security): rewrite policy around OS-level isolation as the boundary#20317
jquesnelle merged 2 commits into
mainfrom
meta/security-policy

Conversation

@jquesnelle

Copy link
Copy Markdown
Collaborator

This is a proposed rewrite of the core security policy of Hermes Agent. It outlines the trust model that the agent operates under, and the processes for security vulnerability reporting. The key pieces of it are:

  • Restate the trust model from first principles: the OS is the only load-bearing boundary against an adversarial LLM
  • Distinguish terminal-backend isolation from whole-process wrapping
  • Name in-process components (approval gate, output redaction, Skills Guard) as heuristics, and the class of reports that defeat them as out of scope under this policy while explicitly welcoming them as regular issues or PRs

This creates a much narrower scope of what constitutes a security vulnerability vs. what can go through the normal PR process. It also gives a firmer commitment on what really can be guaranteed at the various trust boundaries.

We'd like gather community feedback on adopting this new security policy, please leave your comments below!

Restate the trust model from first principles: the OS is the only
load-bearing boundary against an adversarial LLM. Distinguish
terminal-backend isolation (sandboxes the shell tool) from
whole-process wrapping (sandboxes the agent itself, reference
deployment NVIDIA OpenShell). Name in-process components (approval
gate, output redaction, Skills Guard) as heuristics, and the class
of reports that defeat them as out of scope under this policy —
while explicitly welcoming them as regular issues or PRs.

Introduce 'agent-loaded content' as the narrow, honest commitment:
attacker-influenced input must not chain into a write the agent
later loads on its own initiative.

Strip implementation-detail enumerations (backend names, adapter
names, config keys, env vars, internal symbols) so the doc stays
evergreen as code evolves.
@alt-glitch alt-glitch added type/security Security vulnerability or hardening comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have labels May 5, 2026
@vominh1919

Copy link
Copy Markdown
Contributor

Thank you for this rewrite — the shift to "OS is the only load-bearing boundary" is a much clearer mental model than the previous policy, and naming in-process components as heuristics explicitly sets the right expectations for reporters and operators.

I have a few observations from reading the policy against the codebase:

1. External-surface allowlist enforcement (§2.6, rule 2)

Adapters must refuse to dispatch agent work, resolve approvals, or relay output until an allowlist is set. Code paths that fail open when no allowlist is configured are code bugs in scope under §3.1.

This is a strong and testable contract. Has the team audited the current adapters against this rule? I ask because some adapters (e.g. api_server) configure auth via api_key in the config block, and the gateway's PlatformConfig.from_dict() reads from the extra dict — if a user writes api_key at the top level instead of under extra, the adapter silently falls back to no-auth (see #20501). That's arguably a "fail open when no allowlist is configured" code path. If the team agrees this falls under §3.1, it could be worth calling out as an example in the policy or in a follow-up issue.

2. MCP trust boundary (§2.3 vs §2.4)

The policy says MCP subprocesses receive a filtered environment (credential scrubbing), but doesn't explicitly classify MCP servers as either a boundary or a heuristic. From the code:

  • tools/mcp_tool.py runs MCP servers as host subprocesses with _build_safe_env()
  • OSV malware checking happens for npx/uvx packages
  • MCP servers can register tools that the LLM calls directly

Under terminal-backend isolation, MCP servers run on the host (not inside the sandbox), so they're inside the trust envelope. Under whole-process wrapping, they'd be confined. It might be worth a one-liner in §2.2 or §2.3 clarifying where MCP servers sit relative to the two postures, since "MCP server" is a term operators will encounter frequently and the current policy leaves it implicit.

3. Skill import-time execution (§2.4)

Reviewing a skill means reading its Python code and scripts, not just its SKILL.md description — skills execute arbitrary Python at import time.

This is an important statement. For operators who want to do this review, it might help to name the specific code path — skill_commands.py scans ~/.hermes/skills/ and injects the skill as a user message, but the Python import happens via the tool discovery chain in model_tools.py_discover_tools(). Knowing when the arbitrary code runs (at import during tool discovery, not at invocation) helps an operator know what to audit and when.

4. Minor: OpenShell setup reference

The policy references NVIDIA OpenShell as a supported whole-process wrapping option, but there's no link to a Hermes-specific setup guide. If the integration is production-ready, a short "see docs/openshell.md" pointer would help operators actually adopt it. If it's aspirational, a note like "integration in progress" would set expectations.

5. Suggestion: concrete examples for §3.2

The out-of-scope section is clear in principle, but a few concrete examples would help community contributors calibrate before filing:

Scenario Why it's out of scope
LLM emits a malicious URL via prompt injection and the agent fetches it Prompt injection alone; no §3.1 boundary crossed
Approval gate regex bypassed by obfuscated shell command In-process heuristic; not a boundary
Skill reads ~/.hermes/.env at import time Inside trust envelope; operator should have reviewed before install

These don't need to be in the policy itself — a SECURITY-FAQ.md or a pinned issue could serve the same purpose.


Overall this is a strong policy rewrite. The explicit "heuristic vs boundary" distinction will save both the team and reporters significant time triaging reports. Happy to help with any of the above if the team agrees on direction.

@jquesnelle jquesnelle merged commit bf2cc8b into main May 11, 2026
7 checks passed
@jquesnelle jquesnelle deleted the meta/security-policy branch May 11, 2026 05:36
JinyuID pushed a commit to JinyuID/hermes-agent that referenced this pull request May 11, 2026
…-policy

docs(security): rewrite policy around OS-level isolation as the boundary
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…-policy

docs(security): rewrite policy around OS-level isolation as the boundary
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…-policy

docs(security): rewrite policy around OS-level isolation as the boundary
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…-policy

docs(security): rewrite policy around OS-level isolation as the boundary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants