docs(security): rewrite policy around OS-level isolation as the boundary#20317
Conversation
Restate the trust model from first principles: the OS is the only load-bearing boundary against an adversarial LLM. Distinguish terminal-backend isolation (sandboxes the shell tool) from whole-process wrapping (sandboxes the agent itself, reference deployment NVIDIA OpenShell). Name in-process components (approval gate, output redaction, Skills Guard) as heuristics, and the class of reports that defeat them as out of scope under this policy — while explicitly welcoming them as regular issues or PRs. Introduce 'agent-loaded content' as the narrow, honest commitment: attacker-influenced input must not chain into a write the agent later loads on its own initiative. Strip implementation-detail enumerations (backend names, adapter names, config keys, env vars, internal symbols) so the doc stays evergreen as code evolves.
|
Thank you for this rewrite — the shift to "OS is the only load-bearing boundary" is a much clearer mental model than the previous policy, and naming in-process components as heuristics explicitly sets the right expectations for reporters and operators. I have a few observations from reading the policy against the codebase: 1. External-surface allowlist enforcement (§2.6, rule 2)
This is a strong and testable contract. Has the team audited the current adapters against this rule? I ask because some adapters (e.g. 2. MCP trust boundary (§2.3 vs §2.4)The policy says MCP subprocesses receive a filtered environment (credential scrubbing), but doesn't explicitly classify MCP servers as either a boundary or a heuristic. From the code:
Under terminal-backend isolation, MCP servers run on the host (not inside the sandbox), so they're inside the trust envelope. Under whole-process wrapping, they'd be confined. It might be worth a one-liner in §2.2 or §2.3 clarifying where MCP servers sit relative to the two postures, since "MCP server" is a term operators will encounter frequently and the current policy leaves it implicit. 3. Skill import-time execution (§2.4)
This is an important statement. For operators who want to do this review, it might help to name the specific code path — 4. Minor: OpenShell setup referenceThe policy references NVIDIA OpenShell as a supported whole-process wrapping option, but there's no link to a Hermes-specific setup guide. If the integration is production-ready, a short "see docs/openshell.md" pointer would help operators actually adopt it. If it's aspirational, a note like "integration in progress" would set expectations. 5. Suggestion: concrete examples for §3.2The out-of-scope section is clear in principle, but a few concrete examples would help community contributors calibrate before filing:
These don't need to be in the policy itself — a Overall this is a strong policy rewrite. The explicit "heuristic vs boundary" distinction will save both the team and reporters significant time triaging reports. Happy to help with any of the above if the team agrees on direction. |
…-policy docs(security): rewrite policy around OS-level isolation as the boundary
…-policy docs(security): rewrite policy around OS-level isolation as the boundary
…-policy docs(security): rewrite policy around OS-level isolation as the boundary
…-policy docs(security): rewrite policy around OS-level isolation as the boundary
This is a proposed rewrite of the core security policy of Hermes Agent. It outlines the trust model that the agent operates under, and the processes for security vulnerability reporting. The key pieces of it are:
This creates a much narrower scope of what constitutes a security vulnerability vs. what can go through the normal PR process. It also gives a firmer commitment on what really can be guaranteed at the various trust boundaries.
We'd like gather community feedback on adopting this new security policy, please leave your comments below!