spec(security): plan trust context and audience policy#249
Conversation
Critical missing pieces based on real-world agent failuresGreat work on the trust-context model. Based on our recent analysis of real agent failures (OpenClaw email disasters), I see gaps that need addressing before implementation starts. Real-world evidence of why this mattersOur blog post documents 5 specific failure modes from OpenClaw:
These aren't theoretical. They're happening right now with agents that have "good intentions" but no guardrails. What the trust-context model preventsYour design correctly identifies the root cause: no gate between agent decision and action. The trust-context approach should:
Critical missing pieces0. Gateway Authorization (NEW)
1. Sandbox enforcement
2. Memory migration strategy
3. Verified transport criteria
RecommendationAdd Phase 0 (Gateway Authorization) to the task list before starting Phase 1. The trust-context model is the right direction, but it needs authorization enforcement at the transport layer first. Also worth adding the OpenClaw failure cases as a reference section in the design doc - they're perfect examples of what happens when these patterns aren't in place. |
Critical missing pieces based on real-world agent failuresGreat work on the trust-context model. Based on our recent analysis of real agent failures (OpenClaw email disasters), I see gaps that need addressing before implementation starts. Real-world evidence of why this mattersOur blog post documents 5 specific failure modes from OpenClaw:
These aren't theoretical. They're happening right now with agents that have "good intentions" but no guardrails. How other agents handle thisOpenClaw explicitly states in their security docs that they assume a "personal assistant model" with one trusted operator boundary per gateway, and they don't support adversarial multi-tenant scenarios. Hermes Agent uses a 3-tier authorization model with allowlists, DM pairing with 1-time codes, rate limiting (1 code per 10 minutes), and file permissions set to 0600. What the trust-context model preventsYour design correctly identifies the root cause: no gate between agent decision and action. The trust-context approach should:
Critical missing pieces0. Gateway Authorization (NEW)
1. Sandbox enforcement
2. Memory migration strategy
3. Verified transport criteria
RecommendationAdd Phase 0 (Gateway Authorization) to the task list before starting Phase 1. The trust-context model is the right direction, but it needs authorization enforcement at the transport layer first. Also worth adding the OpenClaw failure cases as a reference section in the design doc — they're perfect examples of what happens when these patterns aren't in place. |
163c1d7 to
a0e1a2d
Compare
|
Proposed security-policy / trust-context test sequence now that the branch is rebased onto latest
If helpful, I can turn this into a checkbox-based test matrix next. |
a0e1a2d to
dc385f5
Compare
f19709f to
8f1af03
Compare
Cross-reference: Per-turn memory policy filtering limitationDiscovered during #370 (memory recall optimization) review — filed as #376. The trust context spec includes scenarios where per-turn audience/sensitivity filtering excludes memories from recall when trust degrades mid-session. However, the LLM's own responses are persisted to
Per-turn filtering still provides value as damage limitation — it prevents additional sensitive memories from being introduced after trust degrades, limiting blast radius. But it can't protect information that was already surfaced in a higher-trust turn. The specs should be honest about this limitation. The broader question is whether the trust boundary should be session-scoped rather than turn-scoped — trust degradation mid-session could fork/terminate the session rather than filtering recall while history is already contaminated. See #376 for the full discussion. |
Dependency: Skills directory must be always-readableThe skill discovery redesign (#355) moves from internal daemon-side file reads These paths should join identity files (SOUL.md, AGENTS.md, TOOLING.md) in the
Without this, the LLM will get "Access denied by security policy" when it tries This is the same pattern as identity files — the bot needs to read its own |
Keep lower-trust sessions fail-closed by routing built-in tools and MCP servers through explicit audience profiles. Surface the effective policy in onboarding, doctor output, and follow-up sandbox planning so operators can widen access deliberately.
d20e0ed to
35b3386
Compare
Summary
Testing