spec(security): plan trust context and audience policy by Aaronontheweb · Pull Request #249 · netclaw-dev/netclaw

Aaronontheweb · 2026-03-15T23:41:07Z

Summary

add a new OpenSpec change for a cross-cutting trust-context and audience model spanning channels, memory, tools, MCP, and source provenance
define strict-default, fail-closed policy behavior with downgrade-only trust transitions and posture-aware shell handling
add follow-on implementation tasks for config schema updates, doctor/onboarding UX, and future sandboxed execution work

Testing

not run (planning/spec change only)

Aaronontheweb · 2026-03-16T00:03:24Z

Critical missing pieces based on real-world agent failures

Great work on the trust-context model. Based on our recent analysis of real agent failures (OpenClaw email disasters), I see gaps that need addressing before implementation starts.

Real-world evidence of why this matters

Our blog post documents 5 specific failure modes from OpenClaw:

Speed-run deletions - Agent ignored "confirm before acting" and mass-deleted an inbox faster than a human could intervene
Infinite loops - Sent 500+ confirmation messages to a user's wife, requiring a power pull to stop
Internal monologue leaks - Exposed file paths, API errors, and other customers' data to public channels
Fabricated emails - Created fake reply chains that had real-world consequences
Prompt injection - Extracted SSH keys from hidden instructions in email bodies

These aren't theoretical. They're happening right now with agents that have "good intentions" but no guardrails.

What the trust-context model prevents

Your design correctly identifies the root cause: no gate between agent decision and action. The trust-context approach should:

Block speed-run behavior via rate limiting and confirmation requirements
Prevent infinite loops via working-context timeouts
Stop internal monologue leaks via audience-aware memory/output filtering
Catch prompt injection via payload provenance validation

Critical missing pieces

0. Gateway Authorization (NEW)

Rate limiting at connection level (the 500-iMessage story shows why this is non-negotiable)
JWT bearer tokens for SignalR connections
Connection health checks with auto-reconnect limits

1. Sandbox enforcement

The "speed-run" incident happened because there was no sandbox to throttle execution
You can't have shell mode policy without an actual sandbox implementation
OpenClaw's "Computer Use" mode needs explicit isolation

2. Memory migration strategy

"Conservative defaults" is too vague
Need concrete rules for existing memories when audience model changes
What happens if a "team" memory gets exposed to "public" context?

3. Verified transport criteria

What makes a webhook "verified"?
Signature verification requirements
Payload taint detection rules

Recommendation

Add Phase 0 (Gateway Authorization) to the task list before starting Phase 1. The trust-context model is the right direction, but it needs authorization enforcement at the transport layer first.

Also worth adding the OpenClaw failure cases as a reference section in the design doc - they're perfect examples of what happens when these patterns aren't in place.

Aaronontheweb · 2026-03-16T00:10:05Z

Critical missing pieces based on real-world agent failures

Great work on the trust-context model. Based on our recent analysis of real agent failures (OpenClaw email disasters), I see gaps that need addressing before implementation starts.

Real-world evidence of why this matters

Our blog post documents 5 specific failure modes from OpenClaw:

Speed-run deletions - Agent ignored "confirm before acting" and mass-deleted an inbox faster than a human could intervene. Meta AI safety director Summer Yue had to run to her Mac to kill the process.
Infinite loops - Sent 500+ confirmation messages to a user's wife, requiring a power pull to stop. Story covered by Bloomberg.
Internal monologue leaks - Exposed file paths, API errors, and other customers' data to public channels.
Fabricated emails - Created fake reply chains that had real-world consequences. Ars Technica coverage.
Prompt injection - Extracted SSH keys from hidden instructions in email bodies. Proof of concept by researcher Johann Rehberger.

These aren't theoretical. They're happening right now with agents that have "good intentions" but no guardrails.

How other agents handle this

OpenClaw explicitly states in their security docs that they assume a "personal assistant model" with one trusted operator boundary per gateway, and they don't support adversarial multi-tenant scenarios.

Hermes Agent uses a 3-tier authorization model with allowlists, DM pairing with 1-time codes, rate limiting (1 code per 10 minutes), and file permissions set to 0600.

What the trust-context model prevents

Your design correctly identifies the root cause: no gate between agent decision and action. The trust-context approach should:

Block speed-run behavior via rate limiting and confirmation requirements
Prevent infinite loops via working-context timeouts
Stop internal monologue leaks via audience-aware memory/output filtering
Catch prompt injection via payload provenance validation

Critical missing pieces

0. Gateway Authorization (NEW)

Rate limiting at connection level (the 500-iMessage story shows why this is non-negotiable)
JWT bearer tokens for SignalR connections
Connection health checks with auto-reconnect limits

1. Sandbox enforcement

The "speed-run" incident happened because there was no sandbox to throttle execution
You can't have shell mode policy without an actual sandbox implementation
OpenClaw's "Computer Use" mode needs explicit isolation

2. Memory migration strategy

"Conservative defaults" is too vague
Need concrete rules for existing memories when audience model changes
What happens if a "team" memory gets exposed to "public" context?

3. Verified transport criteria

What makes a webhook "verified"?
Signature verification requirements
Payload taint detection rules

Recommendation

Add Phase 0 (Gateway Authorization) to the task list before starting Phase 1. The trust-context model is the right direction, but it needs authorization enforcement at the transport layer first.

Also worth adding the OpenClaw failure cases as a reference section in the design doc — they're perfect examples of what happens when these patterns aren't in place.

Aaronontheweb · 2026-03-21T13:39:28Z

Proposed security-policy / trust-context test sequence now that the branch is rebased onto latest dev and the full solution test suite is green:

Trust-context derivation

Confirm Slack thread turns enter team
Confirm local / SignalR / TUI turns enter personal
Confirm any untrusted/public ingress stays public
Verify posture only downgrades capability; nothing auto-widens to personal

Tool exposure and invocation

public: verify restricted tool set, no shell, no high-impact MCP tools
team: verify team-safe tools only; shell still denied
personal: verify shell only works when ShellExecutionMode=HostAllowed and audience profiles allow it
Confirm denied tools fail closed with an explicit policy reason

MCP discovery

Run search_tools in each audience and confirm only allowed MCP servers/tools appear
Verify dynamically discovered MCP tools still honor invocation policy after discovery
Verify sensitive/high-impact capability classes stay hidden outside personal

Memory write / recall policy

Persist a durable fact in a personal turn and confirm it is not visible from public or team
Persist a team fact and confirm it is visible to team and personal, but not public
Confirm evidence is searchable but never auto-recalled
Confirm explicit find_memories / get_memories respect both audience and boundary
Confirm shared project facts can cross channels only within the same authorized boundary

Secret handling

Try to store raw secret material and confirm it is rejected/redacted before durable persistence
Confirm secret-bearing memory never shows up in auto recall

Public file confinement

In a public session, verify file read/write stays confined to the session directory
Confirm path traversal / arbitrary host-path access is denied

Operator workflows

Run netclaw doctor and confirm bad/missing audience profiles are flagged
Run init/onboarding flow and confirm recommended default profiles are generated safely

Regression smoke

Re-run the end-to-end happy paths for Slack + SignalR after the negative tests
If all of the above passes, move the PR out of draft and do a final mergeability / CI check

If helpful, I can turn this into a checkbox-based test matrix next.

Aaronontheweb · 2026-03-21T21:46:46Z

Cross-reference: Per-turn memory policy filtering limitation

Discovered during #370 (memory recall optimization) review — filed as #376.

The trust context spec includes scenarios where per-turn audience/sensitivity filtering excludes memories from recall when trust degrades mid-session. However, the LLM's own responses are persisted to _state.History while memory injections are transient. This means:

Turn 1 (high trust): Memory recalled — "preferred airport is IEH"
LLM responds: "I'll book from IEH..." → persisted to history
Trust degrades → policy excludes the memory from recall
Information is still in the conversation history from the LLM's prior output

Per-turn filtering still provides value as damage limitation — it prevents additional sensitive memories from being introduced after trust degrades, limiting blast radius. But it can't protect information that was already surfaced in a higher-trust turn. The specs should be honest about this limitation.

The broader question is whether the trust boundary should be session-scoped rather than turn-scoped — trust degradation mid-session could fork/terminate the session rather than filtering recall while history is already contaminated. See #376 for the full discussion.

Aaronontheweb · 2026-03-22T02:22:24Z

Dependency: Skills directory must be always-readable

The skill discovery redesign (#355) moves from internal daemon-side file reads
to LLM-invoked file_read tool calls. This means the skills directory
(~/.netclaw/skills/ and feeds/skills/.system/files/) must be whitelisted
in the read policy by default, regardless of security posture.

These paths should join identity files (SOUL.md, AGENTS.md, TOOLING.md) in the
"always allowed" read set:

~/.netclaw/skills/** — user-installed skills
System skill feed paths — operator-managed skills

Without this, the LLM will get "Access denied by security policy" when it tries
to load a skill via file_read, breaking the entire skill discovery pipeline.

This is the same pattern as identity files — the bot needs to read its own
operational guidance to function. Blocking it would be like blocking the system
prompt.

Keep lower-trust sessions fail-closed by routing built-in tools and MCP servers through explicit audience profiles. Surface the effective policy in onboarding, doctor output, and follow-up sandbox planning so operators can widen access deliberately.

Aaronontheweb mentioned this pull request Mar 16, 2026

Add gateway authorization for remote SignalR and webhook exposure #252

Closed

Aaronontheweb force-pushed the feature/trust-context-security-planning branch 3 times, most recently from 163c1d7 to a0e1a2d Compare March 21, 2026 04:05

Aaronontheweb force-pushed the feature/trust-context-security-planning branch from a0e1a2d to dc385f5 Compare March 21, 2026 13:56

Aaronontheweb mentioned this pull request Mar 21, 2026

feat(security): command-level approval gates within tool grants #352

Closed

Aaronontheweb force-pushed the feature/trust-context-security-planning branch from f19709f to 8f1af03 Compare March 21, 2026 20:58

This was referenced Mar 21, 2026

arch(security): per-turn memory policy filtering is defeated by LLM output persistence #376

Open

perf(memory): stop re-injecting same memories on every tool loop iteration #370

Closed

Aaronontheweb added 14 commits March 22, 2026 23:34

spec(security): add trust context planning artifacts

e65bece

feat(security): add trust context foundations

85de2c8

feat(memory): scope recall and storage by audience

a28bc98

spec(memory): separate boundaries from memory domains

193179d

feat(memory): add trust boundaries to recall

8fa6fca

feat(memory): tighten retrieval trust boundaries

a60aa68

feat(security): gate shell by trust context

172fb25

feat(security): filter MCP discovery by trust context

f7e17f4

fix(security): confine public file access to sessions

1feabd8

spec(security): add audience policy profiles

2ba74c9

feat(security): add audience-scoped tool profiles

a57414b

fix(security): restore build after trust-context rebase

774b018

fix(memory): enforce trust policy in recall and writes

35b3386

Aaronontheweb force-pushed the feature/trust-context-security-planning branch from d20e0ed to 35b3386 Compare March 22, 2026 23:53

spec(memory): clarify persisted recall trust limits

d54c109

Aaronontheweb marked this pull request as ready for review March 23, 2026 00:39

Aaronontheweb merged commit a800e56 into dev Mar 23, 2026
3 checks passed

Aaronontheweb deleted the feature/trust-context-security-planning branch March 23, 2026 00:46

Aaronontheweb mentioned this pull request Mar 23, 2026

feat(security): integrate trust context policy with global read roots and channel audiences #387

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec(security): plan trust context and audience policy#249

spec(security): plan trust context and audience policy#249
Aaronontheweb merged 15 commits into
devfrom
feature/trust-context-security-planning

Aaronontheweb commented Mar 15, 2026

Uh oh!

Aaronontheweb commented Mar 16, 2026

Uh oh!

Aaronontheweb commented Mar 16, 2026

Uh oh!

Aaronontheweb commented Mar 21, 2026

Uh oh!

Aaronontheweb commented Mar 21, 2026

Uh oh!

Aaronontheweb commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Mar 15, 2026

Summary

Testing

Uh oh!

Aaronontheweb commented Mar 16, 2026

Critical missing pieces based on real-world agent failures

Real-world evidence of why this matters

What the trust-context model prevents

Critical missing pieces

Recommendation

Uh oh!

Aaronontheweb commented Mar 16, 2026

Critical missing pieces based on real-world agent failures

Real-world evidence of why this matters

How other agents handle this

What the trust-context model prevents

Critical missing pieces

Recommendation

Uh oh!

Aaronontheweb commented Mar 21, 2026

Uh oh!

Aaronontheweb commented Mar 21, 2026

Cross-reference: Per-turn memory policy filtering limitation

Uh oh!

Aaronontheweb commented Mar 22, 2026

Dependency: Skills directory must be always-readable

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant