Skip to content

arch(security): per-turn memory policy filtering is defeated by LLM output persistence #376

@Aaronontheweb

Description

@Aaronontheweb

Context

Discovered during #370 review. Intersects with the trust context security policy work in PR #249.

Problem

The current memory recall architecture filters memories per-turn based on trust context, audience, and sensitivity policy. But this protection is illusory because the LLM's own responses are persisted to _state.History while the memory injections are transient.

Example:

  1. Turn 1 (high trust, audience=personal): Memory recalled — "User's preferred airport is IEH"
  2. LLM responds: "I'll book your flight from IEH..." → persisted to _state.History
  3. Trust degrades mid-session (e.g., reading an email from an untrusted source enters the context)
  4. Memory policy dutifully excludes the airport preference from recall on subsequent turns
  5. Doesn't matter — "IEH" is already in the conversation history from the LLM's prior response

The sensitive information leaked into the persistent conversation the moment the LLM referenced it. Per-turn recall filtering is protecting the front door while the information already walked out through the LLM's output.

Implications for PR #249

PR #249's trust context spec includes scenarios like:

GIVEN a stored memory item is marked audience=personal
WHEN a public turn runs recall
THEN the item is excluded from both automatic recall and inline retrieval

This scenario is correct for first-contact gating (preventing a sensitive memory from being introduced during a low-trust turn). But it provides no protection for information that was already surfaced in a prior high-trust turn within the same session.

What per-turn filtering still protects

  • Cross-session isolation — memories from one session domain don't leak into a different session. This boundary is real.
  • First-contact gating — prevents sensitive memories from being introduced during a low-trust turn (the memory never enters the conversation at all).
  • Raw secret rejection at write time — preventing credentials from being stored as memories in the first place. Genuinely valuable.

Proposed direction

The trust boundary should be at the session level, not the turn level:

  • Trust context evaluated at session start (or when a new participant/channel joins)
  • If trust degrades mid-session, the session forks or terminates rather than trying to filter recall while history is already contaminated
  • Per-turn recall filtering becomes simpler — apply the session's established trust level, don't re-evaluate policy on every call

This also has implications for the transient injection architecture. If policy is session-scoped, the main justification for making memory injections transient (re-evaluating policy per-turn) weakens. Memories could potentially be persisted to session history on first injection, eliminating the per-call re-injection cost entirely.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    securitySecurity-related changes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions