Skip to content

[Feature]: Add a memory hygiene doctor and sanitizer #77447

@sercada

Description

@sercada

Summary

Add a configurable memory/transcript hygiene doctor and sanitizer for unsafe persisted artifacts such as raw local paths, file URLs, delivery directives, and internal tool-output leakage.

Problem to solve

Long-running agents can persist text that was useful during execution but should not become durable conversational memory: raw /workspace/... paths, file://... URLs, MEDIA: delivery directives, local attachment claims, raw tool instructions, or backend-specific diagnostics. Once persisted, those strings can reappear in future prompts, summaries, active memory, or user-facing replies.

Today operators can grep/reset manually, but there is no first-class dry-run scan/sanitize workflow that reports affected session or memory surfaces with enough precision to fix them safely.

Proposed solution

Add a memory/transcript hygiene command or plugin surface with rules that can run in dry-run first:

  • Scan durable session transcripts, summaries, active memory, and other configured memory stores.
  • Detect configurable unsafe patterns, starting with raw local paths, file:// URLs, MEDIA: directives, and local-path delivery claims.
  • Report affected files/records, rule IDs, counts, and sample snippets with sensitive values redacted/truncated.
  • Provide a conservative sanitizer that can remove or replace only the unsafe spans, with --dry-run as the default posture.
  • Support allowlisted roots/pattern exceptions for legitimate developer debugging contexts.
  • Emit machine-readable JSON so operators can gate deployments or session resets.

Alternatives considered

  • Manual grep: works for emergencies but misses structured stores and is hard to audit.
  • Full session reset only: safe but heavy-handed; it discards useful context when a narrow sanitize would do.
  • Prompting agents not to remember paths: helpful but insufficient once tool outputs or final delivery text have already been persisted.
  • Deployment-specific scripts: useful locally, but the safety invariant is generic for OpenClaw operators.

Impact

Affected users/systems/channels:

  • Long-running agents with memory enabled.
  • Document/artifact-heavy workflows.
  • Operators who need a safe deployment gate after changing tools, wrappers, channel delivery, or memory behavior.

Severity: medium. The issue rarely crashes the system, but it can poison future context and leak implementation details into user-facing replies.

Frequency: recurring in persistent multi-agent deployments and artifact-heavy workflows.

Consequence: stale/raw internal references in future turns, misleading delivery claims, and manual cleanup/reset work.

Evidence/examples

In a private OpenClaw deployment, a local memory firewall/scan gate catches issues such as:

  • raw generated file paths stored as final answers
  • file:// URLs in conversation memory
  • local-path attachment claims that should have been transport metadata
  • raw tool/workflow instructions that should stay out of durable seller/user memory

That implementation is deployment-specific. This issue proposes the upstreamable core: configurable scan/sanitize primitives and JSON evidence, not private policy rules.

Additional information

This would complement transcript hygiene and artifact delivery hardening. The safest initial PR would likely be read-only scan/reporting plus tests; sanitizer/apply behavior can follow once the rule contract is reviewed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions