Summary
Add a configurable memory/transcript hygiene doctor and sanitizer for unsafe persisted artifacts such as raw local paths, file URLs, delivery directives, and internal tool-output leakage.
Problem to solve
Long-running agents can persist text that was useful during execution but should not become durable conversational memory: raw /workspace/... paths, file://... URLs, MEDIA: delivery directives, local attachment claims, raw tool instructions, or backend-specific diagnostics. Once persisted, those strings can reappear in future prompts, summaries, active memory, or user-facing replies.
Today operators can grep/reset manually, but there is no first-class dry-run scan/sanitize workflow that reports affected session or memory surfaces with enough precision to fix them safely.
Proposed solution
Add a memory/transcript hygiene command or plugin surface with rules that can run in dry-run first:
- Scan durable session transcripts, summaries, active memory, and other configured memory stores.
- Detect configurable unsafe patterns, starting with raw local paths,
file:// URLs, MEDIA: directives, and local-path delivery claims.
- Report affected files/records, rule IDs, counts, and sample snippets with sensitive values redacted/truncated.
- Provide a conservative sanitizer that can remove or replace only the unsafe spans, with
--dry-run as the default posture.
- Support allowlisted roots/pattern exceptions for legitimate developer debugging contexts.
- Emit machine-readable JSON so operators can gate deployments or session resets.
Alternatives considered
- Manual grep: works for emergencies but misses structured stores and is hard to audit.
- Full session reset only: safe but heavy-handed; it discards useful context when a narrow sanitize would do.
- Prompting agents not to remember paths: helpful but insufficient once tool outputs or final delivery text have already been persisted.
- Deployment-specific scripts: useful locally, but the safety invariant is generic for OpenClaw operators.
Impact
Affected users/systems/channels:
- Long-running agents with memory enabled.
- Document/artifact-heavy workflows.
- Operators who need a safe deployment gate after changing tools, wrappers, channel delivery, or memory behavior.
Severity: medium. The issue rarely crashes the system, but it can poison future context and leak implementation details into user-facing replies.
Frequency: recurring in persistent multi-agent deployments and artifact-heavy workflows.
Consequence: stale/raw internal references in future turns, misleading delivery claims, and manual cleanup/reset work.
Evidence/examples
In a private OpenClaw deployment, a local memory firewall/scan gate catches issues such as:
- raw generated file paths stored as final answers
file:// URLs in conversation memory
- local-path attachment claims that should have been transport metadata
- raw tool/workflow instructions that should stay out of durable seller/user memory
That implementation is deployment-specific. This issue proposes the upstreamable core: configurable scan/sanitize primitives and JSON evidence, not private policy rules.
Additional information
This would complement transcript hygiene and artifact delivery hardening. The safest initial PR would likely be read-only scan/reporting plus tests; sanitizer/apply behavior can follow once the rule contract is reviewed.
Summary
Add a configurable memory/transcript hygiene doctor and sanitizer for unsafe persisted artifacts such as raw local paths, file URLs, delivery directives, and internal tool-output leakage.
Problem to solve
Long-running agents can persist text that was useful during execution but should not become durable conversational memory: raw
/workspace/...paths,file://...URLs,MEDIA:delivery directives, local attachment claims, raw tool instructions, or backend-specific diagnostics. Once persisted, those strings can reappear in future prompts, summaries, active memory, or user-facing replies.Today operators can grep/reset manually, but there is no first-class dry-run scan/sanitize workflow that reports affected session or memory surfaces with enough precision to fix them safely.
Proposed solution
Add a memory/transcript hygiene command or plugin surface with rules that can run in dry-run first:
file://URLs,MEDIA:directives, and local-path delivery claims.--dry-runas the default posture.Alternatives considered
Impact
Affected users/systems/channels:
Severity: medium. The issue rarely crashes the system, but it can poison future context and leak implementation details into user-facing replies.
Frequency: recurring in persistent multi-agent deployments and artifact-heavy workflows.
Consequence: stale/raw internal references in future turns, misleading delivery claims, and manual cleanup/reset work.
Evidence/examples
In a private OpenClaw deployment, a local memory firewall/scan gate catches issues such as:
file://URLs in conversation memoryThat implementation is deployment-specific. This issue proposes the upstreamable core: configurable scan/sanitize primitives and JSON evidence, not private policy rules.
Additional information
This would complement transcript hygiene and artifact delivery hardening. The safest initial PR would likely be read-only scan/reporting plus tests; sanitizer/apply behavior can follow once the rule contract is reviewed.