memory: block dreaming self-ingestion by gumadeiras · Pull Request #66852 · openclaw/openclaw

gumadeiras · 2026-04-14T22:45:47Z

Summary

Problem: dreaming-generated narrative/report artifacts could loop back into short-term promotion, while prompt-text-only transcript detection could also suppress ordinary chats that merely quoted the dream prompt.
Why it matters: synthetic dreaming output should not become durable memory, and real user transcripts should not be dropped from session recall ingestion.
What changed: session transcript skipping now relies on internal dreaming run markers with prefix-only matching instead of prompt text; short-term promotion rejects dreaming/promotion-shaped snippets during normalization, recording, ranking, and apply; the dreaming lead detector was hardened with a linear parser that handles markdown/diff wrappers without the backtracking-prone regex shape.
What did NOT change (scope boundary): this PR does not add end-to-end provenance plumbing or a new persisted schema for first-class origin tracking.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: dreaming-generated text and grounded recall snippets converged into downstream plain-text surfaces, so promotion-path filtering had to infer provenance from content patterns.
Missing detection / guardrail: transcript ingestion also treated quoted prompt text as an authoritative dreaming marker, which could falsely classify ordinary chats as internal dreaming runs.
Contributing context (if known): dreaming reports are emitted as markdown bullets/diff-like snippets, so narrow start-of-string heuristics were insufficient to catch all contamination shapes.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/memory-host-sdk/host/session-files.test.ts
- extensions/memory-core/src/short-term-promotion.test.ts
- extensions/memory-core/src/dreaming-phases.test.ts
Scenario the test should lock in: real chats that quote the dream prompt still ingest normally, actual dreaming transcripts checkpoint as skipped, and dreaming markdown/diff candidate snippets never enter short-term promotion.
Why this is the smallest reliable guardrail: the bug spans one transcript-ingestion seam plus the short-term promotion insertion/ranking/apply surfaces.
Existing test that already covers this (if any): N/A
If no new test is added, why not: N/A

User-visible / Behavior Changes

Dreaming diary/report artifacts no longer feed durable memory promotion.
Ordinary transcripts that quote the dream-diary prompt continue to participate in session recall ingestion.

Diagram (if applicable)

Before:
[dreaming transcript or dreaming-shaped snippet] -> [session recall / short-term store] -> [promotion candidate] -> [durable memory pollution]
[ordinary chat quoting prompt] -> [misclassified as dreaming] -> [session ingestion skipped]

After:
[dreaming transcript] -> [checkpoint as skipped] -> [no session recall ingestion]
[dreaming-shaped snippet] -> [filtered from short-term promotion paths] -> [no durable promotion]
[ordinary chat quoting prompt] -> [normal session ingestion] -> [eligible grounded recall]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS
Runtime/container: local repo workspace
Model/provider: N/A
Integration/channel (if any): N/A
Relevant config (redacted): default memory-core dreaming/session ingestion paths

Steps

Create a dreaming transcript with internal dreaming bootstrap/run markers, or create a short-term snippet shaped like a dreaming candidate/report.
Run the memory-core transcript ingestion / short-term promotion path.
Create a normal transcript that merely quotes the dream-diary prompt and run the same ingestion path.

Expected

Dreaming-generated transcripts and dreaming-shaped snippets are excluded from promotion inputs.
Ordinary prompt-quoting transcripts are still ingested.

Actual

Matches expected on the current PR head.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- targeted session-files regression for prompt-quoting chats vs dreaming-marker transcripts
- targeted short-term promotion filtering for direct, stored, markdown-bullet, and diff-prefixed dreaming snippets
- targeted dreaming-phases checkpoint expectation for skipped dreaming transcripts
Edge cases checked:
- prefix-only runId/sessionKey matching
- markdown bullet wrappers
- diff-prefix wrappers
- bracket wrappers
What you did not verify:
- full end-to-end provenance redesign beyond the current heuristic/path-level fix
- full repo test suite

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Risks and Mitigations

Risk: content-based contamination heuristics remain a compatibility backstop rather than first-class provenance.
- Mitigation: transcript-side false positives were narrowed to internal markers, promotion-path filters now cover the known dreaming formats, and a provenance-first follow-up can replace the remaining heuristics later.

greptile-apps · 2026-04-14T22:48:53Z

Greptile Summary

This PR blocks dreaming self-ingestion by broadening narrative transcript detection in session-files.ts (prompt-body and runId/sessionKey fields beyond just bootstrap records) and adding isContaminatedDreamingSnippet guards at all five short-term promotion insertion points in memory-core. New tests validate recording, ranking, and apply paths. Documentation is updated to reflect that dreaming diary/report artifacts are excluded from MEMORY.md promotion.

Confidence Score: 5/5

Safe to merge; all findings are minor style or low-risk broadening choices.

Both P2 findings are non-blocking: one is a includes vs startsWith preference with negligible false-positive risk in practice, and the other is an inline string literal that could be a named constant. Core logic is correct and well-tested across all affected promotion stages.

src/memory-host-sdk/host/session-files.ts — minor: hasDreamingNarrativeRunId uses includes instead of startsWith

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/memory-host-sdk/host/session-files.ts
Line: 50-52

Comment:
**`includes` vs `startsWith` for runId matching**

`hasDreamingNarrativeRunId` uses `includes`, while the `isDreamingNarrativeBootstrapRecord` it wraps uses `startsWith`. Any string containing `"dreaming-narrative-"` as a substring (e.g. a session key like `"my-context-dreaming-narrative-abc"`) would now be flagged, which is broader than the original intent. If run IDs always follow the `dreaming-narrative-<suffix>` prefix convention, `startsWith` is the safer choice here.

```suggestion
  return typeof value === "string" && value.startsWith(DREAMING_NARRATIVE_RUN_PREFIX);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/memory-core/src/short-term-promotion.ts
Line: 248

Comment:
**Inline string vs named constant**

`"dreaming-narrative-"` is used inline here while `DREAMING_NARRATIVE_PROMPT_PREFIX` and `DREAMING_PROMOTION_META_PREFIX` are pulled from module-level constants in the same file. Extracting this as `const DREAMING_NARRATIVE_RUN_PREFIX = "dreaming-narrative-"` (mirroring the `session-files.ts` constant name) would make the three sentinel strings consistent and easier to update together.

```suggestion
    snippet.includes(DREAMING_NARRATIVE_PROMPT_PREFIX) ||
    snippet.includes(DREAMING_PROMOTION_META_PREFIX) ||
    snippet.includes(DREAMING_NARRATIVE_RUN_PREFIX)
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "memory: block dreaming self-ingestion" | Re-trigger Greptile}

Copilot

Pull request overview

This PR tightens the memory system’s “dreaming” hygiene by preventing dream-diary narrative transcripts and dream/promotion-shaped artifacts from being ingested or promoted as durable memories, keeping short-term promotion grounded in real recall evidence.

Changes:

Broaden session transcript detection to flag dreaming narrative runs via bootstrap metadata, run/session IDs, and the dream-diary prompt text; skip collecting transcript content once flagged.
Filter dreaming-/promotion-shaped snippets throughout short-term promotion (store normalization, recall recording, candidate ranking, and apply), including protection during candidate rehydration.
Document that dreaming diary/report artifacts are excluded from short-term promotion into MEMORY.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/memory-host-sdk/host/session-files.ts	Adds broader dreaming narrative detection and ensures flagged transcripts don’t contribute session content for ingestion.
src/memory-host-sdk/host/session-files.test.ts	Adds coverage for prompt-body detection and verifies flagged transcripts yield empty content/line maps.
extensions/memory-core/src/short-term-promotion.ts	Introduces dreaming contamination detection and applies it across normalize/record/rank/apply promotion flow.
extensions/memory-core/src/short-term-promotion.test.ts	Adds tests ensuring contaminated snippets are ignored during recording, ranking, and direct apply.
docs/concepts/dreaming.md	Clarifies that dreaming diary/report artifacts are excluded from short-term promotion to `MEMORY.md`.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c7d94114c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 556949020c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7145cac7b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gumadeiras · 2026-04-14T23:54:44Z

Scanner note triage:

extensions/memory-core/src/short-term-promotion.ts false-positive concerns were valid and are fixed on the current head by narrowing the contamination heuristic to exact promotion-marker comments, transcript-style dreaming prompt echoes, and the full generated candidate shape.
src/memory-host-sdk/host/session-files.ts spoofed dreaming-narrative-* marker claim is not being treated as a security issue under SECURITY.md.

Why the remaining security claim is rejected:

SECURITY.md explicitly treats reports that require write access to trusted local state (~/.openclaw, workspace files, session artifacts) as out of scope.
It also explicitly treats heuristic/parity findings and trusted-operator local DoS/data-loss claims without an auth/policy/sandbox boundary bypass as hardening, not vulnerabilities.
The scanner note does not demonstrate a documented trust-boundary bypass beyond modifying trusted local transcript/state content.

So the remaining session-files.ts concern is, at most, future hardening, not a merge blocker or accepted security finding for this PR.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d307cde276

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gumadeiras · 2026-04-15T00:02:57Z

Scanner note triage refresh on current head:

extensions/memory-core/src/short-term-promotion.ts parser complexity concern was valid as hardening; fixed by switching the diff-prefix scan to sticky-regex/index-based parsing instead of per-iteration suffix slicing.
src/memory-host-sdk/host/session-files.ts zero-line dreaming transcript checkpoint concern was valid as a performance regression; fixed by letting zero-line checkpointed files take the unchanged fast path on later sweeps.
The remaining session-files.ts spoofed dreaming-narrative-* marker claim is still rejected as a security finding under SECURITY.md.

Why that remaining security claim is rejected:

SECURITY.md treats reports requiring write access to trusted local state (~/.openclaw, workspace/session artifacts) as out of scope.
It also treats trusted-operator local DoS/data-loss and heuristic/parity findings without an auth/policy/sandbox boundary bypass as hardening, not vulnerabilities.
The report still does not show a documented trust-boundary bypass beyond modifying trusted local transcript/state content.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c759fc0d83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

aisle-research-bot · 2026-04-15T00:11:12Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Session transcript ingestion can be suppressed by spoofed dreaming-narrative runId/sessionKey prefix
2	🔵 Low	Content-based heuristic drops legitimate short-term memory entries (integrity/availability risk)

1. 🟡 Session transcript ingestion can be suppressed by spoofed dreaming-narrative runId/sessionKey prefix

Property	Value
Severity	Medium
CWE	CWE-345
Location	`src/memory-host-sdk/host/session-files.ts:10-201`

Description

buildSessionEntry() marks the entire session file as generatedByDreamingNarrative if any JSONL record contains a runId or sessionKey (top-level or nested in data) starting with "dreaming-narrative-". Once that flag is set, it skips all message content during ingestion.

This creates an integrity / audit-evasion and data-loss risk if an attacker (or any untrusted producer of JSONL records) can inject a single record with a spoofed runId/sessionKey prefix into an otherwise normal transcript:

Input: JSONL record content read from absPath (session transcript file)
Decision: isDreamingNarrativeGeneratedRecord(record) returns true for broad shapes ({runId}, {sessionKey}, or {data:{runId|sessionKey}})
Impact: after generatedByDreamingNarrative becomes true, message records are silently dropped (continue), producing empty/partial entry.content, lineMap, and affecting downstream memory ingestion/audit.

Vulnerable logic:

sets generatedByDreamingNarrative based on a broad prefix match
skips all subsequent message content when the flag is true

if (!generatedByDreamingNarrative && isDreamingNarrativeGeneratedRecord(record)) {
  generatedByDreamingNarrative = true;
}
...
if (generatedByDreamingNarrative) {
  continue;
}

Recommendation

Tighten dreaming-narrative detection and avoid dropping all message content based on a single loosely-matched record.

Suggested fixes (choose one or combine):

Narrow the predicate to only trust an explicit, well-scoped marker record (e.g. the existing bootstrap record) rather than any runId/sessionKey occurrence.
If you must detect by runId/sessionKey, require a trusted record type/customType and/or a schema version field so arbitrary custom records can’t trigger it.
If the goal is to exclude only dreaming-generated messages, filter only those specific message records rather than skipping the entire transcript.

Example (only treat the explicit bootstrap record as authoritative, and never skip all messages):

// Only set the flag on the explicit bootstrap marker
if (!generatedByDreamingNarrative && isDreamingNarrativeBootstrapRecord(record)) {
  generatedByDreamingNarrative = true;
}

// If you still want to ingest nothing for dreaming sessions, make it explicit:
// return early once, rather than silently skipping per-message.
if (generatedByDreamingNarrative) {
  return {
    path: sessionPathForFile(absPath),
    absPath,
    mtimeMs: stat.mtimeMs,
    size: stat.size,
    hash: hashText(""),
    content: "",
    lineMap: [],
    messageTimestampsMs: [],
    generatedByDreamingNarrative: true,
  };
}

Additionally, consider logging when a transcript is classified as dreaming-narrative, and why (which line/record), to improve auditability.

2. 🔵 Content-based heuristic drops legitimate short-term memory entries (integrity/availability risk)

Property	Value
Severity	Low
CWE	CWE-20
Location	`extensions/memory-core/src/short-term-promotion.ts:275-474`

Description

The short-term promotion pipeline introduces isContaminatedDreamingSnippet() and uses it to silently skip snippets/entries at multiple stages (store load, recall recording, ranking, and apply).

Because the filter is content-based (not source/path based), legitimate user-authored memory notes can be misclassified as “dreaming contamination” and then:

Not recorded when recalled
Removed/ignored when loading an existing short-term store (normalizeStore drops entries)
Excluded from ranking and promotion

This creates a data integrity/availability issue where a user (or any actor able to influence memory file content/snippets) can suppress promotion of specific memories by crafting text that matches the heuristic (e.g., starting with Candidate:/Reflections: and containing fields like confidence:, evidence: memory/..., status: staged, recalls:). The evidence: check is broad (memory/), increasing false-positive risk.

Vulnerable logic (key points):

detection heuristic:

const hasEvidence = /\bevidence:\s*(?:memory\/\.dreams\/session-corpus\/|memory\/)/i.test(snippet);
return hasNarrativeLead && hasConfidence && hasEvidence && hasStatus && hasRecalls;

destructive behavior on load:

if (snippet && isContaminatedDreamingSnippet(snippet)) {
  continue;
}

Impact: legitimate memories matching this format can be permanently excluded from promotion workflows (and effectively dropped from the normalized in-memory store representation).

Recommendation

Avoid content-only heuristics for deleting/ignoring user memory entries.

Suggested mitigations (combine as appropriate):

Scope by provenance: only apply the contamination filter to snippets known to originate from dreaming artifacts (e.g., path under memory/.dreams/ or a dedicated source: "dreaming").
Narrow the evidence matcher: remove the broad memory/ alternative and match only memory/.dreams/session-corpus/ (or another dreaming-specific prefix).
Quarantine instead of drop: retain entries but mark them with a flag (e.g., suspectedDreaming: true) and exclude them only from promotion, not from store normalization.

Example narrowing (conceptual):

const hasEvidence = /\bevidence:\s*memory\/\.dreams\/session-corpus\//i.test(snippet);

function isContaminatedDreamingSnippet(snippet: string, path?: string): boolean {
  if (path && !path.startsWith("memory/.dreams/")) return false;
  ...
}

This reduces false positives and prevents accidental suppression of legitimate user notes.

Analyzed PR: #66852 at commit 4742656

_{Last updated on: 2026-04-15T00:38:40Z}

gumadeiras · 2026-04-15T00:29:16Z

Merged via squash.

Prepared head SHA: 4742656a0d03c90902383213ac0608bcc51c0fbd
Merge commit: 0c4e0d703023c93bed101fcf62a92dbfd3537bcc

Thanks @gumadeiras!

@gumadeiras