Skip to content

fix(memory-core): stop dreaming from promoting transport metadata#67601

Closed
leochame wants to merge 6 commits into
openclaw:mainfrom
leochame:codex/issue-67442-dreaming-memory-guard
Closed

fix(memory-core): stop dreaming from promoting transport metadata#67601
leochame wants to merge 6 commits into
openclaw:mainfrom
leochame:codex/issue-67442-dreaming-memory-guard

Conversation

@leochame

@leochame leochame commented Apr 16, 2026

Copy link
Copy Markdown

Summary

  • Problem: Dreaming could promote transport/session wrapper metadata into MEMORY.md.
  • Why it matters: transient wrapper noise could become durable memory and degrade long-term memory quality.
  • What changed: added deterministic metadata stripping/rejection during dreaming ingestion and kept a final promotion-time safety gate.
  • What did NOT change (scope boundary): no Plugin SDK changes, no docs/changelog updates, and no attempt to redesign durable-memory distillation in this PR.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: dreaming session ingestion normalized transcript lines into corpus snippets without treating transport/session wrappers as non-memory content (extensions/memory-core/src/dreaming-phases.ts:554, extensions/memory-core/src/dreaming-phases.ts:839).
  • Missing detection / guardrail: promotion contamination checks only recognized dreaming-generated artifact shapes, not transport metadata shapes, so wrapper-heavy snippets could still survive to promotion (extensions/memory-core/src/short-term-promotion.ts:275, extensions/memory-core/src/short-term-promotion.ts:1545).
  • Contributing context (if known): dreaming re-ingests both session corpus and daily memory, so once metadata leaked in it could be amplified by later passes. This PR centralizes the metadata heuristics in extensions/memory-core/src/dreaming-shared.ts:4.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/memory-core/src/dreaming-phases.test.ts, extensions/memory-core/src/short-term-promotion.test.ts
  • Scenario the test should lock in: transport metadata is stripped at ingestion and contaminated snippets are rejected before ranking or final promotion.
  • Why this is the smallest reliable guardrail: the bug spans session ingestion, daily re-ingestion, and promotion, so plugin-level tests are the narrowest place that exercises the full path.
  • Existing test that already covers this (if any): none directly covered wrapper-metadata contamination into promotion.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Dreaming no longer persists transport/session wrapper metadata as promotable memory candidates, and contaminated snippets are refused before writing MEMORY.md.

Diagram (if applicable)

Before:
transport metadata -> session/daily dreaming artifacts -> short-term candidates -> MEMORY.md contamination

After:
transport metadata -> strip/reject at ingestion -> reject during ranking/promotion -> real memory only reaches MEMORY.md

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS arm64
  • Runtime/container: local Node/pnpm test environment
  • Model/provider: N/A for automated regression tests
  • Integration/channel (if any): Dreaming / memory-core
  • Relevant config (redacted): Dreaming enabled in memory-core

Steps

  1. Feed Dreaming session or daily-memory content that includes transport wrappers like Conversation info (untrusted metadata), raw message_id, or [[reply_to_current]].
  2. Run the dreaming ingestion / ranking / promotion flow.
  3. Inspect short-term candidates and MEMORY.md.

Expected

  • Metadata wrappers are stripped or rejected.
  • Only real memory content remains promotable.
  • MEMORY.md contains no transport/session wrapper noise.

Actual

  • Before this fix, metadata garbage could survive and be promoted into MEMORY.md.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Issue reference: #67442

Before:

  • Issue [Bug]: Dreaming can promote transport/session metadata into MEMORY.md as durable memory #67442 reports a live contamination path where Dreaming promoted transport/session wrapper metadata into MEMORY.md, including:
    • Conversation info (untrusted metadata)
    • Sender (untrusted metadata)
    • raw message_id values such as 5417 / 5421
    • reply wrapper tags such as [[reply_to_current]]
  • Reported path in the issue:
    • raw transport metadata -> dreaming/session corpus pollution -> daily memory pollution -> erroneous promotion into MEMORY.md

After:

  • Added deterministic metadata sanitizing + rejection in the shared dreaming filter layer:
    • extensions/memory-core/src/dreaming-shared.ts
  • Applied that filter at all relevant stages:
    • session corpus ingestion in extensions/memory-core/src/dreaming-phases.ts
    • daily snippet re-ingestion in extensions/memory-core/src/dreaming-phases.ts
    • final promotion safety gate in extensions/memory-core/src/short-term-promotion.ts

Passing verification:

  • pnpm test extensions/memory-core/src/dreaming-phases.test.ts
  • pnpm test extensions/memory-core/src/short-term-promotion.test.ts

Result:

  • 2 test files passed
  • 74 tests passed
  • 0 failures

Targeted proof added in tests:

  • extensions/memory-core/src/dreaming-phases.test.ts
    • verifies inbound metadata wrappers are stripped before session corpus is written
    • verifies polluted daily metadata chunks are not re-ingested into recall candidates
    • verifies a dreaming sweep can still promote the real content while blocking metadata contamination from reaching MEMORY.md
  • extensions/memory-core/src/short-term-promotion.test.ts
    • verifies transport metadata wrappers are classified as contaminated
    • verifies metadata snippets are not recorded as short-term recall candidates
    • verifies contaminated candidates are refused during final append to MEMORY.md

Net effect:

Human Verification (required)

  • Verified scenarios: targeted memory-core regression tests for session ingestion, daily re-ingestion, ranking, and final promotion.
  • Edge cases checked: reply tags, JSON metadata blocks, raw metadata keys, and mixed valid/invalid content in the same sweep.
  • What you did not verify: manual UI/production-channel reproduction outside the automated plugin tests.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: metadata heuristics could reject a legitimate note that looks too wrapper-like.
    • Mitigation: filtering is constrained to known transport/session metadata patterns and backed by targeted regression tests.

@leochame leochame changed the title [codex] memory-core: block dreaming metadata promotion contamination fix(memory-core): block transport metadata promotion into MEMORY.md Apr 16, 2026
@leochame leochame changed the title fix(memory-core): block transport metadata promotion into MEMORY.md fix(memory-core): stop dreaming from promoting transport metadata Apr 16, 2026
@leochame leochame marked this pull request as ready for review April 16, 2026 09:39

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5159d3138

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated
@leochame

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5159d3138

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated
@greptile-apps

greptile-apps Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds deterministic transport-metadata stripping and rejection across the dreaming ingestion pipeline to fix contamination of MEMORY.md with noise like Conversation info (untrusted metadata): blocks, raw message_id values, and [[reply_to_current]] tags. The fix is applied at three layers: session corpus ingestion (normalizeSessionCorpusSnippet + post-normalize check), daily snippet re-ingestion (collectDailyIngestionBatches / seedHistoricalDailyMemorySignals), and final promotion time (applyShortTermPromotions), with the shared heuristics centralized in dreaming-shared.ts.

Confidence Score: 5/5

Safe to merge — the fix is well-scoped, all remaining findings are P2, and test coverage directly locks in the contamination-blocked paths.

No P0/P1 bugs found. The sentinel-then-JSON stripping path is correct, the index management in the block-skipping loops is sound, and the defense-in-depth design (sanitize → garbage-check → contamination-guard) prevents metadata from slipping through any single layer. The two P2 comments note a standalone-JSON-block false-positive risk and minor regex-object churn — neither affects correctness.

extensions/memory-core/src/dreaming-shared.ts — specifically the standalone "```json" stripping logic at line 132 and the daily ingestion asymmetry (sanitize skipped for chunk snippets).

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-shared.ts
Line: 132-138

Comment:
**Standalone `"```json"` block stripping may drop legitimate code examples**

Any fenced JSON block whose first content line matches `JSON_METADATA_KEY_RE` is stripped, even without a preceding metadata sentinel. A legitimate memory note like:

```
Review the retry payload format:
```json
{"message_id": "evt_123", "type": "webhook", "content": "Hello"}
```
Notes: use the id to correlate retries.
```

…would have its `"```json"` block silently removed because `"message_id":` matches the key regex. The surrounding text is preserved, but the code example is lost. The daily ingestion path (`collectDailyIngestionBatches` / `seedHistoricalDailyMemorySignals`) skips sanitization entirely and relies solely on `isMetadataGarbageText` at the chunk boundary — so a chunk that is mixed real-content + metadata block is passed through with the metadata still in the snippet.

Consider tightening the standalone-block guard to require a preceding sentinel, or explicitly documenting that code examples whose first JSON key matches transport field names are stripped as an accepted tradeoff.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-shared.ts
Line: 72-84

Comment:
**RegExp objects reconstructed on every tagged line**

`stripInlineMetadataTag` creates two `new RegExp(...)` instances on each call when the line contains a reply tag. Since this function is called inside a `.map()` over every line of every sanitized snippet, the repeated object construction adds up for larger corpora. The two derived patterns can be hoisted to module-level constants alongside `INLINE_METADATA_TAG_RE`.

```suggestion
const INLINE_METADATA_TAG_FULL_RE = new RegExp(`^${INLINE_METADATA_TAG_RE.source}$`, "i");
const INLINE_METADATA_TAG_TRAILING_RE = new RegExp(
  `\\s+${INLINE_METADATA_TAG_RE.source}\\s*$`,
  "i",
);

function stripInlineMetadataTag(line: string): string {
  const trimmed = line.trim();
  if (!INLINE_METADATA_TAG_RE.test(trimmed)) {
    return line;
  }
  if (trimmed.match(INLINE_METADATA_TAG_FULL_RE)) {
    return "";
  }
  return line.replace(INLINE_METADATA_TAG_TRAILING_RE, "");
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "memory-core: block dreaming metadata pro..." | Re-trigger Greptile

Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated
Comment thread extensions/memory-core/src/dreaming-shared.ts
@openclaw-barnacle openclaw-barnacle Bot added the gateway Gateway runtime label Apr 17, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7478ba759e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d369aa9ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2897c2989f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated
@leochame leochame force-pushed the codex/issue-67442-dreaming-memory-guard branch from 2897c29 to be07e81 Compare April 17, 2026 02:56

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be07e81d0c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/openclaw-cross-os-release-checks-reusable.yml Outdated
Comment thread extensions/memory-core/src/dreaming-shared.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef913cdc47

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/memory-core/src/dreaming-shared.ts
@clawsweeper

clawsweeper Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex automated review.

Current main already implements the central user-facing fix behind PR #67601: transport metadata is stripped from user session-corpus entries before Dreaming sees them, Dreaming phase output is separate by default, and managed Dreaming artifacts are filtered before daily ingestion/promotion. The PR is now obsolete and includes unrelated workflow/gateway-test changes; the remaining staged-candidate leak is tracked separately in #68774.

Best possible solution:

Close PR #67601 as implemented/obsolete. Keep the shipped main implementation from v2026.4.15, avoid merging the older broad sanitizer patch with unrelated workflow and gateway-test churn, and let the narrower #68774 handle any remaining staged dream-candidate leakage.

What I checked:

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against d54d2d6b9b8a; fix evidence: release v2026.4.15, commit 82e349a48ad9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: memory-core Extension: memory-core gateway Gateway runtime size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Dreaming can promote transport/session metadata into MEMORY.md as durable memory

1 participant