fix(memory-core): stop dreaming from promoting transport metadata#67601
fix(memory-core): stop dreaming from promoting transport metadata#67601leochame wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5159d3138
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5159d3138
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR adds deterministic transport-metadata stripping and rejection across the dreaming ingestion pipeline to fix contamination of Confidence Score: 5/5Safe to merge — the fix is well-scoped, all remaining findings are P2, and test coverage directly locks in the contamination-blocked paths. No P0/P1 bugs found. The sentinel-then-JSON stripping path is correct, the index management in the block-skipping loops is sound, and the defense-in-depth design (sanitize → garbage-check → contamination-guard) prevents metadata from slipping through any single layer. The two P2 comments note a standalone-JSON-block false-positive risk and minor regex-object churn — neither affects correctness. extensions/memory-core/src/dreaming-shared.ts — specifically the standalone Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-shared.ts
Line: 132-138
Comment:
**Standalone `"```json"` block stripping may drop legitimate code examples**
Any fenced JSON block whose first content line matches `JSON_METADATA_KEY_RE` is stripped, even without a preceding metadata sentinel. A legitimate memory note like:
```
Review the retry payload format:
```json
{"message_id": "evt_123", "type": "webhook", "content": "Hello"}
```
Notes: use the id to correlate retries.
```
…would have its `"```json"` block silently removed because `"message_id":` matches the key regex. The surrounding text is preserved, but the code example is lost. The daily ingestion path (`collectDailyIngestionBatches` / `seedHistoricalDailyMemorySignals`) skips sanitization entirely and relies solely on `isMetadataGarbageText` at the chunk boundary — so a chunk that is mixed real-content + metadata block is passed through with the metadata still in the snippet.
Consider tightening the standalone-block guard to require a preceding sentinel, or explicitly documenting that code examples whose first JSON key matches transport field names are stripped as an accepted tradeoff.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-shared.ts
Line: 72-84
Comment:
**RegExp objects reconstructed on every tagged line**
`stripInlineMetadataTag` creates two `new RegExp(...)` instances on each call when the line contains a reply tag. Since this function is called inside a `.map()` over every line of every sanitized snippet, the repeated object construction adds up for larger corpora. The two derived patterns can be hoisted to module-level constants alongside `INLINE_METADATA_TAG_RE`.
```suggestion
const INLINE_METADATA_TAG_FULL_RE = new RegExp(`^${INLINE_METADATA_TAG_RE.source}$`, "i");
const INLINE_METADATA_TAG_TRAILING_RE = new RegExp(
`\\s+${INLINE_METADATA_TAG_RE.source}\\s*$`,
"i",
);
function stripInlineMetadataTag(line: string): string {
const trimmed = line.trim();
if (!INLINE_METADATA_TAG_RE.test(trimmed)) {
return line;
}
if (trimmed.match(INLINE_METADATA_TAG_FULL_RE)) {
return "";
}
return line.replace(INLINE_METADATA_TAG_TRAILING_RE, "");
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "memory-core: block dreaming metadata pro..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7478ba759e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d369aa9ec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2897c2989f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
2897c29 to
be07e81
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be07e81d0c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef913cdc47
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Closing this as implemented after Codex automated review. Current main already implements the central user-facing fix behind PR #67601: transport metadata is stripped from user session-corpus entries before Dreaming sees them, Dreaming phase output is separate by default, and managed Dreaming artifacts are filtered before daily ingestion/promotion. The PR is now obsolete and includes unrelated workflow/gateway-test changes; the remaining staged-candidate leak is tracked separately in #68774. Best possible solution: Close PR #67601 as implemented/obsolete. Keep the shipped main implementation from v2026.4.15, avoid merging the older broad sanitizer patch with unrelated workflow and gateway-test churn, and let the narrower #68774 handle any remaining staged dream-candidate leakage. What I checked:
So I’m closing this as already implemented rather than keeping a duplicate issue open. Codex Review notes: model gpt-5.5, reasoning high; reviewed against d54d2d6b9b8a; fix evidence: release v2026.4.15, commit 82e349a48ad9. |
Summary
MEMORY.md.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
extensions/memory-core/src/dreaming-phases.ts:554,extensions/memory-core/src/dreaming-phases.ts:839).extensions/memory-core/src/short-term-promotion.ts:275,extensions/memory-core/src/short-term-promotion.ts:1545).extensions/memory-core/src/dreaming-shared.ts:4.Regression Test Plan (if applicable)
extensions/memory-core/src/dreaming-phases.test.ts,extensions/memory-core/src/short-term-promotion.test.tsUser-visible / Behavior Changes
Dreaming no longer persists transport/session wrapper metadata as promotable memory candidates, and contaminated snippets are refused before writing
MEMORY.md.Diagram (if applicable)
Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation: N/ARepro + Verification
Environment
memory-coreSteps
Conversation info (untrusted metadata), rawmessage_id, or[[reply_to_current]].MEMORY.md.Expected
MEMORY.mdcontains no transport/session wrapper noise.Actual
MEMORY.md.Evidence
Issue reference: #67442
Before:
MEMORY.md, including:Conversation info (untrusted metadata)Sender (untrusted metadata)message_idvalues such as5417/5421[[reply_to_current]]MEMORY.mdAfter:
extensions/memory-core/src/dreaming-shared.tsextensions/memory-core/src/dreaming-phases.tsextensions/memory-core/src/dreaming-phases.tsextensions/memory-core/src/short-term-promotion.tsPassing verification:
pnpm test extensions/memory-core/src/dreaming-phases.test.tspnpm test extensions/memory-core/src/short-term-promotion.test.tsResult:
Targeted proof added in tests:
extensions/memory-core/src/dreaming-phases.test.tsMEMORY.mdextensions/memory-core/src/short-term-promotion.test.tsMEMORY.mdNet effect:
MEMORY.mdHuman Verification (required)
Review Conversations
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoRisks and Mitigations