fix(memory-core): keep rotated cron transcripts out of dreaming corpus + add operator filters by zqchris · Pull Request #72913 · openclaw/openclaw

zqchris · 2026-04-27T16:23:33Z

Problem

Two related leaks let isolated cron transcripts flow into the Dreaming session corpus (memory/.dreams/session-corpus/<day>.txt) on real deployments, even with sessionTarget: isolated and existing generatedByCronRun / DIRECT_CRON_PROMPT_RE filtering:

Path-comparison miss after rotation. loadSessionTranscriptClassificationForSessionsDir indexes cron / dreaming-narrative transcripts by the live sessionFile absolute path read from sessions.json. As soon as that transcript is rotated to *.jsonl.deleted.<ts> (or *.trajectory.jsonl[.deleted.<ts>]) the on-disk path no longer matches the live entry, so the rotated artifact is no longer attributed to the cron session. dreaming-phases.collectSessionIngestionBatches then reads it as a normal transcript and ingests its [cron:<id> ...] content lines. Real-world recurrence in Dreaming needs configurable session/cron exclusions; isolated cron transcripts still enter session corpus #72611 with paths like main/sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z.
No operator-facing exclusion knob. dreaming.* only exposes phase-level settings. There is no way to say "skip every cron run for the main agent" or "skip session keys starting with agent:ops:" without forking memory-core. Cron runs whose key is the broader cron:<id> shape (without :run:<runId>) are not even matched by isCronRunSessionKey so the built-in classifier silently lets them through.

Fix

Two complementary changes in one commit:

1. Session-id-aware classification (rotated artifacts)

SessionTranscriptClassification now also exposes session-id sets and reverse-lookup maps:

type SessionTranscriptClassification = {
  // existing path sets retained for back-compat
  dreamingNarrativeTranscriptPaths: ReadonlySet<string>;
  cronRunTranscriptPaths: ReadonlySet<string>;
  // new: sessionId-keyed sets for rotated artifacts
  dreamingNarrativeSessionIds: ReadonlySet<string>;
  cronRunSessionIds: ReadonlySet<string>;
  // new: reverse lookup so callers can resolve sessionKey for any transcript
  transcriptPathToSessionKey: ReadonlyMap<string, string>;
  sessionIdToSessionKey: ReadonlyMap<string, string>;
};

New helpers in session-files.ts (also re-exported from openclaw/plugin-sdk/memory-core-host-engine-qmd):

extractSessionIdFromTranscriptFileName(fileName) — handles <id>.jsonl, <id>.jsonl.deleted.<ts>, <id>.jsonl.reset.<ts>, <id>.trajectory.jsonl[.deleted.<ts> | .reset.<ts>]. Returns null for non-transcript file shapes.
isCronRunTranscriptPath(classification, absPath) / isDreamingNarrativeTranscriptPath(...) — try direct path lookup first, fall back to extractSessionIdFromTranscriptFileName(...) plus the new sessionId set.
lookupSessionKeyForTranscriptPath(classification, absPath) — resolves the owning session key for a (possibly rotated) transcript; needed by the operator filter below.

dreaming-phases.collectSessionIngestionBatches is wired to use the new helpers in place of Set.has(normalizedPath).

2. Operator-facing dreaming session filters

New dreaming.sessionFilter config block in extensions/memory-core/openclaw.plugin.json:

{
  "plugins": {
    "entries": {
      "memory-core": {
        "config": {
          "dreaming": {
            "sessionFilter": {
              "excludeCronJobIds":          ["job-1", "job-2"],
              "excludeSessionKeyPrefixes":  ["agent:main:cron:", "agent:ops:"],
              "excludeAgentIds":            ["batch-runner"],
              "excludeSourcePathRegex":     ["^ops/sessions/.*\\.jsonl$"]
            }
          }
        }
      }
    }
  }
}

resolveSessionIngestionExcludePredicate(cfg, logger) builds a single predicate per ingestion sweep:

Returns a no-op () => false when no filter is configured (zero overhead in the common case).
Compiles excludeSourcePathRegex once; invalid patterns are logged as warnings and skipped, never throw.
Operator-driven; the built-in classifier still runs, so this is defense-in-depth for shapes the classifier misses (notably cron:<id> without :run:).

excludeSessionKeyPrefixes is the most flexible knob since it covers both cron:<id>:run:<runId> and cron:<id> shapes via a single agent:<id>:cron: prefix.

Tests

src/memory-host-sdk/host/session-files.test.ts (+10 tests, 34 total):

extractSessionIdFromTranscriptFileName — primary jsonl, .jsonl.deleted.<ts>, .jsonl.reset.<ts>, .trajectory.jsonl, .trajectory.jsonl.deleted.<ts>, non-transcript / null shapes.
isCronRunTranscriptPath / isDreamingNarrativeTranscriptPath — live path classification, rotated .deleted.<ts> recovery, rotated .trajectory.jsonl.deleted.<ts> recovery, live .trajectory.jsonl recovery, unrelated transcripts not matched.
lookupSessionKeyForTranscriptPath — live path, rotated path, unknown path.

extensions/memory-core/src/dreaming-phases.test.ts (+2 tests, 32 total):

"skips rotated cron run transcripts (.jsonl.deleted.<ts>) via session id (Dreaming needs configurable session/cron exclusions; isolated cron transcripts still enter session corpus #72611)" — full ingestion harness, asserts session-corpus/2026-04-05.txt is not created.
"respects dreaming.sessionFilter.excludeSessionKeyPrefixes for operator-driven exclusion" — uses a cron:<id> (non-run) key the built-in classifier wouldn't catch; only the operator filter drops it.

Out of scope

Does not change isCronRunSessionKey regex — kept strict for back-compat. Operators who want to drop cron:<id> (non-run) shapes use the new excludeSessionKeyPrefixes knob.
Does not change disk repair (session-file-repair.ts) — only the in-memory ingestion path.
Live cron transcripts already in the corpus stay; this fix only stops new ingestion. A separate cleanup task (e.g. openclaw doctor --fix extension) could prune historical leaks.

Verified

pnpm tsgo:core ✓
pnpm tsgo:extensions ✓
pnpm test src/memory-host-sdk/host/session-files.test.ts → 34/34 pass
pnpm test extensions/memory-core/src/dreaming-phases.test.ts → 32/32 pass
pnpm check:changed → all gates pass (lint / typecheck / 0 import cycles / policy guards)

greptile-apps · 2026-04-27T16:27:54Z

Greptile Summary

This PR fixes a regression (#72611) where isolated cron transcripts leaked into the dreaming session corpus after rotation by augmenting SessionTranscriptClassification with session-id sets and reverse-lookup maps, then using session-id-aware helpers (isCronRunTranscriptPath, isDreamingNarrativeTranscriptPath, lookupSessionKeyForTranscriptPath) in place of the raw Set.has(normalizedPath) check. It also adds an operator-facing dreaming.sessionFilter config block with four exclusion knobs. The core fix is well-tested with dedicated regression tests; the only material issue is a wrong path-format example in the excludeSourcePathRegex schema description that would cause silently-failing operator regexes.

Confidence Score: 4/5

Safe to merge after correcting the excludeSourcePathRegex schema description; the core session-id classification fix is correct and well-tested.

One P1 finding: the excludeSourcePathRegex schema description documents main/sessions/<uuid>.jsonl but the runtime value produced by sessionPathForFile has no agent prefix (sessions/<uuid>.jsonl), so operator-written patterns following the example will silently never match. The underlying classification logic and the other three filter knobs are sound.

extensions/memory-core/openclaw.plugin.json — incorrect path format in the excludeSourcePathRegex description.

Comments Outside Diff (1)

extensions/memory-core/src/dreaming-phases.ts, line 292 (link)

excludeCronJobIds silently filters across all agents

SESSION_KEY_CRON_JOB_RE (/(?:^|:)cron:([^:]+)/) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named "sync", adding "sync" to excludeCronJobIds will silently drop transcripts from both agents. The schema description says "agent:<id>:cron:<cronJobId>" in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use excludeSessionKeyPrefixes with a fully-qualified prefix like "agent:main:cron:sync:" — worth a callout in the field description.

Prompt To Fix With AI

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-phases.ts
Line: 292

Comment:
**`excludeCronJobIds` silently filters across all agents**

`SESSION_KEY_CRON_JOB_RE` (`/(?:^|:)cron:([^:]+)/`) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named `"sync"`, adding `"sync"` to `excludeCronJobIds` will silently drop transcripts from both agents. The schema description says `"agent:<id>:cron:<cronJobId>"` in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use `excludeSessionKeyPrefixes` with a fully-qualified prefix like `"agent:main:cron:sync:"` — worth a callout in the field description.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/memory-core/openclaw.plugin.json
Line: 219

Comment:
**Wrong path format in `excludeSourcePathRegex` schema example**

The description shows `main/sessions/<uuid>.jsonl` but `sessionPathForFile(absPath)` (called in `collectSessionIngestionBatches`) actually returns `"sessions/" + path.basename(absPath)` — no agent prefix, just `"sessions/<uuid>.jsonl"`. Any operator who copies the documented example and writes `"^main/sessions/.*"` will get a silently-failing filter.

Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. `"sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"`), so a strict `\.jsonl$`-anchored pattern would also miss them. Both facts belong in the description.

```suggestion
                "description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-phases.ts
Line: 292

Comment:
**`excludeCronJobIds` silently filters across all agents**

`SESSION_KEY_CRON_JOB_RE` (`/(?:^|:)cron:([^:]+)/`) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named `"sync"`, adding `"sync"` to `excludeCronJobIds` will silently drop transcripts from both agents. The schema description says `"agent:<id>:cron:<cronJobId>"` in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use `excludeSessionKeyPrefixes` with a fully-qualified prefix like `"agent:main:cron:sync:"` — worth a callout in the field description.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(memory-core): keep rotated cron tran..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T16:27:58Z

+              },
+              "excludeSourcePathRegex": {
+                "type": "array",
+                "description": "Skip transcripts whose normalized source path (e.g. `main/sessions/<uuid>.jsonl`) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",


Wrong path format in excludeSourcePathRegex schema example

The description shows main/sessions/<uuid>.jsonl but sessionPathForFile(absPath) (called in collectSessionIngestionBatches) actually returns "sessions/" + path.basename(absPath) — no agent prefix, just "sessions/<uuid>.jsonl". Any operator who copies the documented example and writes "^main/sessions/.*" will get a silently-failing filter.

Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. "sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"), so a strict \.jsonl$-anchored pattern would also miss them. Both facts belong in the description.

Suggested change

"description": "Skip transcripts whose normalized source path (e.g. `main/sessions/<uuid>.jsonl`) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",

"description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/memory-core/openclaw.plugin.json Line: 219 Comment: **Wrong path format in `excludeSourcePathRegex` schema example** The description shows `main/sessions/<uuid>.jsonl` but `sessionPathForFile(absPath)` (called in `collectSessionIngestionBatches`) actually returns `"sessions/" + path.basename(absPath)` — no agent prefix, just `"sessions/<uuid>.jsonl"`. Any operator who copies the documented example and writes `"^main/sessions/.*"` will get a silently-failing filter. Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. `"sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"`), so a strict `\.jsonl$`-anchored pattern would also miss them. Both facts belong in the description. ```suggestion "description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.", ``` How can I resolve this? If you propose a fix, please make it concise.

clawsweeper · 2026-04-28T07:42:16Z

Codex review: needs changes before merge.

Summary
The PR extends memory-host transcript classification with session-id/run-id lookup, wires memory-core Dreaming session filters, marks heartbeat/cron/exec prompts as internal-system provenance, and updates docs, changelog, and focused tests.

Reproducibility: yes. A sessions store with an explicit non-cron entry using sessionId shared-id and a cron entry whose :run: id is also shared-id makes lookupSessionKeyForTranscriptPath resolve the explicit session while isCronRunTranscriptPath still returns true from cronRunSessionIds.

Next step before merge
A repair worker can address the narrow helper/test blocker on the PR branch, with rebase/conflict handling if needed before normal merge gates.

Security
Cleared: The diff does not add dependencies, workflows, package-resolution changes, broader permissions, secret handling, or unguarded network execution; the new regex surface uses existing safe-regex helpers with bounded input.

Review findings

[P2] Avoid treating runId collisions as cron transcripts — packages/memory-host-sdk/src/host/session-files.ts:223-225

Review details

Best possible solution:

Keep the PR open, make cron transcript classification honor explicit session ownership when runIds collide, then rerun the focused memory-host/memory-core tests and changed gate before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. A sessions store with an explicit non-cron entry using sessionId shared-id and a cron entry whose :run: id is also shared-id makes lookupSessionKeyForTranscriptPath resolve the explicit session while isCronRunTranscriptPath still returns true from cronRunSessionIds.

Is this the best way to solve the issue?

No, not as currently implemented. The session-id-aware helper direction is maintainable, but cron classification should follow the resolved owning session key or avoid adding colliding runIds to the cron set.

Full review comments:

[P2] Avoid treating runId collisions as cron transcripts — packages/memory-host-sdk/src/host/session-files.ts:223-225
The runId fallback adds runIdFromKey to cronRunSessionIds even when that id is already registered as another session's sessionId. In that collision, lookupSessionKeyForTranscriptPath resolves the explicit non-cron session, but isCronRunTranscriptPath still returns true from the id set, so Dreaming will skip a real user transcript. Please make cron classification follow the owning session key or avoid adding colliding runIds to the cron set.
Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

pnpm test packages/memory-host-sdk/src/host/session-files.test.ts extensions/memory-core/src/dreaming-phases.test.ts
pnpm tsgo:core
pnpm tsgo:extensions
pnpm check:changed

What I checked:

Current main classifies by live transcript path only: SessionTranscriptClassification on current main only has path sets, and loadSessionTranscriptClassificationForSessionsDir adds cron entries from the resolved live sessionFile path. (packages/memory-host-sdk/src/host/session-files.ts:54, 95cee64ca6e8)
Current main ingestion checks direct set membership: collectSessionIngestionBatches builds generatedByDreamingNarrative/generatedByCronRun from direct normalized path membership before calling buildSessionEntry. (extensions/memory-core/src/dreaming-phases.ts:757, 95cee64ca6e8)
Current main has no memory-core sessionFilter schema: A targeted rg over the memory-core plugin schema and dreaming docs found no sessionFilter/excludeCronJobIds/excludeSourcePathRegex entries on current main. (extensions/memory-core/openclaw.plugin.json:34, 95cee64ca6e8)
PR adds the requested filters and helper surface: The PR diff adds dreaming.sessionFilter schema/docs/tests and session-id/runId-aware helpers exported through the memory host SDK facade. (extensions/memory-core/openclaw.plugin.json:190, 743556fd2672)
Blocking source-level collision remains: The PR adds runIdFromKey to cronRunSessionIds even when sessionIdToSessionKey already maps that id to another explicit session, while isCronRunTranscriptPath later trusts only the id set. (packages/memory-host-sdk/src/host/session-files.ts:223, 743556fd2672)
Live PR state: GitHub API reports the PR open at head 743556f, with mergeable_state=dirty and maintainer_can_modify=true. (743556fd2672)

Likely related people:

jalehman: Merged fix(memory-core): skip dreaming transcript ingestion via session store #67315 taught dreaming ingestion to classify dreaming transcripts from sessions.json before reading transcript content, which is the same classification boundary this PR changes. (role: introduced related session-store classification behavior; confidence: high; commits: 87c09b2a7544; files: packages/memory-host-sdk/src/host/session-files.ts, extensions/memory-core/src/dreaming-phases.ts)
Patrick-Erichsen: Merged memory/dreaming: decouple managed cron from heartbeat #70737 moved managed dreaming cron onto isolated agent turns and included cron/wrapper transcript-noise filtering in the same memory-core Dreaming area. (role: recent maintainer of managed dreaming cron behavior; confidence: medium; commits: bed48c1a09e9; files: extensions/memory-core/src/dreaming-phases.ts, src/infra/heartbeat-runner.ts)
vignesh07: Merged feat(memory-core): ingest session transcripts into dreaming corpus #62227 added dreaming session-transcript ingestion, per-day session-corpus files, and checkpointing, which is the feature surface being protected here. (role: introduced session-corpus ingestion feature; confidence: medium; commits: 8697d1a8c05f; files: extensions/memory-core/src/dreaming-phases.ts, extensions/memory-core/src/short-term-promotion.ts)
zqchris: In addition to this PR, prior merged memory: strip inbound metadata envelopes from user messages in session corpus #66548 changed session-corpus text extraction in the same memory-host/session-files path. (role: recent adjacent contributor; confidence: medium; commits: 98562b2a8445; files: packages/memory-host-sdk/src/host/session-files.ts, extensions/memory-core/src/dreaming-phases.ts)

Remaining risk / open question:

The current PR head can falsely skip a legitimate non-cron transcript when its sessionId collides with a cron runId.
GitHub currently reports the branch as dirty against the base, so a repair may need rebase or conflict handling before merge gates are meaningful.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 95cee64ca6e8.

…g corpus cron runs sometimes leave a second transcript whose basename equals the runId embedded in the sessionKey, distinct from `entry.sessionId`. With classification only indexing `entry.sessionId`, that mirror file looks like an unowned orphan: `lookupSessionKeyForTranscriptPath` returns null, `isCronRunTranscriptPath` returns false, and the file's content slips into the dreaming session corpus despite the operator's `agent:<id>:cron:` prefix exclusion. Add `extractCronRunIdFromSessionKey` in `src/sessions/session-key-utils.ts` and let `loadSessionTranscriptClassificationForSessionsDir` register the extracted runId in both `cronRunSessionIds` and (when not already taken) `sessionIdToSessionKey`. This makes the existing sessionId-fallback in `extractSessionIdFromTranscriptFileName` recognize mirror files — including their `.deleted.<ts>` / `.reset.<ts>` rotated variants — as cron-owned, so dreaming auto-skips them via `generatedByCronRun` even without operator-supplied `excludeSessionKeyPrefixes`. Tests: `session-key-utils.test.ts` covers the extractor's accept/reject boundary and asserts agreement with `isCronRunSessionKey`. `session-files.test.ts` adds two regression cases: - a mirror transcript named after the runId (live + rotated) resolves back to the cron sessionKey and is reported as cron-owned; - a registered explicit-session entry whose sessionId collides with some cron's runId is not overwritten by the runId fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

zqchris · 2026-05-03T07:56:26Z

Closing to stay under the active-PR quota — will resubmit if there's reviewer interest. Patch lives on zqchris/openclaw#patch/chris (ac32b2305a) and continues to ship to my local stack.

…penclaw#73241, openclaw#73400 Second hardening pass after the first round triggered new findings on Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile landed by the maintainer in upstream commit 28bf71d — that one will drop out at the next rebase via patch-id match, no source change here. PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM: - monitor-processing.sanitizeForLog: redact common secret-bearing patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`, `Authorization: Bearer / Basic …`) before the value reaches the log sink. BlueBubbles uses query-string auth by default, so attachment download failures and similar errors can carry the API password in the captured request URL (CWE-532). - history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing the request. A trailing `/` would otherwise produce an empty bareGuid and turn the call into an unintended `/api/v1/message/` collection query (CWE-20). - monitor-processing reply-context verbose logs: log only `bodyLen=` metadata, not the quoted message text itself. Verbose logs are retained / shipped to aggregators and would otherwise leak private chat content (CWE-532). PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW: - monitor-reply-cache.resolveBlueBubblesMessageId: when `requireKnownShortId=true` and `chatContext` lacks any identifier, throw "requires a chat scope" instead of resolving the short id. Short ids are allocated from a single global counter across every account and chat, so an action call without chat scope could silently apply to the wrong conversation (CWE-285). Test updated to expect fail-closed (was previously fail-open with a comment that acknowledged the risk). - monitor-reply-cache.buildCrossChatError: replace raw inputId in the thrown error message with `<short:N-digit>` or `<uuid:prefix…>` shape descriptors. Combined with the earlier chatGuid redaction, cross-chat errors now leak no concrete identifier (CWE-117 / CWE-200). The Low PII finding on monitor-processing verbose logs is resolved automatically by the sanitizeForLog redaction extension added for PR openclaw#73241 openclaw#1. PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM: - silent-reply-policy.classifySilentReplyConversationType: classify a session key as `internal` only when `parseThreadSessionSuffix` returns a real `🧵<id>` suffix, not on a free-form `🧵` substring match. Caller-supplied session keys can no longer embed the marker mid-string to force the conversation type to `internal` and bypass silent-reply rewrites (CWE-840).

, openclaw#73241 Bundled hardening for the three open PRs: - session-files: use path.posix.join + strip CR/LF/TAB instead of rewriting backslashes to forward slashes. POSIX filenames may legally contain `\`, and the prior translation would synthesize fake path segments that bypass `excludeSourcePathRegex` (CWE-20). NUL is already rejected by Node's fs path layer. - monitor-reply-cache: redact chat identifiers in cross-chat error messages (CWE-200). Phone numbers / email addresses / chat GUIDs must not leak into agent transcripts, tool results, or remote channel deliveries. - monitor-reply-cache: rewrite isCrossChatMismatch so chatIdentifier and chatId comparisons run independently. Earlier version gated fallback comparisons on `!ctxChatGuid`, which let any non-empty ctx.chatGuid suppress the fallback checks when cached entry lacked chatGuid — letting a short id from chat A be reused while acting in chat B (CWE-697). - monitor-processing: drop group reactions where chatGuid / chatIdentifier is whitespace-only, not just empty. A webhook sender supplying " " or "\t" must not satisfy the guard and degrade peerId to the literal "group". - monitor-processing: sanitizeForLog around webhook senderId / messageId / action / attachment guid / err in verbose log lines and attachment download error logs (CWE-117). - monitor-processing: validate replyToId shape (length ≤128, charset alnum + `._:-`) before issuing the API fallback request, so a webhook with a pathological replyToId cannot drive arbitrary outbound load (cheap CWE-400 mitigation). Tests updated: monitor-reply-cache.test.ts cross-chat error assertions now expect `chatGuid=<redacted>` instead of raw values. Also fix unrelated typecheck regression: delivery-dispatch.mirror- bluebubbles.test.ts makeBaseParams missing the `runSessionKey` field introduced in DispatchCronDeliveryParams as part of v2026.4.26.

…penclaw#73241, openclaw#73400 Second hardening pass after the first round triggered new findings on Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile landed by the maintainer in upstream commit 28bf71d — that one will drop out at the next rebase via patch-id match, no source change here. PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM: - monitor-processing.sanitizeForLog: redact common secret-bearing patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`, `Authorization: Bearer / Basic …`) before the value reaches the log sink. BlueBubbles uses query-string auth by default, so attachment download failures and similar errors can carry the API password in the captured request URL (CWE-532). - history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing the request. A trailing `/` would otherwise produce an empty bareGuid and turn the call into an unintended `/api/v1/message/` collection query (CWE-20). - monitor-processing reply-context verbose logs: log only `bodyLen=` metadata, not the quoted message text itself. Verbose logs are retained / shipped to aggregators and would otherwise leak private chat content (CWE-532). PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW: - monitor-reply-cache.resolveBlueBubblesMessageId: when `requireKnownShortId=true` and `chatContext` lacks any identifier, throw "requires a chat scope" instead of resolving the short id. Short ids are allocated from a single global counter across every account and chat, so an action call without chat scope could silently apply to the wrong conversation (CWE-285). Test updated to expect fail-closed (was previously fail-open with a comment that acknowledged the risk). - monitor-reply-cache.buildCrossChatError: replace raw inputId in the thrown error message with `<short:N-digit>` or `<uuid:prefix…>` shape descriptors. Combined with the earlier chatGuid redaction, cross-chat errors now leak no concrete identifier (CWE-117 / CWE-200). The Low PII finding on monitor-processing verbose logs is resolved automatically by the sanitizeForLog redaction extension added for PR openclaw#73241 openclaw#1. PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM: - silent-reply-policy.classifySilentReplyConversationType: classify a session key as `internal` only when `parseThreadSessionSuffix` returns a real `🧵<id>` suffix, not on a free-form `🧵` substring match. Caller-supplied session keys can no longer embed the marker mid-string to force the conversation type to `internal` and bypass silent-reply rewrites (CWE-840).

, openclaw#73241 Bundled hardening for the three open PRs: - session-files: use path.posix.join + strip CR/LF/TAB instead of rewriting backslashes to forward slashes. POSIX filenames may legally contain `\`, and the prior translation would synthesize fake path segments that bypass `excludeSourcePathRegex` (CWE-20). NUL is already rejected by Node's fs path layer. - monitor-reply-cache: redact chat identifiers in cross-chat error messages (CWE-200). Phone numbers / email addresses / chat GUIDs must not leak into agent transcripts, tool results, or remote channel deliveries. - monitor-reply-cache: rewrite isCrossChatMismatch so chatIdentifier and chatId comparisons run independently. Earlier version gated fallback comparisons on `!ctxChatGuid`, which let any non-empty ctx.chatGuid suppress the fallback checks when cached entry lacked chatGuid — letting a short id from chat A be reused while acting in chat B (CWE-697). - monitor-processing: drop group reactions where chatGuid / chatIdentifier is whitespace-only, not just empty. A webhook sender supplying " " or "\t" must not satisfy the guard and degrade peerId to the literal "group". - monitor-processing: sanitizeForLog around webhook senderId / messageId / action / attachment guid / err in verbose log lines and attachment download error logs (CWE-117). - monitor-processing: validate replyToId shape (length ≤128, charset alnum + `._:-`) before issuing the API fallback request, so a webhook with a pathological replyToId cannot drive arbitrary outbound load (cheap CWE-400 mitigation). Tests updated: monitor-reply-cache.test.ts cross-chat error assertions now expect `chatGuid=<redacted>` instead of raw values. Also fix unrelated typecheck regression: delivery-dispatch.mirror- bluebubbles.test.ts makeBaseParams missing the `runSessionKey` field introduced in DispatchCronDeliveryParams as part of v2026.4.26.

…penclaw#73241, openclaw#73400 Second hardening pass after the first round triggered new findings on Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile landed by the maintainer in upstream commit 28bf71d — that one will drop out at the next rebase via patch-id match, no source change here. PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM: - monitor-processing.sanitizeForLog: redact common secret-bearing patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`, `Authorization: Bearer / Basic …`) before the value reaches the log sink. BlueBubbles uses query-string auth by default, so attachment download failures and similar errors can carry the API password in the captured request URL (CWE-532). - history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing the request. A trailing `/` would otherwise produce an empty bareGuid and turn the call into an unintended `/api/v1/message/` collection query (CWE-20). - monitor-processing reply-context verbose logs: log only `bodyLen=` metadata, not the quoted message text itself. Verbose logs are retained / shipped to aggregators and would otherwise leak private chat content (CWE-532). PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW: - monitor-reply-cache.resolveBlueBubblesMessageId: when `requireKnownShortId=true` and `chatContext` lacks any identifier, throw "requires a chat scope" instead of resolving the short id. Short ids are allocated from a single global counter across every account and chat, so an action call without chat scope could silently apply to the wrong conversation (CWE-285). Test updated to expect fail-closed (was previously fail-open with a comment that acknowledged the risk). - monitor-reply-cache.buildCrossChatError: replace raw inputId in the thrown error message with `<short:N-digit>` or `<uuid:prefix…>` shape descriptors. Combined with the earlier chatGuid redaction, cross-chat errors now leak no concrete identifier (CWE-117 / CWE-200). The Low PII finding on monitor-processing verbose logs is resolved automatically by the sanitizeForLog redaction extension added for PR openclaw#73241 openclaw#1. PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM: - silent-reply-policy.classifySilentReplyConversationType: classify a session key as `internal` only when `parseThreadSessionSuffix` returns a real `🧵<id>` suffix, not on a free-form `🧵` substring match. Caller-supplied session keys can no longer embed the marker mid-string to force the conversation type to `internal` and bypass silent-reply rewrites (CWE-840).

openclaw-barnacle Bot added extensions: memory-core Extension: memory-core size: L labels Apr 27, 2026

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

zqchris force-pushed the fix/dream-cron-session-exclusions branch from 35176b2 to e821864 Compare April 27, 2026 16:37

openclaw-barnacle Bot added size: XL and removed size: L labels Apr 27, 2026

zqchris force-pushed the fix/dream-cron-session-exclusions branch 2 times, most recently from 35d6cd3 to 676df8f Compare April 28, 2026 03:14

clawsweeper Bot mentioned this pull request Apr 30, 2026

Dreaming needs configurable session/cron exclusions; isolated cron transcripts still enter session corpus #72611

Open

zqchris force-pushed the fix/dream-cron-session-exclusions branch from 5e61679 to 90e6c11 Compare May 1, 2026 06:37

openclaw-barnacle Bot added docs Improvements or additions to documentation size: L channel: voice-call Channel integration: voice-call cli CLI command changes scripts Repository scripts extensions: kilocode app: macos App: macos and removed size: XL labels May 1, 2026

zqchris force-pushed the fix/dream-cron-session-exclusions branch from 67a7aa9 to 177756f Compare May 1, 2026 07:33

zqchris added 3 commits May 1, 2026 15:42

fix(memory-core): move dreaming filters onto packaged SDK

0a784bf

test(ci): repair PR 72913 failing gates

7b1fe84

style(macos): wrap menu status item

25ec37b

zqchris added 2 commits May 1, 2026 15:42

test(ci): keep supervisor marker export in mock

79d0ea0

style(macos): wrap session status text

b885a67

zqchris force-pushed the fix/dream-cron-session-exclusions branch from d9307e5 to b885a67 Compare May 1, 2026 07:43

openclaw-barnacle Bot removed channel: voice-call Channel integration: voice-call extensions: kilocode labels May 1, 2026

clawsweeper Bot mentioned this pull request May 2, 2026

Bug: Built-in dreaming system contaminates agent identity in multi-agent setups #65374

Open

zqchris closed this May 3, 2026

zqchris mentioned this pull request May 3, 2026

fix(provider-transport): pass baseUrl hostname to SSRF guard so fake-IP proxies don't block model API calls #76549

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory-core): keep rotated cron transcripts out of dreaming corpus + add operator filters#72913

fix(memory-core): keep rotated cron transcripts out of dreaming corpus + add operator filters#72913
zqchris wants to merge 6 commits intoopenclaw:mainfrom
zqchris:fix/dream-cron-session-exclusions

zqchris commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

zqchris commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"description": "Skip transcripts whose normalized source path (e.g. `main/sessions/<uuid>.jsonl`) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
	"description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",

Uh oh!

Conversation

zqchris commented Apr 27, 2026

Problem

Fix

1. Session-id-aware classification (rotated artifacts)

2. Operator-facing dreaming session filters

Tests

Out of scope

Verified

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zqchris commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading