Skip to content

fix(memory-core): keep rotated cron transcripts out of dreaming corpus + add operator filters#72913

Closed
zqchris wants to merge 6 commits intoopenclaw:mainfrom
zqchris:fix/dream-cron-session-exclusions
Closed

fix(memory-core): keep rotated cron transcripts out of dreaming corpus + add operator filters#72913
zqchris wants to merge 6 commits intoopenclaw:mainfrom
zqchris:fix/dream-cron-session-exclusions

Conversation

@zqchris
Copy link
Copy Markdown
Contributor

@zqchris zqchris commented Apr 27, 2026

Closes #72611.

Problem

Two related leaks let isolated cron transcripts flow into the Dreaming session corpus (memory/.dreams/session-corpus/<day>.txt) on real deployments, even with sessionTarget: isolated and existing generatedByCronRun / DIRECT_CRON_PROMPT_RE filtering:

  1. Path-comparison miss after rotation. loadSessionTranscriptClassificationForSessionsDir indexes cron / dreaming-narrative transcripts by the live sessionFile absolute path read from sessions.json. As soon as that transcript is rotated to *.jsonl.deleted.<ts> (or *.trajectory.jsonl[.deleted.<ts>]) the on-disk path no longer matches the live entry, so the rotated artifact is no longer attributed to the cron session. dreaming-phases.collectSessionIngestionBatches then reads it as a normal transcript and ingests its [cron:<id> ...] content lines. Real-world recurrence in Dreaming needs configurable session/cron exclusions; isolated cron transcripts still enter session corpus #72611 with paths like main/sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z.

  2. No operator-facing exclusion knob. dreaming.* only exposes phase-level settings. There is no way to say "skip every cron run for the main agent" or "skip session keys starting with agent:ops:" without forking memory-core. Cron runs whose key is the broader cron:<id> shape (without :run:<runId>) are not even matched by isCronRunSessionKey so the built-in classifier silently lets them through.

Fix

Two complementary changes in one commit:

1. Session-id-aware classification (rotated artifacts)

SessionTranscriptClassification now also exposes session-id sets and reverse-lookup maps:

type SessionTranscriptClassification = {
  // existing path sets retained for back-compat
  dreamingNarrativeTranscriptPaths: ReadonlySet<string>;
  cronRunTranscriptPaths: ReadonlySet<string>;
  // new: sessionId-keyed sets for rotated artifacts
  dreamingNarrativeSessionIds: ReadonlySet<string>;
  cronRunSessionIds: ReadonlySet<string>;
  // new: reverse lookup so callers can resolve sessionKey for any transcript
  transcriptPathToSessionKey: ReadonlyMap<string, string>;
  sessionIdToSessionKey: ReadonlyMap<string, string>;
};

New helpers in session-files.ts (also re-exported from openclaw/plugin-sdk/memory-core-host-engine-qmd):

  • extractSessionIdFromTranscriptFileName(fileName) — handles <id>.jsonl, <id>.jsonl.deleted.<ts>, <id>.jsonl.reset.<ts>, <id>.trajectory.jsonl[.deleted.<ts> | .reset.<ts>]. Returns null for non-transcript file shapes.
  • isCronRunTranscriptPath(classification, absPath) / isDreamingNarrativeTranscriptPath(...) — try direct path lookup first, fall back to extractSessionIdFromTranscriptFileName(...) plus the new sessionId set.
  • lookupSessionKeyForTranscriptPath(classification, absPath) — resolves the owning session key for a (possibly rotated) transcript; needed by the operator filter below.

dreaming-phases.collectSessionIngestionBatches is wired to use the new helpers in place of Set.has(normalizedPath).

2. Operator-facing dreaming session filters

New dreaming.sessionFilter config block in extensions/memory-core/openclaw.plugin.json:

{
  "plugins": {
    "entries": {
      "memory-core": {
        "config": {
          "dreaming": {
            "sessionFilter": {
              "excludeCronJobIds":          ["job-1", "job-2"],
              "excludeSessionKeyPrefixes":  ["agent:main:cron:", "agent:ops:"],
              "excludeAgentIds":            ["batch-runner"],
              "excludeSourcePathRegex":     ["^ops/sessions/.*\\.jsonl$"]
            }
          }
        }
      }
    }
  }
}

resolveSessionIngestionExcludePredicate(cfg, logger) builds a single predicate per ingestion sweep:

  • Returns a no-op () => false when no filter is configured (zero overhead in the common case).
  • Compiles excludeSourcePathRegex once; invalid patterns are logged as warnings and skipped, never throw.
  • Operator-driven; the built-in classifier still runs, so this is defense-in-depth for shapes the classifier misses (notably cron:<id> without :run:).

excludeSessionKeyPrefixes is the most flexible knob since it covers both cron:<id>:run:<runId> and cron:<id> shapes via a single agent:<id>:cron: prefix.

Tests

src/memory-host-sdk/host/session-files.test.ts (+10 tests, 34 total):

  • extractSessionIdFromTranscriptFileName — primary jsonl, .jsonl.deleted.<ts>, .jsonl.reset.<ts>, .trajectory.jsonl, .trajectory.jsonl.deleted.<ts>, non-transcript / null shapes.
  • isCronRunTranscriptPath / isDreamingNarrativeTranscriptPath — live path classification, rotated .deleted.<ts> recovery, rotated .trajectory.jsonl.deleted.<ts> recovery, live .trajectory.jsonl recovery, unrelated transcripts not matched.
  • lookupSessionKeyForTranscriptPath — live path, rotated path, unknown path.

extensions/memory-core/src/dreaming-phases.test.ts (+2 tests, 32 total):

Out of scope

  • Does not change isCronRunSessionKey regex — kept strict for back-compat. Operators who want to drop cron:<id> (non-run) shapes use the new excludeSessionKeyPrefixes knob.
  • Does not change disk repair (session-file-repair.ts) — only the in-memory ingestion path.
  • Live cron transcripts already in the corpus stay; this fix only stops new ingestion. A separate cleanup task (e.g. openclaw doctor --fix extension) could prune historical leaks.

Verified

  • pnpm tsgo:core
  • pnpm tsgo:extensions
  • pnpm test src/memory-host-sdk/host/session-files.test.ts → 34/34 pass
  • pnpm test extensions/memory-core/src/dreaming-phases.test.ts → 32/32 pass
  • pnpm check:changed → all gates pass (lint / typecheck / 0 import cycles / policy guards)

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR fixes a regression (#72611) where isolated cron transcripts leaked into the dreaming session corpus after rotation by augmenting SessionTranscriptClassification with session-id sets and reverse-lookup maps, then using session-id-aware helpers (isCronRunTranscriptPath, isDreamingNarrativeTranscriptPath, lookupSessionKeyForTranscriptPath) in place of the raw Set.has(normalizedPath) check. It also adds an operator-facing dreaming.sessionFilter config block with four exclusion knobs. The core fix is well-tested with dedicated regression tests; the only material issue is a wrong path-format example in the excludeSourcePathRegex schema description that would cause silently-failing operator regexes.

Confidence Score: 4/5

Safe to merge after correcting the excludeSourcePathRegex schema description; the core session-id classification fix is correct and well-tested.

One P1 finding: the excludeSourcePathRegex schema description documents main/sessions/<uuid>.jsonl but the runtime value produced by sessionPathForFile has no agent prefix (sessions/<uuid>.jsonl), so operator-written patterns following the example will silently never match. The underlying classification logic and the other three filter knobs are sound.

extensions/memory-core/openclaw.plugin.json — incorrect path format in the excludeSourcePathRegex description.

Comments Outside Diff (1)

  1. extensions/memory-core/src/dreaming-phases.ts, line 292 (link)

    P2 excludeCronJobIds silently filters across all agents

    SESSION_KEY_CRON_JOB_RE (/(?:^|:)cron:([^:]+)/) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named "sync", adding "sync" to excludeCronJobIds will silently drop transcripts from both agents. The schema description says "agent:<id>:cron:<cronJobId>" in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use excludeSessionKeyPrefixes with a fully-qualified prefix like "agent:main:cron:sync:" — worth a callout in the field description.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/memory-core/src/dreaming-phases.ts
    Line: 292
    
    Comment:
    **`excludeCronJobIds` silently filters across all agents**
    
    `SESSION_KEY_CRON_JOB_RE` (`/(?:^|:)cron:([^:]+)/`) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named `"sync"`, adding `"sync"` to `excludeCronJobIds` will silently drop transcripts from both agents. The schema description says `"agent:<id>:cron:<cronJobId>"` in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use `excludeSessionKeyPrefixes` with a fully-qualified prefix like `"agent:main:cron:sync:"` — worth a callout in the field description.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/memory-core/openclaw.plugin.json
Line: 219

Comment:
**Wrong path format in `excludeSourcePathRegex` schema example**

The description shows `main/sessions/<uuid>.jsonl` but `sessionPathForFile(absPath)` (called in `collectSessionIngestionBatches`) actually returns `"sessions/" + path.basename(absPath)` — no agent prefix, just `"sessions/<uuid>.jsonl"`. Any operator who copies the documented example and writes `"^main/sessions/.*"` will get a silently-failing filter.

Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. `"sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"`), so a strict `\.jsonl$`-anchored pattern would also miss them. Both facts belong in the description.

```suggestion
                "description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/memory-core/src/dreaming-phases.ts
Line: 292

Comment:
**`excludeCronJobIds` silently filters across all agents**

`SESSION_KEY_CRON_JOB_RE` (`/(?:^|:)cron:([^:]+)/`) extracts the cron job ID regardless of which agent owns the session. If two agents both have a cron job named `"sync"`, adding `"sync"` to `excludeCronJobIds` will silently drop transcripts from both agents. The schema description says `"agent:<id>:cron:<cronJobId>"` in its example, implying per-agent scoping, but the implementation is global. Operators who want per-agent exclusion should use `excludeSessionKeyPrefixes` with a fully-qualified prefix like `"agent:main:cron:sync:"` — worth a callout in the field description.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(memory-core): keep rotated cron tran..." | Re-trigger Greptile

},
"excludeSourcePathRegex": {
"type": "array",
"description": "Skip transcripts whose normalized source path (e.g. `main/sessions/<uuid>.jsonl`) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong path format in excludeSourcePathRegex schema example

The description shows main/sessions/<uuid>.jsonl but sessionPathForFile(absPath) (called in collectSessionIngestionBatches) actually returns "sessions/" + path.basename(absPath) — no agent prefix, just "sessions/<uuid>.jsonl". Any operator who copies the documented example and writes "^main/sessions/.*" will get a silently-failing filter.

Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. "sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"), so a strict \.jsonl$-anchored pattern would also miss them. Both facts belong in the description.

Suggested change
"description": "Skip transcripts whose normalized source path (e.g. `main/sessions/<uuid>.jsonl`) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
"description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/memory-core/openclaw.plugin.json
Line: 219

Comment:
**Wrong path format in `excludeSourcePathRegex` schema example**

The description shows `main/sessions/<uuid>.jsonl` but `sessionPathForFile(absPath)` (called in `collectSessionIngestionBatches`) actually returns `"sessions/" + path.basename(absPath)` — no agent prefix, just `"sessions/<uuid>.jsonl"`. Any operator who copies the documented example and writes `"^main/sessions/.*"` will get a silently-failing filter.

Additionally, for rotated artifacts the basename retains the rotation suffix (e.g. `"sessions/<uuid>.jsonl.deleted.2026-04-25T06-33-10.801Z"`), so a strict `\.jsonl$`-anchored pattern would also miss them. Both facts belong in the description.

```suggestion
                "description": "Skip transcripts whose normalized source path (e.g. `sessions/<uuid>.jsonl` for a live transcript or `sessions/<uuid>.jsonl.deleted.<ts>` for a rotated artifact) matches any of these regular expressions. Invalid patterns are ignored at runtime with a warning. Use for ad-hoc one-off exclusions where neither cron-id nor agent-id captures the target.",
```

How can I resolve this? If you propose a fix, please make it concise.

@zqchris zqchris force-pushed the fix/dream-cron-session-exclusions branch from 35176b2 to e821864 Compare April 27, 2026 16:37
@zqchris zqchris force-pushed the fix/dream-cron-session-exclusions branch 2 times, most recently from 35d6cd3 to 676df8f Compare April 28, 2026 03:14
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs changes before merge.

Summary
The PR extends memory-host transcript classification with session-id/run-id lookup, wires memory-core Dreaming session filters, marks heartbeat/cron/exec prompts as internal-system provenance, and updates docs, changelog, and focused tests.

Reproducibility: yes. A sessions store with an explicit non-cron entry using sessionId shared-id and a cron entry whose :run: id is also shared-id makes lookupSessionKeyForTranscriptPath resolve the explicit session while isCronRunTranscriptPath still returns true from cronRunSessionIds.

Next step before merge
A repair worker can address the narrow helper/test blocker on the PR branch, with rebase/conflict handling if needed before normal merge gates.

Security
Cleared: The diff does not add dependencies, workflows, package-resolution changes, broader permissions, secret handling, or unguarded network execution; the new regex surface uses existing safe-regex helpers with bounded input.

Review findings

  • [P2] Avoid treating runId collisions as cron transcripts — packages/memory-host-sdk/src/host/session-files.ts:223-225
Review details

Best possible solution:

Keep the PR open, make cron transcript classification honor explicit session ownership when runIds collide, then rerun the focused memory-host/memory-core tests and changed gate before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. A sessions store with an explicit non-cron entry using sessionId shared-id and a cron entry whose :run: id is also shared-id makes lookupSessionKeyForTranscriptPath resolve the explicit session while isCronRunTranscriptPath still returns true from cronRunSessionIds.

Is this the best way to solve the issue?

No, not as currently implemented. The session-id-aware helper direction is maintainable, but cron classification should follow the resolved owning session key or avoid adding colliding runIds to the cron set.

Full review comments:

  • [P2] Avoid treating runId collisions as cron transcripts — packages/memory-host-sdk/src/host/session-files.ts:223-225
    The runId fallback adds runIdFromKey to cronRunSessionIds even when that id is already registered as another session's sessionId. In that collision, lookupSessionKeyForTranscriptPath resolves the explicit non-cron session, but isCronRunTranscriptPath still returns true from the id set, so Dreaming will skip a real user transcript. Please make cron classification follow the owning session key or avoid adding colliding runIds to the cron set.
    Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test packages/memory-host-sdk/src/host/session-files.test.ts extensions/memory-core/src/dreaming-phases.test.ts
  • pnpm tsgo:core
  • pnpm tsgo:extensions
  • pnpm check:changed

What I checked:

Likely related people:

  • jalehman: Merged fix(memory-core): skip dreaming transcript ingestion via session store #67315 taught dreaming ingestion to classify dreaming transcripts from sessions.json before reading transcript content, which is the same classification boundary this PR changes. (role: introduced related session-store classification behavior; confidence: high; commits: 87c09b2a7544; files: packages/memory-host-sdk/src/host/session-files.ts, extensions/memory-core/src/dreaming-phases.ts)
  • Patrick-Erichsen: Merged memory/dreaming: decouple managed cron from heartbeat #70737 moved managed dreaming cron onto isolated agent turns and included cron/wrapper transcript-noise filtering in the same memory-core Dreaming area. (role: recent maintainer of managed dreaming cron behavior; confidence: medium; commits: bed48c1a09e9; files: extensions/memory-core/src/dreaming-phases.ts, src/infra/heartbeat-runner.ts)
  • vignesh07: Merged feat(memory-core): ingest session transcripts into dreaming corpus #62227 added dreaming session-transcript ingestion, per-day session-corpus files, and checkpointing, which is the feature surface being protected here. (role: introduced session-corpus ingestion feature; confidence: medium; commits: 8697d1a8c05f; files: extensions/memory-core/src/dreaming-phases.ts, extensions/memory-core/src/short-term-promotion.ts)
  • zqchris: In addition to this PR, prior merged memory: strip inbound metadata envelopes from user messages in session corpus #66548 changed session-corpus text extraction in the same memory-host/session-files path. (role: recent adjacent contributor; confidence: medium; commits: 98562b2a8445; files: packages/memory-host-sdk/src/host/session-files.ts, extensions/memory-core/src/dreaming-phases.ts)

Remaining risk / open question:

  • The current PR head can falsely skip a legitimate non-cron transcript when its sessionId collides with a cron runId.
  • GitHub currently reports the branch as dirty against the base, so a repair may need rebase or conflict handling before merge gates are meaningful.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 95cee64ca6e8.

@zqchris zqchris force-pushed the fix/dream-cron-session-exclusions branch from 5e61679 to 90e6c11 Compare May 1, 2026 06:37
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation size: L channel: voice-call Channel integration: voice-call cli CLI command changes scripts Repository scripts extensions: kilocode app: macos App: macos and removed size: XL labels May 1, 2026
@zqchris zqchris force-pushed the fix/dream-cron-session-exclusions branch from 67a7aa9 to 177756f Compare May 1, 2026 07:33
@zqchris zqchris force-pushed the fix/dream-cron-session-exclusions branch from d9307e5 to b885a67 Compare May 1, 2026 07:43
@openclaw-barnacle openclaw-barnacle Bot removed channel: voice-call Channel integration: voice-call extensions: kilocode labels May 1, 2026
…g corpus

cron runs sometimes leave a second transcript whose basename equals the
runId embedded in the sessionKey, distinct from `entry.sessionId`. With
classification only indexing `entry.sessionId`, that mirror file looks
like an unowned orphan: `lookupSessionKeyForTranscriptPath` returns null,
`isCronRunTranscriptPath` returns false, and the file's content slips
into the dreaming session corpus despite the operator's
`agent:<id>:cron:` prefix exclusion.

Add `extractCronRunIdFromSessionKey` in `src/sessions/session-key-utils.ts`
and let `loadSessionTranscriptClassificationForSessionsDir` register the
extracted runId in both `cronRunSessionIds` and (when not already taken)
`sessionIdToSessionKey`. This makes the existing sessionId-fallback in
`extractSessionIdFromTranscriptFileName` recognize mirror files —
including their `.deleted.<ts>` / `.reset.<ts>` rotated variants — as
cron-owned, so dreaming auto-skips them via `generatedByCronRun` even
without operator-supplied `excludeSessionKeyPrefixes`.

Tests: `session-key-utils.test.ts` covers the extractor's accept/reject
boundary and asserts agreement with `isCronRunSessionKey`.
`session-files.test.ts` adds two regression cases:
- a mirror transcript named after the runId (live + rotated) resolves
  back to the cron sessionKey and is reported as cron-owned;
- a registered explicit-session entry whose sessionId collides with some
  cron's runId is not overwritten by the runId fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zqchris
Copy link
Copy Markdown
Contributor Author

zqchris commented May 3, 2026

Closing to stay under the active-PR quota — will resubmit if there's reviewer interest. Patch lives on zqchris/openclaw#patch/chris (ac32b2305a) and continues to ship to my local stack.

@zqchris zqchris closed this May 3, 2026
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 4, 2026
…penclaw#73241, openclaw#73400

Second hardening pass after the first round triggered new findings on
Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile
landed by the maintainer in upstream commit 28bf71d — that one will
drop out at the next rebase via patch-id match, no source change here.

PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM:

- monitor-processing.sanitizeForLog: redact common secret-bearing
  patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`,
  `Authorization: Bearer / Basic …`) before the value reaches the log
  sink. BlueBubbles uses query-string auth by default, so attachment
  download failures and similar errors can carry the API password in
  the captured request URL (CWE-532).
- history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid
  (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing
  the request. A trailing `/` would otherwise produce an empty bareGuid
  and turn the call into an unintended `/api/v1/message/` collection
  query (CWE-20).
- monitor-processing reply-context verbose logs: log only `bodyLen=`
  metadata, not the quoted message text itself. Verbose logs are
  retained / shipped to aggregators and would otherwise leak private
  chat content (CWE-532).

PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW:

- monitor-reply-cache.resolveBlueBubblesMessageId: when
  `requireKnownShortId=true` and `chatContext` lacks any identifier,
  throw "requires a chat scope" instead of resolving the short id.
  Short ids are allocated from a single global counter across every
  account and chat, so an action call without chat scope could
  silently apply to the wrong conversation (CWE-285). Test updated to
  expect fail-closed (was previously fail-open with a comment that
  acknowledged the risk).
- monitor-reply-cache.buildCrossChatError: replace raw inputId in the
  thrown error message with `<short:N-digit>` or `<uuid:prefix…>`
  shape descriptors. Combined with the earlier chatGuid redaction,
  cross-chat errors now leak no concrete identifier (CWE-117 /
  CWE-200). The Low PII finding on monitor-processing verbose logs is
  resolved automatically by the sanitizeForLog redaction extension
  added for PR openclaw#73241 openclaw#1.

PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM:

- silent-reply-policy.classifySilentReplyConversationType: classify
  a session key as `internal` only when `parseThreadSessionSuffix`
  returns a real `🧵<id>` suffix, not on a free-form `🧵`
  substring match. Caller-supplied session keys can no longer embed
  the marker mid-string to force the conversation type to `internal`
  and bypass silent-reply rewrites (CWE-840).
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 6, 2026
, openclaw#73241

Bundled hardening for the three open PRs:

- session-files: use path.posix.join + strip CR/LF/TAB instead of
  rewriting backslashes to forward slashes. POSIX filenames may legally
  contain `\`, and the prior translation would synthesize fake path
  segments that bypass `excludeSourcePathRegex` (CWE-20). NUL is
  already rejected by Node's fs path layer.

- monitor-reply-cache: redact chat identifiers in cross-chat error
  messages (CWE-200). Phone numbers / email addresses / chat GUIDs
  must not leak into agent transcripts, tool results, or remote
  channel deliveries.

- monitor-reply-cache: rewrite isCrossChatMismatch so chatIdentifier
  and chatId comparisons run independently. Earlier version gated
  fallback comparisons on `!ctxChatGuid`, which let any non-empty
  ctx.chatGuid suppress the fallback checks when cached entry lacked
  chatGuid — letting a short id from chat A be reused while acting in
  chat B (CWE-697).

- monitor-processing: drop group reactions where chatGuid /
  chatIdentifier is whitespace-only, not just empty. A webhook sender
  supplying " " or "\t" must not satisfy the guard and degrade peerId
  to the literal "group".

- monitor-processing: sanitizeForLog around webhook senderId /
  messageId / action / attachment guid / err in verbose log lines and
  attachment download error logs (CWE-117).

- monitor-processing: validate replyToId shape (length ≤128, charset
  alnum + `._:-`) before issuing the API fallback request, so a
  webhook with a pathological replyToId cannot drive arbitrary
  outbound load (cheap CWE-400 mitigation).

Tests updated: monitor-reply-cache.test.ts cross-chat error
assertions now expect `chatGuid=<redacted>` instead of raw values.

Also fix unrelated typecheck regression: delivery-dispatch.mirror-
bluebubbles.test.ts makeBaseParams missing the `runSessionKey` field
introduced in DispatchCronDeliveryParams as part of v2026.4.26.
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 7, 2026
, openclaw#73241

Bundled hardening for the three open PRs:

- session-files: use path.posix.join + strip CR/LF/TAB instead of
  rewriting backslashes to forward slashes. POSIX filenames may legally
  contain `\`, and the prior translation would synthesize fake path
  segments that bypass `excludeSourcePathRegex` (CWE-20). NUL is
  already rejected by Node's fs path layer.

- monitor-reply-cache: redact chat identifiers in cross-chat error
  messages (CWE-200). Phone numbers / email addresses / chat GUIDs
  must not leak into agent transcripts, tool results, or remote
  channel deliveries.

- monitor-reply-cache: rewrite isCrossChatMismatch so chatIdentifier
  and chatId comparisons run independently. Earlier version gated
  fallback comparisons on `!ctxChatGuid`, which let any non-empty
  ctx.chatGuid suppress the fallback checks when cached entry lacked
  chatGuid — letting a short id from chat A be reused while acting in
  chat B (CWE-697).

- monitor-processing: drop group reactions where chatGuid /
  chatIdentifier is whitespace-only, not just empty. A webhook sender
  supplying " " or "\t" must not satisfy the guard and degrade peerId
  to the literal "group".

- monitor-processing: sanitizeForLog around webhook senderId /
  messageId / action / attachment guid / err in verbose log lines and
  attachment download error logs (CWE-117).

- monitor-processing: validate replyToId shape (length ≤128, charset
  alnum + `._:-`) before issuing the API fallback request, so a
  webhook with a pathological replyToId cannot drive arbitrary
  outbound load (cheap CWE-400 mitigation).

Tests updated: monitor-reply-cache.test.ts cross-chat error
assertions now expect `chatGuid=<redacted>` instead of raw values.

Also fix unrelated typecheck regression: delivery-dispatch.mirror-
bluebubbles.test.ts makeBaseParams missing the `runSessionKey` field
introduced in DispatchCronDeliveryParams as part of v2026.4.26.
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 7, 2026
…penclaw#73241, openclaw#73400

Second hardening pass after the first round triggered new findings on
Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile
landed by the maintainer in upstream commit 28bf71d — that one will
drop out at the next rebase via patch-id match, no source change here.

PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM:

- monitor-processing.sanitizeForLog: redact common secret-bearing
  patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`,
  `Authorization: Bearer / Basic …`) before the value reaches the log
  sink. BlueBubbles uses query-string auth by default, so attachment
  download failures and similar errors can carry the API password in
  the captured request URL (CWE-532).
- history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid
  (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing
  the request. A trailing `/` would otherwise produce an empty bareGuid
  and turn the call into an unintended `/api/v1/message/` collection
  query (CWE-20).
- monitor-processing reply-context verbose logs: log only `bodyLen=`
  metadata, not the quoted message text itself. Verbose logs are
  retained / shipped to aggregators and would otherwise leak private
  chat content (CWE-532).

PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW:

- monitor-reply-cache.resolveBlueBubblesMessageId: when
  `requireKnownShortId=true` and `chatContext` lacks any identifier,
  throw "requires a chat scope" instead of resolving the short id.
  Short ids are allocated from a single global counter across every
  account and chat, so an action call without chat scope could
  silently apply to the wrong conversation (CWE-285). Test updated to
  expect fail-closed (was previously fail-open with a comment that
  acknowledged the risk).
- monitor-reply-cache.buildCrossChatError: replace raw inputId in the
  thrown error message with `<short:N-digit>` or `<uuid:prefix…>`
  shape descriptors. Combined with the earlier chatGuid redaction,
  cross-chat errors now leak no concrete identifier (CWE-117 /
  CWE-200). The Low PII finding on monitor-processing verbose logs is
  resolved automatically by the sanitizeForLog redaction extension
  added for PR openclaw#73241 openclaw#1.

PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM:

- silent-reply-policy.classifySilentReplyConversationType: classify
  a session key as `internal` only when `parseThreadSessionSuffix`
  returns a real `🧵<id>` suffix, not on a free-form `🧵`
  substring match. Caller-supplied session keys can no longer embed
  the marker mid-string to force the conversation type to `internal`
  and bypass silent-reply rewrites (CWE-840).
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 8, 2026
, openclaw#73241

Bundled hardening for the three open PRs:

- session-files: use path.posix.join + strip CR/LF/TAB instead of
  rewriting backslashes to forward slashes. POSIX filenames may legally
  contain `\`, and the prior translation would synthesize fake path
  segments that bypass `excludeSourcePathRegex` (CWE-20). NUL is
  already rejected by Node's fs path layer.

- monitor-reply-cache: redact chat identifiers in cross-chat error
  messages (CWE-200). Phone numbers / email addresses / chat GUIDs
  must not leak into agent transcripts, tool results, or remote
  channel deliveries.

- monitor-reply-cache: rewrite isCrossChatMismatch so chatIdentifier
  and chatId comparisons run independently. Earlier version gated
  fallback comparisons on `!ctxChatGuid`, which let any non-empty
  ctx.chatGuid suppress the fallback checks when cached entry lacked
  chatGuid — letting a short id from chat A be reused while acting in
  chat B (CWE-697).

- monitor-processing: drop group reactions where chatGuid /
  chatIdentifier is whitespace-only, not just empty. A webhook sender
  supplying " " or "\t" must not satisfy the guard and degrade peerId
  to the literal "group".

- monitor-processing: sanitizeForLog around webhook senderId /
  messageId / action / attachment guid / err in verbose log lines and
  attachment download error logs (CWE-117).

- monitor-processing: validate replyToId shape (length ≤128, charset
  alnum + `._:-`) before issuing the API fallback request, so a
  webhook with a pathological replyToId cannot drive arbitrary
  outbound load (cheap CWE-400 mitigation).

Tests updated: monitor-reply-cache.test.ts cross-chat error
assertions now expect `chatGuid=<redacted>` instead of raw values.

Also fix unrelated typecheck regression: delivery-dispatch.mirror-
bluebubbles.test.ts makeBaseParams missing the `runSessionKey` field
introduced in DispatchCronDeliveryParams as part of v2026.4.26.
zqchris pushed a commit to zqchris/openclaw that referenced this pull request May 8, 2026
…penclaw#73241, openclaw#73400

Second hardening pass after the first round triggered new findings on
Aisle's re-scan. PR openclaw#73406 (auto-reply voice silent) was meanwhile
landed by the maintainer in upstream commit 28bf71d — that one will
drop out at the next rebase via patch-id match, no source change here.

PR openclaw#73241 (BB http hardening) — 1 HIGH + 2 MEDIUM:

- monitor-processing.sanitizeForLog: redact common secret-bearing
  patterns (`?password=`, `?token=`, `?api_key=`, `?secret=`,
  `Authorization: Bearer / Basic …`) before the value reaches the log
  sink. BlueBubbles uses query-string auth by default, so attachment
  download failures and similar errors can carry the API password in
  the captured request URL (CWE-532).
- history.fetchBlueBubblesMessageByGuid: validate the derived bareGuid
  (length ≤128, charset `[A-Za-z0-9._:-]+`, non-empty) before issuing
  the request. A trailing `/` would otherwise produce an empty bareGuid
  and turn the call into an unintended `/api/v1/message/` collection
  query (CWE-20).
- monitor-processing reply-context verbose logs: log only `bodyLen=`
  metadata, not the quoted message text itself. Verbose logs are
  retained / shipped to aggregators and would otherwise leak private
  chat content (CWE-532).

PR openclaw#73235 (BB routing guards) — 1 MEDIUM + 2 LOW:

- monitor-reply-cache.resolveBlueBubblesMessageId: when
  `requireKnownShortId=true` and `chatContext` lacks any identifier,
  throw "requires a chat scope" instead of resolving the short id.
  Short ids are allocated from a single global counter across every
  account and chat, so an action call without chat scope could
  silently apply to the wrong conversation (CWE-285). Test updated to
  expect fail-closed (was previously fail-open with a comment that
  acknowledged the risk).
- monitor-reply-cache.buildCrossChatError: replace raw inputId in the
  thrown error message with `<short:N-digit>` or `<uuid:prefix…>`
  shape descriptors. Combined with the earlier chatGuid redaction,
  cross-chat errors now leak no concrete identifier (CWE-117 /
  CWE-200). The Low PII finding on monitor-processing verbose logs is
  resolved automatically by the sanitizeForLog redaction extension
  added for PR openclaw#73241 openclaw#1.

PR openclaw#73400 (silent-reply 🧵) — 1 MEDIUM:

- silent-reply-policy.classifySilentReplyConversationType: classify
  a session key as `internal` only when `parseThreadSessionSuffix`
  returns a real `🧵<id>` suffix, not on a free-form `🧵`
  substring match. Caller-supplied session keys can no longer embed
  the marker mid-string to force the conversation type to `internal`
  and bypass silent-reply rewrites (CWE-840).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: macos App: macos cli CLI command changes docs Improvements or additions to documentation extensions: memory-core Extension: memory-core scripts Repository scripts size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dreaming needs configurable session/cron exclusions; isolated cron transcripts still enter session corpus

1 participant