Skip to content

[Bug]: Dreaming narrative subagent text never reaches DREAMS.md (race in waitForRun + missing legacy context-engine init) #79500

@brickthompson

Description

@brickthompson

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

The dreaming narrative subagent successfully generates Dream Diary entries via claude-cli, but DREAMS.md never receives them. Phase reports under memory/dreaming/{light,deep,rem}/YYYY-MM-DD.md are written normally; the diary text exists in orphaned subagent session jsonl files but is never appended to the diary by appendNarrativeEntry.

Two upstream defects appear to compound:

  1. Race between subagent.waitForRun() and persistCliTurnTranscript(). generateAndAppendDreamNarrative in dreaming-phases.ts awaits waitForRun (status ok), then calls subagent.getSessionMessages({ sessionKey, limit: 5 }) to extract the narrative text. On this install, getSessionMessages consistently returns no assistant message because the CLI agent-command pipeline hasn't yet finished writing it to the session jsonl. extractNarrativeText returns null, the warning memory-core: narrative generation produced no text for {phase} phase is logged, and the phase aborts.

  2. Context engine "legacy" is not registered. Available engines: (none) during runCliTurnCompactionLifecycle for the same subagent run. cli-compaction.ts calls cliCompactionDeps.resolveContextEngine(params.cfg) directly, but ensureContextEnginesInitialized() is never invoked along the CLI compaction code path (it is only called from pi-embedded, subagent-spawn, and subagent-registry). The error is caught and logged as a CLI transcript persistence failed warning by agent-command.ts, but it is a real registration gap.

Steps to reproduce

  1. Install OpenClaw 2026.5.6 with agentRuntime.id: "claude-cli" and the default claude-opus-4-7 primary model.
  2. Enable dreaming: plugins.entries.memory-core.config.dreaming.enabled: true (cron at 0 3 * * *).
  3. Allow the 3am sweep to run on a workspace with enough recall traffic to generate light/REM/deep candidates.
  4. After the sweep, observe:
    • memory/dreaming/light/YYYY-MM-DD.md, rem/YYYY-MM-DD.md, deep/YYYY-MM-DD.md are written.
    • DREAMS.md is not created (or, if pre-existing, no new diary block is appended).

Expected behavior

After each phase produces material, appendNarrativeEntry writes a flowing-prose Dream Diary block to DREAMS.md between the <!-- openclaw:dreaming:diary:start/end --> markers, in the *Friday, May 8, 2026 at 3:00 AM MDT* format documented in concepts/dreaming.md.

Actual behavior

Per-run gateway log evidence (single 3am sweep, MDT — 2026-05-08T03:00):

03:00:06.683 [plugins] memory-core: narrative generation produced no text for light phase.
03:00:06.725 [agents/agent-command] CLI transcript persistence failed for agent:main:dreaming-narrative-light-…: Context engine "legacy" is not registered. Available engines: (none)
03:00:17.296 [plugins] memory-core: narrative generation produced no text for rem phase.
03:00:17.317 [agents/agent-command] CLI transcript persistence failed for agent:main:dreaming-narrative-rem-…: Context engine "legacy" is not registered. Available engines: (none)
03:00:28.670 [plugins] memory-core: narrative generation produced no text for deep phase.
03:00:28.695 [agents/agent-command] CLI transcript persistence failed for agent:main:dreaming-narrative-deep-…: Context engine "legacy" is not registered. Available engines: (none)

The session jsonl for the deep phase (<sessions>/11a061c2-….jsonl) actually does contain a fully-formed assistant narrative, written at 09:00:28.690Z — i.e. 20 ms after the "produced no text" log fired. That timing strongly indicates waitForRun resolved (and getSessionMessages was called) before persistCliTurnTranscript finished appending the assistant message to the session file.

The same is true for the REM and light orphans (REM at …c118039b….jsonl, light at …ca6618ea….jsonl.deleted.<ts> after the cleanup scrub archived it). All three contain genuine model output that never made it into DREAMS.md.

scrubDreamingNarrativeArtifacts then prunes the dreaming session-store entries on the next run, so the diary text is left only in orphan jsonl files (some live, some .jsonl.deleted.<ts>).

Suspected root cause(s)

  • generateAndAppendDreamNarrative in extensions/memory-core/src/dreaming-narrative.ts should not rely on waitForRun(status="ok") to guarantee that persistCliTurnTranscript has flushed the assistant message to the session jsonl. The subagent's run-completion signal can resolve before agent-command.ts finishes its post-run persistence steps. Either:

    • extract the narrative text directly from the run's result payloads (avoids touching the session store entirely), or
    • wait on a transcript-flushed signal before calling getSessionMessages.
  • runCliTurnCompactionLifecycle in cli-compaction.ts calls resolveContextEngine without first invoking ensureContextEnginesInitialized(). Adding that call (matching the patterns in pi-embedded.ts:86, subagent-spawn.ts:464, subagent-registry.ts:1368) would eliminate the Context engine "legacy" is not registered warning for any CLI-runner subagent, not just the dreaming one. Even though the error is caught and downgraded to a warning by agent-command.ts, it currently masks the legacy compaction lifecycle entirely on this code path.

Workaround

For users hitting this: the orphan *.jsonl files (and their .jsonl.deleted.<ts> archived counterparts) contain real diary text. A small recovery script can scan the agent sessions directory for sessions whose first user message starts with Write a dream diary entry, extract the assistant reply, and append it to DREAMS.md using the same start/end markers and *<weekday>, <month> <day>, <year> at <h>:<mm> <AM|PM> <TZ>* heading format. Filtering on len(narrative) >= 250 and rejecting outputs that contain User: / Assistant: labels or that are substrings of the user prompt is sufficient to skip the occasional light-phase regurgitation.

OpenClaw version

2026.5.6

Operating system

Darwin 25.4.0 (arm64), Node.js v25.8.2

Install method

npm global

Model

anthropic/claude-opus-4-7 (via agentRuntime.id: "claude-cli")

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal backlog priority with limited blast radius.impact:data-lossCan lose, corrupt, or silently drop user/session/config data.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions