Skip to content

fix(memory, builtin backend): eagerly preload session transcript listeners at gateway startup#76666

Open
buyitsydney wants to merge 2 commits into
openclaw:mainfrom
buyitsydney:fix/memory-session-listeners-eager-preload
Open

fix(memory, builtin backend): eagerly preload session transcript listeners at gateway startup#76666
buyitsydney wants to merge 2 commits into
openclaw:mainfrom
buyitsydney:fix/memory-session-listeners-eager-preload

Conversation

@buyitsydney

@buyitsydney buyitsydney commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary (scope narrowed to builtin backend per Codex review)

Eagerly preload MemoryIndexManager at gateway startup for every agent whose agents.defaults.memorySearch.sources (or per-agent overlay) includes "sessions", when using the builtin session-file backend. This arms ensureSessionListener() before the first sessionTranscriptUpdate event fires, so /reset and /new archive emits actually land in the listener set and get indexed into chunks without the user having to run openclaw memory index --force.

Why it was broken

MemoryIndexManager is lazy-loaded on the first memory_search tool call. If a user types /reset or /new before that first search, the builtin archiveFileOnDisk emit lands in an empty SESSION_TRANSCRIPT_LISTENERS set and is silently dropped, so the resulting .jsonl.reset.<iso> / .jsonl.deleted.<iso> archives never enter chunks. Only memory index --force recovers them (which is #76311 / #76452's escape hatch but is not a cold-start fix).

Why builtin-only (Codex review response)

Thanks @clawsweeper for the P2 catch on the qmd path. The initial revision of this PR treated any successful getActiveMemorySearchManager result as listener-armed, but under the qmd backend that call returns the FallbackMemoryManager wrapper whose inner MemoryIndexManager is only constructed by fallbackFactory. So on qmd, merely obtaining the wrapper does NOT run ensureSessionListener(), and the cold-start race still drops sessionTranscriptUpdate emits.

Three options were on the table (per Codex):

  1. Explicitly preload the builtin MemoryIndexManager listener owner even for qmd — requires reaching into the FallbackMemoryManager fallback slot.
  2. Add qmd-native transcript event handling inside qmd-manager.ts.
  3. Narrow this PR to builtin-only and defer qmd to a follow-up.

This revision takes option 3: added an explicit early-return guard if (params.cfg.memory?.backend === "qmd") { ... return; } at the top of startGatewayMemorySessionListeners, with a log line making the scope decision visible, plus a regression test (skips preload entirely when memory.backend is qmd (Codex review scope guard)). The changelog entry is also retitled Memory/sessions (builtin backend): and calls out the intentional qmd deferral.

Independent verification (builtin only)

CarHer builtin-backend deployment (carher-199 baseline). Three cold restarts, same container, identical /reset sequence:

pre-preload post-preload
archive emit lands in listener set ❌ dropped ✅ captured
chunks delta after 30s +0 +41
verified commit main @ 9a49d143c819 HEAD of this PR

What is NOT included

  • qmd backend session-listener arming (follow-up PR, tracked in body + changelog).
  • Changes to qmd-manager.ts transcript event handling.
  • Changes to FallbackMemoryManager fallback construction.

Refs #56131. AI-assisted.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S labels May 3, 2026
@clawsweeper

clawsweeper Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR adds a gateway startup sidecar that preloads builtin memory session listeners for sessions-enabled agents, skips qmd, and covers the path with tests plus a changelog entry.

Reproducibility: yes. Source inspection on current main shows /reset and /new archive emits go to a non-queued listener set, while the builtin sessions listener is installed only after a non-transient memory manager is constructed.

Real behavior proof
Sufficient (live_output): The PR body includes after-fix live-output evidence from a builtin-backend deployment showing reset archive capture and indexing after the patch.

Next step before merge
The branch already implements the narrow fix; the remaining action is maintainer review, current-head validation, and proof-gate synchronization rather than an automated repair.

Security
Cleared: The diff touches gateway startup wiring, memory startup tests, and changelog text without adding dependencies, workflows, permissions, secret handling, package resolution, or download/execute paths.

Review details

Best possible solution:

Land the scoped builtin-backend preload after maintainer review and current-head validation, while keeping qmd listener arming as a separate backend-specific follow-up.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows /reset and /new archive emits go to a non-queued listener set, while the builtin sessions listener is installed only after a non-transient memory manager is constructed.

Is this the best way to solve the issue?

Yes for the scoped builtin fix. Preloading the cached non-transient manager before archive events is the narrow maintainable path, and the explicit qmd skip keeps the fix aligned with the lazy fallback-wrapper contract.

Acceptance criteria:

  • node scripts/run-vitest.mjs src/gateway/server-startup-memory.test.ts
  • node scripts/crabbox-wrapper.mjs run ... --shell -- "pnpm check:changed"

What I checked:

  • Current main lacks the preload sidecar: Current main only schedules the existing qmd memory startup sidecar and has no startGatewayMemorySessionListeners or sidecars.memory-session-listeners implementation. (src/gateway/server-startup-post-attach.ts:552, c320da79edd1)
  • Transcript events are non-queued: emitSessionTranscriptUpdate iterates only the current in-process listener set, so an archive emit that happens before listener registration has no replay path. (src/sessions/transcript-events.ts:14, c320da79edd1)
  • Reset/new archive emits happen immediately: archiveFileOnDisk renames the transcript and immediately emits a session transcript update for the archived path, making listener timing decisive for incremental indexing. (src/gateway/session-transcript-files.fs.ts:127, c320da79edd1)
  • Listener registration depends on non-transient manager construction: MemoryIndexManager.get treats omitted purpose as the cached default manager, and the constructor calls ensureSessionListener() only for non-status/cli managers. (extensions/memory-core/src/memory/manager.ts:167, c320da79edd1)
  • qmd scope guard matches the wrapper contract: The qmd full manager path returns a FallbackMemoryManager wrapper whose builtin fallback is only constructed later through fallbackFactory, so simply obtaining the wrapper would not arm the builtin session listener. (extensions/memory-core/src/memory/search-manager.ts:223, c320da79edd1)
  • PR implements the implicated builtin path: The diff adds startGatewayMemorySessionListeners, skips qmd, calls getActiveMemorySearchManager without a transient purpose for sessions-enabled builtin agents, and intentionally leaves the manager open so the listener remains subscribed. (src/gateway/server-startup-memory.ts:108, b3f39fc38569)

Likely related people:

  • steipete: History ties this person to the experimental session memory source and later memory session sync helper extraction that the listener preload relies on. (role: feature-history contributor; confidence: high; commits: 0e49dca53c3b, d2a03eca1a5e; files: src/agents/memory-search.ts, src/sessions/transcript-events.ts, extensions/memory-core/src/memory/manager-sync-ops.ts)
  • buyitsydney: Beyond authoring this PR, this contributor has merged adjacent reset/deleted archive reindexing and archive visibility fixes on the same session archive recovery path. (role: recent adjacent contributor; confidence: high; commits: aba97a4c7cff, 2ffdb5d248a1; files: src/gateway/session-transcript-files.fs.ts, extensions/memory-core/src/memory/manager-sync-ops.ts, src/plugin-sdk/session-transcript-hit.ts)
  • vignesh07: Commit history ties this person to the qmd gateway startup initialization path that defines the PR's explicit qmd scope boundary. (role: qmd backend contributor; confidence: medium; commits: efc79f69a268; files: src/gateway/server-startup-memory.ts, src/gateway/server-startup-post-attach.ts)

Remaining risk / open question:

  • The PR head is older than current main, so the maintainer should rerun current-head validation before merge even though GitHub reports it mergeable.
  • The qmd backend race is intentionally left out of scope and should remain tracked separately.
  • GitHub currently shows repeated failing Real behavior proof checks and the triage: needs-real-behavior-proof label despite sufficient body proof and prior ClawSweeper sufficient-proof reviews, so the proof gate needs synchronization.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c320da79edd1.

@buyitsydney buyitsydney force-pushed the fix/memory-session-listeners-eager-preload branch from 6ccc895 to c2cdae0 Compare May 3, 2026 13:39
@buyitsydney buyitsydney changed the title fix(memory): eagerly preload session transcript listeners at gateway startup fix(memory, builtin backend): eagerly preload session transcript listeners at gateway startup May 3, 2026
@buyitsydney

Copy link
Copy Markdown
Contributor Author

Addressed @clawsweeper P2 finding in the latest push (c2cdae015b). Going with option 3: narrow scope to builtin backend.

Changes in this revision:

  • Added an explicit early-return guard in startGatewayMemorySessionListeners that skips preload entirely when params.cfg.memory?.backend === "qmd", with an info-level log line documenting the scope decision.
  • Added a regression test (skips preload entirely when memory.backend is qmd (Codex review scope guard)) asserting getActiveMemorySearchManager is never called under qmd.
  • Retitled PR + CHANGELOG entry to Memory/sessions (builtin backend) and explicitly documented qmd as out of scope with a follow-up PR deferred for arming the qmd listener owner.
  • Rebased onto latest upstream/main (past the 5.2 release and fix(memory): resolve archived transcript hits during session visibility filtering (follow-up to #76311, AI-assisted) #76452 merge).

Rationale: the qmd wrapper lazy-constructs its inner MemoryIndexManager via fallbackFactory, so a simple getActiveMemorySearchManager call on the qmd path does not arm the listener. That's a separate design problem in the FallbackMemoryManager / qmd-manager path that deserves its own PR and review, not a bolt-on inside this one.

Ready for re-review.

@buyitsydney buyitsydney force-pushed the fix/memory-session-listeners-eager-preload branch from c2cdae0 to 1da8e47 Compare May 3, 2026 14:02
@buyitsydney

buyitsydney commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

P3 addressed in 1da8e474ca:

  • Added experimental: { sessionMemory: true } to createBuiltinSessionsConfig fixture so resolveMemorySearchConfig keeps the sessions source and the preload path actually reaches getActiveMemorySearchManager.
  • Strengthened the close-check regression test with expect(getMemorySearchManagerMock).toHaveBeenCalledWith(expect.objectContaining({ agentId: "main" })) — so any future regression that silently drops sessions-source resolution (or skips manager acquisition) fails the test instead of passing vacuously.
  • Also added the flag to createBuiltinNoSessionsConfig for symmetry; that fixture's intent is 'the agent has no sessions source', and the flag not being the gating reason makes that clearer.

Ready for re-review.

Re-review progress:

@buyitsydney buyitsydney marked this pull request as ready for review May 3, 2026 14:03
@buyitsydney buyitsydney force-pushed the fix/memory-session-listeners-eager-preload branch from c5615a8 to bbeae3e Compare May 3, 2026 14:49
@buyitsydney

Copy link
Copy Markdown
Contributor Author

FYI on the red check: the 3 failing jobs are all the same upstream/main flake, not from this PR's diff.

Failed test: src/gateway/server-startup-web-fetch-bind.test.ts > binds HTTP with credential-free tools.web.fetch config without fetch provider discovery

Error chain:

[vitest] There was an error when mocking a module. 
Caused by: ReferenceError: Cannot access '__vi_import_1__' before initialization
 ❯ src/gateway/test-helpers.mocks.ts:204:3
 ❯ src/agents/context.ts:6:1
 ❯ src/gateway/session-utils.ts:12:1

This is a vitest vi.mock hoisting / circular-import race, not a test assertion failure. None of those files are in this PR's diff (only server-startup-memory.ts/.test.ts, server-startup-post-attach.ts, CHANGELOG.md). The previous upstream commit literally titled ci: retrigger checks (lint + dist-shard flakes in previous run) confirms this is a known-flaky shard.

Just rebased onto the latest upstream/main (past 9e56cfc + friends) to retrigger CI; if the flake persists after re-run, it's a maintainer side fix, not this PR.

My 11 tests in server-startup-memory.test.ts all pass (✓ 11 tests).

@buyitsydney buyitsydney force-pushed the fix/memory-session-listeners-eager-preload branch from bbeae3e to 05638b1 Compare May 3, 2026 15:08
@buyitsydney

Copy link
Copy Markdown
Contributor Author

Friendly ping — CI is now clean on the latest head (05638b1fa57): 100/100 check runs completed, 79 success / 20 skipped / 1 neutral, 0 failures.

This PR is intentionally scoped to the builtin backend startup listener preload; it leaves the qmd follow-up out of scope per review feedback. Ready for maintainer review when you have bandwidth. Thanks!

…startup

MemoryIndexManager subscribes to sessionTranscriptUpdate events only when it
is instantiated, via ensureSessionListener() inside its constructor. Before
this change the manager was lazy-loaded on the first memory_search tool call,
which means any sessionTranscriptUpdate emit landing earlier in the process
lifetime went to an empty SESSION_TRANSCRIPT_LISTENERS set and was silently
lost.

The most user-visible casualty was the archiveFileOnDisk emit triggered by
/reset and /new: the resulting .jsonl.reset.<iso> / .jsonl.deleted.<iso>
archive file stayed on disk but never entered the chunks table, so
memory_search could not surface it until the user manually ran
`openclaw memory index --force`.

The existing sidecars.memory startup task (startGatewayMemoryBackend) only
runs for the qmd backend (via resolveGatewayMemoryStartupPolicy returning
mode=off on non-qmd), so builtin-backend deployments had no eager-preload
path at all.

Fix
---

Add a sibling startup task sidecars.memory-session-listeners that runs for
every backend. It calls getActiveMemorySearchManager without a purpose
(so MemoryIndexManager.get picks the non-transient 'default' branch and
ensureSessionListener() runs), then deliberately leaves the manager cached
rather than closing it, because close() would unsubscribe the listener we
just attached.

Per-agent failures are swallowed with log.warn; the task as a whole is
wrapped in setImmediate and never blocks gateway boot.

Verification
------------

Independently verified on a builtin-backend deployment running a single
default agent with memorySearch.sources = ["memory", "sessions"]:

  - Before this patch, cold-start (container recreated, no prior
    memory_search call): /reset archive file generated on disk, but
    chunks delta = 0 and files delta = 0 after 3 min of waiting.
  - After this patch, identical cold-start: /reset archive indexed
    within ~30 s (chunks delta = 41, files +1, new path carries the
    .jsonl.reset.<iso> suffix).

Repeated across three consecutive container recreates with stable results.

Refs openclaw#56131.

AI-assisted.
@buyitsydney buyitsydney force-pushed the fix/memory-session-listeners-eager-preload branch from 05638b1 to b3f39fc Compare May 4, 2026 15:14
@buyitsydney

Copy link
Copy Markdown
Contributor Author

Gentle ping @steipete @vincentkoc — this PR is scoped to the builtin-backend startup listener preload and has been sitting ready for ~45h.

Status:

  • ,
  • CI: 100/100 check runs — 79 success / 20 skipped / 1 neutral, 0 failures on head b3f39fc385
  • clawsweeper verdict: needs-human, confidence=high — no repair job needed, normal maintainer review/merge is the remaining action
  • qmd listener arming intentionally left out of scope as you suggested (separate follow-up)

Happy to rebase if you want it on latest main (currently behind 498 but still merges clean). Appreciate your time!

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 14, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 14, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 15, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 16, 2026
@RomneyDa

Copy link
Copy Markdown
Member

Heads up: this PR needs to be updated against current main before the new required Dependency Guard check can pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants