fix(memory, builtin backend): eagerly preload session transcript listeners at gateway startup#76666
Conversation
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Source inspection on current main shows Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the scoped builtin-backend preload after maintainer review and current-head validation, while keeping qmd listener arming as a separate backend-specific follow-up. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows Is this the best way to solve the issue? Yes for the scoped builtin fix. Preloading the cached non-transient manager before archive events is the narrow maintainable path, and the explicit qmd skip keeps the fix aligned with the lazy fallback-wrapper contract. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against c320da79edd1. |
6ccc895 to
c2cdae0
Compare
|
Addressed @clawsweeper P2 finding in the latest push ( Changes in this revision:
Rationale: the qmd wrapper lazy-constructs its inner Ready for re-review. |
c2cdae0 to
1da8e47
Compare
|
P3 addressed in
Ready for re-review. Re-review progress:
|
c5615a8 to
bbeae3e
Compare
|
FYI on the red check: the 3 failing jobs are all the same upstream/main flake, not from this PR's diff. Failed test: Error chain: This is a vitest Just rebased onto the latest upstream/main (past 9e56cfc + friends) to retrigger CI; if the flake persists after re-run, it's a maintainer side fix, not this PR. My 11 tests in |
bbeae3e to
05638b1
Compare
|
Friendly ping — CI is now clean on the latest head ( This PR is intentionally scoped to the builtin backend startup listener preload; it leaves the qmd follow-up out of scope per review feedback. Ready for maintainer review when you have bandwidth. Thanks! |
…startup
MemoryIndexManager subscribes to sessionTranscriptUpdate events only when it
is instantiated, via ensureSessionListener() inside its constructor. Before
this change the manager was lazy-loaded on the first memory_search tool call,
which means any sessionTranscriptUpdate emit landing earlier in the process
lifetime went to an empty SESSION_TRANSCRIPT_LISTENERS set and was silently
lost.
The most user-visible casualty was the archiveFileOnDisk emit triggered by
/reset and /new: the resulting .jsonl.reset.<iso> / .jsonl.deleted.<iso>
archive file stayed on disk but never entered the chunks table, so
memory_search could not surface it until the user manually ran
`openclaw memory index --force`.
The existing sidecars.memory startup task (startGatewayMemoryBackend) only
runs for the qmd backend (via resolveGatewayMemoryStartupPolicy returning
mode=off on non-qmd), so builtin-backend deployments had no eager-preload
path at all.
Fix
---
Add a sibling startup task sidecars.memory-session-listeners that runs for
every backend. It calls getActiveMemorySearchManager without a purpose
(so MemoryIndexManager.get picks the non-transient 'default' branch and
ensureSessionListener() runs), then deliberately leaves the manager cached
rather than closing it, because close() would unsubscribe the listener we
just attached.
Per-agent failures are swallowed with log.warn; the task as a whole is
wrapped in setImmediate and never blocks gateway boot.
Verification
------------
Independently verified on a builtin-backend deployment running a single
default agent with memorySearch.sources = ["memory", "sessions"]:
- Before this patch, cold-start (container recreated, no prior
memory_search call): /reset archive file generated on disk, but
chunks delta = 0 and files delta = 0 after 3 min of waiting.
- After this patch, identical cold-start: /reset archive indexed
within ~30 s (chunks delta = 41, files +1, new path carries the
.jsonl.reset.<iso> suffix).
Repeated across three consecutive container recreates with stable results.
Refs openclaw#56131.
AI-assisted.
05638b1 to
b3f39fc
Compare
|
Gentle ping @steipete @vincentkoc — this PR is scoped to the builtin-backend startup listener preload and has been sitting ready for ~45h. Status:
Happy to rebase if you want it on latest main (currently behind 498 but still merges clean). Appreciate your time! |
|
Heads up: this PR needs to be updated against current |
Summary (scope narrowed to builtin backend per Codex review)
Eagerly preload
MemoryIndexManagerat gateway startup for every agent whoseagents.defaults.memorySearch.sources(or per-agent overlay) includes"sessions", when using the builtin session-file backend. This armsensureSessionListener()before the firstsessionTranscriptUpdateevent fires, so/resetand/newarchive emits actually land in the listener set and get indexed intochunkswithout the user having to runopenclaw memory index --force.Why it was broken
MemoryIndexManageris lazy-loaded on the firstmemory_searchtool call. If a user types/resetor/newbefore that first search, the builtinarchiveFileOnDiskemit lands in an emptySESSION_TRANSCRIPT_LISTENERSset and is silently dropped, so the resulting.jsonl.reset.<iso>/.jsonl.deleted.<iso>archives never enterchunks. Onlymemory index --forcerecovers them (which is #76311 / #76452's escape hatch but is not a cold-start fix).Why builtin-only (Codex review response)
Thanks @clawsweeper for the P2 catch on the qmd path. The initial revision of this PR treated any successful
getActiveMemorySearchManagerresult as listener-armed, but under the qmd backend that call returns theFallbackMemoryManagerwrapper whose innerMemoryIndexManageris only constructed byfallbackFactory. So on qmd, merely obtaining the wrapper does NOT runensureSessionListener(), and the cold-start race still dropssessionTranscriptUpdateemits.Three options were on the table (per Codex):
MemoryIndexManagerlistener owner even for qmd — requires reaching into the FallbackMemoryManager fallback slot.qmd-manager.ts.This revision takes option 3: added an explicit early-return guard
if (params.cfg.memory?.backend === "qmd") { ... return; }at the top ofstartGatewayMemorySessionListeners, with a log line making the scope decision visible, plus a regression test (skips preload entirely when memory.backend is qmd (Codex review scope guard)). The changelog entry is also retitledMemory/sessions (builtin backend):and calls out the intentional qmd deferral.Independent verification (builtin only)
CarHer builtin-backend deployment (
carher-199baseline). Three cold restarts, same container, identical/resetsequence:chunksdelta after 30smain @ 9a49d143c819HEAD of this PRWhat is NOT included
qmd-manager.tstranscript event handling.FallbackMemoryManagerfallback construction.Refs #56131. AI-assisted.