fix(memory): resolve archived transcript hits during session visibility filtering (follow-up to #76311, AI-assisted)#76452
Conversation
|
Codex review: needs changes before merge. Summary Reproducibility: yes. Current-main source inspection shows usage-counted archives are listed, then Next step before merge Security Review findings
Review detailsBest possible solution: Keep the archive indexing and owner-aware visibility fix, but preserve legacy live transcript path identity or add an explicit migration/full reindex before landing with focused memory session tests. Do we have a high-confidence way to reproduce the issue? Yes. Current-main source inspection shows usage-counted archives are listed, then Is this the best way to solve the issue? No. The current PR is directionally correct for archive content and visibility, but rewriting all live transcript index paths is not the narrowest maintainable fix; preserve legacy live paths or add an explicit migration/full reindex. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against d89be34360ec. |
a1e1ae0 to
a7fe5b6
Compare
|
Thanks for the review, @clawsweeper — actioned the [P2] findings in force-push What changed in this push
AC re-run (locally, on this SHA)
Security note unchanged: the authoritative visibility guard is untouched. An archive hit for a session the requester cannot see is still dropped — just further downstream at the guard instead of silently at extraction/empty-content. Re-requesting review. @clawsweeper re-review |
…ive hits during session visibility filtering
Two coupled regressions together made post-reset memory_search return
empty on archived session content, even when the live session was
visible to the requester.
1. `buildSessionEntry` short-circuits every archive file (`.reset`,
`.deleted`, `.bak`) to `{ content: '', lineMap: [] }` via
`shouldSkipTranscriptFileForDreaming`. That is correct for opaque
backups and compaction checkpoints, but it also zeros out the two
usage-counted archive kinds (`.jsonl.reset.<iso>` and
`.jsonl.deleted.<iso>`) that are meant to be searchable as the
rotated copy of a real session. No chunks are ever produced from
them, so `memory_search` has nothing to return.
2. `extractTranscriptStemFromSessionsMemoryHit` recognises only
`.jsonl` and `.md` suffixes. Even if archive content is indexed,
hits land on `sessions/<stem>.jsonl.reset.<iso>` paths that the
stem extractor resolves to `null`, so
`filterMemorySearchHitsBySessionVisibility` drops them before the
guard check runs.
Fix
- `shouldSkipTranscriptFileForDreaming` now distinguishes
usage-counted archives from opaque ones: compaction checkpoints and
non-usage-counted archives (`.bak`, legacy store backups) stay
skipped; `.jsonl.reset.<iso>` and `.jsonl.deleted.<iso>` fall through
to the normal JSONL reader and produce searchable content,
`lineMap`, and timestamps like any other session transcript.
- `extractTranscriptStemFromSessionsMemoryHit` learns the archive
suffix shape (`\.jsonl\.(?:reset|deleted)\.<iso>$`) and maps those
hit paths back to the same live `<stem>`. The existing visibility
guard (`resolveTranscriptStemToSessionKeys` →
`createSessionVisibilityGuard`) is then authoritative and gates the
archive hit against the same session entry as a live `.jsonl` hit.
Security / scope
- No change to the authoritative visibility guard. An archive hit for
a session the requester cannot see is still dropped, just further
downstream than before (at the guard instead of silently at stem
extraction / empty-content). `.bak` and compaction checkpoints still
produce no searchable content.
Tests
- `packages/memory-host-sdk/src/host/session-files.test.ts` now
asserts that reset/deleted archives produce real content and
`lineMap`, while `.bak` and compaction checkpoints stay empty.
- `src/plugin-sdk/session-transcript-hit.test.ts` covers both
archive suffixes with and without the `sessions/` prefix, plus a
negative case rejecting unrelated `.jsonl.backup.` shaped suffixes.
Refs openclaw#56131. Follow-up to openclaw#76311 (socket retry).
AI-assisted: Prepared with Claude (Opus 4.7). Validated locally with
`pnpm check` (typecheck/lint/guards EXIT 0) and the exact vitest AC
set from the earlier Codex review (session-transcript-hit,
session-files, session-search-visibility, memory/index) all green.
e215eef to
4881323
Compare
… delta threshold for usage-counted archives so new reset/deleted artifacts index incrementally
Two coupled gaps together meant post-reset `.jsonl.reset.<iso>` /
`.jsonl.deleted.<iso>` archives never entered `chunks` on the
incremental memory sync path, only on a manual
`openclaw memory index --force` full reindex.
1. `archiveFileOnDisk` renamed the live `.jsonl` onto disk but never
emitted a `sessionTranscriptUpdate` event, so the memory sync's
`ensureSessionListener` never learned the archived path existed.
Every other in-process session-file mutation already goes through
`emitSessionTranscriptUpdate` (appendMessage, compaction,
tool-result rewrite, chat transcript inject, command execution,
pi-embedded tool-result truncation, `config/sessions/transcript`
append/truncate plumbing, session tool-result guard); archive was
the sole emit gap.
2. Even with the emit in place, `processSessionDeltaBatch` would
forward the event to `updateSessionDelta` and gate it behind the
`deltaBytes` / `deltaMessages` thresholds (defaults
`agents.defaults.memorySearch.sync.sessions.deltaBytes: 100000` and
`deltaMessages: 50`). Those thresholds are designed for live
transcripts accumulating appended messages; a one-shot archive
rename below the threshold would never be marked dirty and never
trigger a session-delta sync, so the archive would still not
reindex.
Fix
- `src/gateway/session-transcript-files.fs.ts::archiveFileOnDisk`
emits `emitSessionTranscriptUpdate({ sessionFile: archived })` after
`fs.renameSync`. Same pattern as the eight existing emit points.
- `extensions/memory-core/src/memory/manager-sync-ops.ts::`
`processSessionDeltaBatch` now short-circuits the delta accounting
for usage-counted archive artifacts (classified by
`isSessionArchiveArtifactName` + `isUsageCountedSessionTranscriptFileName`),
marking the archive path dirty directly and triggering a session-delta
sync regardless of size. `.jsonl.bak.<iso>` is explicitly NOT treated
as usage-counted and therefore does not bypass \u2014 it stays opaque to
the indexer the same way `buildSessionEntry` already skips it.
- `packages/memory-host-sdk/src/engine-qmd.ts` re-exports the two
classification helpers so memory-core can consume them via the
existing plugin-sdk boundary.
Scope
- No change to indexing policy for live transcripts, the chokidar
watch set, force-reindex behavior, visibility filtering, the
downstream `onSessionTranscriptUpdate` contract, or the archive
`.bak`/compaction-checkpoint skip list. The authoritative visibility
guard remains authoritative.
Tests
- `src/gateway/session-transcript-files.fs.archive-events.test.ts`
asserts archive emit for reset / deleted / bak.
- `extensions/memory-core/src/memory/manager-sync-ops.archive-delta-bypass.test.ts`
locks in the classification invariants the bypass depends on
(reset+deleted bypass, bak explicitly does not, live `.jsonl`
explicitly does not, compaction checkpoint does not), plus an
end-to-end verification that `emitSessionTranscriptUpdate` delivers
the archived path verbatim through the transcript-events bus.
Refs openclaw#56131. Follow-up to openclaw#76311 (socket retry, merged) and openclaw#76452
(archive content indexing + visibility stem). Together with those two
PRs this closes the archive-path memory indexing story: archive files
are read (`openclaw#76452`), mapped back to the live transcript stem
(`openclaw#76452`), the archive event is emitted (this PR), and the memory
sync incremental path now actually reindexes the archive instead of
dropping the event at the delta gate (this PR).
AI-assisted: Prepared with Claude (Opus 4.7). Validated locally with
`pnpm check` EXIT 0 and the two new test files (6 + 8 passed across
all three gateway test environments + extension-memory environment).
Summary
Follow-up to #76311 (socket retry). Closes the other half of the CarHer downstream fix: archived session transcript hits from
memory_searchare currently being dropped by the visibility guard because the stem extractor doesn't recognise the archive suffix.memory_searchreturns no hits on any.jsonl.reset.<iso>or.jsonl.deleted.<iso>archived transcript, even when the user has full visibility on the live session the archive was rotated from.memory_searchqueries returned empty for clearly indexed content.extractTranscriptStemFromSessionsMemoryHitto recognise the two archive suffixes and map them back to the same live stem the archive was rotated from. The existing visibility guard (filterMemorySearchHitsBySessionVisibility→resolveTranscriptStemToSessionKeys) then correctly gates archived hits against the requester's session store entry just like live-transcript hits.buildSessionEntry, no changes to thefilterMemorySearchHitsBySessionVisibilitybody, no changes totools.tsorcitations.test.ts. The upstream already has themessageTimestampsMsplumbing and the visibility filter skeleton — only the stem-suffix recognition was missing.Change Type
Scope
Linked Issue/PR
Root Cause
extractTranscriptStemFromSessionsMemoryHit(hitPath)only recognisesendsWith(".jsonl")andendsWith(".md"). Archived basenames look like<stem>.jsonl.reset.2026-02-16T22-26-33.000Z, so neither branch matches and the function returnsnull.filterMemorySearchHitsBySessionVisibilitythencontinues past the hit without resolving any session keys, and the hit is permanently filtered out.source: "sessions"hits recognise those suffixes.Regression Test Plan
src/plugin-sdk/session-transcript-hit.test.tsextractTranscriptStemFromSessionsMemoryHit("sessions/abc.jsonl.reset.<iso>")returns"abc".jsonl.deleted.<iso>sessions/prefix.jsonl.backup.*, etc.) still returnnull(no false positives).jsonland.md).User-visible / Behavior Changes
memory_searchresults fromsource: "sessions"now include hits on archived transcripts (.jsonl.reset.<iso>/.jsonl.deleted.<iso>) when the live session's visibility allows it, restoring searchability of historical session content after reset.Diagram
Security Impact
Repro + Verification
Environment
Steps
/resetor gateway session rotation) so a.jsonl.reset.<iso>archive lands insessions/.memory_searchwith a query that uniquely matches content only present in the archived transcript.resultsis empty even though the archive is on disk and indexed.Evidence
Local validation on upstream
mainat this commit SHA (Linux, Node 24.14.0, pnpm 10.33.2):pnpm check— EXIT 0 (typecheck, lint, policy guards, import-cycle check, temp-path guardrails all green)npx vitest run src/plugin-sdk/session-transcript-hit.test.ts→ 8 passed (4 pre-existing + 4 new archive-suffix cases)npx vitest run extensions/memory-core/src/session-search-visibility.test.ts→ passing (visibility guard consumes the extractor and sees correctly-extracted stems now)npx vitest run extensions/memory-core/src/tools.citations.test.ts→ 14 passed (no regression in citation surfacing)Human Verification
memory_searchon archived session content returns hits instead of empty. The stem extractor change is covered by targeted unit tests with explicit examples of both archive suffixes and the negative case.sessions/prefix with and without; bare basenames; unrelated.jsonl.backup.*style suffixes (correctly rejected as not-an-archive); existing.jsonl/.mdcases (unchanged).pnpm buildand the fullpnpm testsweep on upstream main at this exact SHA — only the three targeted test files were exercised; CI will cover the rest on the PR.Review Conversations
Compatibility / Migration
Risks and Mitigations
.jsonl.reset.or.jsonl.deleted.mid-name (not as a terminal archive suffix) could be misread as an archive./\.jsonl\.(?:reset|deleted)\.[^/]+$/is anchored to the end of the basename and requires content after the ISO dot, so mid-name false positives cannot match. A unit test explicitly rejectsweird.jsonl.backup.<ts>.zst..jsonlhits; the guard's authoritative check (guard.check(key).allowed) is unchanged.AI-assisted PR: Prepared with Claude (Opus 4.7). Lightly tested — the underlying fix is in active production use downstream; upstream-rebased version has been unit-test-validated against three targeted test files. Full
pnpm testsweep on upstream main is pending CI.