Skip to content

fix(memory): include archived session transcripts in QMD export#43947

Closed
holgergruenhagen wants to merge 2 commits into
openclaw:mainfrom
holgergruenhagen:fix/qmd-export-archived-session-transcripts
Closed

fix(memory): include archived session transcripts in QMD export#43947
holgergruenhagen wants to merge 2 commits into
openclaw:mainfrom
holgergruenhagen:fix/qmd-export-archived-session-transcripts

Conversation

@holgergruenhagen

Copy link
Copy Markdown

Summary

  • include archived session transcript files in listSessionFilesForAgent()
  • keep the scope intentionally small by handling .jsonl.reset.* and .jsonl.deleted.* in the agent sessions root
  • add a unit test covering live files, archived files, and ignored nested/archive entries

Problem

The QMD session export currently only picks up live *.jsonl transcript files.

When a session is reset, OpenClaw renames the transcript to something like:

  • session-id.jsonl.reset.<timestamp>
  • session-id.jsonl.deleted.<timestamp>

After that rename, the transcript disappears from the QMD session export because listSessionFilesForAgent() filters strictly on name.endsWith(".jsonl").

In practice this means older sessions can silently drop out of qmd/sessions/*.md even though the transcript still exists locally.

Why this change

This is the smallest safe fix for the observed data-loss path:

  • live sessions still work as before
  • archived reset/deleted transcripts in the sessions root are exported again
  • nested sessions/archive/... files are still out of scope for this PR on purpose

That broader archive behavior likely deserves a separate product decision around retention/deduplication.

Test

  • pnpm exec vitest run --config vitest.unit.config.ts src/memory/session-files.test.ts

@greptile-apps

greptile-apps Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a silent data-loss path in the QMD session export by extending listSessionFilesForAgent() to include archived transcript files (.jsonl.reset.<timestamp> and .jsonl.deleted.<timestamp>) alongside live .jsonl files. The change is intentionally minimal: a single compiled regex replaces the .endsWith(".jsonl") check, readdir still operates non-recursively so nested sessions/archive/ entries remain out of scope, and the existing call sites are unaffected.

Key observations:

  • The regex /\.jsonl(?:\.(?:reset|deleted)\..+)?$/ is correct and covers all described cases — it properly rejects .jsonl.lock, .jsonl.reset (no trailing segment), and .jsonl.reset. (empty trailing segment), while matching the two new archive patterns.
  • No risk of ReDoS — the regex has no nested quantifiers or catastrophic backtracking paths.
  • The new test uses vi.stubEnv("OPENCLAW_HOME", tmpDir) correctly and verifies all four boundary cases (live, reset, deleted, ignored, nested directory).
  • One downstream consideration worth being aware of: sessionPathForFile() (not changed here) will produce keys like sessions/session.jsonl.reset.2026-03-12T08-50-29.182Z for archived files. Callers that use these as output document names in the QMD export may want to normalise or sanitise them — but that is explicitly noted as out of scope for this PR.

Confidence Score: 5/5

  • This PR is safe to merge — the change is minimal, well-tested, and non-breaking.
  • The modification is a single-line filter replacement backed by a carefully crafted, non-catastrophic regex. Live session handling is unchanged, archived file matching is verified by a dedicated test, and no downstream API surfaces are altered. No logical errors, security issues, or regressions were found.
  • No files require special attention.

Last reviewed commit: 4d4ebb4

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d4ebb4f53

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.filter((entry) => entry.isFile())
.map((entry) => entry.name)
.filter((name) => name.endsWith(".jsonl"))
.filter((name) => SESSION_EXPORT_FILE_RE.test(name))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude archived transcripts from shared session file listing

listSessionFilesForAgent() is now returning *.jsonl.reset.* and *.jsonl.deleted.*, but that helper is also used by syncSessionFiles (src/memory/manager-sync-ops.ts) to build the searchable sessions index, not just by QMD export. In environments with sessions indexing enabled, reset/deleted transcripts are re-indexed as active docs, so content users explicitly reset/delete can still be recalled and index size grows with archive retention; this should be scoped to QMD export or gated behind a separate archive-aware path.

Useful? React with 👍 / 👎.

@holgergruenhagen

Copy link
Copy Markdown
Author

Follow-up after opening this PR: I found related existing threads that I should have linked up front.

Related:

So for maintainers: this PR should be read as a minimal alternative to the broader variants, not as the first report/fix for the problem.

What is intentionally different here:

  • minimal surface area
  • no config/schema changes
  • includes both .jsonl.reset.* and .jsonl.deleted.*
  • only covers files in the agent sessions root
  • deliberately leaves sessions/archive/ out of scope for a separate product decision

Also, the follow-up push fixes the Windows CI failure from the first version of the new test by making the test state-dir-aware (OPENCLAW_STATE_DIR + OPENCLAW_HOME).

If maintainers prefer consolidating on #30273 or #20183 instead of keeping this smaller variant, I'm happy to close this PR.

@vincentkoc

Copy link
Copy Markdown
Member

Thanks for the fix here.

Closing this as superseded by #57446, which carries the archived-session export fix onto the current packages/memory-host-sdk path on top of latest main.

I kept the same small scope: include reset/deleted transcript history without widening into unrelated archive traversal behavior.

@vincentkoc vincentkoc closed this Mar 30, 2026
@holgergruenhagen

Copy link
Copy Markdown
Author

Thanks for picking this up and porting it to the current code path, @vincentkoc! Glad the fix made it into #57446 — happy to see the archived transcript coverage land properly on top of the new memory-host-sdk structure. Appreciate you taking the time to review and carry it forward. 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants