Skip to content

fix(memory): index reset session transcripts#20183

Closed
VACInc wants to merge 5 commits into
openclaw:mainfrom
VACInc:fix/session-index-reset-deleted
Closed

fix(memory): index reset session transcripts#20183
VACInc wants to merge 5 commits into
openclaw:mainfrom
VACInc:fix/session-index-reset-deleted

Conversation

@VACInc

@VACInc VACInc commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: session indexing previously only matched *.jsonl, so reset archives (*.jsonl.reset.<timestamp>) were excluded from recall.
  • Scope: support reset-archive recall without pulling deleted/backup artifacts back into memory.
  • Final behavior: reset archives are opt-in (default remains off).

Final Behavior

  • Included for indexing:
    • *.jsonl
    • *.jsonl.reset.<timestamp> (only when explicitly enabled)
  • Excluded from indexing:
    • *.jsonl.deleted.*
    • *.jsonl.bak.*
    • compounded artifacts such as *.jsonl.reset.<ts>.deleted.<ts> / *.bak.<ts>

Config (Opt-In)

  • QMD path:
    • memory.qmd.sessions.includeResetArchives: true
  • Builtin memory-search sync path:
    • agents.defaults.memorySearch.sync.sessions.includeResetArchives: true

Both default to false to preserve existing safety expectations.

What Changed

  • Added opt-in config plumbing + schema/help/types for reset-archive indexing.
  • Tightened reset archive matching to canonical reset archive names only.
  • Updated sync behavior so newly discovered archived paths are indexed when enabled.
  • Ensured transcript-update emission on reset archive lifecycle paths to avoid stale indexed rows after cleanup/removal.
  • Added/updated targeted tests across session-file filtering, archived sync behavior, qmd/session paths, and cleanup-event behavior.

Codex Review Notes

  • Prior Codex findings about Cron/Mattermost were from earlier polluted branch history and are not part of this PR scope.
  • Memory-related Codex findings are addressed in this branch:
    • reset matching is strict to canonical reset artifacts
    • cleanup/removal paths emit updates needed for index invalidation

Branch Hygiene

  • PR branch is now squashed to a single commit on top of main:
    • 7b3a34266
  • Previous noisy merge/squash history has been force-rewritten out of the PR head.

Verification

  • Tree/content matches the previously validated clean tip (history-only rewrite for squash).
  • Targeted memory/session tests were used during validation of this content:
    • src/memory/session-files.test.ts
    • src/memory/manager.session-delta-archived.test.ts
    • plus related gateway/qmd tests in this diff set

Linked

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M labels Feb 18, 2026
@VACInc VACInc changed the title memory: index reset/deleted session transcripts fix(memory): index reset/deleted session transcripts Feb 18, 2026
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch from db9217e to 645a664 Compare February 20, 2026 09:09
@VACInc

VACInc commented Feb 20, 2026

Copy link
Copy Markdown
Contributor Author

@vignesh07 you might be interested in this fix. I just rebased and squashed when I saw you pushed some changes and remembered your focus on memory enhancements

@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch 4 times, most recently from c0e989c to cb32759 Compare February 25, 2026 03:45
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Feb 26, 2026
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch from 5b930d3 to cb32759 Compare February 26, 2026 18:00
@openclaw-barnacle openclaw-barnacle Bot removed the agents Agent runtime and tooling label Feb 26, 2026
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch 2 times, most recently from 2b1f577 to 210db85 Compare February 27, 2026 20:22
@jaybuidl

Copy link
Copy Markdown

+1 — we hit this exact scenario tonight. After a session cleanup, dozens of .jsonl.deleted.* files were invisible to the indexer. Had to manually mv them back to .jsonl as a workaround before we could even begin indexing them (which itself required working around #20148).

This PR + #20148 together would fully resolve the issue. 🙏

@buyitsydney

Copy link
Copy Markdown
Contributor

+1 from a production deployment.

We have 92 .reset/.deleted session transcript files totaling 37.9 MB that are completely invisible to the memory indexer. This represents ~213% of our currently indexed content — more than double what's searchable is sitting in the dark.

After a session cleanup cycle, memory_search effectively loses access to the majority of historical context. This PR would fix that.

Happy to help test if needed.

@VACInc

VACInc commented Feb 28, 2026

Copy link
Copy Markdown
Contributor Author

Been actively using and debasing since I opened the PR with no issues. Hopefully someone takes a peek soon 🤞

@VACInc

VACInc commented Feb 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks all for the production feedback and +1s it's actually helpful in a place where there are literally thousands of PRs/issues.

I pushed an update to scope this PR to reset archives only. This changes the scope to indexes *.jsonl and *.jsonl.reset.* only and no longer indexes *.jsonl.deleted.*

Why the change?

From the session lifecycle paths:

  • .reset is created when a session is explicitly replaced/reset (for example /new, /reset, or sessions.reset), i.e. “continue conversation lineage from a fresh session”.
  • .deleted is created when sessions are explicitly removed/pruned (for example sessions.delete, maintenance prune/cap in enforce mode, and cron run-session reaping).

So .reset is a continuity artifact, while .deleted is a removal artifact. This PR now targets continuity recall only.

Result

This fixes missing memory recall from reset snapshots while intentionally keeping deleted transcripts out of indexing.

If you're looking for *.jsonl.reset.* files to not get deleted then you should also set the following:

  • session.maintenance.mode: "warn"
  • session.maintenance.resetArchiveRetention: false

If there’s demand for deleted indexing, that should likely be a separate, explicit follow-up (probably as an opt-in and scoped by session type).

@VACInc VACInc changed the title fix(memory): index reset/deleted session transcripts fix(memory): index reset session transcripts Feb 28, 2026
@VACInc

VACInc commented Feb 28, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@VACInc

VACInc commented Feb 28, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8115b99aea

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/memory/session-files.ts Outdated
@greptile-apps

greptile-apps Bot commented Feb 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds opt-in recall of reset-archive transcripts (*.jsonl.reset.<timestamp>) to both the builtin memory-search sync path and the QMD manager. The implementation wires a new includeResetArchives boolean through schema validation, config types, help text, labels, merge logic, and runtime meta tracking — all consistent with existing patterns. Archive lifecycle events (archiveFileOnDisk, cleanupArchivedSessionTranscripts) now emit emitSessionTranscriptUpdate so index entries can be invalidated after cleanup. Two issues were found:

  • emitSessionTranscriptUpdate emitted after a silently-swallowed rm failure (src/gateway/session-utils.fs.ts:269-273): if fs.promises.rm fails and the error is caught, the removal signal still fires — potentially causing the memory manager to drop an index entry for a file that remains on disk.
  • Stale session-row cleanup skipped on full reindex (src/memory/manager-sync-ops.ts:784-789): sessionRowsBefore is set to [] when needsFullReindex is true, making the stale-row deletion loop a no-op. The old code always queried and cleaned stale rows regardless of reindex mode. This is safe only if the parent sync path already bulk-deletes session rows before calling syncSessionFiles — please confirm this assumption or restore the unconditional query for the cleanup pass.

Confidence Score: 3/5

  • Safe to merge only after confirming that stale session rows are cleaned on full reindex and the emitSessionTranscriptUpdate emit is guarded against failed rm calls.
  • Config plumbing, filtering logic, and most of the lifecycle emit work are solid and well-tested. Two logic concerns reduce confidence: (1) the emitSessionTranscriptUpdate call fires even when rm silently fails, sending a false removal signal downstream, and (2) the sessionRowsBefore = [] optimization on full reindex may prevent stale rows from being cleaned up unless a bulk-delete exists elsewhere in the sync path — the test at line 814 should surface this if broken, but the assumption is not documented in the code.
  • src/memory/manager-sync-ops.ts (stale cleanup on full reindex) and src/gateway/session-utils.fs.ts (emit after failed rm).
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 269-273

Comment:
**Transcript update emitted even when `rm` fails silently**

`fs.promises.rm` errors are swallowed by `.catch(() => undefined)`, but `emitSessionTranscriptUpdate` is called unconditionally afterward. If the file removal fails, downstream subscribers (memory index manager) will receive a "file removed" signal for a file that still exists on disk. This can cause the index entry to be deleted and then re-added on the next sync cycle — at minimum, unnecessary churn.

Compare with `archiveFileOnDisk` where `renameSync` throws on failure so the emit is implicitly guarded. The cleanup path should apply the same guard:

```typescript
const removed_ = await fs.promises.rm(fullPath).then(() => true, () => false);
if (removed_) {
  removed += 1;
  if (reason === "reset") {
    emitSessionTranscriptUpdate(fullPath);
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/memory/manager-sync-ops.ts
Line: 784-789

Comment:
**Stale session rows not cleaned up on full reindex**

`sessionRowsBefore` is set to `[]` when `needsFullReindex` is `true`, which means the stale-row cleanup loop at the bottom of `syncSessionFiles` becomes a no-op during full reindexes. Prior to this PR, the `staleRows` query ran *after* indexing regardless of `needsFullReindex`, so it always cleaned up paths no longer in `activePaths`.

The practical impact is visible in the toggle scenario: when `includeResetArchives` flips from `true``false`, a full reindex is triggered (via `metaSessionIncludeResetArchivesDiffers`), `activePaths` no longer includes reset archive paths, but because `sessionRowsBefore = []` the stale reset archive rows are never deleted from the `files`/`chunks` tables.

The test `triggers a full session reindex when includeResetArchives toggles` expects `withoutArchivesFiles` to equal `1` after the toggle, which would fail unless there is a pre-existing bulk-delete of session rows elsewhere in the parent sync path (not visible in this diff). If that bulk-delete exists, this is intentional; if not, it is a regression. Please confirm that the full reindex path clears session rows before calling `syncSessionFiles({ needsFullReindex: true })`, or restore the unconditional query for the cleanup pass:

```typescript
const sessionRowsBefore =
  (this.db.prepare(`SELECT path FROM files WHERE source = ?`).all("sessions") as Array<{
    path: string;
  }>);
const knownPaths = params.needsFullReindex
  ? new Set<string>()
  : new Set(sessionRowsBefore.map((row) => row.path));
```

This preserves the optimization (skip known unchanged paths on incremental sync) while still ensuring stale rows are cleaned up on full reindex.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: cd22347

@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch from fb5961e to 386a672 Compare March 2, 2026 19:28

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 386a672d9d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/memory/session-files.ts Outdated
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch from c5bdf44 to 35d782b Compare March 4, 2026 01:35
@openclaw-barnacle openclaw-barnacle Bot added channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core and removed channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 4, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: L and removed docs Improvements or additions to documentation channel: mattermost Channel integration: mattermost size: XL labels Mar 5, 2026
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch 3 times, most recently from 7b3a342 to 2f8de9b Compare March 5, 2026 15:46

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f8de9bda8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/memory-search.ts
@VACInc

VACInc commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps review
@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@VACInc

VACInc commented Mar 5, 2026

Copy link
Copy Markdown
Contributor Author

@vignesh07 would you mind taking a look?

@VACInc

VACInc commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps review
@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/memory/manager-sync-ops.ts Outdated
Comment thread src/memory/manager-sync-ops.ts
@VACInc VACInc force-pushed the fix/session-index-reset-deleted branch from c38b418 to cd22347 Compare March 12, 2026 17:52
@VACInc

VACInc commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps review
@codex review

Comment thread src/gateway/session-utils.fs.ts Outdated
Comment thread src/memory/manager-sync-ops.ts Outdated
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@VACInc

VACInc commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

@vignesh07 rebased with main and addressed additional bot reviews 🤞

VACInc and others added 3 commits March 12, 2026 14:45
…et-deleted

# Conflicts:
#	src/agents/memory-search.test.ts
#	src/agents/memory-search.ts
#	src/config/schema.help.ts
#	src/config/schema.labels.ts
#	src/config/types.tools.ts
#	src/config/zod-schema.agent-runtime.ts
#	src/memory/manager-sync-ops.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3dcaf0b236

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +267 to +269
if (reason === "reset") {
emitSessionTranscriptUpdate(archived);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update the session-archive reset path too

Fresh evidence: the message-driven reset flow in src/auto-reply/reply/session.ts:571-579 still calls archiveSessionTranscripts from src/gateway/session-archive.fs.ts, and that helper's archiveFileOnDisk never emits emitSessionTranscriptUpdate. When agents.defaults.memorySearch.sync.sessions.includeResetArchives is enabled, a normal /reset creates a new *.jsonl.reset.<ts> file, but the listener in src/memory/manager-sync-ops.ts:443-456 never sees it, so the reset archive is not indexed until some unrelated full/manual sync happens.

Useful? React with 👍 / 👎.

Comment on lines +361 to +362
if (reason === "reset") {
emitSessionTranscriptUpdate(fullPath);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mirror reset-retention invalidation in session-archive.fs

Fresh evidence: regular session-store maintenance still imports cleanupArchivedSessionTranscripts from src/gateway/session-archive.fs.ts (src/config/sessions/store.ts:468-478), while only this copy now emits a transcript update after removing expired reset archives. In deployments that index reset archives, aging them out through the normal store maintenance path leaves stale sessions/*reset* rows behind, so deleted reset content remains searchable until another sync eventually rebuilds the session index.

Useful? React with 👍 / 👎.

@buyitsydney

Copy link
Copy Markdown
Contributor

+1 — this PR fixes a real pain point (92 .reset files / 37.9MB excluded from search). Codex review passed, rebased clean. Would love to see this merged! 🙏

@vignesh07

Copy link
Copy Markdown
Member

Taking a look.

@VACInc VACInc closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants