fix(gateway): preserve chat history across compaction checkpoint chains#68765
fix(gateway): preserve chat history across compaction checkpoint chains#68765cholaolu-boop wants to merge 12 commits into
Conversation
…hain for agent:main:main For `agent:main:main` sessions whose `sessions.json` entry carries a non-empty `compactionCheckpoints[]` chain, `readSessionMessages` only read the single transcript resolved by `resolveSessionTranscriptCandidates`. All predecessor transcripts were silently dropped, so consumers of `chat.history` saw a history truncated at the last compaction boundary, with `seq` reset to 1 inside the trailing segment. Introduce a multi-file transcript resolver that, for `agent:main:main` only, walks `compactionCheckpoints[]` plus the current session id and collects every existing transcript variant (`.jsonl`, `.jsonl.reset.*`, `.jsonl.deleted.*`, `.jsonl.bak`, excluding `.lock`), deduplicating and returning them in `mtimeMs` order. `readSessionMessages` iterates this file list with a single shared `messageSeq` counter so the reassembled history is continuous across the full chain. For non-`agent:main:main` sessions, and when the chained lookup yields no files, `resolveTranscriptReadFiles` falls back to the existing single-file path. Behavior byte-compatible with today's upstream there. Inner parse loop for `message` / `compaction` entries unchanged. No UX boundary markers, no sanitize-pipeline or threshold changes.
Greptile SummaryThis PR adds multi-file transcript resolution to
Confidence Score: 3/5Not safe to merge as-is: the unfiltered reset-file scan can corrupt session history with data from unrelated sessions. The overall approach is architecturally sound and the compaction-chain walking logic is correct. However, the reset-file collection loop does not constrain itself to session IDs in the checkpoint chain, which is a present data-integrity defect — foreign messages will be returned in the history response. This needs to be fixed before merging. src/gateway/session-utils.fs.ts — specifically the resetFiles IIFE in resolveMainSessionTranscriptFiles Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 197-218
Comment:
**Unfiltered reset-file scan includes files from unrelated sessions**
`resetFiles` is built by scanning the entire `sessionsDir` for any filename containing `.jsonl.reset.` — with no check that the name belongs to a session ID in the checkpoint chain. The latest such file is then pushed unconditionally into the transcript list.
In a directory that holds transcripts for multiple sessions (the typical case), this picks up reset archives from completely unrelated sessions. `pushFile` deduplicates by path, but it doesn't filter by session ownership, so a foreign session's archive ends up merged into the current chain. Messages from the wrong session are then re-sequenced and returned to callers of `readSessionMessages`.
The existing `listExistingTranscriptVariantsForSessionId` already collects `.jsonl.reset.*` variants for every session ID in `orderedSessionIds` (because `name.startsWith(`${sessionId}.jsonl.`)` matches them). The separate `latestReset` block is therefore both redundant for in-chain sessions and harmful for out-of-chain ones. The safest fix is to remove the `resetFiles` block entirely and rely on the per-session-ID collection loop below it.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 248-254
Comment:
**`fs.statSync` called inside sort comparators causes repeated syscalls**
Both `listExistingTranscriptVariantsForSessionId` and the final `files.toSorted` in `resolveMainSessionTranscriptFiles` call `fs.statSync` inside their comparator lambdas. A sort over N files triggers O(N log N) comparisons, each issuing a live syscall. After `listExistingTranscriptVariantsForSessionId` pre-sorts its list, `resolveMainSessionTranscriptFiles` immediately re-sorts the same files — doubling the syscall budget unnecessarily.
Pre-compute mtimes into a `Map<string, number>` before sorting and reuse those values in both places, or stat each file once and store the result alongside the path.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(gateway): read session transcripts a..." | Re-trigger Greptile |
| const resetFiles: string[] = (() => { | ||
| try { | ||
| const parsed = JSON.parse(line); | ||
| if (parsed?.message) { | ||
| messageSeq += 1; | ||
| messages.push( | ||
| attachOpenClawTranscriptMeta(parsed.message, { | ||
| ...(typeof parsed.id === "string" ? { id: parsed.id } : {}), | ||
| seq: messageSeq, | ||
| }), | ||
| ); | ||
| return fs | ||
| .readdirSync(context.sessionsDir) | ||
| .filter((name) => name.includes(".jsonl.reset.")) | ||
| .map((name) => path.join(context.sessionsDir, name)) | ||
| .filter((filePath) => fs.existsSync(filePath)) | ||
| .toSorted((a, b) => { | ||
| try { | ||
| return fs.statSync(a).mtimeMs - fs.statSync(b).mtimeMs; | ||
| } catch { | ||
| return a.localeCompare(b); | ||
| } | ||
| }); | ||
| } catch { | ||
| return []; | ||
| } | ||
| })(); | ||
| const latestReset = resetFiles[resetFiles.length - 1]; | ||
| if (latestReset) { | ||
| pushFile(latestReset); | ||
| } |
There was a problem hiding this comment.
Unfiltered reset-file scan includes files from unrelated sessions
resetFiles is built by scanning the entire sessionsDir for any filename containing .jsonl.reset. — with no check that the name belongs to a session ID in the checkpoint chain. The latest such file is then pushed unconditionally into the transcript list.
In a directory that holds transcripts for multiple sessions (the typical case), this picks up reset archives from completely unrelated sessions. pushFile deduplicates by path, but it doesn't filter by session ownership, so a foreign session's archive ends up merged into the current chain. Messages from the wrong session are then re-sequenced and returned to callers of readSessionMessages.
The existing listExistingTranscriptVariantsForSessionId already collects .jsonl.reset.* variants for every session ID in orderedSessionIds (because name.startsWith(${sessionId}.jsonl.) matches them). The separate latestReset block is therefore both redundant for in-chain sessions and harmful for out-of-chain ones. The safest fix is to remove the resetFiles block entirely and rely on the per-session-ID collection loop below it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 197-218
Comment:
**Unfiltered reset-file scan includes files from unrelated sessions**
`resetFiles` is built by scanning the entire `sessionsDir` for any filename containing `.jsonl.reset.` — with no check that the name belongs to a session ID in the checkpoint chain. The latest such file is then pushed unconditionally into the transcript list.
In a directory that holds transcripts for multiple sessions (the typical case), this picks up reset archives from completely unrelated sessions. `pushFile` deduplicates by path, but it doesn't filter by session ownership, so a foreign session's archive ends up merged into the current chain. Messages from the wrong session are then re-sequenced and returned to callers of `readSessionMessages`.
The existing `listExistingTranscriptVariantsForSessionId` already collects `.jsonl.reset.*` variants for every session ID in `orderedSessionIds` (because `name.startsWith(`${sessionId}.jsonl.`)` matches them). The separate `latestReset` block is therefore both redundant for in-chain sessions and harmful for out-of-chain ones. The safest fix is to remove the `resetFiles` block entirely and rely on the per-session-ID collection loop below it.
How can I resolve this? If you propose a fix, please make it concise.| return files.toSorted((a, b) => { | ||
| try { | ||
| return fs.statSync(a).mtimeMs - fs.statSync(b).mtimeMs; | ||
| } catch { | ||
| return a.localeCompare(b); | ||
| } | ||
| }); |
There was a problem hiding this comment.
fs.statSync called inside sort comparators causes repeated syscalls
Both listExistingTranscriptVariantsForSessionId and the final files.toSorted in resolveMainSessionTranscriptFiles call fs.statSync inside their comparator lambdas. A sort over N files triggers O(N log N) comparisons, each issuing a live syscall. After listExistingTranscriptVariantsForSessionId pre-sorts its list, resolveMainSessionTranscriptFiles immediately re-sorts the same files — doubling the syscall budget unnecessarily.
Pre-compute mtimes into a Map<string, number> before sorting and reuse those values in both places, or stat each file once and store the result alongside the path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 248-254
Comment:
**`fs.statSync` called inside sort comparators causes repeated syscalls**
Both `listExistingTranscriptVariantsForSessionId` and the final `files.toSorted` in `resolveMainSessionTranscriptFiles` call `fs.statSync` inside their comparator lambdas. A sort over N files triggers O(N log N) comparisons, each issuing a live syscall. After `listExistingTranscriptVariantsForSessionId` pre-sorts its list, `resolveMainSessionTranscriptFiles` immediately re-sorts the same files — doubling the syscall budget unnecessarily.
Pre-compute mtimes into a `Map<string, number>` before sorting and reuse those values in both places, or stat each file once and store the result alongside the path.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f90c0e0042
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| return fs | ||
| .readdirSync(context.sessionsDir) | ||
| .filter((name) => name.includes(".jsonl.reset.")) |
There was a problem hiding this comment.
Scope reset-archive fallback to the target session
This reset-file scan is global to sessionsDir, so resolveMainSessionTranscriptFiles can prepend the most recent .jsonl.reset.* from an unrelated (or previously reset) session when reading agent:main:main. In practice, once any reset archive exists in the store, readSessionMessages may merge stale/cross-session turns into the current chat history, which breaks reset semantics and can expose messages from the wrong session. The fallback should be constrained to the current session/checkpoint chain instead of using any reset file in the directory.
Useful? React with 👍 / 👎.
… cache stat for sort Addresses review feedback on openclaw#68765 (greptile P1 + codex P1, greptile P2). P1 (data integrity): remove the unscoped `.jsonl.reset.*` scan over `sessionsDir` in `resolveMainSessionTranscriptFiles`. The previous implementation picked the most recent reset archive in the directory without filtering by session ownership, so reset files from unrelated sessions could be merged into the returned `agent:main:main` history. `listExistingTranscriptVariantsForSessionId` already collects `<sessionId>.jsonl.reset.*` variants for every session id in the checkpoint chain (including the current one), so the separate global scan was redundant for in-chain sessions and actively harmful for out-of-chain ones. P2 (performance): pre-compute `fs.statSync().mtimeMs` per file into a Map before sorting the merged transcript list, so each file is stat'd once instead of O(N log N) times inside the comparator. Drop the redundant inner sort in `listExistingTranscriptVariantsForSessionId` now that the single outer sort in `resolveMainSessionTranscriptFiles` is the authoritative ordering step across the full merged list. Behavior: unchanged for any session that has a `.jsonl` or rotated variant matching a session id in its checkpoint chain — which is the designed and tested case. Behavior improvement: reset archives of unrelated sessions are no longer merged into the response.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 619ed80284
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try { | ||
| return fs | ||
| .readdirSync(sessionsDir) | ||
| .filter((name) => name === `${sessionId}.jsonl` || name.startsWith(`${sessionId}.jsonl.`)) |
There was a problem hiding this comment.
Stop reading
.bak archives as live transcript segments
This variant matcher pulls in every file that starts with ${sessionId}.jsonl. so it will include .bak archives, but sessions.compact creates those by renaming the full transcript and then writing a truncated replacement (src/gateway/server-methods/sessions.ts:1631-1633). Reading both files replays the same turns twice (and can reintroduce lines intentionally compacted away), so readSessionMessages can return duplicated/stale history and inflated seq counts after compaction.
Useful? React with 👍 / 👎.
| [...checkpointSessionIds, sessionId].filter((value) => looksLikeGeneratedSessionId(value)), | ||
| ), |
There was a problem hiding this comment.
Remove UUID-only filtering from checkpoint session traversal
The new chain resolver drops any session/checkpoint id that is not UUID-shaped, so those sessions immediately fall back to the old single-file read path. Session IDs are not UUID-only in this codebase (validateSessionId accepts broader IDs in src/config/sessions/paths.ts:61-69), so compacted main sessions with legacy/custom IDs will still lose pre-checkpoint history despite this fix.
Useful? React with 👍 / 👎.
…ID_RE Addresses the second codex review pass on openclaw#68765 (P1 + P2 on commit 619ed80). P1 (data integrity): restrict listExistingTranscriptVariantsForSessionId to the primary `.jsonl` and `.jsonl.reset.*` files. The previous matcher accepted every `${sessionId}.jsonl.*` filename, which included `.bak` archives created by `sessions.compact` as pre-compaction snapshots. Read- ing those alongside the truncated post-compaction `.jsonl` would double- count the compacted turns and re-emit messages that compaction intention- ally removed from the live transcript. Narrowing the matcher to the two known-safe suffixes drops `.bak` (and `.deleted.*`) from the chain. P2 (correctness of chain coverage): replace the UUID-only local `looksLikeGeneratedSessionId` guard with the codebase's own `SAFE_SESSION_ID_RE` from `src/config/sessions/paths.ts`. Session IDs in this codebase are not UUID-only (e.g. `boot-<timestamp>-<shortid>` boot sessions); the previous filter silently excluded them from the chain walk and fell back to the legacy single-file path. Using the already- exported SAFE_SESSION_ID_RE accepts the same set of IDs that `validateSessionId` accepts, so every session id that can legitimately appear in `compactionCheckpoints[]` or as the current sessionId is now walked. Behavior: for `agent:main:main` sessions with non-UUID session IDs, the chain walk now activates (previously inactive, silently falling through to single-file read); for UUID-based sessions, behavior is unchanged except `.bak` / `.deleted.*` siblings are no longer merged. No reset- archive behavior change — `.jsonl.reset.*` still participates in the chain as before.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5842d3a9f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try { | ||
| return fs | ||
| .readdirSync(sessionsDir) | ||
| .filter((name) => name === jsonlName || name.startsWith(resetPrefix)) |
There was a problem hiding this comment.
Exclude reset archives from checkpoint-chain replay
This variant filter includes *.jsonl.reset.* for every checkpoint session id, so a main session that has been reset after compaction can replay pre-reset transcripts: reset preserves compactionCheckpoints (src/gateway/session-reset-service.ts), resolveMainSessionTranscriptFiles walks those old ids, and this line pulls their reset archives back into readSessionMessages. In that flow, chat.history shows messages that were explicitly cleared by reset, which breaks reset semantics.
Useful? React with 👍 / 👎.
| const da = getMtime(a); | ||
| const db = getMtime(b); | ||
| if (da !== null && db !== null) { | ||
| return da - db; |
There was a problem hiding this comment.
Add deterministic tie-break for equal mtimes
The chain sort returns only da - db when both mtimes are present, so equal timestamps produce 0 and preserve the original readdirSync order (which is not guaranteed). On filesystems with coarse timestamp resolution, this can reorder transcript segments nondeterministically, changing replay order and assigned seq values for the same session history. Add a stable secondary key (for example filename or explicit checkpoint order) when da === db.
Useful? React with 👍 / 👎.
…ise sort ties Addresses third codex review pass on openclaw#68765 (both P1 and P2 on commit b5842d3). P1 (reset-semantics correctness): drop .jsonl.reset.* siblings from the compaction-chain replay. resolveMainSessionTranscriptFiles now reads only ${sessionId}.jsonl per checkpoint session id. Upstream session-reset-service.ts preserves compactionCheckpoints[] across a /reset, so including per-session .reset.* archives would resurrect content the user explicitly cleared via reset and violate reset semantics. Reset-archive fallback for the non-chained path remains out of scope for this PR and is handled by openclaw#60409. P2 (determinism): when both compared file mtimes are equal, fall through to a filename localeCompare tie-break instead of returning 0 and relying on readdir order. Filesystems with coarse timestamp resolution (FAT32, some network mounts) or rapid writes can produce identical mtimes; without a secondary key, the replay order and assigned seq values are non-deterministic across runs and platforms. Behavior: for sessions whose chained transcripts have distinct mtimes, unchanged. For sessions with coincident mtimes, deterministic and stable. Reset-after-compaction scenarios no longer resurrect cleared content. No change to the non-chained fallback path. Also removes the now-redundant listExistingTranscriptVariantsForSessionId helper (its single call site used only the primary-jsonl lookup after this change); inlined into resolveMainSessionTranscriptFiles for clarity.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 148bfce6b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sessionFile?: string, | ||
| ): string[] | null { | ||
| const context = resolveSessionMessagesStoreContext(sessionId, storePath, sessionFile); | ||
| if (!context?.sessionsDir || context.key !== "agent:main:main") { |
There was a problem hiding this comment.
Support configured main-session keys in chain replay
Gate the chained-read path on the resolved session being the configured main key, not the hardcoded literal agent:main:main. OpenClaw supports non-default main keys/agents (for example agent:ops:work), and for those installs this condition is always false, so readSessionMessages falls back to single-file reads and still truncates history at compaction boundaries for the actual main session.
Useful? React with 👍 / 👎.
| new Set([...checkpointSessionIds, sessionId].filter((value) => SAFE_SESSION_ID_RE.test(value))), | ||
| ); | ||
| for (const chainedSessionId of orderedSessionIds) { | ||
| const filePath = path.join(context.sessionsDir, `${chainedSessionId}.jsonl`); |
There was a problem hiding this comment.
Load checkpoint snapshots instead of only
<sessionId>.jsonl
Resolve chained transcript files from checkpoint file references, not just path.join(sessionsDir, ${id}.jsonl). Compaction checkpoints persist pre-compaction snapshots as *.checkpoint.*.jsonl, and post-compaction truncation can remove older turns from the live <sessionId>.jsonl; with the current lookup, those snapshot-only turns are never read, so the new chain logic can still return history that drops pre-compaction messages.
Useful? React with 👍 / 👎.
…onFile refs and configured main key
Rewrites resolveMainSessionTranscriptFiles in src/gateway/session-utils.fs.ts
to deliver the stated PR objective instead of being a de-facto no-op.
What was wrong in the previous revision
----------------------------------------
1. Hardcoded main-session-key literal: the chain walk was gated on
`context.key !== "agent:main:main"`. The codebase resolves the main
session key via `resolveMainSessionKey` / `resolveMainSessionKeyFromConfig`,
which respects `session.scope` (`"global"` short-circuit), `session.mainKey`,
and the `agents.default` selection. Any non-default config silently
disabled the chain walk.
2. Wrong file-resolution for chained transcripts: each chained sessionId
was looked up via `path.join(sessionsDir, `${chainedSessionId}.jsonl`)`.
That does not match the on-disk convention. session-compaction-checkpoints.ts
writes pre-compaction snapshots as `<name>.checkpoint.<uuid>.jsonl` and
records the authoritative path on each checkpoint entry under
`preCompaction.sessionFile`. In practice the previous resolver found no
files, returned null, and left callers on the legacy single-file path —
the PR delivered nothing for real compaction chains.
3. Broad variant enumeration: `.jsonl.reset.*`, `.jsonl.bak`, `.jsonl.deleted.*`
siblings were pulled into the chain. `.bak` is a pre-compaction snapshot
already represented by a predecessor checkpoint entry and double-counts
compacted turns. `.reset.*` archives resurface content that the user
explicitly cleared via /reset, breaking reset semantics
(session-reset-service.ts preserves `compactionCheckpoints[]` across a
reset, so the previous code could replay cleared turns for every chained
session id).
What this commit does
---------------------
- Imports `resolveMainSessionKeyFromConfig` from
`../config/sessions/main-session.runtime.js`. The chained read activates
only when `context.key === resolveMainSessionKeyFromConfig()`; on a
config read error the function returns null and the legacy path takes
over. Non-main sessions (isolated, subagent, heartbeat, explicit agent
sessions) stay on the legacy single-file path.
- Walks `entry.compactionCheckpoints` in recorded order. For each entry
reads `preCompaction.sessionFile` and pushes that exact path. Guards
against non-object entries and missing fields.
- Appends `entry.sessionFile` as the newest segment so the chain ends
with live content.
- No readdir scan. No `.reset.*`, `.bak`, or `.deleted.*` inclusion.
- Deterministic sort: ascending mtime with filename localeCompare
tie-break. fs.statSync cached per path (one syscall per file instead
of O(N log N) inside the comparator).
- Dropped SAFE_SESSION_ID_RE import and the UUID-shape filter; no
filenames are constructed from session ids anymore, so path-safety
validation is no longer required.
Review cluster addressed
------------------------
- greptile P1 (unfiltered reset-file scan leaks foreign-session archives):
resolved — no directory scan remains.
- greptile P2 (fs.statSync inside sort comparator): resolved — cached.
- codex P1 on 619ed80 (scope reset-archive fallback to target session):
resolved — no reset fallback in chain path at all.
- codex P1 on b5842d3 (stop reading .bak as live segments): resolved.
- codex P2 on b5842d3 (remove UUID-only filter): resolved — filter
dropped; non-UUID session ids no longer excluded.
- codex P1 on 148bfce (exclude reset archives from replay / reset
semantics): resolved — no reset replay.
- codex P2 on 148bfce (deterministic tie-break): resolved — retained.
- codex P2 line 152 (gate on configured main key, not literal): resolved.
- codex P1 line 182 (resolve chained files from preCompaction.sessionFile,
not ${sessionId}.jsonl): resolved.
Behavior
--------
- Pre-patch (no-op in practice): no regression, no real improvement.
- Post-patch: for the configured main session with a non-empty
`compactionCheckpoints[]` chain, `chat.history` returns messages from
every recorded pre-compaction snapshot plus the current primary
transcript, in mtime order, with a single continuous seq counter.
Reset-cleared content stays cleared. Fallback for non-chained and
non-main sessions is byte-compatible with today's upstream.
Scope strictly limited to src/gateway/session-utils.fs.ts.
Refs openclaw#68765
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c9e2108a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const reference = (preCompaction as Record<string, unknown>).sessionFile; | ||
| if (typeof reference === "string" && reference.trim()) { | ||
| pushFile(reference.trim()); |
There was a problem hiding this comment.
Skip pre-reset checkpoints when rebuilding transcript chain
This loop replays every compactionCheckpoints[*].preCompaction.sessionFile without checking whether the checkpoint belongs to the current post-reset session. Because reset preserves compactionCheckpoints on the new entry (src/gateway/session-reset-service.ts:612), a compacted session that is later reset will have readSessionMessages prepend old checkpoint snapshots and chat.history can show messages the user explicitly cleared. Restrict checkpoint replay to the active session/reset epoch (or otherwise suppress pre-reset checkpoints) before adding these files.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid concern. session-reset-service.ts:612 explicitly preserves compactionCheckpoints[] on the new entry after /reset, so this loop as-is will replay pre-reset checkpoint snapshots if the user resets a compacted session. I have not addressed this in the current PR because the right fix is a data-model decision, not a mechanical change:
- Option A — filter checkpoints by comparing
checkpoint.createdAtagainst a reset timestamp on the entry (requires a newresetAt/resetEpochfield onSessionEntry, since it doesn't currently carry one). - Option B — suppress the chain walk entirely when the entry shows reset markers (requires defining what a "reset marker" on the entry is).
- Option C — change the reset service to drop
compactionCheckpoints[]on reset (conflicts with the intent atsession-reset-service.ts:612, which looks deliberate).
Each needs a maintainer call on what the intended semantics are for "chain history survives reset" vs "reset clears everything". Happy to open a dedicated follow-up PR for whichever direction you pick.
For this PR's narrow scope (restoring the chain read for the configured main session), the P2 canonicalization fix just landed as 6ab4646.
| if (context.key !== configuredMainKey) { | ||
| return null; |
There was a problem hiding this comment.
Canonicalize legacy main keys before enabling chain replay
Chained replay is currently enabled only when the raw store key exactly equals resolveMainSessionKeyFromConfig(). In stores that still use legacy/mixed-case aliases for the same main session key (a case the gateway explicitly supports elsewhere via case-insensitive legacy-key scans), this condition fails and readSessionMessages silently falls back to single-file reads, so compaction checkpoint history is still truncated for those users. Compare against canonicalized/alias-aware key forms instead of strict raw-key equality.
Useful? React with 👍 / 👎.
… replay Addresses codex P2 review comment on 5c9e210 (openclaw#68765#discussion_r3111205457). Chained transcript replay was gated on raw-key equality: `context.key === resolveMainSessionKeyFromConfig()`. Stores with legacy or mixed-case aliases of the configured main session key (`agent:main:main` while the configured default agent is e.g. "ops", `agent:main:<configured mainKey>`, `agent:<configured agent>:main`, the bare "main" / mainKey forms) did not match and silently fell back to the legacy single-file path, so compaction-chain history stayed truncated for those users. Gateway write paths canonicalize the same way (see auto-reply/reply/ session.ts). Replaces the raw comparison with `canonicalizeMainSessionAlias` from `src/config/sessions/main-session.ts`, passed the resolved default agent id and the full `OpenClawConfig`. The canonicalizer: - short-circuits to "global" when session.scope is "global" and the input is any main alias; - collapses the six legacy/aliased main-key shapes to the canonical `agent:<default>:<mainKey>` form; - returns the raw key unchanged for genuine non-main sessions (isolated, subagent, heartbeat, explicit agent sessions). `canonicalContextKey === resolveMainSessionKey(cfg)` is now the gate. Config-load failure still falls through to the legacy path. Imports: +resolveDefaultAgentId from ../agents/agent-scope-config.js, +loadConfig from ../config/config.js, +canonicalizeMainSessionAlias and +resolveMainSessionKey from ../config/sessions/main-session.js, drops the -resolveMainSessionKeyFromConfig shortcut (no longer needed). Deliberately NOT addressed in this commit: codex P1 on the same review (r3111205471, "Skip pre-reset checkpoints when rebuilding transcript chain"). Pre-reset checkpoint filtering is a data-model question about whether checkpoints should carry a reset-epoch tag or whether chain-walk should be suppressed entirely when the entry shows reset markers, both of which are beyond this PR's narrow scope of restoring chain-read for the configured main session. Happy to coordinate a follow-up PR once that approach is agreed. Scope still limited to src/gateway/session-utils.fs.ts. Refs openclaw#68765
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ab4646526
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (targetSessionFile && entrySessionFile && entrySessionFile !== targetSessionFile) { | ||
| continue; |
There was a problem hiding this comment.
Require sessionFile match when target transcript is provided
When sessionFile is passed in, this filter still accepts store rows that have the same sessionId but no record.sessionFile, because the mismatch check only runs when both sides are present. In stores with duplicate/legacy rows for one session id, resolveSessionMessagesStoreContext can return a stale row without a transcript path; resolveMainSessionTranscriptFiles then cannot append the live transcript from entry.sessionFile and may replay only checkpoint snapshots, dropping the newest turns from readSessionMessages.
Useful? React with 👍 / 👎.
| const chained = resolveMainSessionTranscriptFiles(sessionId, storePath, sessionFile); | ||
| if (chained && chained.length > 0) { |
There was a problem hiding this comment.
Avoid unconditional chained-resolver I/O on every message read
resolveTranscriptReadFiles always invokes resolveMainSessionTranscriptFiles, which synchronously parses sessions.json and calls loadConfig() before it can decide that chaining is not applicable. Since readSessionMessages is used on hot request paths, this adds blocking disk I/O and JSON parsing to non-main session reads as well, creating avoidable latency that grows with session-store size.
Useful? React with 👍 / 👎.
…vided Rows with no record.sessionFile bypassed the entrySessionFile guard when targetSessionFile was set, allowing a stale/legacy row to be returned as the store context. Drop the redundant entrySessionFile && guard so any row without a matching sessionFile is skipped when the caller specifies a target file. Refs openclaw#68765
…ckpoints resolveTranscriptReadFiles is on the hot path for every message read, but resolveMainSessionTranscriptFiles parsed sessions.json via the store context loader AND then called loadConfig() + the main-key canonicalizer before deciding whether chaining applied. The existing compactionCheckpoints[] guard sat after the config work, so isolated, subagent, heartbeat, and explicit-agent session reads all paid for config I/O that was always thrown away. Move the checkpoint-array check ahead of the config/canonicalize work. Behavior is unchanged: main sessions with a checkpoint chain still walk the chain, and everything else still falls through to the legacy single-file path. The only difference is that the loadConfig() and canonicalizeMainSessionAlias() costs are now gated on the cheap "entry has compactionCheckpoints" field check, which is false for the vast majority of reads. Refs openclaw#68765
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c877c3ad4b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const reference = (preCompaction as Record<string, unknown>).sessionFile; | ||
| if (typeof reference === "string" && reference.trim()) { | ||
| pushFile(reference.trim()); | ||
| } |
There was a problem hiding this comment.
Ignore pre-restore checkpoints when chaining transcripts
When a user runs sessions.compaction.restore, the store keeps prior compactionCheckpoints while switching entry.sessionFile to a new transcript forked from the selected checkpoint snapshot (src/gateway/server-methods/sessions.ts:1142-1162). This loop always replays every preCompaction.sessionFile, so readSessionMessages reads both the checkpoint snapshot and the restored transcript, duplicating the same turns and inflating chat.history sequence/message counts right after restore. Restrict checkpoint replay to the active restore/reset epoch (or otherwise suppress snapshots that are already represented by the current transcript).
Useful? React with 👍 / 👎.
Add two red-bar fixtures for readSessionMessages that pin the compaction-chain epoch semantics we expect, but that the current replay-everything logic in resolveMainSessionTranscriptFiles does not honor: 1. Compact -> Reset -> Read: performGatewaySessionReset keeps compactionCheckpoints[] on the new entry, so the chain walk resurfaces pre-reset preCompaction.sessionFile snapshots in the replay. The user explicitly archived that content; it must not reappear. 2. Compact -> Restore -> Read: sessions.compaction.restore forks a new transcript from a chosen checkpoint's preCompaction.sessionFile and writes it as the entry.sessionFile while preserving compactionCheckpoints[] (preserveCompactionCheckpoints: true). The chain walk then reads the checkpoint snapshot AND the forked copy, and also replays checkpoints taken between the restore target and the restore point. These tests are intentionally left failing. The fix requires a product decision about reset/restore epoch semantics (prune checkpoints at /reset? suppress post-target checkpoints after restore? stamp a resetEpoch / restoreEpoch on the entry?) rather than another mechanical patch to the chain walker. Once that decision is made, a targeted fix should turn these red tests green. Separate follow-up noted but not addressed here: canonicalizeMainSessionAlias (src/config/sessions/main-session.ts) has no dedicated test file. Legacy alias coverage (agent:main:<mainKey> vs. agent:<default>:<mainKey>, scope=global) should be pinned in its own small test before the chain-walker is hardened. Refs openclaw#68765
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 451fed833e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const checkpoint of rawCheckpoints) { | ||
| if (!checkpoint || typeof checkpoint !== "object" || Array.isArray(checkpoint)) { | ||
| continue; |
There was a problem hiding this comment.
Stop replaying checkpoints from prior reset/restore epochs
resolveMainSessionTranscriptFiles replays every compactionCheckpoints[*].preCompaction.sessionFile without filtering to the active reset/restore epoch, so after /reset or sessions.compaction.restore it can prepend snapshots that were intentionally discarded. In those flows readSessionMessages can resurface cleared history and double-count restored content (checkpoint snapshot + restored fork), which is a correctness/privacy regression in the new chain logic; the newly added tests in src/gateway/session-utils.fs.test.ts for reset/restore are fresh evidence of this failure mode.
Useful? React with 👍 / 👎.
resolveSessionMessagesStoreContext parsed sessions.json directly via JSON.parse(fs.readFileSync(...)) on every call. Switch to the existing loadSessionStore() helper from config/sessions/store.js so the chain walker reuses the shared session-store cache instead of paying a fresh parse for each readSessionMessages call. Pure refactor: no behavior change for single reads; repeated reads now reuse the cached parse. Refs openclaw#68765
Add four green-bar tests that pin the current chain-walk semantics so the behavior can't silently regress under future edits: - skips stale duplicate rows without matching sessionFile: covers the sessionFile-match guard added by df6b239. A duplicate store row for the same sessionId without a matching record.sessionFile must be skipped; only the row whose sessionFile matches targetSessionFile provides the chain. - chains checkpoints for configured main session even when the store key uses a legacy alias: exercises canonicalizeMainSessionAlias on the chain-activation gate. A store keyed "agent:main:main" under a config where the default agent is "ops" and session.mainKey is "work" must still canonicalize to the configured main key and walk the checkpoint chain. - keeps non-main sessions on the legacy single-file fallback even when checkpoints exist: confirms the canonical-key mismatch branch. A non-main store entry with compactionCheckpoints[] must not trigger chain replay; only entry.sessionFile is read. - reuses the cached session store on repeated chained reads: asserts that loadSessionStore's cache eliminates repeated fs.readFileSync calls for the same storePath inside the cache TTL. Refs openclaw#68765
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e26f54940
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const reference = (preCompaction as Record<string, unknown>).sessionFile; | ||
| if (typeof reference === "string" && reference.trim()) { | ||
| pushFile(reference.trim()); |
There was a problem hiding this comment.
Trim checkpoint snapshots at compaction boundaries
This code appends every preCompaction.sessionFile as a full transcript segment, but compaction checkpoints already record firstKeptEntryId precisely because each next transcript keeps an unsummarized tail from the prior one (see src/agents/pi-embedded-runner/session-truncation.ts:84-93). With two or more normal compactions, replaying whole snapshots here duplicates retained turns, so readSessionMessages returns inflated history and incorrect seq progression even without reset/restore flows. The chain reader should cut each snapshot at its boundary instead of concatenating entire files.
Useful? React with 👍 / 👎.
|
Added production-derived main-session checkpoint-chain evidence, plus a focused readSessionMessages regression test for the checkpoint-chain read path. Targeted test is green on this branch. Reset/restore remains intentionally out of scope for this evidence block. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 711c0d1c23
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { | ||
| key, | ||
| entry: value, | ||
| sessionsDir, | ||
| }; |
There was a problem hiding this comment.
Prefer the canonical matching store row before returning
In resolveSessionMessagesStoreContext, the loop returns on the first row whose sessionId (and optional sessionFile) matches, so duplicate rows for the same session can select a stale/non-main entry and stop the scan early. If that first match has no compactionCheckpoints (or a non-main key), resolveMainSessionTranscriptFiles falls back to single-file reads and drops pre-compaction history even when a later canonical main entry has the full checkpoint chain. This regression is triggered by legacy alias duplicates in sessions.json and directly undermines this fix’s intended behavior.
Useful? React with 👍 / 👎.
| const normalized = path.resolve(filePath); | ||
| if (seen.has(normalized) || !fs.existsSync(normalized)) { | ||
| return; |
There was a problem hiding this comment.
Constrain chained transcript paths to the session store root
pushFile trusts compactionCheckpoints[*].preCompaction.sessionFile and entry.sessionFile via path.resolve(...) and only checks existence before reading, which bypasses the containment/sanitization logic used by resolveSessionTranscriptCandidates. With stale or corrupted sessions.json metadata, readSessionMessages can ingest unrelated files outside the active sessions directory and merge them into chat history, exposing wrong-session content instead of safely skipping invalid paths.
Useful? React with 👍 / 👎.
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Close as superseded: current main has a merged maintainer/member product path that preserves bounded active-transcript So I’m closing this here and keeping the remaining discussion on the canonical linked item. Review detailsBest possible solution: Keep the shipped bounded active-history behavior and checkpoint recovery affordance; revisit full checkpoint replay only as a new opt-in API or recovery design with explicit reset, restore, branch, boundary, and containment rules. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: current Is this the best way to solve the issue? No. This PR is not the best current solution because the merged #76437 path preserves bounded active history and points users to checkpoint controls, while this branch adds default replay without the needed privacy and correctness contract. Security review: Security review needs attention: The diff adds default transcript replay from stored checkpoint paths without containment, reset/restore epoch guards, or active-branch selection.
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 686b93e5c710. |
Problem
When
readSessionMessagesis called for the configured main session (asreturned by
resolveMainSessionKey—"global","agent:main:main", orany
agent:<default>:<mainKey>your config resolves to) and that session'ssessions.jsonentry carries a non-emptycompactionCheckpoints[]chain,the function only reads the single transcript file resolved by
resolveSessionTranscriptCandidates(...).find(existsSync).That single-file read discards every pre-compaction transcript referenced
by the chain, so downstream consumers of
readSessionMessages(notablythe
chat.historygateway handler and anything that feeds it) see atruncated history that silently ends at the most recent compaction
checkpoint boundary. Sequence numbers get reassigned starting from
1inside the last segment only, which makes chain-continuity checks (gap
detection, dedup, ordering) report a healthy chain that is in fact
missing all pre-compaction turns.
The bug is not visible in sessions that have never been compacted (their
compactionCheckpoints[]is empty or absent), which is why it survivesbasic smoke tests on fresh sessions.
Repro
compaction checkpoint (verify via
sessions.json→ the entry hascompactionCheckpoints: [{ checkpointId: "...", preCompaction: { sessionFile: "..." }, ... }, ...]).openclaw gateway call chat.history --params '{"sessionKey":"<main_session_key>","limit":200}'.messages[]starts at the most recentpost-compaction segment. All messages from transcripts listed in
compactionCheckpoints[].preCompaction.sessionFileare missing.Sequence numbers begin at
1inside the returned window instead ofbeing continuous across the chain.
Expected: the returned history spans every transcript referenced by the
checkpoint chain plus the current primary transcript, in mtime order,
with a single continuous
seqcounter.Fix
Introduce a chained resolver in
resolveMainSessionTranscriptFilesthatactivates only for the configured main session key and walks the stored
checkpoint chain using the authoritative file references recorded at
compaction time:
sessions.jsonvia a newresolveSessionMessagesStoreContext(sessionId, storePath, sessionFile)helper.
context.key === resolveMainSessionKeyFromConfig(). This respectssession.scope("global"short-circuit),session.mainKey, andagents.default— no hardcoded session-key literal. For any othersession (isolated, subagent, heartbeat, explicit agent session) the
resolver returns
nulland the legacy single-file path applies.entry.compactionCheckpoints[]in recorded order. For eachentry read
checkpoint.preCompaction.sessionFile(theSessionCompactionTranscriptReferencewritten bysession-compaction-checkpoints.tsfor files named like<name>.checkpoint.<uuid>.jsonl) and push that exact path.entry.sessionFileas the newest segment so the chain endswith the live primary transcript.
resolves, return
nulland let the legacy single-file fallback apply.mtimeMsascending, with a filenamelocaleComparetie-break.fs.statSyncresults are cached perpath so each file is stat'd once, not O(N log N) times inside the
comparator.
readSessionMessagesiterates over that file list and keeps a singleshared
messageSeqcounter across all files, so theseqon eachreassembled message is continuous across the full chain.
Scope is intentionally narrow:
readdirSyncsweep of the sessions directory..jsonl.reset.*,.jsonl.bak, or.jsonl.deleted.*sibling files.
.bakis a pre-compaction snapshot that the checkpointchain already represents via predecessor entries (reading both would
double-count compacted turns).
.reset.*archives are content theuser explicitly cleared via
/reset; replaying them would resurfacecleared turns and break reset semantics
(
session-reset-service.tspreservescompactionCheckpoints[]acrossa reset).
comes from the stored
preCompaction.sessionFilereference, so nopath-safety validation on raw session ids is needed.
message/compactionentries.
resolveSessionTranscriptCandidatesmodule.compactionCheckpoints[]chain, and any config-read error all fall through to the existing
single-file path — behavior byte-compatible with today's upstream.
All changes are contained in
src/gateway/session-utils.fs.ts.Test notes for reviewers
Suggested test cases:
compactionCheckpoints[]→ behaviorunchanged, same messages, same
seqassignment.resolveMainSessionKeyFromConfig) witha two-step checkpoint chain whose entries carry
preCompaction.sessionFilepaths → returned messages span bothcheckpoint transcripts plus
entry.sessionFilein mtime order,seqstrictly monotonically increasing from 1 to N with no gaps.
sessionFilediffers from the arg →entry mismatch is detected in
resolveSessionMessagesStoreContextand does not produce chained reads.
context.key !== resolveMainSessionKeyFromConfig()(subagent, heartbeat, explicit agent, or reconfigured main key) →
resolver returns
null, legacy single-file path applies.sessions.json→ store lookup returnsnullor{ sessionsDir }only, the chained resolver returnsnull, legacypath applies.
preCompaction.sessionFile→ entryis skipped, remaining chain entries are still used.
resolveMainSessionKeyFromConfig→ caughtand treated as "not the main session"; legacy path applies.
mtimeMs→ deterministic ordering viafilename
localeComparetie-break.Relationship to #60409
#60409 ("fix(session): fall back to reset archive in chat.history when
primary transcript is missing") addresses a different failure mode in
the same function: it triggers when the primary transcript file is
absent and falls back to the latest
.reset.<timestamp>archive. Itstrigger condition is
!filePath && sessionId.This PR addresses a distinct scenario: the primary transcript is
present, but the returned history misses every pre-compaction
transcript referenced via
compactionCheckpoints[]. Trigger conditionis the existence of a non-empty checkpoint chain on the configured
main session.
The two fixes handle disjoint failure paths and stack additively —
neither removes functionality the other needs. If #60409 merges first,
this PR rebases mechanically: the
resolveTranscriptReadFilesdispatcher keeps the existing single-file resolution plus #60409's
reset-archive fallback intact as the non-chained path, and layers the
compaction-chain walk in front of it for the configured main session.
Happy to coordinate during review.
Out of scope
segments (kept for a possible follow-up).
fix(session): fall back to reset archive in chat.history when primary transcript is missing #60409.
changes in
chat.history.main (subagent, heartbeat, explicit agent sessions) — possible
follow-up, but requires a separate conversation about which session
shapes should carry compaction checkpoints.
mainrebase of this branch (force-push required,reviewer-gated).