feat(session): add includeArchived to chat.history and sessions_history#73883
feat(session): add includeArchived to chat.history and sessions_history#73883CadanHu wants to merge 3 commits into
Conversation
Implements PR 3 of the issue openclaw#45003 archive UX roadmap (read-only path): give agents and the chat.history endpoint a way to recover context that lives in .reset.<ts> archives, without resurrecting those archives as live sessions. Changes ------- Protocol (additive, backwards-compatible): - src/gateway/protocol/schema/logs-chat.ts: ChatHistoryParamsSchema gains an optional includeArchived: boolean. Default false preserves all current behavior. Gateway helpers: - src/gateway/session-transcript-files.fs.ts: New listResetArchivesForSession(sessionId, searchDirs) lists every <sessionId>.jsonl.reset.<ts> archive sorted oldest-first. - src/gateway/session-utils.fs.ts: New readSessionMessagesIncludingArchives(sessionId, storePath, sessionFile) chains archive segments before the primary transcript with synthetic system messages tagged __openclaw.kind: "session-reset" so the Web UI / agent can render boundaries. Archive segments are parsed line-by-line; the primary segment delegates to the existing readSessionMessages so it still gets SessionManager active-branch treatment for tree-format transcripts. Gateway handler: - src/gateway/server-methods/chat.ts: chat.history reads includeArchived from params and dispatches to the new aggregator only when explicitly true. Agent tool: - src/agents/tools/sessions-history-tool.ts: SessionsHistoryToolSchema gains optional includeArchived. The tool forwards the flag verbatim to the chat.history call. Tests ----- src/gateway/session-utils.fs.test.ts (6 new tests): - Default behavior unchanged when no archives exist - Archive content surfaced when primary missing - Multiple archives chained chronologically before primary, each with a session-reset boundary marker - Empty result when neither primary nor any archive exists - Other sessions' archives are not pulled in - Empty sessionId short-circuits src/agents/tools/sessions-history-tool.test.ts (2 new tests): - includeArchived=true forwards through to chat.history - includeArchived omitted forwards as false Design notes ------------ This is the read-only path of the openclaw#45003 roadmap. It does not rename, move, or otherwise mutate archive files: archives stay archived, the user opts into visibility for one query at a time. That keeps it consistent with existing "archive lifecycle is one-way" precedent (sessions.delete has no undo) and removes the compaction-chain checkpoint and session-memory-hook double-summary risks that come with any unarchive/restore design. Refs ---- Issue: openclaw#45003 Companion PRs in the same area: - openclaw#60409 (chat.history fallback to most recent reset archive when primary is missing) — different code path, no overlap - openclaw#71537 (session-memory hook + session-logs skill archive support) — different code path, no overlap Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryAdds optional Confidence Score: 4/5Safe to merge; the one flagged issue is a non-blocking P2 about seq uniqueness that won't affect existing callers. No P0 or P1 issues found. The single P2 concern (non-unique __openclaw.seq across chained archive segments) is speculative and only matters if a consumer treats seq as a globally unique identifier, which is a new use-case introduced by this feature. src/gateway/session-utils.fs.ts — readArchivedTranscriptMessages local seq counter Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 213-257
Comment:
**`seq` resets to 0 for every archive segment**
`messageSeq` is declared locally inside `readArchivedTranscriptMessages` and restarts at 1 for each archive file. When `readSessionMessagesIncludingArchives` chains N archives plus the primary, the combined result contains duplicate `__openclaw.seq` values — e.g., the first message from archive-1 and the first message from archive-2 both carry `seq: 1`. If any consumer (Web UI renderer, deduplication logic, React key prop) relies on `seq` being unique across the combined list, this will produce silent bugs or duplicate-key warnings.
Consider threading a shared offset into the helper so the combined output has monotonically increasing `seq` across all segments:
```typescript
function readArchivedTranscriptMessages(filePath: string, seqOffset = 0): { messages: unknown[]; nextSeq: number } {
// ...
let messageSeq = seqOffset;
// ...
return { messages, nextSeq: messageSeq };
}
```
And in `readSessionMessagesIncludingArchives`:
```typescript
let seqOffset = 0;
for (const archive of archives) {
const { messages, nextSeq } = readArchivedTranscriptMessages(archive.path, seqOffset);
combined.push(...messages);
seqOffset = nextSeq;
// push boundary marker ...
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat(session): add includeArchived to ch..." | Re-trigger Greptile |
| function readArchivedTranscriptMessages(filePath: string): unknown[] { | ||
| let lines: string[]; | ||
| try { | ||
| lines = fs.readFileSync(filePath, "utf-8").split(/\r?\n/); | ||
| } catch { | ||
| return []; | ||
| } | ||
| const messages: unknown[] = []; | ||
| let messageSeq = 0; | ||
| for (const line of lines) { | ||
| if (!line.trim()) { | ||
| continue; | ||
| } | ||
| try { | ||
| const parsed = JSON.parse(line); | ||
| if (parsed?.message) { | ||
| messageSeq += 1; | ||
| messages.push( | ||
| attachOpenClawTranscriptMeta(parsed.message, { | ||
| ...(typeof parsed.id === "string" ? { id: parsed.id } : {}), | ||
| seq: messageSeq, | ||
| }), | ||
| ); | ||
| continue; | ||
| } | ||
| if (parsed?.type === "compaction") { | ||
| const ts = typeof parsed.timestamp === "string" ? Date.parse(parsed.timestamp) : Number.NaN; | ||
| const timestamp = Number.isFinite(ts) ? ts : Date.now(); | ||
| messageSeq += 1; | ||
| messages.push({ | ||
| role: "system", | ||
| content: [{ type: "text", text: "Compaction" }], | ||
| timestamp, | ||
| __openclaw: { | ||
| kind: "compaction", | ||
| id: typeof parsed.id === "string" ? parsed.id : undefined, | ||
| seq: messageSeq, | ||
| }, | ||
| }); | ||
| } | ||
| } catch { | ||
| // ignore malformed lines | ||
| } | ||
| } | ||
| return messages; |
There was a problem hiding this comment.
seq resets to 0 for every archive segment
messageSeq is declared locally inside readArchivedTranscriptMessages and restarts at 1 for each archive file. When readSessionMessagesIncludingArchives chains N archives plus the primary, the combined result contains duplicate __openclaw.seq values — e.g., the first message from archive-1 and the first message from archive-2 both carry seq: 1. If any consumer (Web UI renderer, deduplication logic, React key prop) relies on seq being unique across the combined list, this will produce silent bugs or duplicate-key warnings.
Consider threading a shared offset into the helper so the combined output has monotonically increasing seq across all segments:
function readArchivedTranscriptMessages(filePath: string, seqOffset = 0): { messages: unknown[]; nextSeq: number } {
// ...
let messageSeq = seqOffset;
// ...
return { messages, nextSeq: messageSeq };
}And in readSessionMessagesIncludingArchives:
let seqOffset = 0;
for (const archive of archives) {
const { messages, nextSeq } = readArchivedTranscriptMessages(archive.path, seqOffset);
combined.push(...messages);
seqOffset = nextSeq;
// push boundary marker ...
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/session-utils.fs.ts
Line: 213-257
Comment:
**`seq` resets to 0 for every archive segment**
`messageSeq` is declared locally inside `readArchivedTranscriptMessages` and restarts at 1 for each archive file. When `readSessionMessagesIncludingArchives` chains N archives plus the primary, the combined result contains duplicate `__openclaw.seq` values — e.g., the first message from archive-1 and the first message from archive-2 both carry `seq: 1`. If any consumer (Web UI renderer, deduplication logic, React key prop) relies on `seq` being unique across the combined list, this will produce silent bugs or duplicate-key warnings.
Consider threading a shared offset into the helper so the combined output has monotonically increasing `seq` across all segments:
```typescript
function readArchivedTranscriptMessages(filePath: string, seqOffset = 0): { messages: unknown[]; nextSeq: number } {
// ...
let messageSeq = seqOffset;
// ...
return { messages, nextSeq: messageSeq };
}
```
And in `readSessionMessagesIncludingArchives`:
```typescript
let seqOffset = 0;
for (const archive of archives) {
const { messages, nextSeq } = readArchivedTranscriptMessages(archive.path, seqOffset);
combined.push(...messages);
seqOffset = nextSeq;
// push boundary marker ...
}
```
How can I resolve this? If you propose a fix, please make it concise.|
Codex review: needs real behavior proof before merge. Reviewed May 30, 2026, 12:55 AM ET / 04:55 UTC. Summary PR surface: Source +168, Tests +183, Other +8. Total +359 across 10 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against b9933b2ec119. Label changesLabel justifications:
Evidence reviewedPR surface: Source +168, Tests +183, Other +8. Total +359 across 10 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
…udeArchived Updates the auto-generated GatewayModels.swift in both apps/macos and apps/shared/OpenClawKit to reflect the new optional includeArchived field added to ChatHistoryParamsSchema. Run: pnpm protocol:gen && pnpm protocol:gen:swift Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous run failed in command-execution-startup.test.ts:75 (banner-emit assertion) which passes locally 3/3 — flaky CI run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…reads when active is missing When the active session transcript file is missing, async read paths used by chat.history (`readSessionMessagesAsync`, `readRecentSessionMessagesAsync`) and the session title/preview lookup (`readSessionTitleFieldsFromTranscriptAsync`) returned an empty result instead of any archived content. Reset rotates the active `<id>.jsonl` to a sibling `<id>.jsonl.reset.<UTC timestamp>` and creates a new empty active file, so sessions reset before the next read became invisible through the production async API even when the archive still existed on disk. This change adds a narrow auto-fallback on the async paths only: - New `findLatestResetArchiveAsync(candidatePaths)` helper in `session-transcript-files.fs.ts` does an async `readdir` of each candidate parent directory, validates archive entries via the existing `parseSessionArchiveTimestamp(entry, "reset")` primitive, and returns only the path with the newest archive timestamp. - A small internal helper `findActiveOrLatestResetArchiveAsync` in `session-utils.fs.ts` reuses the existing active-candidate `existsSync` check first; only when no active candidate exists does it consult the new helper. - The three async functions above are re-routed through this helper. Behavior boundaries (kept narrow on purpose): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation, no `previousFragments`, no rolled logical-history reconstruction). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are untouched. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…reads when active is missing When the active session transcript file is missing, async read paths used by chat.history (`readSessionMessagesAsync`, `readRecentSessionMessagesAsync`) and the session title/preview lookup (`readSessionTitleFieldsFromTranscriptAsync`) returned an empty result instead of any archived content. Reset rotates the active `<id>.jsonl` to a sibling `<id>.jsonl.reset.<UTC timestamp>` and creates a new empty active file, so sessions reset before the next read became invisible through the production async API even when the archive still existed on disk. This change adds a narrow auto-fallback on the async paths only: - New `findLatestResetArchiveAsync(candidatePaths)` helper in `session-transcript-files.fs.ts` does an async `readdir` of each candidate parent directory, validates archive entries via the existing `parseSessionArchiveTimestamp(entry, "reset")` primitive, and returns only the path with the newest archive timestamp. - A small internal helper `findActiveOrLatestResetArchiveAsync` in `session-utils.fs.ts` reuses the existing active-candidate `existsSync` check first; only when no active candidate exists does it consult the new helper. - The three async functions above are re-routed through this helper. Behavior boundaries (kept narrow on purpose): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation, no `previousFragments`, no rolled logical-history reconstruction). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are untouched. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…reads when active is missing When the active session transcript file is missing, async read paths used by chat.history (`readSessionMessagesAsync`, `readRecentSessionMessagesAsync`) and the session title/preview lookup (`readSessionTitleFieldsFromTranscriptAsync`) returned an empty result instead of any archived content. Reset rotates the active `<id>.jsonl` to a sibling `<id>.jsonl.reset.<UTC timestamp>` and creates a new empty active file, so sessions reset before the next read became invisible through the production async API even when the archive still existed on disk. This change adds a narrow auto-fallback on the async paths only: - New `findLatestResetArchiveAsync(candidatePaths)` helper in `session-transcript-files.fs.ts` does an async `readdir` of each candidate parent directory, validates archive entries via the existing `parseSessionArchiveTimestamp(entry, "reset")` primitive, and returns only the path with the newest archive timestamp. - A small internal helper `findActiveOrLatestResetArchiveAsync` in `session-utils.fs.ts` reuses the existing active-candidate `existsSync` check first; only when no active candidate exists does it consult the new helper. - The three async functions above are re-routed through this helper. Behavior boundaries (kept narrow on purpose): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation, no `previousFragments`, no rolled logical-history reconstruction). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are untouched. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Implements the read-only path of #45003's 3-PR roadmap: give agents and
chat.historycallers a way to recover context that lives in.reset.<ts>archives, without resurrecting those archives as live sessions.This PR closes the three concrete upstream gaps identified by clawsweeper / Codex review on issue #45003 (2026-04-28T22:20:54Z):
Changes
Protocol (additive, backwards-compatible)
src/gateway/protocol/schema/logs-chat.ts—ChatHistoryParamsSchemagains an optionalincludeArchived: boolean. Defaultfalsepreserves all current behavior.Gateway helpers
src/gateway/session-transcript-files.fs.ts— newlistResetArchivesForSession(sessionId, searchDirs)lists every<sessionId>.jsonl.reset.<ts>archive sorted oldest-first.src/gateway/session-utils.fs.ts— newreadSessionMessagesIncludingArchives(sessionId, storePath, sessionFile)chains archive segments before the primary transcript with synthetic system messages tagged__openclaw.kind: "session-reset"so callers can render dividers. Archive segments are parsed line-by-line; the primary segment delegates to the existingreadSessionMessagesso it still getsSessionManageractive-branch treatment for tree-format transcripts.Gateway handler
src/gateway/server-methods/chat.ts—chat.historyreadsincludeArchivedfrom params and dispatches to the new aggregator only when explicitlytrue.Agent tool
src/agents/tools/sessions-history-tool.ts—SessionsHistoryToolSchemagains optionalincludeArchived. The tool forwards the flag verbatim to thechat.historycall.Tests
src/gateway/session-utils.fs.test.ts— 6 new tests:sessionIdshort-circuitssrc/agents/tools/sessions-history-tool.test.ts— 2 new tests verifying the param is plumbed through from tool →chat.history.Design notes
This is the read-only path of the #45003 roadmap. It does not rename, move, or otherwise mutate archive files: archives stay archived, the user opts into visibility for one query at a time. That keeps it consistent with the existing "archive lifecycle is one-way" precedent (
sessions.deletehas no undo) and avoids the compaction-chain checkpoint and session-memory-hook double-summary risks that come with any unarchive/restore design.includeArchivedis also a user-explicit opt-in (not an automatic fallback), so it is consistent with/resetsemantics: the user is asking for the recovery view, not having archived content silently re-injected into automated paths.Verification
pnpm test src/gateway/session-utils.fs.test.ts— 60/60 pass (54 originals + 6 new).pnpm tsgo:coreintroduces 0 new errors on the touched files (verified by stash-toggle baseline comparison).:18789against the local production data dir;chat.historyRPC withincludeArchived: truereturns chained archive segments withsession-resetboundary markers, while the default (false/ omitted) returns the same set as before.Related work
chat.historyfallback to most recent reset archive when primary is missing (PR 1 of the roadmap; in flight, CI green).session-memoryhook +session-logsskill archive support (touchessrc/hooks/bundled/session-memory/*, no overlap with this PR).