fix(sessions): fall back to latest reset archive for missing async transcripts#76134
fix(sessions): fall back to latest reset archive for missing async transcripts#76134masatohoshino wants to merge 3 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 2, 2026, 1:07 AM ET / 05:07 UTC. Summary PR surface: Source +55, Tests +352, Docs +1. Total +408 across 5 files. Reproducibility: yes. from source inspection: current main's async readers and count paths use active-only transcript resolution while reset archives are valid sibling files. I did not run tests because this was a required read-only review. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge
Security Review detailsDo we have a high-confidence way to reproduce the issue? Yes, from source inspection: current main's async readers and count paths use active-only transcript resolution while reset archives are valid sibling files. I did not run tests because this was a required read-only review. Is this the best way to solve the issue? Mostly yes: the fallback is the narrowest maintainable fix for the missing-active async slice and avoids schema/config expansion. It is not merge-ready until the dirty branch state and release-owned changelog edit are cleaned up. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against ebf20241bd17. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +55, Tests +352, Docs +1. Total +408 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
532865f to
db66238
Compare
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
|
Thanks for the precise root-cause trace. New head Stats/count now uses the same active-or-archive resolution as the readers. Both Archive-only bounded pagination is now covered by tests.
Acceptance criteria run locally on the rebased branch:
Scope remains narrow. Still no The CHANGELOG conflict is also resolved: rebased onto current |
db66238 to
21bd4d3
Compare
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…reads when active is missing When the active session transcript file is missing, async read paths used by chat.history (`readSessionMessagesAsync`, `readRecentSessionMessagesAsync`) and the session title/preview lookup (`readSessionTitleFieldsFromTranscriptAsync`) returned an empty result instead of any archived content. Reset rotates the active `<id>.jsonl` to a sibling `<id>.jsonl.reset.<UTC timestamp>` and creates a new empty active file, so sessions reset before the next read became invisible through the production async API even when the archive still existed on disk. This change adds a narrow auto-fallback on the async paths only: - New `findLatestResetArchiveAsync(candidatePaths)` helper in `session-transcript-files.fs.ts` does an async `readdir` of each candidate parent directory, validates archive entries via the existing `parseSessionArchiveTimestamp(entry, "reset")` primitive, and returns only the path with the newest archive timestamp. - A small internal helper `findActiveOrLatestResetArchiveAsync` in `session-utils.fs.ts` reuses the existing active-candidate `existsSync` check first; only when no active candidate exists does it consult the new helper. - The three async functions above are re-routed through this helper. Behavior boundaries (kept narrow on purpose): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation, no `previousFragments`, no rolled logical-history reconstruction). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are untouched. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
…hive fallback as readers Codex/clawsweeper review on PR openclaw#76134 found that `readRecentSessionMessagesWithStatsAsync` still derived `totalMessages` from `readSessionMessageCountAsync`, which kept the active-only `findExistingTranscriptPath` resolution. After the new async reader fallback, an archive-only transcript longer than the bounded tail window therefore returned the newest messages with low `seq` values and `hasMore = false`. The sessions-history HTTP first page passes `boundedSnapshot.totalMessages` into `buildSessionHistorySnapshot`, so a `nextCursor` follow-up could skip the middle archive page. This commit routes the async count and visit helpers (`readSessionMessageCountAsync` and `visitSessionMessagesAsync`) through the same `findActiveOrLatestResetArchiveAsync` resolution that the message readers use, so an archive-only transcript reports a consistent total, gives messages the right tail-end `seq` values, and exposes the archive's earlier pages via `nextCursor`. Behavior boundaries (kept narrow, same as the original PR): - Active wins when present (no active+archive merging). - Only the single newest archive is returned (no chain aggregation). - Sync paths, `findExistingTranscriptPath`, the protocol schema, the `includeArchived` opt-in, sessions_history schema, the session-memory hook, and the session-logs skill are still untouched. Tests: 4 new cases in `session-utils.fs.test.ts` cover the count and stats helpers (active-only, archive-only with archive longer than the tail window, both-exist active-priority, both-missing) and assert the returned `seq` metadata matches the archive's tail position. One new case in `sessions-history-http.test.ts` exercises the archive-only HTTP pagination path: archive a 3-message transcript, request the first `limit=2` page, follow `nextCursor`, and assert the older archive message is visible without skipping middle pages. Refs openclaw#56131 openclaw#43929 openclaw#45003 openclaw#60409 openclaw#73883
21bd4d3 to
abc9c16
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
<id>.jsonl) is missing, three async readers —readRecentSessionMessagesAsync(called bychat.historyinsrc/gateway/server-methods/chat.ts:1684),readSessionMessagesAsync(called by sessions-history HTTP insrc/gateway/sessions-history-http.ts:171), andreadSessionTitleFieldsFromTranscriptAsync(called by session listing/title insrc/gateway/session-utils.ts:1888) — return an empty result even when a<id>.jsonl.reset.<UTC timestamp>archive sibling exists from a prior reset rotation.<id>.jsonlto<id>.jsonl.reset.<ts>via an atomic rename and only re-creates the active file at the next write, so a session reset before the next read leaves the production async API surfacing nothing without raw-file forensics. References the discoverability concerns in Bug: session reset archives are not discoverable via sessions_history, breaking context recovery after reset #56131 / [Feature]: Script to restore session history archived by new daily-reset mechanism (introduced v2026.2+) #45003 / [Feature]: logical chat history across rolled sessions #43929 and complements the in-flight fix(session): fall back to reset archive in chat.history when primary transcript is missing #60409 (sync) / feat(session): add includeArchived to chat.history and sessions_history #73883 (explicit opt-in) PRs.findLatestResetArchiveAsync(candidatePaths)insrc/gateway/session-transcript-files.fs.ts, one new internal helperfindActiveOrLatestResetArchiveAsyncinsrc/gateway/session-utils.fs.ts, and three async readers re-routed through it. Existing helperparseSessionArchiveTimestamp(entry, "reset")is reused for archive validation.includeArchivedAPI, nosessions_historytool schema, no session-memory hook, no session-logs skill, no sync paths, nofindExistingTranscriptPath, no UI sidebar listing, no compaction checkpoint chains, no active+archive merging, no archive chain aggregation, nopreviousFragments/ chained logical-history.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Why
Refsand notFixes:previousFragmentsand chained logical history, both deliberately out of scope here.Root Cause (if applicable)
findExistingTranscriptPathplus the inlineresolveSessionTranscriptCandidates+existsSyncblock inreadSessionTitleFieldsFromTranscriptAsync) only considers active<id>.jsonlcandidates, even thougharchiveFileOnDisk(session-transcript-files.fs.ts) atomically renames the active to<id>.jsonl.reset.<ts>on reset.cleanupArchivedSessionTranscriptsalready enumerates archives viaparseSessionArchiveTimestamp(entry, "reset"), so core was maintaining the on-disk archive while the read side had no async-path equivalent. The sync side is being addressed in fix(session): fall back to reset archive in chat.history when primary transcript is missing #60409; this PR brings the async side to behavioral parity for the missing-active case only.Regression Test Plan (if applicable)
src/gateway/session-utils.fs.test.ts— newdescribe("async archive fallback (missing-primary auto-fallback)")block..reset.<ts>archives exist, only the newest archive's content is returned (no chain aggregation).[]or{firstUserMessage: null, lastMessagePreview: null}).formatSessionArchiveTimestamp(ms)for deterministic archive names, so the cases run in milliseconds and do not depend onDate.now()ordering.archiveSessionTranscriptsdescribe block elsewhere in the same file covers archive creation but not the read-side fallback.User-visible / Behavior Changes
chat.history, sessions-history HTTP, and session listing/title now return the newest reset-archive content (or its title fields) instead of an empty result. Active-present behavior is intended to remain unchanged and is covered by the active-wins tests above. No new config knob, no schema change, no opt-in flag.Diagram (if applicable)
N/A(single-step decision: "no active candidate exists → consultfindLatestResetArchiveAsync→ read that one path").Security Impact (required)
NoNoNoNoNoConcrete invariant: the fallback uses the same candidate roots as
resolveSessionTranscriptCandidates, only inspects sibling filenames in those candidate parent directories, and accepts only entries whose suffix passes the existingparseSessionArchiveTimestamp(entry, "reset")validation (already used bycleanupArchivedSessionTranscripts). No new directory roots, no caller-controlled path components.Repro + Verification
Environment
~/.openclawlayout / temp store underos.tmpdir()for testsSteps
<id>.jsonlforsessionIdin the sessions store; place a<id>.jsonl.reset.<UTC timestamp>sibling with content.readRecentSessionMessagesAsync,readSessionMessagesAsync({mode:"full"}), andreadSessionTitleFieldsFromTranscriptAsync.<id>.jsonlwith different content and repeat — assert active wins.<id>.jsonl.reset.<later-ts>and re-remove the active file — assert only the newer archive is returned.Expected
Actual
src/gateway/session-utils.fs.test.tsenforce all four invariants.Evidence
upstream/mainHEADbd6035d97707b7a62376a30de2eeae9579d56466): three of the new invariants (missing-active fallback, latest-archive-only, archive-derived title fields) exercise code paths that did not exist before this PR, so without these commits the asserted post-conditions could not be produced. I have not separately cherry-picked the newdescribeblock onto the base commit to demonstrate a direct red run.session-utils.fs.test.tscontinue to pass alongside the 10 new cases (81 / 81 green locally on this branch). Adjacent regression:session-archive.imports.test.ts,session-transcript-key.test.ts,session-history-state.test.ts,sessions-history-http.revocation.test.ts,managed-image-attachments.test.ts,embedded-backend.test.tsall pass alongside.## Real behavior proofsection below.readdirper deduped candidate parent dir, only on the missing-primary slow path; on the fast path the only added work is one extra await).Real behavior proof
Captured 2026-05-17 against this branch rebased onto
upstream/main9616aa6e5a. Branch head SHA after rebase:21bd4d3ed080c6cb75f725dbf14a527f22a6074b.Behavior or issue addressed: when a session has been reset (
<id>.jsonlatomically rotated to<id>.jsonl.reset.<UTC timestamp>with no active file recreated yet), the production async readers consumed bychat.history(readRecentSessionMessagesAsync), sessions-history HTTP (readSessionMessagesAsync+readSessionMessageCountAsync+readRecentSessionMessagesWithStatsAsync), and session listing/title (readSessionTitleFieldsFromTranscriptAsync) used to return an empty response. After this PR they auto-fall back to the newest reset-archive sibling.Real environment tested: Linux x86_64 local dev workspace (
Node v22,pnpm 11.1.0) running against this branch's actual source tree undersrc/gateway/. The fs state was constructed on the real local filesystem (/tmp/proof-archive-async-*), the readers were imported directly fromsrc/gateway/session-utils.fs.tsvia tsx (no mocks, no stubs, no vitest harness), and the sessions-history HTTP path was exercised with the real gateway HTTP server bound to a real loopback port.Exact steps or command run after this patch:
sessions.jsonand a sibling<id>.jsonl.reset.2026-05-01T00-00-00.000Zarchive carrying two messages; do not create the active<id>.jsonl.src/gateway/session-utils.fs.tsdirectly and invoke each public async reader against that fs state.archiveSessionTranscriptshelper (reset reason → atomic rename), and follownextCursorpagination with realfetchcalls.Concrete commands run on the rebased branch:
Evidence after fix:
Terminal transcript from the standalone fs probe (no test framework involved — actual production readers invoked against real disk state):
Live HTTP transcript from the regression scenario (real gateway HTTP server, archive-only state, real
fetchcursor pagination):The seq numbering 2,3 then 1 (rather than 1,2 then end) only holds if
totalMessagesis computed from the archived transcript (3 messages) rather than from the missing active file (0) — so the HTTP pagination path is observably routed through the new archive fallback in both the readers and the count helper.Observed result after fix: every public async surface consumed by
chat.history, sessions-history HTTP, and session listing/title returns the archived content against the archive-only fs state (recent.length=2,full.length=2,messageCount=2,stats.totalMessages=2,title.firstUserMsg="archived: please summarize log A"). Cursor pagination over the same archive-only state crosses both pages without skipping or duplicating, with stable__openclaw.seqnumbering anchored to the archivedtotalMessages. Against the same fs state on the prior commitdb66238880the readers return[]/0/ null title fields and the cursor pagination cannot reach the middle archive page.What was not tested:
chat.historyround trip over the OpenClaw protocol seam against a real running OpenClaw gateway with a real client; the change sits below the protocol layer and the protocol shape is unchanged, so the public readers above are the contract surface.fs.promises.readdir+fs.promises.stat, which are platform-stable.prefers active over archive when both existcases) and the change deliberately does not touch the active path.Human Verification (required)
pnpm test src/gateway/session-utils.fs.test.ts(rebased branch, locally) — 81 / 81 passed.pnpm testfor the consumer-side test files listed in Evidence — all passed.pnpm check:changed— typecheck-all (tsgo), lint (oxlint), no-conflict-markers, no wildcard re-export drift, no runtime sidecar drift, no import cycles, all green locally.findLatestResetArchiveAsyncreturnsnull.readdirfailure on a candidate directory → swallowed by.catch(() => []), fallback continues with other deduped candidate dirs.<basename>.but is not a.reset.<ts>archive → rejected byparseSessionArchiveTimestampreturningnull.>keeps the newer one; same-timestamp edge case is not relied on by any test.chat.historyend-to-end (the change is below the protocol seam; the protocol is unchanged so the unit tests on the three readers are the contract).fs.promises.readdir+fs.promises.statsemantics that are stable across platforms.pnpm test:changedfan-out also surfaced 8 unrelated upstream-pre-existing failures insrc/agents/pi-embedded-runner/compact.hooks.test.ts(itsvi.doMock("../session-write-lock.js", ...)factory does not listresolveSessionWriteLockAcquireTimeoutMs, which was added by upstream commitf7ed29e118"fix: thread session write-lock timeout config"). I confirmed those failures reproduce on barebd6035d97707b7a62376a30de2eeae9579d56466without this PR's commits.Review Conversations
Compatibility / Migration
Yes— async readers return strictly more (a previously-empty result becomes the archive's content) only when no active exists; active-present behavior is intended to remain unchanged and is covered by the active-wins tests.NoNoRisks and Mitigations
parseSessionArchiveTimestampvalidates the suffix againstARCHIVE_TIMESTAMP_REand returnsnullfor non-matching suffixes; rogue entries are skipped before sorting.readdircost on the missing-primary path.seenDirsset deduplicates across overlapping candidate parent directories;readdirerrors are swallowed and the fallback yields no archive (caller's existing default applies).