feat(core): persist file history snapshots for cross-session /rewind (T2.1)#4897
Conversation
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Hi @doudouOUC — thanks for working on file-history snapshot persistence, this is a meaningful feature! 👋
However, this PR doesn't follow our PR template. A few sections are missing or renamed:
- "What this PR does" and "Why it's needed" — your
## Summarycovers what, but the why (motivation, user-facing benefit) should be its own section. - "Reviewer Test Plan" — this is the most important missing piece. Reviewers need
How to verifysteps,Evidence (Before & After)(tmux output or screenshots showing/rewindworking across session resume), and theTested onOS table. Without this, review may be significantly delayed. - "Risk & Scope" — your "Known Limitations" section is useful, but the template asks for main risk/tradeoff, what's out of scope, and breaking changes in a structured format.
- "Linked Issues" — mention of PR #4253 is inline but there's no dedicated section with
Closes #N/Fixes #N. - "中文说明" — the bilingual
<details>block is missing.
Could you reformat the PR body to match the template? The content is good — it just needs to be in the right structure so reviewers can evaluate it efficiently. 🙏
— Qwen Code · qwen3.7-max
There was a problem hiding this comment.
Pull request overview
This PR adds persistent file-history snapshot recording to session JSONL logs so /rewind can function across session resume, including restoring the snapshot chain on load and copying backup files when sessions are forked. It also introduces a stable session_resume capability tag while keeping the prior unstable alias.
Changes:
- Persist
FileHistorySnapshotdata asfile_history_snapshotsystem records, and restore snapshots duringSessionService.loadSession()/Config.getFileHistoryService()on resume. - Copy file-history backup files on session fork (hard-link with copy fallback) and validate restored backups with bounded parallel stats.
- Add
session_resumecapability tag (aliasingunstable_session_resume) and update CLI/ACP call sites to pass surviving snapshots during rewind.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/services/sessionService.ts | Extract persisted snapshot records on resume; copy backup files on session fork. |
| packages/core/src/services/fileHistoryService.ts | Export snapshot constants; add snapshot (de)serialization and restored-backup validation. |
| packages/core/src/services/chatRecordingService.ts | Add file_history_snapshot record subtype and snapshot recording APIs; re-record snapshots after rewind. |
| packages/core/src/core/client.ts | Record latest snapshot to JSONL after makeSnapshot() on user turns. |
| packages/core/src/config/config.ts | Restore snapshots into FileHistoryService and kick off validation on resume. |
| packages/cli/src/ui/AppContainer.tsx | Pass surviving snapshots into rewindRecording() during UI rewind. |
| packages/cli/src/serve/server.test.ts | Update expected capabilities to include session_resume. |
| packages/cli/src/serve/capabilities.ts | Register session_resume and keep unstable_session_resume as deprecated alias. |
| packages/cli/src/acp-integration/session/Session.ts | Pass surviving snapshots into rewindRecording() during ACP rewind. |
| packages/cli/src/acp-integration/session/Session.test.ts | Mock getFileHistoryService().getSnapshots() for updated rewind signature usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks — reformatted the PR body to match the template. Added: What/Why sections, Risk & Scope, Linked Issues, Reviewer Test Plan with verification steps, Tested-on table, and 中文说明 details block. |
|
Review round summary (Copilot)
Resolved 4/4 threads. |
wenshao
left a comment
There was a problem hiding this comment.
Additional notes (not on diff lines)
[Suggestion] packages/webui/src/daemon/session/DaemonSessionProvider.tsx:622 — console.warn('[DaemonSessionProvider] SSE stream ended') fires on every normal SSE close. Other console.warn calls in this file are reserved for actual failures. Consider removing or using a debug logger.
(Posted as body because this file is not in the GitHub PR diff.)
— qwen3.7-max via Qwen Code /review
|
Review round summary (wenshao R1)
Resolved: 3 fixed + 2 pushed back + 1 deferred = all threads replied. |
|
Review round summary (wenshao R2)
Resolved 3/3 threads. |
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
1 similar comment
|
Qwen Code review did not complete successfully: Qwen review timed out after 55 minutes. See workflow logs. |
Local runtime verification reportVerified locally on Linux x64 / Node v22.22.2: PR head TL;DR
Verified working (PASS)
Finding 1 — daemon sessions never make snapshots (Reviewer Test Plan §4 cannot pass)
Observed on the live PR daemon:
To be fair, this gating predates the PR — the same condition already leaves the #4820 in-process daemon rewind endpoints with an empty snapshot list. But the PR's stated goal (unblocking T2.1 daemon resume + Finding 2 — TUI post-resume promptId collision corrupts the persisted chain (new)TUI prompt ids are A rewind targeting "turn 0" in that session silently restores v3 (the newest content) instead of the original — a silent wrong-content restore, worse than an honest failure. Pre-PR the collision existed only in memory; with persistence it becomes durable and survives every later resume. Suggested fix: seed the TUI prompt counter from the resumed conversation (mirror Finding 3 — "last turn may be incomplete" is actually per-file-first-editA file's pre-first-edit backup (v1, captured by Finding 4 — no user-reachable surface performs cross-session file restore yet
Unit-test attribution
Suggested pre-merge actions
If 1–2 are tracked as immediate follow-ups instead, the persistence layer itself is sound and mergeable on its own merits — but the PR description should then stop implying the daemon E2E (§4) currently works. Repro notes
|
|
Fixes for verification findings (wenshao runtime report)
|
|
Qwen Code review did not complete successfully: Qwen review timed out after 85 minutes. See workflow logs. |
doudouOUC
left a comment
There was a problem hiding this comment.
Critical
forkSession resurrects abandoned-branch snapshot records after rewind (packages/core/src/services/sessionService.ts)
forkSession rebuilds a linear parentUuid chain from ALL JSONL records, including abandoned-branch ones severed by rewindRecording. After a rewind, file_history_snapshot records from discarded turns remain in the JSONL file. The original session excludes them (unreachable via reconstructHistory), but forkSession resurrects them into the forked session's snapshot chain. The dedup logic only covers promptIds present in the rewind batch record, so snapshots from turns past the rewind point leak through.
Scenario: Session with 5 turns S0–S4 → rewind to turn 2 (S3/S4 orphaned in JSONL) → fork session → forked session loads [S0', S1', S2', S3, S4] instead of [S0', S1', S2']. Subsequent /rewind in the forked session can target wrong snapshots.
Fix: Have forkSession use reconstructHistory() active-branch traversal (same as loadSession) rather than rebuilding a linear chain from all JSONL records.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| } | ||
| const cappedSnapshots = | ||
| fileHistorySnapshots.length > MAX_SNAPSHOTS | ||
| ? fileHistorySnapshots.slice(-MAX_SNAPSHOTS) |
There was a problem hiding this comment.
[Suggestion] Snapshot capping at MAX_SNAPSHOTS does not call cleanupOrphanedBackups
When loadSession caps snapshots with slice(-MAX_SNAPSHOTS), the dropped snapshots' backup files are never cleaned up from disk. makeSnapshot and rewind both call cleanupOrphanedBackups after truncation, but this path doesn't.
| ? fileHistorySnapshots.slice(-MAX_SNAPSHOTS) | |
| ? fileHistorySnapshots.slice(-MAX_SNAPSHOTS) | |
| : fileHistorySnapshots; | |
| // TODO: call fileHistoryService.cleanupOrphanedBackups() for dropped snapshots |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
doudouOUC
left a comment
There was a problem hiding this comment.
Overall: well-structured persistence layer for file history snapshots. One Critical correctness issue found around MAX_SNAPSHOTS front-trim desynchronizing snapshot array indices from turn indices — affects sessions exceeding 100 user turns that are resumed then rewound.
| } | ||
| } | ||
| } | ||
| const cappedSnapshots = |
There was a problem hiding this comment.
[Critical] slice(-MAX_SNAPSHOTS) drops oldest snapshots from the front, but slice(0, targetTurnIndex + 1) in Session.ts:571 and AppContainer.tsx:~2635 still assumes snapshot[i] corresponds to turn i. After front-trimming, the resulting array no longer starts at turn 0.
Example: 150 turns → capped to last 100 snapshots (turns 50–149). Rewinding to turn 80 → slice(0, 81) returns snapshots for turns 50–130 instead of 50–80, silently restoring wrong file content.
| const cappedSnapshots = | |
| const baseTurnIndex = Math.max(0, fileHistorySnapshots.length - MAX_SNAPSHOTS); | |
| const cappedSnapshots = | |
| fileHistorySnapshots.length > MAX_SNAPSHOTS | |
| ? fileHistorySnapshots.slice(-MAX_SNAPSHOTS) | |
| : fileHistorySnapshots; |
Return baseTurnIndex alongside cappedSnapshots in ResumedSessionData, and have callers adjust: slice(0, targetTurnIndex + 1 - baseTurnIndex).
— qwen3.7-max via Qwen Code /review
| } | ||
|
|
||
| recordFileHistorySnapshotBatch(snapshots: FileHistorySnapshot[]): void { | ||
| if (snapshots.length === 0) return; |
There was a problem hiding this comment.
[Suggestion] recordFileHistorySnapshot() and recordFileHistorySnapshotBatch() — the sole mechanism for persisting file history snapshots to JSONL — have no test coverage. A regression in record structure (wrong subtype, missing systemPayload, bad serialization) would silently break /rewind across sessions.
Suggested tests:
recordFileHistorySnapshotwrites a single-entry batch record with correctsubtype: 'file_history_snapshot'and serializedsystemPayloadrecordFileHistorySnapshotBatch([])is a no-op- Write errors are caught without throwing
— qwen3.7-max via Qwen Code /review
|
|
||
| // Re-record surviving file history snapshots on the active branch so | ||
| // they are visible to reconstructHistory on resume. | ||
| if (survivingFileHistorySnapshots?.length) { |
There was a problem hiding this comment.
[Suggestion] The new 3-arg branch of rewindRecording (re-recording surviving snapshots on the active branch after rewind) has no test coverage. The existing test at line 669 uses the old 2-arg call, and Session.test.ts passes [] as the third arg — neither exercises the if (survivingFileHistorySnapshots?.length) branch.
If this branch breaks, resumed sessions lose file history state for surviving turns after rewind.
Suggested test: call rewindRecording(targetTurnIndex, payload, [mockSnapshot]) and verify the batch record is written alongside the rewind record.
— qwen3.7-max via Qwen Code /review
| const survivingSnapshots = fileHistoryService | ||
| .getSnapshots() | ||
| .slice(0, targetTurnIndex + 1); | ||
|
|
There was a problem hiding this comment.
[Suggestion] restoreFromSnapshots(survivingSnapshots) replaces in-memory state but never calls cleanupOrphanedBackups for the removed snapshots' backup files. The TUI path uses fileHistoryService.rewind() which calls cleanupOrphanedBackups(removed) explicitly. Over repeated ACP rewinds in long-lived daemon sessions, orphaned backup files accumulate on disk.
Consider exposing cleanupOrphanedBackups and calling it here after restoreFromSnapshots, or having restoreFromSnapshots internally detect and clean up orphans:
const allSnapshots = fileHistoryService.getSnapshots();
const survivingSnapshots = allSnapshots.slice(0, targetTurnIndex + 1);
const removed = allSnapshots.slice(targetTurnIndex + 1);
fileHistoryService.restoreFromSnapshots(survivingSnapshots);
fileHistoryService.cleanupOrphanedBackups(removed);— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Already addressed — orphaned backup cleanup deferred in wenshao review. Bounded by MAX_SNAPSHOTS, cleaned up by subsequent makeSnapshot calls.
| // Snapshot file state before this turn (mirrors the makeSnapshot | ||
| // block in GeminiClient.sendMessageStream). Placed after | ||
| // slash-command and hook early-returns so locally handled commands | ||
| // don't create phantom snapshots that desync the snapshot index. |
There was a problem hiding this comment.
[Suggestion] This makeSnapshot + recordFileHistorySnapshot block (lines 1012–1028) is nearly identical to the one in packages/core/src/core/client.ts:1624-1643 — same 15-line structure with identical nested try/catch and identical error messages. The comment above even acknowledges it "mirrors the makeSnapshot block in GeminiClient.sendMessageStream." The two sites already differ slightly (this one caches the service in a local, client.ts calls the getter twice), and they will silently diverge further over time.
Consider extracting a shared helper:
async function snapshotAndRecordFileHistory(
config: Config,
promptId: string,
): Promise<void> {
try {
const fhs = config.getFileHistoryService();
await fhs.makeSnapshot(promptId);
try {
const latest = fhs.getSnapshots().at(-1);
if (latest) {
config.getChatRecordingService()?.recordFileHistorySnapshot(latest);
}
} catch (e) {
debugLogger.error(`FileHistory: recordSnapshot failed: ${e}`);
}
} catch (e) {
debugLogger.error(`FileHistory: makeSnapshot failed: ${e}`);
}
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Already addressed — two call sites with different contexts don't justify extraction. Addressed in multiple prior review rounds.
… /rewind (T2.1) File history snapshots were purely in-memory — lost on process exit, making /rewind unusable after session resume. This adds JSONL persistence so restored sessions can rewind to any pre-resume turn. Key changes: - Serialize/deserialize FileHistorySnapshot to/from JSONL system records - Record each snapshot after makeSnapshot succeeds (incremental per turn) - Re-record surviving snapshots after rewind (full batch on active branch) - Parse file_history_snapshot records in sessionService.loadSession() - Restore snapshot chain in config.getFileHistoryService() on resume - Copy backup files on session fork (hard link with copy fallback) - Add session_resume capability tag (stable alias for unstable_session_resume) - validateRestoredSnapshots with dedup + batched parallel stat 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Change snapshot dedup from first-wins to last-wins so rewind batch records (which contain the most up-to-date snapshot state) override earlier incremental records for the same promptId. - Guard validateRestoredSnapshots behind isEnabled() to skip I/O when file checkpointing is disabled. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Session.ts: slice snapshots to targetTurnIndex+1 to exclude turns being discarded (fixes ghost-snapshot persistence) - AppContainer.tsx: only pass snapshots when file restore succeeded (avoids writing un-truncated snapshots for conversation-only rewind) - Session.test.ts: update rewindRecording assertion to expect 3 args 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…ialize warning - AppContainer.tsx: use .slice(0, targetTurnIndex + 1) to match Session.ts behavior (prevents ghost snapshots for conversation-only rewind) - sessionService.ts: log warning instead of silent continue on malformed file_history_snapshot deserialization 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Two fixes from wenshao's local verification: 1. ACP daemon sessions had fileCheckpointingEnabled=false because stdin is a pipe (non-TTY). Add enableFileCheckpointing() to Config and call it in acpAgent.newSessionConfig so daemon /rewind works. 2. TUI prompt counter restarted at 0 on --resume, colliding with restored snapshot promptIds and corrupting the chain via last-wins dedup. Seed promptCount from the resumed conversation's user turn count so new promptIds don't overlap. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
ACP sessions drive the chat through Session.prompt → GeminiChat, bypassing GeminiClient.sendMessageStream where makeSnapshot lives. This meant daemon-created sessions never produced file history snapshots, leaving /rewind non-functional. Add makeSnapshot + recordFileHistorySnapshot at the start of each ACP prompt turn (mirroring client.ts:1488), using the existing sessionId########turn promptId format. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Locally handled slash commands (/help, /memory, etc.) previously created file history snapshots even though no model turn was added. This caused the snapshot index to drift from the real user turn count, breaking rewind in web-shell sessions that use slash commands frequently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove stale line number from comment (Session.ts) - Single cast instead of double cast for systemPayload (sessionService.ts) - Guard mkdir in copyFileHistoryBackups to prevent fork failure (sessionService.ts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add makeSnapshot/rewind to FileHistoryService mock in Session.test.ts - Invalidate cached FileHistoryService on enableFileCheckpointing() to prevent stale enabled=false if service was lazily created first - Move copyFileHistoryBackups above class JSDoc to fix association Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ileHistorySnapshot - Call restoreFromSnapshots(survivingSnapshots) in ACP rewindToTurn to prevent phantom snapshots from accumulating in the in-memory array after conversation-only rewind - Simplify recordFileHistorySnapshot to delegate to the batch variant Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7169461 to
714ce9d
Compare
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No high-confidence issues found. Build passes, 193 unit tests pass, lint clean. Two low-probability edge cases noted for human review (ACP orphan cleanup, snapshot/turn index coupling). LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local verification report — real build + runtime E2EI built this PR at its head ( Verdict: PASS for the primary goal (daemon ✅ What works (verified at runtime)1. Daemon / ACP cross-session The rewind also re-recorded the surviving snapshots on the active branch (JSONL gained a 2.
3. Snapshot persistence + self-healing. Snapshots are serialized to JSONL as 4. Base build lacks the machinery (confirms the PR is what adds it):
|
|
Thanks for the thorough runtime E2E — the daemon cross-session rewind verification is exactly what we needed to confirm T2.1. Re the TUI This is outside the scope of this PR ( |
What this PR does
Persists
FileHistorySnapshotto JSONL asfile_history_snapshotsystem records, enabling/rewindto work across session resume. Previously, file history snapshots were purely in-memory and lost on process exit.Why it's needed
T2.1 (
loadSession/resumegraduation fromunstable_) is blocked because/rewinddoesn't work after session resume — the snapshot chain is empty. This PR closes that gap, making the resume feature reliable enough to drop theunstable_prefix.Replaces the stalled PR #4253 with a fresh implementation.
Key changes
FileHistorySnapshotto/from JSONL system recordsmakeSnapshotsucceeds (incremental per turn)file_history_snapshotrecords insessionService.loadSession()config.getFileHistoryService()on resumesession_resumecapability tag (stable alias forunstable_session_resume)validateRestoredSnapshotswith dedup + batched parallel statenableFileCheckpointing()in Config — daemon/ACP sessions run with piped stdin (non-TTY), sofileCheckpointingEnableddefaulted tofalse; explicitly enabled viaacpAgent.newSessionConfigseedPromptCount()in SessionContext — on TUI--continueresume, the prompt counter restarted at 0, causing promptId collisions with pre-resume turns; now seeded from resumed conversation user turn countSession.promptnow callsmakeSnapshot+recordFileHistorySnapshot— daemon-created sessions previously never produced file history snapshotsRisk & Scope
Main risk: Last turn's snapshot may be incomplete (serialized before
trackEditmutates it). Self-healing for all but the very last turn before exit.Out of scope (follow-up):
promptIdrewriting (forked snapshots carry source session's promptId prefix)Breaking changes: None. Old sessions without
file_history_snapshotrecords resume normally (graceful degradation).Linked Issues
Reviewer Test Plan
How to verify
Tested on
中文说明
这个 PR 做了什么
将
FileHistorySnapshot持久化到 JSONL 的file_history_snapshot系统记录中,使/rewind能跨 session resume 工作。此前快照完全在内存中,进程退出即丢失。为什么需要
T2.1(
loadSession/resume从unstable_毕业)被阻塞,因为 resume 后/rewind不工作——快照链为空。本 PR 修复了这个核心差距。替代已停滞 3 周的 PR #4253。
主要变更
makeSnapshot成功后录制快照(增量写入)loadSession解析file_history_snapshot记录并恢复快照链session_resume稳定 capability tagvalidateRestoredSnapshots去重 + 批量并行 statenableFileCheckpointing()— 修复 daemon/ACP 会话因 piped stdin 导致文件检查点被禁用的问题seedPromptCount()— 修复 TUI--continue恢复后 promptId 计数器归零导致的冲突Session.prompt新增makeSnapshot调用 — 修复 daemon 创建的会话从不产生快照的问题已知限制(后续 PR)
promptId包含源 session ID,可能导致不匹配🤖 Generated with Qwen Code