fix(tui): stop compacting tool outputs on session save/load to preserve LLM cache by LeoAlex0 · Pull Request #2388 · Hmbown/CodeWhale

LeoAlex0 · 2026-05-31T06:34:57Z

Summary / 概述

Remove compact_session_tool_outputs from session save/load paths to preserve message fidelity for LLM KV-cache hits on resume.

从 session 的 save/load 路径中移除 compact_session_tool_outputs，保持消息完整性以确保 resume 时 LLM KV-cache 命中。

Problem / 问题

Session compaction replaces large tool outputs (>12K chars) with compact receipts during save and load. This changes message content between sessions, breaking the LLM KV-cache prefix on resume and causing near-zero cache hit rates for continued conversations.

Session 持久化时，compact_session_tool_outputs 会将超过 12K 字符的大型 tool output 替换为简短的 receipt。这导致 save/load 后消息内容发生变化，LLM 的 KV-cache 前缀不再匹配，resume 时缓存命中率几乎为零。

Fix / 修复

Remove compaction from save_session, save_checkpoint, /save command
Remove compaction from load_session, load_checkpoint, /load command
Remove the now-unused compact_session_tool_outputs wrapper
Live compaction (compact_messages_for_persistence in tui/ui.rs) is unaffected — context budgets remain protected during active conversation
Remove unnecessary mut on loaded session variables
移除 save_session、save_checkpoint、/save 命令中的 compaction
移除 load_session、load_checkpoint、/load 命令中的 compaction
移除已无调用的 compact_session_tool_outputs 包装函数
交互时的实时 compaction（tui/ui.rs 中的 compact_messages_for_persistence）不受影响，对话过程中的上下文预算保护仍然有效
移除 loaded session 变量上不必要的 mut

Tradeoffs / 权衡

Pros / 优点:

Resumed sessions preserve exact message content → LLM prefix-cache hits on every continue
No silent content mutation across save/load round-trips
Simpler code: persistence layer stores what the user sees, context management stays in the engine

Resume 后的消息内容与原始完全一致 → 每次 continue 都能命中 LLM 前缀缓存；持久化层不再静默修改消息内容；代码更简洁。

Cons / 代价:

Session JSON files may grow larger for conversations with many large tool outputs
Resuming a session on a smaller-context model may trigger emergency compaction (recover_context_overflow) on the first turn — this is reactive but bounded by the existing cap_messages (500) and cycle-archive safety nets

包含大量 tool output 的会话 JSON 文件可能变大；切换到小上下文模型 resume 时首次发送可能触发紧急压缩（recover_context_overflow），但受现有 cap_messages(500) 和 cycle-archive 安全网保护，不会无限循环。

⚠️ Remaining Risk / 仍存在的风险: `MAX_PERSISTED_MESSAGES`

This PR fixes one source of cache invalidation, but a bigger one remains.

本次 PR 修复了缓存失效的一个来源，但一个更大的问题依然存在。

The problem / 问题分析

MAX_PERSISTED_MESSAGES = 500 (session_manager.rs:25) performs tail truncation when a session exceeds 500 messages — the oldest messages are silently dropped, while the most recent 500 are kept.

Since LLM KV-cache is prefix-based (keyed by the full message sequence from the start), dropping oldest messages changes the entire cache prefix. The first message after truncation becomes a new "first message", so the cache key is completely different. Every resume of a session with >500 messages results in a 100% cache miss — even worse than the compaction issue this PR fixes.

LLM 的 KV-cache 是按前缀（prefix）构建的，缓存键基于从第一条消息开始的完整序列。cap_messages 保留最新 500 条（尾部），丢弃最旧消息（头部）。截断后的第一条消息变成了新的"第一条"，缓存前缀完全改变 → 100% 缓存失效，比本 PR 修复的 compaction 问题更严重。

Why 500 is arbitrary / 为什么 500 不合理

500 messages can be well under the budget for large-context models (1M tokens) → wastes cache unnecessarily
500 messages can already overflow small-context models (32K tokens) → does not actually protect them
The limit is driven by file-size concerns, not model capability
对大上下文模型（1M tokens），500 条消息远未触及预算 → 白白浪费缓存
对小上下文模型（32K tokens），500 条消息可能已经超了 → 实际上也保护不了
这个限制来自文件大小的工程考量，而非模型能力

Recommended direction / 建议方向

Replace MAX_PERSISTED_MESSAGES with a token-budget-aware cap that uses the saved model's context window as the bound, or move truncation to load time so the same model on resume gets the full, cacheable message list. The persistence layer should store faithfully; context limits should be enforced at send time by the engine.

建议将 MAX_PERSISTED_MESSAGES 替换为基于 token 预算 的智能裁剪——使用模型上下文窗口作为上限，或在 load 时按当前模型预算动态裁剪。持久化层应忠实存储，上下文限制应由 engine 在发送时按模型能力执行。

Tests / 测试

Updated two existing tests to verify raw output preservation instead of compaction
cargo test -p codewhale-tui -- session_manager: 39 passed, 0 failed
更新两个已有测试，验证原始输出保留而非压缩
cargo test -p codewhale-tui -- session_manager: 39 passed, 0 failed

…ve LLM cache

gemini-code-assist

Code Review

This pull request removes the session tool output compaction logic (compact_session_tool_outputs) during session saving, loading, and checkpointing. By preserving the raw, large tool outputs in the saved sessions instead of replacing them with tool output receipts, the application ensures better cache fidelity when resuming LLM sessions. The corresponding unit tests have been updated to verify that raw outputs are preserved. There are no review comments to address, so I have no additional feedback to provide.

Hmbown · 2026-05-31T06:42:36Z

Thanks @LeoAlex0, this is a clean and useful cache-fidelity fix. I verified the branch merged cleanly on current main and ran: cargo fmt --all -- --check, git diff --check HEAD, python3 scripts/check-provider-registry.py, cargo test -p codewhale-tui -- session_manager --nocapture (39 passed), and cargo check -p codewhale-tui --all-features --locked.

I also read Greptile’s checkpoint-size concern. That is the real tradeoff here, but the PR body calls it out and the behavior matches the goal: persistence should preserve the exact message stream for reliable resume/KV-cache behavior, while live context compaction remains separate. Merging this now; appreciate the focused fix and the bilingual writeup.

LeoAlex0 · 2026-05-31T06:49:17Z

@Hmbown FYI: There's still a Remaining Risk: MAX_PERSISTED_MESSAGES performs tail truncation which will break ALL LLM cache, I've edit this PR message.

fix(tui): stop compacting tool outputs on session save/load to preser…

32c8530

…ve LLM cache

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

Hmbown merged commit 85764b9 into Hmbown:main May 31, 2026
2 checks passed

This was referenced May 31, 2026

fix(tui): remove message-count truncation from session persistence #2395

Merged

There still seems to be some problems with cache hits缓存命中方面似乎还是有些问题 #1120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui): stop compacting tool outputs on session save/load to preserve LLM cache#2388

fix(tui): stop compacting tool outputs on session save/load to preserve LLM cache#2388
Hmbown merged 1 commit into
Hmbown:mainfrom
LeoAlex0:main

LeoAlex0 commented May 31, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Hmbown commented May 31, 2026

Uh oh!

Uh oh!

LeoAlex0 commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LeoAlex0 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary / 概述

Problem / 问题

Fix / 修复

Tradeoffs / 权衡

⚠️ Remaining Risk / 仍存在的风险: MAX_PERSISTED_MESSAGES

The problem / 问题分析

Why 500 is arbitrary / 为什么 500 不合理

Recommended direction / 建议方向

Tests / 测试

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Hmbown commented May 31, 2026

Uh oh!

Uh oh!

LeoAlex0 commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LeoAlex0 commented May 31, 2026 •

edited

Loading

⚠️ Remaining Risk / 仍存在的风险: `MAX_PERSISTED_MESSAGES`

LeoAlex0 commented May 31, 2026 •

edited

Loading