You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace qwen-code's current auto-compaction model — split history by character-count, summarize the front 70%, preserve the recent 30% verbatim — with a model inspired by claude-code:
Always send the entire curated history to the summary side-query (no split, no tail preservation).
Replace the post-compact history with a structured composition:
A 9-section structured summary (Primary Request, Key Concepts, Files, Errors, Problem Solving, All user messages verbatim, Pending Tasks, Current Work, Optional Next Step).
A synthetic model ack message to keep role alternation correct.
Top 5 recently-touched files restored as attachments: small files (≤ 5K tokens) embed full current content read fresh from disk; large files listed as path-only references with an explicit instruction to call read_file to view current content.
Top 3 recently-captured images restored as a single user message with a metadata header (turn index + source tool name + args) followed by the inlineData parts in chronological order.
All composition uses the existing Content + user/model roles with text / inlineData parts. No new message types, no attachment subsystem, no backward-compatibility concerns.
Why is this needed?
The current tail-preservation model has a critical failure mode that hits single-turn long-running tasks — the dominant pattern for any computer-use-style workflow ("open Safari, click the first result, scroll, take a screenshot").
In findCompressSplitPoint, the "preserve the most recent 30% by char-count" rule requires at least two non-functionResponse user messages in the history — one early, one past the 70% char-count mark — to find a "clean" split point. A single-turn task only ever has one such user message (the original prompt at index 0). The clean-split scan therefore never succeeds, and the code falls through to one of three fallback branches:
Compression-trigger moment
Behavior
Entries preserved verbatim
Mid-tool-call (model+fc is last)
splitPointRetainingTrailingPairs(retainCount=2)
up to 5 entries
Tool result just returned (user+fr is last)
return contents.length → compress everything
0
Final text response (model without fc)
return contents.length → compress everything
0
The code comment at lines 193–204 explicitly acknowledges this: "the gate that gets us here has already decided we need to compress, so all three fallbacks bias toward more compression rather than less". The 30% preserve guarantee is never enforced in single-turn tasks; it's only an upper bound for multi-turn ones.
Concrete impact on a representative computer-use session:
User asks: "open Safari and read the first headline" — one user message at index 0.
Agent runs 10+ rounds of computer_use__get_app_state / computer_use__click / computer_use__scroll, each round containing a base64 screenshot (~6400 chars equivalent under our imageTokenEstimate).
If the trigger lands in the user+fr or model-no-fc terminal state (the common cases between tool calls or just before final response), all screenshots AND the original user prompt are replaced by an opaque text summary.
The agent resumes "blind": no visual context, no verbatim user intent. It often loops back to clarifying questions or re-screenshots the entire app from scratch.
The claude-code-style model addresses both gaps:
User intent survives via section 6 of the new summary template ("All user messages, chronological, verbatim"). Even with only one user message, that message is preserved word-for-word in the summary.
Recent screenshots survive via the image restoration block (top 3 most recent images, regardless of whether they came from a tool result or a user paste). The metadata header tells the model which tool call produced each image, so it can correlate visual state with the actions that produced it.
Recent files survive via the file restoration block (top 5 most recently touched files via read_file / write_file / edit / replace, size-adaptive embed vs. reference). The model doesn't have to re-read files that were just being worked on.
The pattern is conceptually simple: trust the summary for narrative continuity, trust selective restoration for state continuity. It mirrors how human engineers resume a context-overflowed conversation in practice.
packages/core/src/core/prompts.ts (replace getCompressionPrompt with the 9-section template)
Builds on prior compression work:
feat(core): strip inline media before chat compaction summary #4101 — inline media stripping for the summary side-query (image base64 → [image: <mime>] placeholder). The placeholder behavior in side-query is preserved; this issue is orthogonal — it changes what happens to the live history after compression, not what is sent to the summary model.
Microcompact (microcompaction/microcompact.ts) — currently clears nested media in tool results (including computer-use screenshots) under keepRecent=5 per kind. This is independently problematic and will be addressed in a follow-up focused on whitelist-based clearing.
mcp_instructions_delta / deferred_tools_delta re-injection — qwen-code's MCPClientManager and ToolRegistry (including revealedDeferred: Set<string>) hold runtime state that survives in-session compaction, so no special handling is needed.
Skill activation state — SkillManager.activationRegistry is runtime state; conditional skill activations survive compaction natively, and SkillTool.refreshSkills() keeps the tool description in sync.
Session resume — loading a compacted session from disk requires runtime-state reconstruction, a separate concern.
Transcript-path pointer in the summary (claude-code includes the source .jsonl path so the model can Read it for details) — requires plumbing the session path through; potential follow-up.
No public API breaks. The symbols being removed (findCompressSplitPoint, splitPointRetainingTrailingPairs, COMPRESSION_PRESERVE_THRESHOLD, MIN_COMPRESSION_FRACTION, TOOL_ROUND_RETAIN_COUNT) are not re-exported from packages/core/src/index.ts, so they are inaccessible to @qwen-code/sdk (TypeScript/Python/Java) and packages/acp-bridge consumers. The only in-repo reference outside chatCompressionService.ts itself is an internal test hatch (TEST_ONLY.COMPRESSION_PRESERVE_THRESHOLD in packages/core/src/core/client.ts) which will be removed in the same change. Two other files (config.ts, compactionInputSlimming.ts) reference these names only in docstring comments, which will be updated.
What would you like to be added?
Replace qwen-code's current auto-compaction model — split history by character-count, summarize the front 70%, preserve the recent 30% verbatim — with a model inspired by claude-code:
read_fileto view current content.All composition uses the existing
Content+user/modelroles withtext/inlineDataparts. No new message types, no attachment subsystem, no backward-compatibility concerns.Why is this needed?
The current tail-preservation model has a critical failure mode that hits single-turn long-running tasks — the dominant pattern for any computer-use-style workflow ("open Safari, click the first result, scroll, take a screenshot").
In
findCompressSplitPoint, the "preserve the most recent 30% by char-count" rule requires at least two non-functionResponse user messages in the history — one early, one past the 70% char-count mark — to find a "clean" split point. A single-turn task only ever has one such user message (the original prompt at index 0). The clean-split scan therefore never succeeds, and the code falls through to one of three fallback branches:model+fcis last)splitPointRetainingTrailingPairs(retainCount=2)user+fris last)return contents.length→ compress everythingmodelwithout fc)return contents.length→ compress everythingThe code comment at lines 193–204 explicitly acknowledges this: "the gate that gets us here has already decided we need to compress, so all three fallbacks bias toward more compression rather than less". The 30% preserve guarantee is never enforced in single-turn tasks; it's only an upper bound for multi-turn ones.
Concrete impact on a representative computer-use session:
computer_use__get_app_state/computer_use__click/computer_use__scroll, each round containing a base64 screenshot (~6400 chars equivalent under ourimageTokenEstimate).user+frormodel-no-fc terminal state (the common cases between tool calls or just before final response), all screenshots AND the original user prompt are replaced by an opaque text summary.The claude-code-style model addresses both gaps:
read_file/write_file/edit/replace, size-adaptive embed vs. reference). The model doesn't have to re-read files that were just being worked on.The pattern is conceptually simple: trust the summary for narrative continuity, trust selective restoration for state continuity. It mirrors how human engineers resume a context-overflowed conversation in practice.
Additional context
Implementation plan: A detailed TDD-driven 10-task plan is at
docs/superpowers/plans/2026-05-28-claude-code-style-compaction.md. Estimated effort: ~1.5 weeks (1 engineer).Files touched:
packages/core/src/services/postCompactAttachments.ts(new module, ~250 LOC + tests)packages/core/src/services/chatCompressionService.ts(rewritecompress(), deletefindCompressSplitPoint/splitPointRetainingTrailingPairs/COMPRESSION_PRESERVE_THRESHOLD/MIN_COMPRESSION_FRACTION/TOOL_ROUND_RETAIN_COUNT)packages/core/src/core/prompts.ts(replacegetCompressionPromptwith the 9-section template)Builds on prior compression work:
[image: <mime>]placeholder). The placeholder behavior in side-query is preserved; this issue is orthogonal — it changes what happens to the live history after compression, not what is sent to the summary model.computeThresholdsis untouched; this issue only changes whatcompress()does once the threshold fires.Out of scope (to be addressed separately):
microcompaction/microcompact.ts) — currently clears nested media in tool results (including computer-use screenshots) underkeepRecent=5per kind. This is independently problematic and will be addressed in a follow-up focused on whitelist-based clearing.mcp_instructions_delta/deferred_tools_deltare-injection — qwen-code'sMCPClientManagerandToolRegistry(includingrevealedDeferred: Set<string>) hold runtime state that survives in-session compaction, so no special handling is needed.SkillManager.activationRegistryis runtime state; conditional skill activations survive compaction natively, andSkillTool.refreshSkills()keeps the tool description in sync..jsonlpath so the model canReadit for details) — requires plumbing the session path through; potential follow-up.No public API breaks. The symbols being removed (
findCompressSplitPoint,splitPointRetainingTrailingPairs,COMPRESSION_PRESERVE_THRESHOLD,MIN_COMPRESSION_FRACTION,TOOL_ROUND_RETAIN_COUNT) are not re-exported frompackages/core/src/index.ts, so they are inaccessible to@qwen-code/sdk(TypeScript/Python/Java) andpackages/acp-bridgeconsumers. The only in-repo reference outsidechatCompressionService.tsitself is an internal test hatch (TEST_ONLY.COMPRESSION_PRESERVE_THRESHOLDinpackages/core/src/core/client.ts) which will be removed in the same change. Two other files (config.ts,compactionInputSlimming.ts) reference these names only in docstring comments, which will be updated.