Skip to content

refactor(core): replace tail-preservation compaction with claude-code-style "summary + restoration attachments" model #4592

@LaZzyMan

Description

@LaZzyMan

What would you like to be added?

Replace qwen-code's current auto-compaction model — split history by character-count, summarize the front 70%, preserve the recent 30% verbatim — with a model inspired by claude-code:

  1. Always send the entire curated history to the summary side-query (no split, no tail preservation).
  2. Replace the post-compact history with a structured composition:
    • A 9-section structured summary (Primary Request, Key Concepts, Files, Errors, Problem Solving, All user messages verbatim, Pending Tasks, Current Work, Optional Next Step).
    • A synthetic model ack message to keep role alternation correct.
    • Top 5 recently-touched files restored as attachments: small files (≤ 5K tokens) embed full current content read fresh from disk; large files listed as path-only references with an explicit instruction to call read_file to view current content.
    • Top 3 recently-captured images restored as a single user message with a metadata header (turn index + source tool name + args) followed by the inlineData parts in chronological order.

All composition uses the existing Content + user/model roles with text / inlineData parts. No new message types, no attachment subsystem, no backward-compatibility concerns.

Why is this needed?

The current tail-preservation model has a critical failure mode that hits single-turn long-running tasks — the dominant pattern for any computer-use-style workflow ("open Safari, click the first result, scroll, take a screenshot").

In findCompressSplitPoint, the "preserve the most recent 30% by char-count" rule requires at least two non-functionResponse user messages in the history — one early, one past the 70% char-count mark — to find a "clean" split point. A single-turn task only ever has one such user message (the original prompt at index 0). The clean-split scan therefore never succeeds, and the code falls through to one of three fallback branches:

Compression-trigger moment Behavior Entries preserved verbatim
Mid-tool-call (model+fc is last) splitPointRetainingTrailingPairs(retainCount=2) up to 5 entries
Tool result just returned (user+fr is last) return contents.length → compress everything 0
Final text response (model without fc) return contents.length → compress everything 0

The code comment at lines 193–204 explicitly acknowledges this: "the gate that gets us here has already decided we need to compress, so all three fallbacks bias toward more compression rather than less". The 30% preserve guarantee is never enforced in single-turn tasks; it's only an upper bound for multi-turn ones.

Concrete impact on a representative computer-use session:

  • User asks: "open Safari and read the first headline" — one user message at index 0.
  • Agent runs 10+ rounds of computer_use__get_app_state / computer_use__click / computer_use__scroll, each round containing a base64 screenshot (~6400 chars equivalent under our imageTokenEstimate).
  • Auto-compact triggers (e.g., from PR feat(core)!: redesign auto-compaction thresholds with three-tier ladder #4345's three-tier ladder). The clean-split scan finds nothing (only one fresh user message, at <1% of total chars).
  • If the trigger lands in the user+fr or model-no-fc terminal state (the common cases between tool calls or just before final response), all screenshots AND the original user prompt are replaced by an opaque text summary.
  • The agent resumes "blind": no visual context, no verbatim user intent. It often loops back to clarifying questions or re-screenshots the entire app from scratch.

The claude-code-style model addresses both gaps:

  1. User intent survives via section 6 of the new summary template ("All user messages, chronological, verbatim"). Even with only one user message, that message is preserved word-for-word in the summary.
  2. Recent screenshots survive via the image restoration block (top 3 most recent images, regardless of whether they came from a tool result or a user paste). The metadata header tells the model which tool call produced each image, so it can correlate visual state with the actions that produced it.
  3. Recent files survive via the file restoration block (top 5 most recently touched files via read_file / write_file / edit / replace, size-adaptive embed vs. reference). The model doesn't have to re-read files that were just being worked on.

The pattern is conceptually simple: trust the summary for narrative continuity, trust selective restoration for state continuity. It mirrors how human engineers resume a context-overflowed conversation in practice.

Additional context

Implementation plan: A detailed TDD-driven 10-task plan is at docs/superpowers/plans/2026-05-28-claude-code-style-compaction.md. Estimated effort: ~1.5 weeks (1 engineer).

Files touched:

  • packages/core/src/services/postCompactAttachments.ts (new module, ~250 LOC + tests)
  • packages/core/src/services/chatCompressionService.ts (rewrite compress(), delete findCompressSplitPoint / splitPointRetainingTrailingPairs / COMPRESSION_PRESERVE_THRESHOLD / MIN_COMPRESSION_FRACTION / TOOL_ROUND_RETAIN_COUNT)
  • packages/core/src/core/prompts.ts (replace getCompressionPrompt with the 9-section template)

Builds on prior compression work:

Out of scope (to be addressed separately):

  • Microcompact (microcompaction/microcompact.ts) — currently clears nested media in tool results (including computer-use screenshots) under keepRecent=5 per kind. This is independently problematic and will be addressed in a follow-up focused on whitelist-based clearing.
  • mcp_instructions_delta / deferred_tools_delta re-injection — qwen-code's MCPClientManager and ToolRegistry (including revealedDeferred: Set<string>) hold runtime state that survives in-session compaction, so no special handling is needed.
  • Skill activation state — SkillManager.activationRegistry is runtime state; conditional skill activations survive compaction natively, and SkillTool.refreshSkills() keeps the tool description in sync.
  • Session resume — loading a compacted session from disk requires runtime-state reconstruction, a separate concern.
  • Transcript-path pointer in the summary (claude-code includes the source .jsonl path so the model can Read it for details) — requires plumbing the session path through; potential follow-up.

No public API breaks. The symbols being removed (findCompressSplitPoint, splitPointRetainingTrailingPairs, COMPRESSION_PRESERVE_THRESHOLD, MIN_COMPRESSION_FRACTION, TOOL_ROUND_RETAIN_COUNT) are not re-exported from packages/core/src/index.ts, so they are inaccessible to @qwen-code/sdk (TypeScript/Python/Java) and packages/acp-bridge consumers. The only in-repo reference outside chatCompressionService.ts itself is an internal test hatch (TEST_ONLY.COMPRESSION_PRESERVE_THRESHOLD in packages/core/src/core/client.ts) which will be removed in the same change. Two other files (config.ts, compactionInputSlimming.ts) reference these names only in docstring comments, which will be updated.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions