refactor(core): replace tail-preservation compaction with claude-code-style "summary + restoration attachments" model

## What would you like to be added?

Replace qwen-code's current auto-compaction model — split history by character-count, summarize the front 70%, preserve the recent 30% verbatim — with a model inspired by claude-code:

1. Always send the **entire** curated history to the summary side-query (no split, no tail preservation).
2. Replace the post-compact history with a structured composition:
   - A **9-section structured summary** (Primary Request, Key Concepts, Files, Errors, Problem Solving, **All user messages verbatim**, Pending Tasks, Current Work, Optional Next Step).
   - A synthetic model ack message to keep role alternation correct.
   - **Top 5 recently-touched files** restored as attachments: small files (≤ 5K tokens) embed full current content read fresh from disk; large files listed as path-only references with an explicit instruction to call `read_file` to view current content.
   - **Top 3 recently-captured images** restored as a single user message with a metadata header (turn index + source tool name + args) followed by the inlineData parts in chronological order.

All composition uses the existing `Content` + `user`/`model` roles with `text` / `inlineData` parts. **No new message types, no attachment subsystem, no backward-compatibility concerns.**

## Why is this needed?

The current tail-preservation model has a critical failure mode that hits **single-turn long-running tasks** — the dominant pattern for any computer-use-style workflow ("open Safari, click the first result, scroll, take a screenshot").

In [`findCompressSplitPoint`](packages/core/src/services/chatCompressionService.ts), the "preserve the most recent 30% by char-count" rule requires at least two non-functionResponse user messages in the history — one early, one past the 70% char-count mark — to find a "clean" split point. A single-turn task only ever has **one** such user message (the original prompt at index 0). The clean-split scan therefore never succeeds, and the code falls through to one of three fallback branches:

| Compression-trigger moment | Behavior | Entries preserved verbatim |
|---|---|---|
| Mid-tool-call (`model+fc` is last) | `splitPointRetainingTrailingPairs(retainCount=2)` | up to 5 entries |
| Tool result just returned (`user+fr` is last) | `return contents.length` → compress everything | **0** |
| Final text response (`model` without fc) | `return contents.length` → compress everything | **0** |

The code comment at lines 193–204 explicitly acknowledges this: *"the gate that gets us here has already decided we need to compress, so all three fallbacks bias toward more compression rather than less"*. The 30% preserve guarantee is **never enforced** in single-turn tasks; it's only an upper bound for multi-turn ones.

**Concrete impact** on a representative computer-use session:

- User asks: *"open Safari and read the first headline"* — one user message at index 0.
- Agent runs 10+ rounds of `computer_use__get_app_state` / `computer_use__click` / `computer_use__scroll`, each round containing a base64 screenshot (~6400 chars equivalent under our `imageTokenEstimate`).
- Auto-compact triggers (e.g., from PR #4345's three-tier ladder). The clean-split scan finds nothing (only one fresh user message, at <1% of total chars).
- If the trigger lands in the `user+fr` or `model`-no-fc terminal state (the common cases between tool calls or just before final response), **all screenshots AND the original user prompt are replaced by an opaque text summary**.
- The agent resumes "blind": no visual context, no verbatim user intent. It often loops back to clarifying questions or re-screenshots the entire app from scratch.

The claude-code-style model addresses both gaps:

1. **User intent survives** via section 6 of the new summary template ("All user messages, chronological, verbatim"). Even with only one user message, that message is preserved word-for-word in the summary.
2. **Recent screenshots survive** via the image restoration block (top 3 most recent images, regardless of whether they came from a tool result or a user paste). The metadata header tells the model which tool call produced each image, so it can correlate visual state with the actions that produced it.
3. **Recent files survive** via the file restoration block (top 5 most recently touched files via `read_file` / `write_file` / `edit` / `replace`, size-adaptive embed vs. reference). The model doesn't have to re-read files that were just being worked on.

The pattern is conceptually simple: **trust the summary for narrative continuity, trust selective restoration for state continuity**. It mirrors how human engineers resume a context-overflowed conversation in practice.

## Additional context

**Implementation plan:** A detailed TDD-driven 10-task plan is at [`docs/superpowers/plans/2026-05-28-claude-code-style-compaction.md`](docs/superpowers/plans/2026-05-28-claude-code-style-compaction.md). Estimated effort: ~1.5 weeks (1 engineer).

**Files touched:**
- `packages/core/src/services/postCompactAttachments.ts` (new module, ~250 LOC + tests)
- `packages/core/src/services/chatCompressionService.ts` (rewrite `compress()`, delete `findCompressSplitPoint` / `splitPointRetainingTrailingPairs` / `COMPRESSION_PRESERVE_THRESHOLD` / `MIN_COMPRESSION_FRACTION` / `TOOL_ROUND_RETAIN_COUNT`)
- `packages/core/src/core/prompts.ts` (replace `getCompressionPrompt` with the 9-section template)

**Builds on prior compression work:**
- #4101 — inline media stripping for the summary side-query (image base64 → `[image: <mime>]` placeholder). The placeholder behavior in side-query is preserved; this issue is orthogonal — it changes what happens to the **live** history after compression, not what is sent to the summary model.
- #4345 — three-tier auto-compaction threshold ladder (warn / auto / hard). The threshold logic in `computeThresholds` is untouched; this issue only changes what `compress()` does once the threshold fires.
- #3879 — reactive compression on context overflow. Same trigger entrypoint; downstream behavior changes.

**Out of scope (to be addressed separately):**
- Microcompact (`microcompaction/microcompact.ts`) — currently clears nested media in tool results (including computer-use screenshots) under `keepRecent=5` per kind. This is independently problematic and will be addressed in a follow-up focused on whitelist-based clearing.
- `mcp_instructions_delta` / `deferred_tools_delta` re-injection — qwen-code's `MCPClientManager` and `ToolRegistry` (including `revealedDeferred: Set<string>`) hold runtime state that survives in-session compaction, so no special handling is needed.
- Skill activation state — `SkillManager.activationRegistry` is runtime state; conditional skill activations survive compaction natively, and `SkillTool.refreshSkills()` keeps the tool description in sync.
- Session resume — loading a compacted session from disk requires runtime-state reconstruction, a separate concern.
- Transcript-path pointer in the summary (claude-code includes the source `.jsonl` path so the model can `Read` it for details) — requires plumbing the session path through; potential follow-up.

**No public API breaks.** The symbols being removed (`findCompressSplitPoint`, `splitPointRetainingTrailingPairs`, `COMPRESSION_PRESERVE_THRESHOLD`, `MIN_COMPRESSION_FRACTION`, `TOOL_ROUND_RETAIN_COUNT`) are not re-exported from `packages/core/src/index.ts`, so they are inaccessible to `@qwen-code/sdk` (TypeScript/Python/Java) and `packages/acp-bridge` consumers. The only in-repo reference outside `chatCompressionService.ts` itself is an internal test hatch (`TEST_ONLY.COMPRESSION_PRESERVE_THRESHOLD` in `packages/core/src/core/client.ts`) which will be removed in the same change. Two other files (`config.ts`, `compactionInputSlimming.ts`) reference these names only in docstring comments, which will be updated.


Compression-trigger moment	Behavior	Entries preserved verbatim
Mid-tool-call (`model+fc` is last)	`splitPointRetainingTrailingPairs(retainCount=2)`	up to 5 entries
Tool result just returned (`user+fr` is last)	`return contents.length` → compress everything	0
Final text response (`model` without fc)	`return contents.length` → compress everything	0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): replace tail-preservation compaction with claude-code-style "summary + restoration attachments" model #4592

What would you like to be added?

Why is this needed?

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

refactor(core): replace tail-preservation compaction with claude-code-style "summary + restoration attachments" model #4592

Description

What would you like to be added?

Why is this needed?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions