Context-limit handoff as a default replacement for routine compaction

## Summary

Replace routine compaction-as-default with **agent-aware handoff**: when an agent's context approaches a configurable threshold (proposed default ~80% of the model's window — ~800K on 1M models), the agent writes a structured handoff prompt for a successor and the engine spawns a fresh agent loop seeded with that prompt. The successor's context begins clean, with the handoff as its stable cache prefix.

This is one concrete mechanism for the goal already tracked at #541 ("V4: cache-maximal context as default, demote routine compaction"). Filing as its own issue because the mechanism is specific enough to design + spec without conflating with #541's broader intent.

## Why this beats compaction for cache behavior

V4 caches shared prefixes at 128-token granularity with ~90% cost discount. Compaction breaks that cache: the new summarized prefix is byte-different from anything cached upstream, so the next turn pays the full uncached prompt cost. On a 800K-token-prefix session, that's roughly a 10× cost spike on the post-compaction turn.

Handoff doesn't pay that cost — it doesn't mutate the existing agent's prefix at all. The outgoing agent finishes its current turn cleanly; the *incoming* agent starts with a brand new prefix (the handoff prompt) that becomes its own stable cache key for the rest of its run.

Empirically, the cost-of-aggressive-compaction signal has been visible in community feedback (some users disable compaction entirely to dodge the cache miss). That's a reasonable workaround for the symptom but loses the long-context benefit. Handoff keeps the long-context benefit by being more architectural about *when* the prefix changes.

## Mechanism sketch

1. **Awareness threshold** — engine surfaces context fill ratio to the agent each turn. At ~70% the agent is told "approaching limit; consider wrapping current sub-task." At ~80% it's told "draft a handoff." At ~90% it's told "stop and hand off now." Thresholds configurable.

2. **Handoff prompt format** — structured markdown the agent writes into a designated artifact (e.g. `.deepseek/handoff.md` — already a real path used by `prompts.rs::HANDOFF_RELATIVE_PATH` for cross-session handoffs). Sections:
   - Goal (what the user is trying to do)
   - Current state (what's been done, what's in flight)
   - Open blockers (what the next agent needs to figure out)
   - Recent decisions (what's been ruled in / out)
   - Files-of-interest (working set)
   - Recovery hints (commands to re-discover state if anything's missing)

3. **Successor spawn** — engine reads the handoff, builds a fresh system prompt that prepends the handoff block, and starts a new turn loop. The user sees a marker in the transcript ("→ handed off to successor agent") but otherwise continues talking normally.

4. **Hybrid fallback** — if the agent self-assesses "this conversation can't be cleanly handed off" (e.g. mid-thought reasoning chain that needs full continuity), it can opt to compact instead. The decision becomes the agent's, not the system's. Both behaviors stay available; the *default* flips from compact-first to handoff-first.

## Risks worth naming

- **Producing a great handoff is itself expensive.** The outgoing agent has to spend reasoning effort to write the handoff right when its context is most cluttered. Mitigation: the handoff template is structured (not free-form summary) so the cost is bounded.
- **UX cost of mid-task handoff.** The user notices the transition. Mitigation: clear visual marker, plus the successor opens with "I'm picking up from {summary}" so the seam reads as continuity, not amnesia.
- **Not all conversations have natural break-points.** Single-thread deep debugging can't be cleanly chunked. Hybrid fallback (compact-when-handoff-fails) handles this.
- **Handoff quality is bounded by the outgoing agent's self-awareness.** Some signals it should track: open files, recent tool failures, decisions, what the user was just told. We can scaffold these via the handoff template.

## Relation to existing issues

- **#541** — broader meta-goal: "demote routine compaction." This issue is one specific mechanism for that goal.
- **#528** — "re-read active files instead of summarizing." Complementary: handoff covers the *cross-conversation* case; #528 covers the *within-conversation* case. Both are anti-compaction.
- **#547 / #548** — workshop pattern for tool output. Orthogonal — handoff is about the *primary* agent context, workshop is about *tool output* spillover.

## Out of scope

- The actual handoff template content (separate spec).
- Multi-agent orchestration patterns beyond simple succession.
- v0.8.11 work — this is v0.9.0 candidate.

## Reporter

Maintainer-scoped design exploration (Hmbown), 2026-05-04. Spawned from a session reviewing the cache-cost economics of routine compaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context-limit handoff as a default replacement for routine compaction #664

Summary

Why this beats compaction for cache behavior

Mechanism sketch

Risks worth naming

Relation to existing issues

Out of scope

Reporter

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Context-limit handoff as a default replacement for routine compaction #664

Description

Summary

Why this beats compaction for cache behavior

Mechanism sketch

Risks worth naming

Relation to existing issues

Out of scope

Reporter

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions