Feature: Two-Phase Context Management — Prune Tool Outputs Before Full Compaction (inspired by Kilocode)

## Overview

Hermes uses a single-phase context compression: when context exceeds 85% of the model window, summarize all middle turns with an LLM call. Kilocode uses a **two-phase approach** that is both cheaper and produces better results:

**Phase 1 (Prune):** Walk backwards through messages. Keep the last 40K tokens of tool outputs untouched. Mark all older tool outputs as "compacted" (elided from context). No LLM call required — just removal. Only prune if >20K tokens can be reclaimed.

**Phase 2 (Compact):** If pruning isn't enough, run LLM summarization with a **structured template** producing actionable context, not a narrative blob.

Source: `packages/opencode/src/session/compaction.ts`

---

## Research Findings

### Kilocode's Pruning Phase

```
Walk backward through message parts.
Skip the last 2 turns entirely (always protected).
Keep 40K tokens (PRUNE_PROTECT) of recent tool call outputs.
Beyond that, set time.compacted=true on old tool outputs.
Protected tools (e.g., "skill") are never pruned.
Only prune if >20K tokens (PRUNE_MINIMUM) would be reclaimed.
```

The key insight: **tool outputs are the bulkiest content** in a conversation. A single `terminal` or `search_files` result can be 10-50K characters. Pruning these first recovers enormous amounts of context without losing the conversational structure (user messages, assistant reasoning, tool *calls* are all preserved — only tool *results* are removed).

### Kilocode's Structured Compaction Template

When pruning isn't enough, the compaction agent uses this template:

```
## Goal
[What the user is trying to accomplish]

## Instructions
[Standing instructions from the user]

## Discoveries
[Key findings, relevant code, data discovered]

## Accomplished
[What has been completed so far]

## Relevant files/directories
[Files and paths that matter for the ongoing task]
```

After compaction, injects: "Continue if you have next steps, or stop and ask for clarification."

### Current Hermes Approach

`agent/context_compressor.py`:
- Single phase: protect first 3 + last 4 turns, summarize middle with LLM
- Generic prompt: "Summarize these conversation turns concisely..." covering actions, results, decisions, data
- Produces a `[CONTEXT SUMMARY]` narrative blob
- No tool-output-specific pruning
- No structured template for resumption

---

## Implementation Plan

### Classification

**Core codebase change** to `agent/context_compressor.py` and `run_agent.py`. Not a skill or tool.

### Phase 1: Tool Output Pruning (no LLM call)

Add a `_prune_tool_outputs()` method that runs BEFORE the current LLM-based compression:

```python
PRUNE_PROTECT_TOKENS = 40_000  # Keep last 40K tokens of tool outputs
PRUNE_MINIMUM_TOKENS = 20_000  # Only prune if we reclaim >20K tokens
NEVER_PRUNE_TOOLS = {"clarify", "memory", "skill_view", "todo"}

def _prune_tool_outputs(self, messages: list) -> tuple[list, int]:
    """Remove old tool outputs while preserving recent ones.
    Returns (pruned_messages, tokens_saved)."""
    # Walk backward, accumulate tool output token estimates
    # After PRUNE_PROTECT reached, replace old tool content with
    # "[Tool output pruned — was N chars]"
    ...
```

Integrate before the existing compression check:
```python
if self.should_compress_preflight(messages):
    messages, saved = self._prune_tool_outputs(messages)
    if self.should_compress_preflight(messages):
        # Still over threshold — do full LLM compression
        messages = self.compress(messages)
```

### Phase 2: Structured Compaction Template

Replace the generic summarization prompt with the structured template:

```python
COMPACTION_TEMPLATE = """Summarize the compressed conversation turns into a structured resumption context.

## Goal
[What is the user trying to accomplish?]

## Standing Instructions
[Any persistent instructions or constraints from the user]

## Key Discoveries
[Important findings, relevant code, data, error messages]

## Accomplished So Far
[What has been completed — be specific about files changed, commands run]

## Relevant Files & Paths
[List all file paths, URLs, and resources that matter]

## Next Steps
[What was the agent about to do when compression triggered?]
"""
```

### Phase 3: Adaptive thresholds

Scale PRUNE_PROTECT based on model context window:
- 128K models: protect last 40K tokens
- 32K models: protect last 10K tokens
- 1M models: protect last 100K tokens

---

## Pros & Cons

### Pros
- **Phase 1 is free** — no LLM call, just string replacement. Saves both money and latency.
- **Preserves conversation structure** — User messages, assistant reasoning, and tool call names stay intact. Only the bulky output blobs are removed.
- **Structured template** produces actionable resumption context vs a narrative blob that loses task structure
- **Prompt-cache friendly** — pruning could mark old tool results at insertion time, preserving prefix cache
- **Composable** — prune first, then compress only if still needed. Often pruning alone is enough.

### Cons
- **Information loss** — Old tool outputs may contain data the agent needs to reference later
- **Threshold tuning** — 40K tokens of protection may be too much for small-context models or too little for large ones
- **Interaction with #415** — If insertion-time trimming lands first, tool outputs will already be smaller, reducing the need for pruning (but the two are complementary, not conflicting)

---

## References

- [Kilocode compaction.ts](https://github.com/Kilo-Org/kilocode/blob/main/packages/opencode/src/session/compaction.ts) — Two-phase prune+compact
- Hermes `agent/context_compressor.py` — Current single-phase compression
- Related: #415 (Insertion-Time Tool Result Trimming) — complementary; trims at write time, this trims at read time


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Two-Phase Context Management — Prune Tool Outputs Before Full Compaction (inspired by Kilocode) #513

Overview

Research Findings

Kilocode's Pruning Phase

Kilocode's Structured Compaction Template

Current Hermes Approach

Implementation Plan

Classification

Phase 1: Tool Output Pruning (no LLM call)

Phase 2: Structured Compaction Template

Phase 3: Adaptive thresholds

Pros & Cons

Pros

Cons

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Two-Phase Context Management — Prune Tool Outputs Before Full Compaction (inspired by Kilocode) #513

Description

Overview

Research Findings

Kilocode's Pruning Phase

Kilocode's Structured Compaction Template

Current Hermes Approach

Implementation Plan

Classification

Phase 1: Tool Output Pruning (no LLM call)

Phase 2: Structured Compaction Template

Phase 3: Adaptive thresholds

Pros & Cons

Pros

Cons

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions