Feature: Context Compaction Quality Overhaul — Handoff-Oriented Prompts, User Message Preservation, and Configurable Compaction (inspired by Codex CLI)

## Overview

Deep research into OpenAI's Codex CLI context compaction architecture reveals several concrete improvements we can make to Hermes Agent's existing `ContextCompressor` (`agent/context_compressor.py`). The research was prompted by a [prompt injection experiment](https://x.com/DimitrisPapail) demonstrating that Codex's encrypted API `compact()` path uses the **same prompts** internally as the open-source local path — validating these prompts as OpenAI's production-tested approach.

The Codex CLI (cloned at `~/agent-codebases/codex`) implements two compaction paths: a **local LLM path** (for non-OpenAI models) and a **remote API path** (for OpenAI models, calling `POST /responses/compact`). The decision is a simple `provider.is_openai()` check (`codex-rs/core/src/compact.rs:50`). The local path is fully visible and the prompts are well-crafted for task continuity. The remote path returns encrypted blobs, but prompt injection revealed it uses near-identical prompts internally.

Hermes already has working context compression (fires at 85% threshold, uses Gemini Flash for summarization, protects head/tail turns). This issue proposes **quality-focused improvements** to how compaction works — better prompts, smarter preservation, handoff framing — not a rewrite.

---

## Research Findings

### Codex Compaction Architecture

**Local Compaction Flow** (`codex-rs/core/src/compact.rs`):
1. The compaction prompt is sent as a user message to the model
2. Model generates a summary (streamed via normal inference)
3. Summary is prepended with `SUMMARY_PREFIX` (the handoff prompt)
4. New history is built: `[preserved user messages, up to 20K tokens] + [summary as user message]`
5. Stale developer/system messages are stripped
6. Fresh initial context (permissions, environment, instructions) is re-injected
7. Ghost snapshots (for `/undo`) are preserved across compaction

**The Compaction Prompt** (`codex-rs/core/templates/compact/prompt.md`):
```
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff
summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue

Be concise, structured, and focused on helping the next LLM seamlessly
continue the work.
```

**The Handoff Prompt** (`codex-rs/core/templates/compact/summary_prefix.md`):
```
Another language model started to solve this problem and produced a
summary of its thinking process. You also have access to the state of
the tools that were used by that language model. Use this to build on
the work that has already been done and avoid duplicating work. Here is
the summary produced by the other language model, use the information
in this summary to assist with your own analysis:
```

**Remote Compaction Flow** (`codex-rs/core/src/compact_remote.rs`):
1. Trims function call history items that exceed context window
2. Calls `POST /responses/compact` with full conversation + instructions
3. API returns `Vec<ResponseItem>` — may include `Compaction { encrypted_content }` blobs
4. Processes returned history: drops stale developer messages, drops non-user-content items
5. Re-injects fresh initial context before the last real user message
6. Replaces session history with processed result

**Key Design Decisions in Codex:**
- Summary is always encoded as a **user message** (not system/developer)
- User's actual messages preserved **by content** (up to 20K tokens, most recent first) — not by position
- Initial context is **always refreshed** after compaction — stale system/developer messages stripped and rebuilt fresh
- Custom compaction prompts supported via `config.compact_prompt`
- Warning emitted about degradation: "Long threads and multiple compactions can cause the model to be less accurate"
- Model-switch-aware compaction: if switching to smaller context window model, compact first using the old model

### What the Prompt Injection Reveals

The prompt injection experiment (2 API calls, 35 lines of Python) showed:
1. The encrypted `compact()` API **does use an LLM** internally to summarize context
2. The server-side compaction prompt is **nearly identical** to the open-source `prompt.md`
3. A handoff prompt is **prepended** to the summary before it's encrypted
4. The encryption likely serves: tamper prevention between API calls, consistent client surface, potentially additional metadata

This validates the open-source prompts as production-grade. We can confidently adopt them.

### Open Question from the Research

Why does Codex use two entirely different compaction paths when the prompts are nearly identical? Possible reasons:
- The encrypted blob may carry metadata beyond the summary (tool state, token accounting)
- Encryption prevents the client from tampering with the summary between calls
- Remote compaction may use a specialized model variant optimized for summarization
- API path provides a consistent interface regardless of model — the server handles routing

---

## Current State in Hermes Agent

**What we have** (`agent/context_compressor.py`, 265 lines):

| Feature | Hermes Current | Codex Approach |
|---------|---------------|----------------|
| **Compaction prompt** | Generic: "Summarize these conversation turns concisely" | Task-oriented: "CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task" |
| **Handoff framing** | None — summary inserted with `[CONTEXT SUMMARY]:` text prefix | Explicit: "Another language model started to solve this problem and produced a summary..." |
| **Message preservation** | First 3 + Last 4 turns (positional) | User messages up to 20K tokens (semantic, most-recent-first) |
| **System context** | Appends note to system prompt on first compression | Strips all stale developer/system messages, re-injects fresh context |
| **Summarization model** | Gemini Flash via auxiliary client | Same model (local path) or API (remote) |
| **Custom prompt** | Not configurable | `config.compact_prompt` override |
| **Degradation warning** | None | Warns about accuracy loss from multiple compactions |
| **Model-switch** | Not handled | Preemptively compacts with old model before switching to smaller context |
| **Memory flush** | Yes (`flush_memories()`) | No equivalent (advantage for Hermes) |
| **Manual command** | `/compress` | `/compact` |

**Relevant existing issues:**
- #415: Insertion-Time Tool Result Trimming (complementary — trims individual tool outputs)
- #480: LLM-Based Context Condensation (Phase 1-2 already exist; updated with comment)
- #132: Unsafe context length assumption (closed)

---

## Implementation Plan

### Skill vs. Tool Classification

This is a **core codebase change** to `agent/context_compressor.py` and `run_agent.py`. It modifies the internal conversation management pipeline — not expressible as a skill (no shell commands involved) and not a new tool (no new user-facing function). Per CONTRIBUTING.md criteria, this is neither skill nor tool.

### What We'd Need

1. **New compaction prompt** — Replace the inline summarization prompt with a Codex-style task-oriented handoff prompt
2. **Handoff prefix** — Prepend a framing message to the summary before inserting it
3. **User message collector** — Extract and preserve actual user messages (up to a token budget) rather than positional turns
4. **System context refresh** — After compaction, rebuild system prompt fresh instead of appending a note
5. **Custom prompt config** — Support `CONTEXT_COMPACTION_PROMPT` env var or config.yaml setting
6. **Degradation warning** — Emit a warning on repeated compactions

### Phased Rollout

**Phase 1: Compaction Prompt & Handoff Framing** (small, high-impact)
- Replace the generic summarization prompt in `_generate_summary()` with a task-oriented compaction prompt (adapted from Codex's `prompt.md`)
- Add `SUMMARY_PREFIX` handoff framing to the inserted summary message
- The summary message changes from `[CONTEXT SUMMARY]: <raw summary>` to `<SUMMARY_PREFIX>\n<task-oriented summary>`
- Support custom compaction prompt via `CONTEXT_COMPACTION_PROMPT` env var
- Estimated: ~30 lines changed in `context_compressor.py`

Proposed compaction prompt (adapted for Hermes):
```
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff
summary for the AI assistant that will resume this conversation.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences discovered
- What remains to be done (clear next steps)
- Any critical data: file paths, variable names, URLs, error messages,
  or code snippets needed to continue
- Tool calls made and their key results

Be concise, structured, and focused on helping the assistant seamlessly
continue the work without re-doing what's already been done.
```

Proposed handoff prefix:
```
[CONTEXT COMPACTION] An earlier part of this conversation was
summarized to preserve context space. Below is the summary — use it to
build on the work already done and avoid duplicating effort:
```

**Phase 2: Smart Message Preservation** (medium complexity)
- Replace positional `protect_first_n/protect_last_n` with semantic preservation
- Always keep: system message(s), all user messages (up to configurable token budget, default 20K)
- Summarize: assistant responses, tool calls, tool results (the bulk of context)
- This better preserves user intent and task continuity across compaction
- Estimated: ~50 lines changed in `compress()` method

**Phase 3: System Context Refresh & Robustness** (medium complexity)
- After compaction, fully rebuild the system prompt from scratch instead of appending notes
- Strip any stale system-injected messages from the compressed conversation
- Add degradation warning after 2+ compactions: "Long sessions with multiple compressions may cause accuracy loss. Consider starting a new session."
- Model-switch detection: if `self.model` changed since last call and context is large, trigger preemptive compaction
- Estimated: ~40 lines across `context_compressor.py` and `run_agent.py`

---

## Pros & Cons

### Pros
- **Better compaction quality**: Task-oriented prompts preserve intent and next-steps better than generic summarization
- **Handoff framing reduces confusion**: The model knows the summary exists and should build on it, not re-do work
- **User message preservation**: Keeps what the user actually asked for, even across compaction boundaries
- **Validated approach**: These exact prompts are used in production by OpenAI's Codex CLI (confirmed via prompt injection)
- **Low risk**: Phase 1 is a prompt change with no architectural modifications
- **Backward compatible**: No changes to the compression trigger logic, threshold, or auxiliary model routing

### Cons / Risks
- **Summary format change**: Existing sessions that relied on `[CONTEXT SUMMARY]:` prefix may not parse the new format — but this is internal, not user-facing
- **Prompt sensitivity**: The quality of compaction is very sensitive to the prompt wording; needs testing across models (Gemini Flash, GPT-4o-mini, local models)
- **User message preservation cost**: Keeping more user messages means the summary has less token budget — need to balance
- **Multiple compaction degradation**: Even with better prompts, "summaries of summaries" still lose information over time (Codex had a known bug with this before their Compactor 2 rewrite)

---

## Open Questions

- Should the handoff prefix use the `user` role (like Codex) or a distinct marker format? Using `user` role is simpler but could confuse models that track conversation turns. Codex does it this way and it works.
- Should we support a `CONTEXT_COMPACTION_PROMPT_FILE` path (like Codex's `experimental_compact_prompt_file`) for multi-line custom prompts?
- Should Phase 2 user message preservation use a fixed token budget (20K like Codex) or a percentage of the context window (e.g., 15%)?
- What's the right threshold for the degradation warning? Codex warns after every compaction. Should Hermes warn after the 2nd or 3rd?
- Should we consider using the main model for compaction instead of/in addition to the auxiliary model? Codex's local path uses the main model, which may produce better summaries at the cost of more tokens.

---

## References

- Codex CLI source: `~/agent-codebases/codex/codex-rs/core/src/compact.rs` (local compaction)
- Codex CLI source: `~/agent-codebases/codex/codex-rs/core/src/compact_remote.rs` (remote compaction)
- Codex compaction prompt: `~/agent-codebases/codex/codex-rs/core/templates/compact/prompt.md`
- Codex handoff prompt: `~/agent-codebases/codex/codex-rs/core/templates/compact/summary_prefix.md`
- Hermes compressor: `agent/context_compressor.py`
- Hermes integration: `run_agent.py:2380-2412` (compression), `run_agent.py:2880-2922` (preflight), `run_agent.py:3691` (mid-loop)
- Prompt injection research by Dimitris Papailiopoulos (@DimitrisPapail on X/Twitter)
- OpenAI compaction API docs: https://developers.openai.com/api/docs/guides/compaction/
- Codex CLI compaction issues: #5957 (context loss), #5799 (obscured context), #10986 (forced remote), #13142 (custom control)
- Related Hermes issues: #415 (tool result trimming), #480 (context condensation — already implemented), #132 (context length)
- Community research: https://gist.github.com/badlogic (cross-agent compaction comparison)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Context Compaction Quality Overhaul — Handoff-Oriented Prompts, User Message Preservation, and Configurable Compaction (inspired by Codex CLI) #499

Overview

Research Findings

Codex Compaction Architecture

What the Prompt Injection Reveals

Open Question from the Research

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature	Hermes Current	Codex Approach
Compaction prompt	Generic: "Summarize these conversation turns concisely"	Task-oriented: "CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task"
Handoff framing	None — summary inserted with `[CONTEXT SUMMARY]:` text prefix	Explicit: "Another language model started to solve this problem and produced a summary..."
Message preservation	First 3 + Last 4 turns (positional)	User messages up to 20K tokens (semantic, most-recent-first)
System context	Appends note to system prompt on first compression	Strips all stale developer/system messages, re-injects fresh context
Summarization model	Gemini Flash via auxiliary client	Same model (local path) or API (remote)
Custom prompt	Not configurable	`config.compact_prompt` override
Degradation warning	None	Warns about accuracy loss from multiple compactions
Model-switch	Not handled	Preemptively compacts with old model before switching to smaller context
Memory flush	Yes (`flush_memories()`)	No equivalent (advantage for Hermes)
Manual command	`/compress`	`/compact`

Feature: Context Compaction Quality Overhaul — Handoff-Oriented Prompts, User Message Preservation, and Configurable Compaction (inspired by Codex CLI) #499

Description

Overview

Research Findings

Codex Compaction Architecture

What the Prompt Injection Reveals

Open Question from the Research

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions