Overview
Deep research into OpenAI's Codex CLI context compaction architecture reveals several concrete improvements we can make to Hermes Agent's existing ContextCompressor (agent/context_compressor.py). The research was prompted by a prompt injection experiment demonstrating that Codex's encrypted API compact() path uses the same prompts internally as the open-source local path — validating these prompts as OpenAI's production-tested approach.
The Codex CLI (cloned at ~/agent-codebases/codex) implements two compaction paths: a local LLM path (for non-OpenAI models) and a remote API path (for OpenAI models, calling POST /responses/compact). The decision is a simple provider.is_openai() check (codex-rs/core/src/compact.rs:50). The local path is fully visible and the prompts are well-crafted for task continuity. The remote path returns encrypted blobs, but prompt injection revealed it uses near-identical prompts internally.
Hermes already has working context compression (fires at 85% threshold, uses Gemini Flash for summarization, protects head/tail turns). This issue proposes quality-focused improvements to how compaction works — better prompts, smarter preservation, handoff framing — not a rewrite.
Research Findings
Codex Compaction Architecture
Local Compaction Flow (codex-rs/core/src/compact.rs):
- The compaction prompt is sent as a user message to the model
- Model generates a summary (streamed via normal inference)
- Summary is prepended with
SUMMARY_PREFIX (the handoff prompt)
- New history is built:
[preserved user messages, up to 20K tokens] + [summary as user message]
- Stale developer/system messages are stripped
- Fresh initial context (permissions, environment, instructions) is re-injected
- Ghost snapshots (for
/undo) are preserved across compaction
The Compaction Prompt (codex-rs/core/templates/compact/prompt.md):
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff
summary for another LLM that will resume the task.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
Be concise, structured, and focused on helping the next LLM seamlessly
continue the work.
The Handoff Prompt (codex-rs/core/templates/compact/summary_prefix.md):
Another language model started to solve this problem and produced a
summary of its thinking process. You also have access to the state of
the tools that were used by that language model. Use this to build on
the work that has already been done and avoid duplicating work. Here is
the summary produced by the other language model, use the information
in this summary to assist with your own analysis:
Remote Compaction Flow (codex-rs/core/src/compact_remote.rs):
- Trims function call history items that exceed context window
- Calls
POST /responses/compact with full conversation + instructions
- API returns
Vec<ResponseItem> — may include Compaction { encrypted_content } blobs
- Processes returned history: drops stale developer messages, drops non-user-content items
- Re-injects fresh initial context before the last real user message
- Replaces session history with processed result
Key Design Decisions in Codex:
- Summary is always encoded as a user message (not system/developer)
- User's actual messages preserved by content (up to 20K tokens, most recent first) — not by position
- Initial context is always refreshed after compaction — stale system/developer messages stripped and rebuilt fresh
- Custom compaction prompts supported via
config.compact_prompt
- Warning emitted about degradation: "Long threads and multiple compactions can cause the model to be less accurate"
- Model-switch-aware compaction: if switching to smaller context window model, compact first using the old model
What the Prompt Injection Reveals
The prompt injection experiment (2 API calls, 35 lines of Python) showed:
- The encrypted
compact() API does use an LLM internally to summarize context
- The server-side compaction prompt is nearly identical to the open-source
prompt.md
- A handoff prompt is prepended to the summary before it's encrypted
- The encryption likely serves: tamper prevention between API calls, consistent client surface, potentially additional metadata
This validates the open-source prompts as production-grade. We can confidently adopt them.
Open Question from the Research
Why does Codex use two entirely different compaction paths when the prompts are nearly identical? Possible reasons:
- The encrypted blob may carry metadata beyond the summary (tool state, token accounting)
- Encryption prevents the client from tampering with the summary between calls
- Remote compaction may use a specialized model variant optimized for summarization
- API path provides a consistent interface regardless of model — the server handles routing
Current State in Hermes Agent
What we have (agent/context_compressor.py, 265 lines):
| Feature |
Hermes Current |
Codex Approach |
| Compaction prompt |
Generic: "Summarize these conversation turns concisely" |
Task-oriented: "CONTEXT CHECKPOINT COMPACTION. Create a handoff summary for another LLM that will resume the task" |
| Handoff framing |
None — summary inserted with [CONTEXT SUMMARY]: text prefix |
Explicit: "Another language model started to solve this problem and produced a summary..." |
| Message preservation |
First 3 + Last 4 turns (positional) |
User messages up to 20K tokens (semantic, most-recent-first) |
| System context |
Appends note to system prompt on first compression |
Strips all stale developer/system messages, re-injects fresh context |
| Summarization model |
Gemini Flash via auxiliary client |
Same model (local path) or API (remote) |
| Custom prompt |
Not configurable |
config.compact_prompt override |
| Degradation warning |
None |
Warns about accuracy loss from multiple compactions |
| Model-switch |
Not handled |
Preemptively compacts with old model before switching to smaller context |
| Memory flush |
Yes (flush_memories()) |
No equivalent (advantage for Hermes) |
| Manual command |
/compress |
/compact |
Relevant existing issues:
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change to agent/context_compressor.py and run_agent.py. It modifies the internal conversation management pipeline — not expressible as a skill (no shell commands involved) and not a new tool (no new user-facing function). Per CONTRIBUTING.md criteria, this is neither skill nor tool.
What We'd Need
- New compaction prompt — Replace the inline summarization prompt with a Codex-style task-oriented handoff prompt
- Handoff prefix — Prepend a framing message to the summary before inserting it
- User message collector — Extract and preserve actual user messages (up to a token budget) rather than positional turns
- System context refresh — After compaction, rebuild system prompt fresh instead of appending a note
- Custom prompt config — Support
CONTEXT_COMPACTION_PROMPT env var or config.yaml setting
- Degradation warning — Emit a warning on repeated compactions
Phased Rollout
Phase 1: Compaction Prompt & Handoff Framing (small, high-impact)
- Replace the generic summarization prompt in
_generate_summary() with a task-oriented compaction prompt (adapted from Codex's prompt.md)
- Add
SUMMARY_PREFIX handoff framing to the inserted summary message
- The summary message changes from
[CONTEXT SUMMARY]: <raw summary> to <SUMMARY_PREFIX>\n<task-oriented summary>
- Support custom compaction prompt via
CONTEXT_COMPACTION_PROMPT env var
- Estimated: ~30 lines changed in
context_compressor.py
Proposed compaction prompt (adapted for Hermes):
You are performing a CONTEXT CHECKPOINT COMPACTION. Create a handoff
summary for the AI assistant that will resume this conversation.
Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences discovered
- What remains to be done (clear next steps)
- Any critical data: file paths, variable names, URLs, error messages,
or code snippets needed to continue
- Tool calls made and their key results
Be concise, structured, and focused on helping the assistant seamlessly
continue the work without re-doing what's already been done.
Proposed handoff prefix:
[CONTEXT COMPACTION] An earlier part of this conversation was
summarized to preserve context space. Below is the summary — use it to
build on the work already done and avoid duplicating effort:
Phase 2: Smart Message Preservation (medium complexity)
- Replace positional
protect_first_n/protect_last_n with semantic preservation
- Always keep: system message(s), all user messages (up to configurable token budget, default 20K)
- Summarize: assistant responses, tool calls, tool results (the bulk of context)
- This better preserves user intent and task continuity across compaction
- Estimated: ~50 lines changed in
compress() method
Phase 3: System Context Refresh & Robustness (medium complexity)
- After compaction, fully rebuild the system prompt from scratch instead of appending notes
- Strip any stale system-injected messages from the compressed conversation
- Add degradation warning after 2+ compactions: "Long sessions with multiple compressions may cause accuracy loss. Consider starting a new session."
- Model-switch detection: if
self.model changed since last call and context is large, trigger preemptive compaction
- Estimated: ~40 lines across
context_compressor.py and run_agent.py
Pros & Cons
Pros
- Better compaction quality: Task-oriented prompts preserve intent and next-steps better than generic summarization
- Handoff framing reduces confusion: The model knows the summary exists and should build on it, not re-do work
- User message preservation: Keeps what the user actually asked for, even across compaction boundaries
- Validated approach: These exact prompts are used in production by OpenAI's Codex CLI (confirmed via prompt injection)
- Low risk: Phase 1 is a prompt change with no architectural modifications
- Backward compatible: No changes to the compression trigger logic, threshold, or auxiliary model routing
Cons / Risks
- Summary format change: Existing sessions that relied on
[CONTEXT SUMMARY]: prefix may not parse the new format — but this is internal, not user-facing
- Prompt sensitivity: The quality of compaction is very sensitive to the prompt wording; needs testing across models (Gemini Flash, GPT-4o-mini, local models)
- User message preservation cost: Keeping more user messages means the summary has less token budget — need to balance
- Multiple compaction degradation: Even with better prompts, "summaries of summaries" still lose information over time (Codex had a known bug with this before their Compactor 2 rewrite)
Open Questions
- Should the handoff prefix use the
user role (like Codex) or a distinct marker format? Using user role is simpler but could confuse models that track conversation turns. Codex does it this way and it works.
- Should we support a
CONTEXT_COMPACTION_PROMPT_FILE path (like Codex's experimental_compact_prompt_file) for multi-line custom prompts?
- Should Phase 2 user message preservation use a fixed token budget (20K like Codex) or a percentage of the context window (e.g., 15%)?
- What's the right threshold for the degradation warning? Codex warns after every compaction. Should Hermes warn after the 2nd or 3rd?
- Should we consider using the main model for compaction instead of/in addition to the auxiliary model? Codex's local path uses the main model, which may produce better summaries at the cost of more tokens.
References
Overview
Deep research into OpenAI's Codex CLI context compaction architecture reveals several concrete improvements we can make to Hermes Agent's existing
ContextCompressor(agent/context_compressor.py). The research was prompted by a prompt injection experiment demonstrating that Codex's encrypted APIcompact()path uses the same prompts internally as the open-source local path — validating these prompts as OpenAI's production-tested approach.The Codex CLI (cloned at
~/agent-codebases/codex) implements two compaction paths: a local LLM path (for non-OpenAI models) and a remote API path (for OpenAI models, callingPOST /responses/compact). The decision is a simpleprovider.is_openai()check (codex-rs/core/src/compact.rs:50). The local path is fully visible and the prompts are well-crafted for task continuity. The remote path returns encrypted blobs, but prompt injection revealed it uses near-identical prompts internally.Hermes already has working context compression (fires at 85% threshold, uses Gemini Flash for summarization, protects head/tail turns). This issue proposes quality-focused improvements to how compaction works — better prompts, smarter preservation, handoff framing — not a rewrite.
Research Findings
Codex Compaction Architecture
Local Compaction Flow (
codex-rs/core/src/compact.rs):SUMMARY_PREFIX(the handoff prompt)[preserved user messages, up to 20K tokens] + [summary as user message]/undo) are preserved across compactionThe Compaction Prompt (
codex-rs/core/templates/compact/prompt.md):The Handoff Prompt (
codex-rs/core/templates/compact/summary_prefix.md):Remote Compaction Flow (
codex-rs/core/src/compact_remote.rs):POST /responses/compactwith full conversation + instructionsVec<ResponseItem>— may includeCompaction { encrypted_content }blobsKey Design Decisions in Codex:
config.compact_promptWhat the Prompt Injection Reveals
The prompt injection experiment (2 API calls, 35 lines of Python) showed:
compact()API does use an LLM internally to summarize contextprompt.mdThis validates the open-source prompts as production-grade. We can confidently adopt them.
Open Question from the Research
Why does Codex use two entirely different compaction paths when the prompts are nearly identical? Possible reasons:
Current State in Hermes Agent
What we have (
agent/context_compressor.py, 265 lines):[CONTEXT SUMMARY]:text prefixconfig.compact_promptoverrideflush_memories())/compress/compactRelevant existing issues:
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change to
agent/context_compressor.pyandrun_agent.py. It modifies the internal conversation management pipeline — not expressible as a skill (no shell commands involved) and not a new tool (no new user-facing function). Per CONTRIBUTING.md criteria, this is neither skill nor tool.What We'd Need
CONTEXT_COMPACTION_PROMPTenv var or config.yaml settingPhased Rollout
Phase 1: Compaction Prompt & Handoff Framing (small, high-impact)
_generate_summary()with a task-oriented compaction prompt (adapted from Codex'sprompt.md)SUMMARY_PREFIXhandoff framing to the inserted summary message[CONTEXT SUMMARY]: <raw summary>to<SUMMARY_PREFIX>\n<task-oriented summary>CONTEXT_COMPACTION_PROMPTenv varcontext_compressor.pyProposed compaction prompt (adapted for Hermes):
Proposed handoff prefix:
Phase 2: Smart Message Preservation (medium complexity)
protect_first_n/protect_last_nwith semantic preservationcompress()methodPhase 3: System Context Refresh & Robustness (medium complexity)
self.modelchanged since last call and context is large, trigger preemptive compactioncontext_compressor.pyandrun_agent.pyPros & Cons
Pros
Cons / Risks
[CONTEXT SUMMARY]:prefix may not parse the new format — but this is internal, not user-facingOpen Questions
userrole (like Codex) or a distinct marker format? Usinguserrole is simpler but could confuse models that track conversation turns. Codex does it this way and it works.CONTEXT_COMPACTION_PROMPT_FILEpath (like Codex'sexperimental_compact_prompt_file) for multi-line custom prompts?References
~/agent-codebases/codex/codex-rs/core/src/compact.rs(local compaction)~/agent-codebases/codex/codex-rs/core/src/compact_remote.rs(remote compaction)~/agent-codebases/codex/codex-rs/core/templates/compact/prompt.md~/agent-codebases/codex/codex-rs/core/templates/compact/summary_prefix.mdagent/context_compressor.pyrun_agent.py:2380-2412(compression),run_agent.py:2880-2922(preflight),run_agent.py:3691(mid-loop)