feat: Align context compaction with Codex behavior#776
Closed
kshitijk4poor wants to merge 2 commits into
Closed
Conversation
…ction-parity # Conflicts: # cli.py # gateway/run.py # hermes_cli/config.py # tests/agent/test_context_compressor.py
Contributor
|
Thanks for the thorough work on this, @kshitijk4poor! The Codex-style compaction research in #499 is solid and there are several genuinely valuable improvements in this PR — the handoff prompt, multimodal content handling, and configurable compaction prompt are all things we want. However, merging the PR as-is would introduce some issues:
We're going to cherry-pick the good parts into separate atomic PRs:
Your work on #499 and this PR directly inspired these improvements. Thank you! 🙏 |
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in #499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR #776 by @kshitijk4poor.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.
What changes:
1. New SUMMARY_PREFIX constant — the text prepended to every
compressed summary:
[CONTEXT COMPACTION] An earlier part of this conversation was
summarized to preserve context space. Below is the summary — use
it to build on the work already done and avoid duplicating effort:
2. _with_summary_prefix() helper — normalizes model output by stripping
any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
have produced, then prepends the new SUMMARY_PREFIX.
3. System message annotation updated — the note appended to the system
prompt on first compression now says 'compacted into a handoff
summary' and instructs 'build on that summary rather than re-doing
work' instead of the old generic note.
Why this is better:
The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.
What does NOT change:
- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility
Inspired by PR #776 by @kshitijk4poor and the research in #499.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.
Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback
This ensures multimodal conversations compress correctly instead of
producing garbled summaries.
Inspired by PR #776 by @kshitijk4poor.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.
What changes:
1. ContextCompressor.__init__() accepts compaction_prompt_override param.
When set (non-empty string), it replaces the default summarization
instructions in _generate_summary(). The framing (token target, turns
to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.
2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
to ContextCompressor.
3. Config wiring — the new 'prompt' key under 'compression' section is
mapped to CONTEXT_COMPRESSION_PROMPT env var in:
- cli.py (load_cli_config defaults + env mapping)
- hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
- gateway/run.py (gateway env mapping)
Usage in config.yaml:
compression:
prompt: 'Your custom summarization instructions here'
Or via environment variable:
CONTEXT_COMPRESSION_PROMPT='Your custom instructions'
When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.
Inspired by PR #776 by @kshitijk4poor and the research in #499.
This was referenced Mar 11, 2026
teknium1
added a commit
that referenced
this pull request
Mar 14, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in #499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR #776 by @kshitijk4poor.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR NousResearch#776 by @kshitijk4poor.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.
What changes:
1. ContextCompressor.__init__() accepts compaction_prompt_override param.
When set (non-empty string), it replaces the default summarization
instructions in _generate_summary(). The framing (token target, turns
to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.
2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
to ContextCompressor.
3. Config wiring — the new 'prompt' key under 'compression' section is
mapped to CONTEXT_COMPRESSION_PROMPT env var in:
- cli.py (load_cli_config defaults + env mapping)
- hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
- gateway/run.py (gateway env mapping)
Usage in config.yaml:
compression:
prompt: 'Your custom summarization instructions here'
Or via environment variable:
CONTEXT_COMPRESSION_PROMPT='Your custom instructions'
When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.
Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.
What changes:
1. New SUMMARY_PREFIX constant — the text prepended to every
compressed summary:
[CONTEXT COMPACTION] An earlier part of this conversation was
summarized to preserve context space. Below is the summary — use
it to build on the work already done and avoid duplicating effort:
2. _with_summary_prefix() helper — normalizes model output by stripping
any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
have produced, then prepends the new SUMMARY_PREFIX.
3. System message annotation updated — the note appended to the system
prompt on first compression now says 'compacted into a handoff
summary' and instructs 'build on that summary rather than re-doing
work' instead of the old generic note.
Why this is better:
The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.
What does NOT change:
- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility
Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.
Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback
This ensures multimodal conversations compress correctly instead of
producing garbled summaries.
Inspired by PR NousResearch#776 by @kshitijk4poor.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR NousResearch#776 by @kshitijk4poor.
This was referenced May 16, 2026
This was referenced May 29, 2026
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.
Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback
This ensures multimodal conversations compress correctly instead of
producing garbled summaries.
Inspired by PR NousResearch#776 by @kshitijk4poor.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR NousResearch#776 by @kshitijk4poor.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.
What changes:
1. ContextCompressor.__init__() accepts compaction_prompt_override param.
When set (non-empty string), it replaces the default summarization
instructions in _generate_summary(). The framing (token target, turns
to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.
2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
to ContextCompressor.
3. Config wiring — the new 'prompt' key under 'compression' section is
mapped to CONTEXT_COMPRESSION_PROMPT env var in:
- cli.py (load_cli_config defaults + env mapping)
- hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
- gateway/run.py (gateway env mapping)
Usage in config.yaml:
compression:
prompt: 'Your custom summarization instructions here'
Or via environment variable:
CONTEXT_COMPRESSION_PROMPT='Your custom instructions'
When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.
Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.
What changes:
1. New SUMMARY_PREFIX constant — the text prepended to every
compressed summary:
[CONTEXT COMPACTION] An earlier part of this conversation was
summarized to preserve context space. Below is the summary — use
it to build on the work already done and avoid duplicating effort:
2. _with_summary_prefix() helper — normalizes model output by stripping
any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
have produced, then prepends the new SUMMARY_PREFIX.
3. System message annotation updated — the note appended to the system
prompt on first compression now says 'compacted into a handoff
summary' and instructs 'build on that summary rather than re-doing
work' instead of the old generic note.
Why this is better:
The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.
What does NOT change:
- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility
Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).
The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results
This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.
No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.
Inspired by PR NousResearch#776 by @kshitijk4poor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This updates Hermes context compaction to closely mirror the Codex compaction flow described in #499 and cross-verified against the original
openai/codeximplementation and tests.This is intended as a behavioral match, not a loose approximation: reviewers should be able to compare the resulting Hermes flow directly against the Codex paths referenced in the issue.
Closes #499.
What changed
contentvalue is a stringCross-verification
This PR mimics the original Codex behavior and can be cross-checked against the implementation/tests cited from
openai/codexin #499:One detail worth calling out explicitly: the "keep latest user full" behavior is now scoped only to preflight compaction. That matches the Codex behavior cited in #499; ordinary compaction still allows the newest preserved user message to be truncated within the configured budget.