Skip to content

feat: Align context compaction with Codex behavior#776

Closed
kshitijk4poor wants to merge 2 commits into
NousResearch:mainfrom
kshitijk4poor:fix/499-codex-compaction-parity
Closed

feat: Align context compaction with Codex behavior#776
kshitijk4poor wants to merge 2 commits into
NousResearch:mainfrom
kshitijk4poor:fix/499-codex-compaction-parity

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented Mar 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

This updates Hermes context compaction to closely mirror the Codex compaction flow described in #499 and cross-verified against the original openai/codex implementation and tests.

This is intended as a behavioral match, not a loose approximation: reviewers should be able to compare the resulting Hermes flow directly against the Codex paths referenced in the issue.

Closes #499.

What changed

  • add Codex-style handoff prompt/prefix handling and a configurable preserved-user token budget
  • preserve multimodal user content during compaction instead of assuming every user content value is a string
  • keep the incoming user request after the compaction summary in preflight compression so the next model call still targets the active prompt
  • keep ordinary/manual/reactive compaction token-limited, rather than exempting the newest preserved user turn in every compaction path
  • insert todo snapshots before the compaction summary so the current request remains the trailing active user turn in preflight compaction
  • wire the compaction prompt and preserved-user budget through CLI, gateway, and config defaults

Cross-verification

This PR mimics the original Codex behavior and can be cross-checked against the implementation/tests cited from openai/codex in #499:

  • token-limited preserved user selection in local compaction
  • pre-turn compaction excluding the incoming user request from the compaction request, then re-appending it afterward
  • multimodal follow-up preservation
  • summary ordering relative to the active user turn

One detail worth calling out explicitly: the "keep latest user full" behavior is now scoped only to preflight compaction. That matches the Codex behavior cited in #499; ordinary compaction still allows the newest preserved user message to be truncated within the configured budget.

…ction-parity

# Conflicts:
#	cli.py
#	gateway/run.py
#	hermes_cli/config.py
#	tests/agent/test_context_compressor.py
@kshitijk4poor kshitijk4poor changed the title Align context compaction with Codex behavior feat: Align context compaction with Codex behavior Mar 9, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the thorough work on this, @kshitijk4poor! The Codex-style compaction research in #499 is solid and there are several genuinely valuable improvements in this PR — the handoff prompt, multimodal content handling, and configurable compaction prompt are all things we want.

However, merging the PR as-is would introduce some issues:

  1. Consecutive user messages — the new compress() output drops all assistant/tool messages and can produce 3-5+ consecutive user messages, breaking role alternation for non-OpenAI providers (Anthropic, etc.)
  2. Multi-compaction warning fires on every compression — not just 2+ as intended
  3. File-read history preservation silently dropped — main's _compress_context preserves which files were read so the model doesn't re-read them after compression
  4. Dead code_align_boundary_forward/backward are still defined but never called
  5. 226 commits behind main with a merge conflict

We're going to cherry-pick the good parts into separate atomic PRs:

  • Codex-style compaction prompt
  • Codex-style handoff prefix
  • Multimodal content handling in summarization
  • Custom compaction prompt config option

Your work on #499 and this PR directly inspired these improvements. Thank you! 🙏

@teknium1 teknium1 closed this Mar 11, 2026
teknium1 added a commit that referenced this pull request Mar 11, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in #499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR #776 by @kshitijk4poor.
teknium1 added a commit that referenced this pull request Mar 11, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.

What changes:

1. New SUMMARY_PREFIX constant — the text prepended to every
   compressed summary:

   [CONTEXT COMPACTION] An earlier part of this conversation was
   summarized to preserve context space. Below is the summary — use
   it to build on the work already done and avoid duplicating effort:

2. _with_summary_prefix() helper — normalizes model output by stripping
   any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
   have produced, then prepends the new SUMMARY_PREFIX.

3. System message annotation updated — the note appended to the system
   prompt on first compression now says 'compacted into a handoff
   summary' and instructs 'build on that summary rather than re-doing
   work' instead of the old generic note.

Why this is better:

The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.

What does NOT change:

- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
  same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility

Inspired by PR #776 by @kshitijk4poor and the research in #499.
teknium1 added a commit that referenced this pull request Mar 11, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.

Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback

This ensures multimodal conversations compress correctly instead of
producing garbled summaries.

Inspired by PR #776 by @kshitijk4poor.
teknium1 added a commit that referenced this pull request Mar 11, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.

What changes:

1. ContextCompressor.__init__() accepts compaction_prompt_override param.
   When set (non-empty string), it replaces the default summarization
   instructions in _generate_summary(). The framing (token target, turns
   to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.

2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
   to ContextCompressor.

3. Config wiring — the new 'prompt' key under 'compression' section is
   mapped to CONTEXT_COMPRESSION_PROMPT env var in:
   - cli.py (load_cli_config defaults + env mapping)
   - hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
   - gateway/run.py (gateway env mapping)

Usage in config.yaml:
  compression:
    prompt: 'Your custom summarization instructions here'

Or via environment variable:
  CONTEXT_COMPRESSION_PROMPT='Your custom instructions'

When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.

Inspired by PR #776 by @kshitijk4poor and the research in #499.
teknium1 added a commit that referenced this pull request Mar 14, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in #499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR #776 by @kshitijk4poor.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR NousResearch#776 by @kshitijk4poor.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.

What changes:

1. ContextCompressor.__init__() accepts compaction_prompt_override param.
   When set (non-empty string), it replaces the default summarization
   instructions in _generate_summary(). The framing (token target, turns
   to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.

2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
   to ContextCompressor.

3. Config wiring — the new 'prompt' key under 'compression' section is
   mapped to CONTEXT_COMPRESSION_PROMPT env var in:
   - cli.py (load_cli_config defaults + env mapping)
   - hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
   - gateway/run.py (gateway env mapping)

Usage in config.yaml:
  compression:
    prompt: 'Your custom summarization instructions here'

Or via environment variable:
  CONTEXT_COMPRESSION_PROMPT='Your custom instructions'

When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.

Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.

What changes:

1. New SUMMARY_PREFIX constant — the text prepended to every
   compressed summary:

   [CONTEXT COMPACTION] An earlier part of this conversation was
   summarized to preserve context space. Below is the summary — use
   it to build on the work already done and avoid duplicating effort:

2. _with_summary_prefix() helper — normalizes model output by stripping
   any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
   have produced, then prepends the new SUMMARY_PREFIX.

3. System message annotation updated — the note appended to the system
   prompt on first compression now says 'compacted into a handoff
   summary' and instructs 'build on that summary rather than re-doing
   work' instead of the old generic note.

Why this is better:

The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.

What does NOT change:

- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
  same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility

Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.

Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback

This ensures multimodal conversations compress correctly instead of
producing garbled summaries.

Inspired by PR NousResearch#776 by @kshitijk4poor.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR NousResearch#776 by @kshitijk4poor.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
The _generate_summary() method assumed message content is always a
string (msg.get('content') or ''). When content is a multimodal list
(e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this
produced mangled output: len() returned the list length instead of
character count, and slicing produced list items instead of substrings.

Add _content_to_text() helper that safely converts any content format
to plain text:
- str → returned as-is
- None → empty string
- list (multimodal) → text parts joined, images replaced with [image]
- dict/other → JSON serialization with str() fallback

This ensures multimodal conversations compress correctly instead of
producing garbled summaries.

Inspired by PR NousResearch#776 by @kshitijk4poor.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR NousResearch#776 by @kshitijk4poor.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Add a compression.prompt config option that lets users override the
default summarization prompt used during context compression.

What changes:

1. ContextCompressor.__init__() accepts compaction_prompt_override param.
   When set (non-empty string), it replaces the default summarization
   instructions in _generate_summary(). The framing (token target, turns
   to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same.

2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it
   to ContextCompressor.

3. Config wiring — the new 'prompt' key under 'compression' section is
   mapped to CONTEXT_COMPRESSION_PROMPT env var in:
   - cli.py (load_cli_config defaults + env mapping)
   - hermes_cli/config.py (DEFAULT_CONFIG + show_config display)
   - gateway/run.py (gateway env mapping)

Usage in config.yaml:
  compression:
    prompt: 'Your custom summarization instructions here'

Or via environment variable:
  CONTEXT_COMPRESSION_PROMPT='Your custom instructions'

When empty (default), the built-in summarization prompt is used
unchanged. This gives power users control over how context is
compressed without modifying source code.

Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries
with a Codex-inspired handoff framing that tells the model what happened
and how to use the summary.

What changes:

1. New SUMMARY_PREFIX constant — the text prepended to every
   compressed summary:

   [CONTEXT COMPACTION] An earlier part of this conversation was
   summarized to preserve context space. Below is the summary — use
   it to build on the work already done and avoid duplicating effort:

2. _with_summary_prefix() helper — normalizes model output by stripping
   any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may
   have produced, then prepends the new SUMMARY_PREFIX.

3. System message annotation updated — the note appended to the system
   prompt on first compression now says 'compacted into a handoff
   summary' and instructs 'build on that summary rather than re-doing
   work' instead of the old generic note.

Why this is better:

The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no
context about what the summary is or how to use it. The new prefix
explicitly frames it as a context compaction event and instructs the
model to build on prior work rather than re-doing it. This reduces
redundant tool calls and file re-reads after compression.

What does NOT change:

- The compression algorithm (positional protection, boundary alignment)
- The role alternation logic (summary role adapts to avoid consecutive
  same-role messages)
- The summarization model or trigger thresholds
- LEGACY_SUMMARY_PREFIX is exported for backward compatibility

Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Replace the generic summarization prompt ('Summarize these conversation
turns concisely') with a task-oriented handoff prompt inspired by
OpenAI's Codex CLI compaction flow (researched in NousResearch#499).

The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION'
and instructs the summarization model to produce a structured handoff
summary that includes:
- Current progress and key decisions
- User preferences and constraints discovered
- Clear next steps remaining
- Critical data (file paths, URLs, error messages, code snippets)
- Tool calls made and their key results

This produces better summaries because the model understands the summary
will be used by another LLM to continue the work, rather than treating
it as a generic text compression task.

No behavioral change to the compression algorithm itself — same
positional protection, same role alternation, same [CONTEXT SUMMARY]:
prefix. Only the prompt sent to the summarization model changes.

Inspired by PR NousResearch#776 by @kshitijk4poor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants