feat: Align context compaction with Codex behavior by kshitijk4poor · Pull Request #776 · NousResearch/hermes-agent

kshitijk4poor · 2026-03-09T16:57:03Z

Summary

This updates Hermes context compaction to closely mirror the Codex compaction flow described in #499 and cross-verified against the original openai/codex implementation and tests.

This is intended as a behavioral match, not a loose approximation: reviewers should be able to compare the resulting Hermes flow directly against the Codex paths referenced in the issue.

Closes #499.

What changed

add Codex-style handoff prompt/prefix handling and a configurable preserved-user token budget
preserve multimodal user content during compaction instead of assuming every user content value is a string
keep the incoming user request after the compaction summary in preflight compression so the next model call still targets the active prompt
keep ordinary/manual/reactive compaction token-limited, rather than exempting the newest preserved user turn in every compaction path
insert todo snapshots before the compaction summary so the current request remains the trailing active user turn in preflight compaction
wire the compaction prompt and preserved-user budget through CLI, gateway, and config defaults

Cross-verification

This PR mimics the original Codex behavior and can be cross-checked against the implementation/tests cited from openai/codex in #499:

token-limited preserved user selection in local compaction
pre-turn compaction excluding the incoming user request from the compaction request, then re-appending it afterward
multimodal follow-up preservation
summary ordering relative to the active user turn

One detail worth calling out explicitly: the "keep latest user full" behavior is now scoped only to preflight compaction. That matches the Codex behavior cited in #499; ordinary compaction still allows the newest preserved user message to be truncated within the configured budget.

…ction-parity # Conflicts: # cli.py # gateway/run.py # hermes_cli/config.py # tests/agent/test_context_compressor.py

teknium1 · 2026-03-11T12:37:02Z

Thanks for the thorough work on this, @kshitijk4poor! The Codex-style compaction research in #499 is solid and there are several genuinely valuable improvements in this PR — the handoff prompt, multimodal content handling, and configurable compaction prompt are all things we want.

However, merging the PR as-is would introduce some issues:

Consecutive user messages — the new compress() output drops all assistant/tool messages and can produce 3-5+ consecutive user messages, breaking role alternation for non-OpenAI providers (Anthropic, etc.)
Multi-compaction warning fires on every compression — not just 2+ as intended
File-read history preservation silently dropped — main's _compress_context preserves which files were read so the model doesn't re-read them after compression
Dead code — _align_boundary_forward/backward are still defined but never called
226 commits behind main with a merge conflict

We're going to cherry-pick the good parts into separate atomic PRs:

Codex-style compaction prompt
Codex-style handoff prefix
Multimodal content handling in summarization
Custom compaction prompt config option

Your work on #499 and this PR directly inspired these improvements. Thank you! 🙏

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in #499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR #776 by @kshitijk4poor.

@kshitijk4poor

Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries with a Codex-inspired handoff framing that tells the model what happened and how to use the summary. What changes: 1. New SUMMARY_PREFIX constant — the text prepended to every compressed summary: [CONTEXT COMPACTION] An earlier part of this conversation was summarized to preserve context space. Below is the summary — use it to build on the work already done and avoid duplicating effort: 2. _with_summary_prefix() helper — normalizes model output by stripping any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may have produced, then prepends the new SUMMARY_PREFIX. 3. System message annotation updated — the note appended to the system prompt on first compression now says 'compacted into a handoff summary' and instructs 'build on that summary rather than re-doing work' instead of the old generic note. Why this is better: The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no context about what the summary is or how to use it. The new prefix explicitly frames it as a context compaction event and instructs the model to build on prior work rather than re-doing it. This reduces redundant tool calls and file re-reads after compression. What does NOT change: - The compression algorithm (positional protection, boundary alignment) - The role alternation logic (summary role adapts to avoid consecutive same-role messages) - The summarization model or trigger thresholds - LEGACY_SUMMARY_PREFIX is exported for backward compatibility Inspired by PR #776 by @kshitijk4poor and the research in #499.

@kshitijk4poor

The _generate_summary() method assumed message content is always a string (msg.get('content') or ''). When content is a multimodal list (e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this produced mangled output: len() returned the list length instead of character count, and slicing produced list items instead of substrings. Add _content_to_text() helper that safely converts any content format to plain text: - str → returned as-is - None → empty string - list (multimodal) → text parts joined, images replaced with [image] - dict/other → JSON serialization with str() fallback This ensures multimodal conversations compress correctly instead of producing garbled summaries. Inspired by PR #776 by @kshitijk4poor.

@kshitijk4poor

Add a compression.prompt config option that lets users override the default summarization prompt used during context compression. What changes: 1. ContextCompressor.__init__() accepts compaction_prompt_override param. When set (non-empty string), it replaces the default summarization instructions in _generate_summary(). The framing (token target, turns to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same. 2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it to ContextCompressor. 3. Config wiring — the new 'prompt' key under 'compression' section is mapped to CONTEXT_COMPRESSION_PROMPT env var in: - cli.py (load_cli_config defaults + env mapping) - hermes_cli/config.py (DEFAULT_CONFIG + show_config display) - gateway/run.py (gateway env mapping) Usage in config.yaml: compression: prompt: 'Your custom summarization instructions here' Or via environment variable: CONTEXT_COMPRESSION_PROMPT='Your custom instructions' When empty (default), the built-in summarization prompt is used unchanged. This gives power users control over how context is compressed without modifying source code. Inspired by PR #776 by @kshitijk4poor and the research in #499.

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in #499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR #776 by @kshitijk4poor.

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in NousResearch#499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR NousResearch#776 by @kshitijk4poor.

@kshitijk4poor

Add a compression.prompt config option that lets users override the default summarization prompt used during context compression. What changes: 1. ContextCompressor.__init__() accepts compaction_prompt_override param. When set (non-empty string), it replaces the default summarization instructions in _generate_summary(). The framing (token target, turns to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same. 2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it to ContextCompressor. 3. Config wiring — the new 'prompt' key under 'compression' section is mapped to CONTEXT_COMPRESSION_PROMPT env var in: - cli.py (load_cli_config defaults + env mapping) - hermes_cli/config.py (DEFAULT_CONFIG + show_config display) - gateway/run.py (gateway env mapping) Usage in config.yaml: compression: prompt: 'Your custom summarization instructions here' Or via environment variable: CONTEXT_COMPRESSION_PROMPT='Your custom instructions' When empty (default), the built-in summarization prompt is used unchanged. This gives power users control over how context is compressed without modifying source code. Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.

@kshitijk4poor

Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries with a Codex-inspired handoff framing that tells the model what happened and how to use the summary. What changes: 1. New SUMMARY_PREFIX constant — the text prepended to every compressed summary: [CONTEXT COMPACTION] An earlier part of this conversation was summarized to preserve context space. Below is the summary — use it to build on the work already done and avoid duplicating effort: 2. _with_summary_prefix() helper — normalizes model output by stripping any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may have produced, then prepends the new SUMMARY_PREFIX. 3. System message annotation updated — the note appended to the system prompt on first compression now says 'compacted into a handoff summary' and instructs 'build on that summary rather than re-doing work' instead of the old generic note. Why this is better: The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no context about what the summary is or how to use it. The new prefix explicitly frames it as a context compaction event and instructs the model to build on prior work rather than re-doing it. This reduces redundant tool calls and file re-reads after compression. What does NOT change: - The compression algorithm (positional protection, boundary alignment) - The role alternation logic (summary role adapts to avoid consecutive same-role messages) - The summarization model or trigger thresholds - LEGACY_SUMMARY_PREFIX is exported for backward compatibility Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.

@kshitijk4poor

The _generate_summary() method assumed message content is always a string (msg.get('content') or ''). When content is a multimodal list (e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this produced mangled output: len() returned the list length instead of character count, and slicing produced list items instead of substrings. Add _content_to_text() helper that safely converts any content format to plain text: - str → returned as-is - None → empty string - list (multimodal) → text parts joined, images replaced with [image] - dict/other → JSON serialization with str() fallback This ensures multimodal conversations compress correctly instead of producing garbled summaries. Inspired by PR NousResearch#776 by @kshitijk4poor.

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in NousResearch#499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR NousResearch#776 by @kshitijk4poor.

@kshitijk4poor

The _generate_summary() method assumed message content is always a string (msg.get('content') or ''). When content is a multimodal list (e.g. [{type: 'text', text: '...'}, {type: 'image_url', ...}]), this produced mangled output: len() returned the list length instead of character count, and slicing produced list items instead of substrings. Add _content_to_text() helper that safely converts any content format to plain text: - str → returned as-is - None → empty string - list (multimodal) → text parts joined, images replaced with [image] - dict/other → JSON serialization with str() fallback This ensures multimodal conversations compress correctly instead of producing garbled summaries. Inspired by PR NousResearch#776 by @kshitijk4poor.

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in NousResearch#499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR NousResearch#776 by @kshitijk4poor.

@kshitijk4poor

Add a compression.prompt config option that lets users override the default summarization prompt used during context compression. What changes: 1. ContextCompressor.__init__() accepts compaction_prompt_override param. When set (non-empty string), it replaces the default summarization instructions in _generate_summary(). The framing (token target, turns to summarize, [CONTEXT SUMMARY]: prefix instruction) stays the same. 2. run_agent.py reads CONTEXT_COMPRESSION_PROMPT env var and passes it to ContextCompressor. 3. Config wiring — the new 'prompt' key under 'compression' section is mapped to CONTEXT_COMPRESSION_PROMPT env var in: - cli.py (load_cli_config defaults + env mapping) - hermes_cli/config.py (DEFAULT_CONFIG + show_config display) - gateway/run.py (gateway env mapping) Usage in config.yaml: compression: prompt: 'Your custom summarization instructions here' Or via environment variable: CONTEXT_COMPRESSION_PROMPT='Your custom instructions' When empty (default), the built-in summarization prompt is used unchanged. This gives power users control over how context is compressed without modifying source code. Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.

@kshitijk4poor

Replace the old '[CONTEXT SUMMARY]:' prefix on compressed summaries with a Codex-inspired handoff framing that tells the model what happened and how to use the summary. What changes: 1. New SUMMARY_PREFIX constant — the text prepended to every compressed summary: [CONTEXT COMPACTION] An earlier part of this conversation was summarized to preserve context space. Below is the summary — use it to build on the work already done and avoid duplicating effort: 2. _with_summary_prefix() helper — normalizes model output by stripping any legacy '[CONTEXT SUMMARY]:' prefix the summarization model may have produced, then prepends the new SUMMARY_PREFIX. 3. System message annotation updated — the note appended to the system prompt on first compression now says 'compacted into a handoff summary' and instructs 'build on that summary rather than re-doing work' instead of the old generic note. Why this is better: The old prefix ('[CONTEXT SUMMARY]: <raw text>') gave the model no context about what the summary is or how to use it. The new prefix explicitly frames it as a context compaction event and instructs the model to build on prior work rather than re-doing it. This reduces redundant tool calls and file re-reads after compression. What does NOT change: - The compression algorithm (positional protection, boundary alignment) - The role alternation logic (summary role adapts to avoid consecutive same-role messages) - The summarization model or trigger thresholds - LEGACY_SUMMARY_PREFIX is exported for backward compatibility Inspired by PR NousResearch#776 by @kshitijk4poor and the research in NousResearch#499.

@kshitijk4poor

Replace the generic summarization prompt ('Summarize these conversation turns concisely') with a task-oriented handoff prompt inspired by OpenAI's Codex CLI compaction flow (researched in NousResearch#499). The new prompt frames compression as a 'CONTEXT CHECKPOINT COMPACTION' and instructs the summarization model to produce a structured handoff summary that includes: - Current progress and key decisions - User preferences and constraints discovered - Clear next steps remaining - Critical data (file paths, URLs, error messages, code snippets) - Tool calls made and their key results This produces better summaries because the model understands the summary will be used by another LLM to continue the work, rather than treating it as a generic text compression task. No behavioral change to the compression algorithm itself — same positional protection, same role alternation, same [CONTEXT SUMMARY]: prefix. Only the prompt sent to the summarization model changes. Inspired by PR NousResearch#776 by @kshitijk4poor.

kshitijk4poor added 2 commits March 9, 2026 22:26

Align context compaction with Codex semantics

1382681

Merge remote-tracking branch 'upstream/main' into fix/499-codex-compa…

db62dc2

…ction-parity # Conflicts: # cli.py # gateway/run.py # hermes_cli/config.py # tests/agent/test_context_compressor.py

kshitijk4poor changed the title ~~Align context compaction with Codex behavior~~ feat: Align context compaction with Codex behavior Mar 9, 2026

teknium1 closed this Mar 11, 2026

teknium1 mentioned this pull request Mar 11, 2026

feat: use Codex-style compaction prompt for context compression #915

Closed

teknium1 mentioned this pull request Mar 11, 2026

feat: Codex-style handoff prefix for compressed context summaries #916

Closed

teknium1 mentioned this pull request Mar 11, 2026

fix: handle multimodal content in context compression summarization #917

Closed

teknium1 mentioned this pull request Mar 11, 2026

feat: configurable custom compaction prompt for context compression #919

Closed

This was referenced Mar 11, 2026

refactor: split adapter and cli cleanup #939

Closed

fix: stop repeated identical tool doom loops #823

Closed

This was referenced May 16, 2026

feat: use Codex-style compaction prompt for context compression #27085

Closed

feat: configurable custom compaction prompt for context compression #27087

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Align context compaction with Codex behavior#776

feat: Align context compaction with Codex behavior#776
kshitijk4poor wants to merge 2 commits into
NousResearch:mainfrom
kshitijk4poor:fix/499-codex-compaction-parity

kshitijk4poor commented Mar 9, 2026 •

edited

Loading

Uh oh!

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Cross-verification

Uh oh!

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kshitijk4poor commented Mar 9, 2026 •

edited

Loading