fix(agent): robust context compression summary fallback by lumethegreat · Pull Request #4243 · NousResearch/hermes-agent

lumethegreat · 2026-03-31T13:26:14Z

What changed

This PR makes context compaction (conversation compression) more robust when summary generation fails due to misconfigured auxiliary model routing.

Retry summary generation using the main provider/model
- If the configured summary model/provider fails (e.g. google/* model routed to a Codex endpoint), the compressor retries with:
  - provider="main"
  - model=self.model (the main conversation model)
Never fail silently when compaction cannot produce a summary
- If summary generation still returns None, we insert an explicit compaction warning summary marker (with the standard compaction prefix), explaining that earlier turns may have been dropped without a summary and how to fix configuration.

Why

We observed failures where context compression attempted to use an unsupported summarization model for the active endpoint (e.g. google/gemini-3-flash-preview on a Codex ChatGPT account). The summary call failed, a session split occurred due to “compression”, and continuity degraded (inconsistent answers / lost context).

This PR both:

recovers automatically in common misconfiguration cases, and
warns explicitly when compaction cannot summarize.

User impact / behavior

In the common failure mode (auxiliary summary model incompatible with the active endpoint), Hermes will now still generate a compaction summary using the main model instead of failing compaction silently.
If summarization still fails, the transcript includes a clear warning marker that explains the continuity risks and points to config knobs (compression.summary_* and auxiliary.compression.provider=main).

How to reproduce (example)

Configure context compression to use a model that the active endpoint cannot serve (e.g. compression.summary_model: google/gemini-3-flash-preview while using a Codex endpoint).
Generate enough turns/tool output to trigger context compaction.
Observe summary generation failure and degraded continuity.

How this fixes it

On summary failure, compaction performs a second attempt using provider=main + the main model, avoiding the incompatible auxiliary route.
If the fallback also fails, compaction still injects a warning summary marker to make the failure explicit to both users and future assistants.

Tests

Added unit tests covering:
- retry behavior when the first summary call fails and the second succeeds (provider=main)
- warning marker insertion when summary generation returns None

Command run locally:

PYTHONPATH="$PWD" pytest -q tests/agent/test_context_compressor.py

Files changed

agent/context_compressor.py
tests/agent/test_context_compressor.py

Retry summary generation on failure using provider=main + main model to avoid silent compaction failures when auxiliary model routing is misconfigured (e.g. google/* on Codex). Also insert an explicit warning summary marker when summarization fails so users understand continuity may degrade.

This was referenced Apr 2, 2026

fix(compression): use extract_content_or_reasoning for reasoning model summaries #4603

Open

fix(compression): include reasoning tokens in context tracking #4614

Closed

lumethegreat closed this Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): robust context compression summary fallback#4243

fix(agent): robust context compression summary fallback#4243
lumethegreat wants to merge 1 commit into
NousResearch:mainfrom
lumethegreat:fix/compression-fallback-codex

lumethegreat commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lumethegreat commented Mar 31, 2026

What changed

Why

User impact / behavior

How to reproduce (example)

How this fixes it

Tests

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant