Skip to content

fix(agent): robust context compression summary fallback#4243

Closed
lumethegreat wants to merge 1 commit into
NousResearch:mainfrom
lumethegreat:fix/compression-fallback-codex
Closed

fix(agent): robust context compression summary fallback#4243
lumethegreat wants to merge 1 commit into
NousResearch:mainfrom
lumethegreat:fix/compression-fallback-codex

Conversation

@lumethegreat

Copy link
Copy Markdown

What changed

This PR makes context compaction (conversation compression) more robust when summary generation fails due to misconfigured auxiliary model routing.

  1. Retry summary generation using the main provider/model

    • If the configured summary model/provider fails (e.g. google/* model routed to a Codex endpoint), the compressor retries with:
      • provider="main"
      • model=self.model (the main conversation model)
  2. Never fail silently when compaction cannot produce a summary

    • If summary generation still returns None, we insert an explicit compaction warning summary marker (with the standard compaction prefix), explaining that earlier turns may have been dropped without a summary and how to fix configuration.

Why

We observed failures where context compression attempted to use an unsupported summarization model for the active endpoint (e.g. google/gemini-3-flash-preview on a Codex ChatGPT account). The summary call failed, a session split occurred due to “compression”, and continuity degraded (inconsistent answers / lost context).

This PR both:

  • recovers automatically in common misconfiguration cases, and
  • warns explicitly when compaction cannot summarize.

User impact / behavior

  • In the common failure mode (auxiliary summary model incompatible with the active endpoint), Hermes will now still generate a compaction summary using the main model instead of failing compaction silently.
  • If summarization still fails, the transcript includes a clear warning marker that explains the continuity risks and points to config knobs (compression.summary_* and auxiliary.compression.provider=main).

How to reproduce (example)

  1. Configure context compression to use a model that the active endpoint cannot serve (e.g. compression.summary_model: google/gemini-3-flash-preview while using a Codex endpoint).
  2. Generate enough turns/tool output to trigger context compaction.
  3. Observe summary generation failure and degraded continuity.

How this fixes it

  • On summary failure, compaction performs a second attempt using provider=main + the main model, avoiding the incompatible auxiliary route.
  • If the fallback also fails, compaction still injects a warning summary marker to make the failure explicit to both users and future assistants.

Tests

  • Added unit tests covering:
    • retry behavior when the first summary call fails and the second succeeds (provider=main)
    • warning marker insertion when summary generation returns None

Command run locally:

  • PYTHONPATH="$PWD" pytest -q tests/agent/test_context_compressor.py

Files changed

  • agent/context_compressor.py
  • tests/agent/test_context_compressor.py

Retry summary generation on failure using provider=main + main model to avoid silent compaction failures when auxiliary model routing is misconfigured (e.g. google/* on Codex). Also insert an explicit warning summary marker when summarization fails so users understand continuity may degrade.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant