Skip to content

fix: harden context compression against silent failures#5457

Closed
luinbytes wants to merge 1 commit into
NousResearch:mainfrom
luinbytes:fix/compaction-resilience
Closed

fix: harden context compression against silent failures#5457
luinbytes wants to merge 1 commit into
NousResearch:mainfrom
luinbytes:fix/compaction-resilience

Conversation

@luinbytes

@luinbytes luinbytes commented Apr 6, 2026

Copy link
Copy Markdown

Summary

When context compression fires, the model sometimes returns empty on the next turn. Hermes sends (empty) to the user and the conversation dies. Two fixes:

  1. Post-compaction empty recovery — sets a _just_compacted flag after compression. If the very next model response is empty, injects a continuation prompt and retries instead of sending (empty).

  2. Retry via provider routing — if the primary summary model fails (bad config, API error, timeout), retries the same call_llm(task="compression", ...) call without the explicit model override. call_llm already has its own provider fallback chain, so this goes through the normal routing pipeline instead of hardcoding a specific provider/model in core code.

Also upgrades summary-failure logging from warning to ERROR with model name and error details, so misconfigured summary models are visible in logs immediately.

What happens today without this

  • User has summary_model pointing at a broken or unreachable endpoint
  • Compression fires, summary generation fails silently, middle turns get dropped without any summary
  • Model sees the truncated context, returns empty
  • User gets (empty) sent to them, conversation is effectively dead

Test plan

  • py_compile passes on both files

@luinbytes luinbytes force-pushed the fix/compaction-resilience branch 2 times, most recently from 2247930 to 60eefd7 Compare April 13, 2026 16:08
@luinbytes

Copy link
Copy Markdown
Author

bump — rebased onto latest upstream, all conflicts resolved. @teknium1

- Add post-compaction empty recovery: when the model returns empty
  right after compression, inject a continuation prompt and retry
  instead of breaking with (empty) and killing the conversation
- Add fallback summary model: if the primary summary model fails
  (bad config, API error, timeout), try openrouter/qwen as fallback
  before dropping all middle turns without a summary
- Upgrade summary-failure logging to ERROR level with model name
  and error details so config mistakes are immediately visible
- Remove dead code: _classify_empty_content_response() was defined
  but never called; also removed now-unused is_local_endpoint import
@luinbytes luinbytes force-pushed the fix/compaction-resilience branch from 60eefd7 to 5a819da Compare April 19, 2026 02:40
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 1, 2026
@luinbytes luinbytes closed this Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants