Skip to content

fix(compress): abort instead of dropping messages when summary LLM fails#28102

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-4823e8a4
May 18, 2026
Merged

fix(compress): abort instead of dropping messages when summary LLM fails#28102
teknium1 merged 2 commits into
mainfrom
hermes/hermes-4823e8a4

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

When auto-compression's summary LLM fails, freeze the chat as-is instead of silently dropping N turns and replacing them with a 'summary unavailable' placeholder. User can /compress to retry or /new to start fresh.

What was wrong

context_compressor.compress() called _generate_summary(). When that returned None (aux model errored, returned non-JSON, timed out), the code STILL dropped every middle message between compress_start..compress_end and inserted a static placeholder saying 'Summary generation was unavailable. N message(s) were removed and could not be summarized.' Session continued but you'd just lost N turns of context for nothing.

Changes

  • agent/context_compressor.py: compress() gains force= kwarg. On summary failure, set _last_compress_aborted=True and return messages unchanged.
  • run_agent.py: _compress_context() detects abort, emits warning, skips session-rotation, returns messages unchanged.
  • cli.py + gateway/run.py (manual /compress handler): pass force=True to bypass the failure cooldown so users can retry immediately.
  • gateway/run.py (session hygiene + /compress handlers): detect _last_compress_aborted and emit new gateway.compress.aborted i18n key ("⚠️ Compression aborted (...). No messages were dropped...") instead of the old 'N historical messages were removed' message.
  • locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
  • Tests: updated to assert the abort contract (messages preserved byte-for-byte, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers manual-retry path.

Why force=True on manual /compress

After an auto-compress aborts, _summary_failure_cooldown_until is set for 30-60s to keep the aux model from being burned every turn. Manual /compress is the user explicitly saying 'retry now' — it should bypass that cooldown.

Net effect

  • aux LLM failure → freeze, surface warning, zero silent message loss
  • user retries via /compress → cooldown bypassed, fresh attempt
  • still failing → another visible warning, conversation still intact

Test plan

  • scripts/run_tests.sh tests/agent/test_context_compressor.py → 81 passed
  • scripts/run_tests.sh tests/gateway/test_session_hygiene.py → 23 passed
  • scripts/run_tests.sh tests/gateway/test_compress_command.py → 4 passed
  • scripts/run_tests.sh tests/agent/test_i18n.py → 43 passed (locale parity holds with new key)

teknium1 added 2 commits May 18, 2026 10:16
When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
…ession.compress_context

Origin/main extracted _compress_context into agent/conversation_compression.
Forward force= through the new function signature and apply the abort
short-circuit there instead of in run_agent.py.
@teknium1 teknium1 merged commit 1634397 into main May 18, 2026
@teknium1 teknium1 deleted the hermes/hermes-4823e8a4 branch May 18, 2026 17:19
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-4823e8a4 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8798 on HEAD, 8798 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4620 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround labels May 18, 2026
teknium1 added a commit that referenced this pull request May 18, 2026
…28117)

PR #28102 made the summary-failure abort path the unconditional default,
changing established behavior. Gate it behind config.yaml flag
`compression.abort_on_summary_failure` (default False = historical
fallback-placeholder behavior).

- hermes_cli/config.py: new `compression.abort_on_summary_failure` key,
  default False, documented inline.
- agent/agent_init.py: read the flag from compression config and pass to
  ContextCompressor.
- agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure`
  (default False). `compress()` failure branch gates the abort on the
  flag; when False, falls through to the restored legacy fallback path
  (static "summary unavailable" placeholder + drop middle window).
- tests: restore original fallback expectations as default; add new
  TestAbortOnSummaryFailure class for the opt-in mode.

Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort
detection, locale `gateway.compress.aborted` key) from PR #28102 stays
intact — those paths only fire when `_last_compress_aborted` is True,
which now only happens when the flag is enabled.
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…ils (NousResearch#28102)

When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
Lillard01 pushed a commit to Lillard01/hermes-agent that referenced this pull request May 21, 2026
…ousResearch#28117)

PR NousResearch#28102 made the summary-failure abort path the unconditional default,
changing established behavior. Gate it behind config.yaml flag
`compression.abort_on_summary_failure` (default False = historical
fallback-placeholder behavior).

- hermes_cli/config.py: new `compression.abort_on_summary_failure` key,
  default False, documented inline.
- agent/agent_init.py: read the flag from compression config and pass to
  ContextCompressor.
- agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure`
  (default False). `compress()` failure branch gates the abort on the
  flag; when False, falls through to the restored legacy fallback path
  (static "summary unavailable" placeholder + drop middle window).
- tests: restore original fallback expectations as default; add new
  TestAbortOnSummaryFailure class for the opt-in mode.

Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort
detection, locale `gateway.compress.aborted` key) from PR NousResearch#28102 stays
intact — those paths only fire when `_last_compress_aborted` is True,
which now only happens when the flag is enabled.
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
…ils (NousResearch#28102)

When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
Mucky010 pushed a commit to Mucky010/hermes-agent that referenced this pull request May 24, 2026
…ousResearch#28117)

PR NousResearch#28102 made the summary-failure abort path the unconditional default,
changing established behavior. Gate it behind config.yaml flag
`compression.abort_on_summary_failure` (default False = historical
fallback-placeholder behavior).

- hermes_cli/config.py: new `compression.abort_on_summary_failure` key,
  default False, documented inline.
- agent/agent_init.py: read the flag from compression config and pass to
  ContextCompressor.
- agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure`
  (default False). `compress()` failure branch gates the abort on the
  flag; when False, falls through to the restored legacy fallback path
  (static "summary unavailable" placeholder + drop middle window).
- tests: restore original fallback expectations as default; add new
  TestAbortOnSummaryFailure class for the opt-in mode.

Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort
detection, locale `gateway.compress.aborted` key) from PR NousResearch#28102 stays
intact — those paths only fire when `_last_compress_aborted` is True,
which now only happens when the flag is enabled.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ils (NousResearch#28102)

When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.

New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.

Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.

- agent/context_compressor.py: compress() gains force= kwarg; failure
  branch sets _last_compress_aborted and returns messages unchanged
  instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
  skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
  and emit the new 'Compression aborted' warning (gateway.compress.aborted)
  instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
  compression_count not incremented, abort flag set, no placeholder
  leaked). New test_force_true_bypasses_failure_cooldown covers the
  manual-retry path.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ousResearch#28117)

PR NousResearch#28102 made the summary-failure abort path the unconditional default,
changing established behavior. Gate it behind config.yaml flag
`compression.abort_on_summary_failure` (default False = historical
fallback-placeholder behavior).

- hermes_cli/config.py: new `compression.abort_on_summary_failure` key,
  default False, documented inline.
- agent/agent_init.py: read the flag from compression config and pass to
  ContextCompressor.
- agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure`
  (default False). `compress()` failure branch gates the abort on the
  flag; when False, falls through to the restored legacy fallback path
  (static "summary unavailable" placeholder + drop middle window).
- tests: restore original fallback expectations as default; add new
  TestAbortOnSummaryFailure class for the opt-in mode.

Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort
detection, locale `gateway.compress.aborted` key) from PR NousResearch#28102 stays
intact — those paths only fire when `_last_compress_aborted` is True,
which now only happens when the flag is enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants