Skip to content

fix(agent): prevent context compression from re-firing after a fresh compress#40246

Closed
davidgut1982 wants to merge 1 commit into
NousResearch:mainfrom
davidgut1982:fix/hermes-compression-refire
Closed

fix(agent): prevent context compression from re-firing after a fresh compress#40246
davidgut1982 wants to merge 1 commit into
NousResearch:mainfrom
davidgut1982:fix/hermes-compression-refire

Conversation

@davidgut1982

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes context compression re-triggering on consecutive turns immediately after a fresh compression, even when the just-compressed context is small.

Root cause: right after a compression, last_prompt_tokens is set to the sentinel -1 and awaiting_real_usage_after_compression is set True. The preflight display-sync in conversation_loop.py used if _preflight_tokens > (last_prompt_tokens or 0) — but (-1 or 0) evaluates to -1 in Python (-1 is truthy), so the comparison was _preflight_tokens > -1, always True, which overwrote the sentinel with a rough estimate and made should_compress() re-fire every turn.

Related Issue

Fixes #36718

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

Changes Made

  • agent/context_compressor.py should_compress(): early-return guard if self.awaiting_real_usage_after_compression: return False, bounded by a suppression counter so it self-heals after at most 2 evaluations (prevents silently suppressing legitimate compression if a turn returns usage=None or errors before usage is recorded).
  • agent/conversation_loop.py: fixed the truthiness bug — bind _last = last_prompt_tokens and guard if _last >= 0 and _preflight_tokens > _last, so the -1 sentinel is no longer treated as a valid lower bound.
  • agent/conversation_compression.py: reset the suppression counter at the compress callsite where the flag is set True, so each compression cycle gets a full window.
  • Counter also reset in __init__, on_session_reset, and on update_from_response() (when real usage clears the flag).

How to Test

pytest tests/agent/test_context_compressor.py — includes:

  • TestCompressionRefireBug: asserts should_compress does not re-fire while awaiting real usage, and resumes once real usage arrives (fails on unpatched code).
  • TestBoundedSuppressionWindow: asserts suppression is bounded to 2 evaluations (returns False, False, True) so a stuck flag can never silence compression indefinitely.

97 tests pass; broader compression suites (tests/run_agent/, tests/gateway/ -k compress) pass; ruff check . and check-windows-footguns.py --all clean.

Checklist — Code

  • I have read the CONTRIBUTING guide
  • My commit messages follow Conventional Commits
  • No duplicate PR exists
  • Focused on a single concern
  • pytest tests/ -q passes (no new failures)
  • Added tests proving the fix
  • Considered cross-platform behavior (check-windows-footguns.py --all passes)

Checklist — Documentation

  • N/A — internal agent compression-logic fix, no user-facing docs/config/schema changes

@liuhao1024

Copy link
Copy Markdown
Contributor

I verified this fix is correct. The root cause and fix are well-analyzed.

Bug verification: compress_context() sets last_prompt_tokens = -1 as a sentinel. The preflight path at conversation_loop.py:628 does _preflight_tokens > (_compressor.last_prompt_tokens or 0). Since -1 is truthy, or 0 evaluates to -1, making the comparison always True and overwriting the sentinel with the rough estimate — causing should_compress() to re-trigger on the very next turn.

Fix correctness:

  1. _last >= 0 guard at the preflight site correctly preserves the -1 sentinel
  2. should_compress() suppression while awaiting_real_usage_after_compression blocks all trigger paths (preflight, post-tool, post-response)
  3. _AWAITING_SUPPRESSION_LIMIT = 2 with self-healing prevents a stuck flag from permanently silencing compression (handles the usage=None partial-stream case)
  4. All four reset sites (__init__, on_session_reset, update_from_response, compress_context) correctly zero _awaiting_suppression_count

Edge case: the bounded window test (test_suppression_bounded_to_two_turns_without_update_from_response) confirms that a stuck flag self-heals after exactly 2 turns, and test_fresh_compression_resets_suppression_count confirms the window doesn't bleed across compression cycles.

…earch#36718)

After context compression completes, compress_context() sets
last_prompt_tokens=-1 as a sentinel and
awaiting_real_usage_after_compression=True to signal that no real API
usage data has arrived yet. However, should_compress() did not guard on
the awaiting flag, so a schema-heavy rough preflight estimate that still
exceeded the threshold could re-trigger compression on the very next
turn — causing the HUD to show -1/262K and a spurious cmp2/cmp3.

Two-part fix:

1. should_compress() now returns False while
   awaiting_real_usage_after_compression is True. This is the
   single choke-point for all compression-trigger paths (preflight,
   post-API-response, post-tool). Once update_from_response() clears
   the flag, normal compression logic resumes.

2. The preflight display-sync path (conversation_loop.py:631) used
   `last_prompt_tokens or 0` which evaluates to -1 (truthy), making
   the `>` comparison always True and overwriting the sentinel with the
   rough estimate. Changed to an explicit `>= 0` guard so negative
   sentinel values are never treated as a valid lower bound.

Bounded suppression window (adversarial-review hardening):

If a turn returns usage=None (partial-stream stub) or raises before
update_from_response() runs, the awaiting flag stays True across
subsequent turns and would permanently suppress legitimate preflight
compression. Fix: should_compress() counts consecutive evaluations
under the flag via _awaiting_suppression_count. After 2 suppressed
evaluations the flag self-clears so normal token-count logic resumes.
update_from_response() resets the counter when real usage arrives so
each compression cycle's window starts fresh. conversation_compression
also resets it when setting the flag True. The normal case (usage
arrives next turn) is completely unaffected.

Fixes NousResearch#36718

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@davidgut1982 davidgut1982 force-pushed the fix/hermes-compression-refire branch from a12798a to 5a5beb3 Compare June 6, 2026 03:14
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 6, 2026
@teknium1

teknium1 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Salvaged into #40582 with credit. I kept the root-cause fix (the -1 sentinel guard in the preflight seed) and left out the _awaiting_suppression_count state machine to keep the compression hot path lean — the sentinel guard alone stops the re-fire. If the usage=None edge case proves to matter in practice, happy to add the suppression window as a separate reviewed change. Thanks!

#40582

@teknium1 teknium1 closed this Jun 6, 2026
teknium1 added a commit that referenced this pull request Jun 7, 2026
… preflight seed (#36718)

compress_context() sets last_prompt_tokens=-1 right after compression to
mark "no real API usage yet". The preflight display-seed used
`_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1
(truthy), so any positive rough estimate clobbered the sentinel with a
schema-inflated count — re-triggering compression on the next turn.
Treat any negative value as "no real data yet" and skip the seed.

Salvaged from #40246 as the minimal root-cause fix. The original also
added an `_awaiting_suppression_count` bounded-window state machine to
should_compress() across 3 files; left out here to keep blast radius
small — the sentinel guard alone fixes the re-fire. The suppression
window can be added separately if the usage=None-stub edge case warrants it.

Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
… preflight seed (NousResearch#36718)

compress_context() sets last_prompt_tokens=-1 right after compression to
mark "no real API usage yet". The preflight display-seed used
`_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1
(truthy), so any positive rough estimate clobbered the sentinel with a
schema-inflated count — re-triggering compression on the next turn.
Treat any negative value as "no real data yet" and skip the seed.

Salvaged from NousResearch#40246 as the minimal root-cause fix. The original also
added an `_awaiting_suppression_count` bounded-window state machine to
should_compress() across 3 files; left out here to keep blast radius
small — the sentinel guard alone fixes the re-fire. The suppression
window can be added separately if the usage=None-stub edge case warrants it.

Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call

4 participants