fix(agent): prevent context compression from re-firing after a fresh compress by davidgut1982 · Pull Request #40246 · NousResearch/hermes-agent

davidgut1982 · 2026-06-06T02:06:38Z

What does this PR do?

Fixes context compression re-triggering on consecutive turns immediately after a fresh compression, even when the just-compressed context is small.

Root cause: right after a compression, last_prompt_tokens is set to the sentinel -1 and awaiting_real_usage_after_compression is set True. The preflight display-sync in conversation_loop.py used if _preflight_tokens > (last_prompt_tokens or 0) — but (-1 or 0) evaluates to -1 in Python (-1 is truthy), so the comparison was _preflight_tokens > -1, always True, which overwrote the sentinel with a rough estimate and made should_compress() re-fire every turn.

Related Issue

Fixes #36718

Type of Change

Bug fix (non-breaking change which fixes an issue)

Changes Made

agent/context_compressor.py should_compress(): early-return guard if self.awaiting_real_usage_after_compression: return False, bounded by a suppression counter so it self-heals after at most 2 evaluations (prevents silently suppressing legitimate compression if a turn returns usage=None or errors before usage is recorded).
agent/conversation_loop.py: fixed the truthiness bug — bind _last = last_prompt_tokens and guard if _last >= 0 and _preflight_tokens > _last, so the -1 sentinel is no longer treated as a valid lower bound.
agent/conversation_compression.py: reset the suppression counter at the compress callsite where the flag is set True, so each compression cycle gets a full window.
Counter also reset in __init__, on_session_reset, and on update_from_response() (when real usage clears the flag).

How to Test

pytest tests/agent/test_context_compressor.py — includes:

TestCompressionRefireBug: asserts should_compress does not re-fire while awaiting real usage, and resumes once real usage arrives (fails on unpatched code).
TestBoundedSuppressionWindow: asserts suppression is bounded to 2 evaluations (returns False, False, True) so a stuck flag can never silence compression indefinitely.

97 tests pass; broader compression suites (tests/run_agent/, tests/gateway/ -k compress) pass; ruff check . and check-windows-footguns.py --all clean.

Checklist — Code

I have read the CONTRIBUTING guide
My commit messages follow Conventional Commits
No duplicate PR exists
Focused on a single concern
pytest tests/ -q passes (no new failures)
Added tests proving the fix
Considered cross-platform behavior (check-windows-footguns.py --all passes)

Checklist — Documentation

N/A — internal agent compression-logic fix, no user-facing docs/config/schema changes

liuhao1024 · 2026-06-06T02:36:36Z

I verified this fix is correct. The root cause and fix are well-analyzed.

Bug verification: compress_context() sets last_prompt_tokens = -1 as a sentinel. The preflight path at conversation_loop.py:628 does _preflight_tokens > (_compressor.last_prompt_tokens or 0). Since -1 is truthy, or 0 evaluates to -1, making the comparison always True and overwriting the sentinel with the rough estimate — causing should_compress() to re-trigger on the very next turn.

Fix correctness:

_last >= 0 guard at the preflight site correctly preserves the -1 sentinel
should_compress() suppression while awaiting_real_usage_after_compression blocks all trigger paths (preflight, post-tool, post-response)
_AWAITING_SUPPRESSION_LIMIT = 2 with self-healing prevents a stuck flag from permanently silencing compression (handles the usage=None partial-stream case)
All four reset sites (__init__, on_session_reset, update_from_response, compress_context) correctly zero _awaiting_suppression_count

Edge case: the bounded window test (test_suppression_bounded_to_two_turns_without_update_from_response) confirms that a stuck flag self-heals after exactly 2 turns, and test_fresh_compression_resets_suppression_count confirms the window doesn't bleed across compression cycles.

…earch#36718) After context compression completes, compress_context() sets last_prompt_tokens=-1 as a sentinel and awaiting_real_usage_after_compression=True to signal that no real API usage data has arrived yet. However, should_compress() did not guard on the awaiting flag, so a schema-heavy rough preflight estimate that still exceeded the threshold could re-trigger compression on the very next turn — causing the HUD to show -1/262K and a spurious cmp2/cmp3. Two-part fix: 1. should_compress() now returns False while awaiting_real_usage_after_compression is True. This is the single choke-point for all compression-trigger paths (preflight, post-API-response, post-tool). Once update_from_response() clears the flag, normal compression logic resumes. 2. The preflight display-sync path (conversation_loop.py:631) used `last_prompt_tokens or 0` which evaluates to -1 (truthy), making the `>` comparison always True and overwriting the sentinel with the rough estimate. Changed to an explicit `>= 0` guard so negative sentinel values are never treated as a valid lower bound. Bounded suppression window (adversarial-review hardening): If a turn returns usage=None (partial-stream stub) or raises before update_from_response() runs, the awaiting flag stays True across subsequent turns and would permanently suppress legitimate preflight compression. Fix: should_compress() counts consecutive evaluations under the flag via _awaiting_suppression_count. After 2 suppressed evaluations the flag self-clears so normal token-count logic resumes. update_from_response() resets the counter when real usage arrives so each compression cycle's window starts fresh. conversation_compression also resets it when setting the flag True. The normal case (usage arrives next turn) is completely unaffected. Fixes NousResearch#36718 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

teknium1 · 2026-06-06T15:21:47Z

Salvaged into #40582 with credit. I kept the root-cause fix (the -1 sentinel guard in the preflight seed) and left out the _awaiting_suppression_count state machine to keep the compression hot path lean — the sentinel guard alone stops the re-fire. If the usage=None edge case proves to matter in practice, happy to add the suppression window as a separate reviewed change. Thanks!

#40582

… preflight seed (#36718) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from #40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>

… preflight seed (NousResearch#36718) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from NousResearch#40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>

davidgut1982 force-pushed the fix/hermes-compression-refire branch from a12798a to 5a5beb3 Compare June 6, 2026 03:14

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 6, 2026

teknium1 mentioned this pull request Jun 6, 2026

fix(compression): preserve -1 post-compression sentinel to stop re-fire (#36718) #40582

Merged

teknium1 closed this Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): prevent context compression from re-firing after a fresh compress#40246

fix(agent): prevent context compression from re-firing after a fresh compress#40246
davidgut1982 wants to merge 1 commit into
NousResearch:mainfrom
davidgut1982:fix/hermes-compression-refire

davidgut1982 commented Jun 6, 2026

Uh oh!

liuhao1024 commented Jun 6, 2026

Uh oh!

teknium1 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

davidgut1982 commented Jun 6, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist — Code

Checklist — Documentation

Uh oh!

liuhao1024 commented Jun 6, 2026

Uh oh!

teknium1 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants