fix(agent): prevent context compression from re-firing after a fresh compress#40246
fix(agent): prevent context compression from re-firing after a fresh compress#40246davidgut1982 wants to merge 1 commit into
Conversation
|
I verified this fix is correct. The root cause and fix are well-analyzed. Bug verification: Fix correctness:
Edge case: the bounded window test ( |
…earch#36718) After context compression completes, compress_context() sets last_prompt_tokens=-1 as a sentinel and awaiting_real_usage_after_compression=True to signal that no real API usage data has arrived yet. However, should_compress() did not guard on the awaiting flag, so a schema-heavy rough preflight estimate that still exceeded the threshold could re-trigger compression on the very next turn — causing the HUD to show -1/262K and a spurious cmp2/cmp3. Two-part fix: 1. should_compress() now returns False while awaiting_real_usage_after_compression is True. This is the single choke-point for all compression-trigger paths (preflight, post-API-response, post-tool). Once update_from_response() clears the flag, normal compression logic resumes. 2. The preflight display-sync path (conversation_loop.py:631) used `last_prompt_tokens or 0` which evaluates to -1 (truthy), making the `>` comparison always True and overwriting the sentinel with the rough estimate. Changed to an explicit `>= 0` guard so negative sentinel values are never treated as a valid lower bound. Bounded suppression window (adversarial-review hardening): If a turn returns usage=None (partial-stream stub) or raises before update_from_response() runs, the awaiting flag stays True across subsequent turns and would permanently suppress legitimate preflight compression. Fix: should_compress() counts consecutive evaluations under the flag via _awaiting_suppression_count. After 2 suppressed evaluations the flag self-clears so normal token-count logic resumes. update_from_response() resets the counter when real usage arrives so each compression cycle's window starts fresh. conversation_compression also resets it when setting the flag True. The normal case (usage arrives next turn) is completely unaffected. Fixes NousResearch#36718 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a12798a to
5a5beb3
Compare
|
Salvaged into #40582 with credit. I kept the root-cause fix (the |
… preflight seed (#36718) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from #40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
… preflight seed (NousResearch#36718) compress_context() sets last_prompt_tokens=-1 right after compression to mark "no real API usage yet". The preflight display-seed used `_preflight_tokens > (last_prompt_tokens or 0)`, and `(-1 or 0)` is -1 (truthy), so any positive rough estimate clobbered the sentinel with a schema-inflated count — re-triggering compression on the next turn. Treat any negative value as "no real data yet" and skip the seed. Salvaged from NousResearch#40246 as the minimal root-cause fix. The original also added an `_awaiting_suppression_count` bounded-window state machine to should_compress() across 3 files; left out here to keep blast radius small — the sentinel guard alone fixes the re-fire. The suppression window can be added separately if the usage=None-stub edge case warrants it. Co-authored-by: davidgut1982 <davidgut1982@users.noreply.github.com>
What does this PR do?
Fixes context compression re-triggering on consecutive turns immediately after a fresh compression, even when the just-compressed context is small.
Root cause: right after a compression,
last_prompt_tokensis set to the sentinel-1andawaiting_real_usage_after_compressionis set True. The preflight display-sync inconversation_loop.pyusedif _preflight_tokens > (last_prompt_tokens or 0)— but(-1 or 0)evaluates to-1in Python (-1is truthy), so the comparison was_preflight_tokens > -1, always True, which overwrote the sentinel with a rough estimate and madeshould_compress()re-fire every turn.Related Issue
Fixes #36718
Type of Change
Changes Made
agent/context_compressor.pyshould_compress(): early-return guardif self.awaiting_real_usage_after_compression: return False, bounded by a suppression counter so it self-heals after at most 2 evaluations (prevents silently suppressing legitimate compression if a turn returnsusage=Noneor errors before usage is recorded).agent/conversation_loop.py: fixed the truthiness bug — bind_last = last_prompt_tokensand guardif _last >= 0 and _preflight_tokens > _last, so the-1sentinel is no longer treated as a valid lower bound.agent/conversation_compression.py: reset the suppression counter at the compress callsite where the flag is set True, so each compression cycle gets a full window.__init__,on_session_reset, and onupdate_from_response()(when real usage clears the flag).How to Test
pytest tests/agent/test_context_compressor.py— includes:TestCompressionRefireBug: assertsshould_compressdoes not re-fire while awaiting real usage, and resumes once real usage arrives (fails on unpatched code).TestBoundedSuppressionWindow: asserts suppression is bounded to 2 evaluations (returns False, False, True) so a stuck flag can never silence compression indefinitely.97 tests pass; broader compression suites (
tests/run_agent/,tests/gateway/-k compress) pass;ruff check .andcheck-windows-footguns.py --allclean.Checklist — Code
pytest tests/ -qpasses (no new failures)Checklist — Documentation