Skip to content

fix(agent): prevent double-compression on turn immediately after compress run#38133

Open
ashishpatel26 wants to merge 2 commits into
NousResearch:mainfrom
ashishpatel26:fix/compression-double-trigger-36718
Open

fix(agent): prevent double-compression on turn immediately after compress run#38133
ashishpatel26 wants to merge 2 commits into
NousResearch:mainfrom
ashishpatel26:fix/compression-double-trigger-36718

Conversation

@ashishpatel26

Copy link
Copy Markdown
Contributor

Problem

Closes #36718.

After compress() runs, the scheduler sets:

  • last_prompt_tokens = -1 (sentinel)
  • awaiting_real_usage_after_compression = True

But last_real_prompt_tokens still holds the old pre-compression value (above the threshold). On the very next preflight check, should_defer_preflight_to_real_usage() hit this branch:

if self.last_real_prompt_tokens >= self.threshold_tokens:
    return False   # incorrectly skips deferral

…returned False, allowing should_compress(preflight_tokens) to fire — triggering a second compression before the provider ever reported real token usage for the now-shorter conversation.

Root cause

awaiting_real_usage_after_compression exists precisely to guard this window, but should_defer_preflight_to_real_usage() never consulted it. The stale last_real_prompt_tokens value short-circuited deferral.

Fix

Add an early-return in should_defer_preflight_to_real_usage() (agent/context_compressor.py):

if self.awaiting_real_usage_after_compression:
    return True   # defer until update_from_response() clears the flag

update_from_response() already clears awaiting_real_usage_after_compression once real prompt_tokens arrive from the provider, so the guard is active for exactly one turn.

Test

Added two cases to TestPreflightDeferral in tests/agent/test_context_compressor.py:

  1. test_defers_immediately_after_compression_before_real_usage_arrives — verifies should_defer_preflight_to_real_usage returns True when the flag is set and rough tokens exceed threshold (the previously broken case).
  2. test_no_longer_defers_after_real_usage_clears_flag — verifies normal baseline/growth deferral resumes once the flag is cleared.

Test plan

  • Existing TestPreflightDeferral tests still pass
  • TestUpdateFromResponse tests still pass (flag-clearing path unchanged)
  • Manual: start a long conversation that triggers compression; verify only one compression fires per threshold crossing, not two back-to-back

@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround labels Jun 3, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

I verified the guard logic and the regression scenario — the test is correct that deferring when awaiting_real_usage_after_compression=True prevents double-compression. However, the flag is never set to True in production.

On current main, awaiting_real_usage_after_compression is:

  • Initialized to False at line 554 and line 655
  • Cleared to False in update_from_response() at line 696
  • Never assigned True anywhere

The compress() method (line 1827+) does not set this flag after a successful compression. So the new guard if self.awaiting_real_usage_after_compression: return True will never trigger — the flag is always False when should_defer_preflight_to_real_usage() runs.

The tests pass because they manually set compressor.awaiting_real_usage_after_compression = True, but no production code path does this.

Suggested fix: Add this to the compress() method, after the summary is generated and before returning the compressed messages:

# Park last_prompt_tokens at -1 so the preflight check in
# should_defer_preflight_to_real_usage() knows real usage
# hasn't arrived yet.
self.last_prompt_tokens = -1
self.awaiting_real_usage_after_compression = True

Without this, the PR fixes the symptom in tests but not in production.

@ashishpatel26

Copy link
Copy Markdown
Contributor Author

Great catch — you're absolutely right. The flag and the guard were both present, but compress() never actually set awaiting_real_usage_after_compression = True, so the guard in should_defer_preflight_to_real_usage() could never fire in production. The tests passed only because they set the flag manually.

Fixed in the latest push: at the end of compress() (before return compressed), added:

self.last_prompt_tokens = -1
self.awaiting_real_usage_after_compression = True

This matches your suggested fix exactly. last_prompt_tokens = -1 is also important — the -1 or 0 truthiness bug (#36718 secondary mechanism) means a zero-check alone isn't sufficient, so the flag is the reliable signal.

After compress() runs, last_prompt_tokens is set to -1 and
awaiting_real_usage_after_compression=True. last_real_prompt_tokens still
holds the old pre-compression value (above threshold), so
should_defer_preflight_to_real_usage() incorrectly returned False on the
very next turn — letting the preflight estimate re-trigger a second
compression before the API reported real usage for the shorter conversation.

Fix: add an early return in should_defer_preflight_to_real_usage() that
defers any above-threshold preflight compression while the
awaiting_real_usage_after_compression flag is set. The flag is cleared by
update_from_response() once the first real prompt_tokens arrive from the
provider, restoring normal behaviour.

Closes NousResearch#36718
…s() (NousResearch#36718)

The flag and guard were present but compress() never set awaiting_real_usage_after_compression=True,
so should_defer_preflight_to_real_usage() always returned False in production.
Setting last_prompt_tokens=-1 and the flag before returning ensures the preflight
check defers until update_from_response() receives the real post-compress token
count.
@ashishpatel26 ashishpatel26 force-pushed the fix/compression-double-trigger-36718 branch from dc745d8 to 5e60b4e Compare June 5, 2026 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Context compression triggers repeatedly after fresh compress — last_prompt_tokens=-1 not updated until next API call

3 participants