fix(agent): prevent double-compression on turn immediately after compress run#38133
fix(agent): prevent double-compression on turn immediately after compress run#38133ashishpatel26 wants to merge 2 commits into
Conversation
|
I verified the guard logic and the regression scenario — the test is correct that deferring when On current
The The tests pass because they manually set Suggested fix: Add this to the # Park last_prompt_tokens at -1 so the preflight check in
# should_defer_preflight_to_real_usage() knows real usage
# hasn't arrived yet.
self.last_prompt_tokens = -1
self.awaiting_real_usage_after_compression = TrueWithout this, the PR fixes the symptom in tests but not in production. |
|
Great catch — you're absolutely right. The flag and the guard were both present, but Fixed in the latest push: at the end of self.last_prompt_tokens = -1
self.awaiting_real_usage_after_compression = TrueThis matches your suggested fix exactly. |
After compress() runs, last_prompt_tokens is set to -1 and awaiting_real_usage_after_compression=True. last_real_prompt_tokens still holds the old pre-compression value (above threshold), so should_defer_preflight_to_real_usage() incorrectly returned False on the very next turn — letting the preflight estimate re-trigger a second compression before the API reported real usage for the shorter conversation. Fix: add an early return in should_defer_preflight_to_real_usage() that defers any above-threshold preflight compression while the awaiting_real_usage_after_compression flag is set. The flag is cleared by update_from_response() once the first real prompt_tokens arrive from the provider, restoring normal behaviour. Closes NousResearch#36718
…s() (NousResearch#36718) The flag and guard were present but compress() never set awaiting_real_usage_after_compression=True, so should_defer_preflight_to_real_usage() always returned False in production. Setting last_prompt_tokens=-1 and the flag before returning ensures the preflight check defers until update_from_response() receives the real post-compress token count.
dc745d8 to
5e60b4e
Compare
Problem
Closes #36718.
After
compress()runs, the scheduler sets:last_prompt_tokens = -1(sentinel)awaiting_real_usage_after_compression = TrueBut
last_real_prompt_tokensstill holds the old pre-compression value (above the threshold). On the very next preflight check,should_defer_preflight_to_real_usage()hit this branch:…returned
False, allowingshould_compress(preflight_tokens)to fire — triggering a second compression before the provider ever reported real token usage for the now-shorter conversation.Root cause
awaiting_real_usage_after_compressionexists precisely to guard this window, butshould_defer_preflight_to_real_usage()never consulted it. The stalelast_real_prompt_tokensvalue short-circuited deferral.Fix
Add an early-return in
should_defer_preflight_to_real_usage()(agent/context_compressor.py):update_from_response()already clearsawaiting_real_usage_after_compressiononce realprompt_tokensarrive from the provider, so the guard is active for exactly one turn.Test
Added two cases to
TestPreflightDeferralintests/agent/test_context_compressor.py:test_defers_immediately_after_compression_before_real_usage_arrives— verifiesshould_defer_preflight_to_real_usagereturnsTruewhen the flag is set and rough tokens exceed threshold (the previously broken case).test_no_longer_defers_after_real_usage_clears_flag— verifies normal baseline/growth deferral resumes once the flag is cleared.Test plan
TestPreflightDeferraltests still passTestUpdateFromResponsetests still pass (flag-clearing path unchanged)