fix(compaction): token-budget primary tail protection#6240
Conversation
Tail protection was effectively message-count based despite having a token budget, because protect_last_n=20 acted as a hard floor. A single 50K-token tool output would cause all 20 recent messages to be preserved regardless of budget, leaving little room for summarization. Changes: - _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20) to 3; token budget is now the primary criterion - Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message - _prune_old_tool_results: accepts optional protect_tail_tokens so pruning also respects the token budget instead of a fixed count - compress() minimum message check relaxed from protect_first_n + protect_last_n + 1 to protect_first_n + 3 + 1 - Tool group alignment (no splitting tool_call/result) preserved
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
Test Fix: context_compressor min_tail=3 ✅ pushed3 failing tests in
Root cause: PR changed min_tail from protect_last_n to min(3, ...), raising min_for_compress to 6. Tests with exactly 6 msgs hit the early return. Remaining 10 failures are pre-existing on main (quick_commands print mock, HF model, docker env, vision tools). |
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
CI Failure AnalysisVerified that all 10 test failures are pre-existing on main and unrelated to this PR. The changes here only touch context_compressor.py and its tests — none of the failing test files were modified. Test failures (all pre-existing):
build-and-push failure: pip install hits resolution-too-deep — also fails on the latest main commit. Upstream dependency resolution issue. 9329 tests passed. Safe to merge. |
PR #6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR #6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
|
Merged via PR #6453 — your commits were cherry-picked onto current main with authorship preserved. Added 6 new tests covering the motivating large-tool-output scenario, min tail guarantee, 1.5x soft ceiling, and token-budget prune path. Great fix, @BongSuCHOI — this makes compaction actually effective in tool-heavy sessions! |
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
Problem
Tail protection in context compaction is effectively message-count based despite having a token budget.
protect_last_n=20acts as a hard floor, so a single 50K-token tool output (file read, large API response) causes all 20 recent messages to be preserved — even if they total 200K+ tokens. This leaves almost nothing to summarize, making compaction nearly useless in long-tool-output sessions.Current behavior (200K context, 50K threshold):
If the last 5 messages total 120K tokens, all 20 are still protected → only head + tiny middle gets summarized.
Solution
Make token budget the primary criterion for tail protection, with a small message-count floor for safety:
min_tailreduced fromprotect_last_n(20) → 3 messages (hard minimum)_prune_old_tool_resultsalso respects token budget (newprotect_tail_tokensparam)Changes
Example (200K context, 50K threshold, 20K tail budget)
Before: 20 messages protected (could be 200K tokens) → almost nothing to summarize
After: ~20K tokens of recent messages protected (~5 normal msgs, or 1 large tool output + 2 msgs) → much more middle content available for summarization
Backward compatibility
_prune_old_tool_resultsnew param is optional (defaults to None → old behavior)protect_last_nstill exists as a config param, just no longer the tail floor_find_tail_cut_by_tokenssignature unchangedcontext_compressor.py