fix(compaction): token-budget primary tail protection#6453
Merged
Conversation
Tail protection was effectively message-count based despite having a token budget, because protect_last_n=20 acted as a hard floor. A single 50K-token tool output would cause all 20 recent messages to be preserved regardless of budget, leaving little room for summarization. Changes: - _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20) to 3; token budget is now the primary criterion - Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message - _prune_old_tool_results: accepts optional protect_tail_tokens so pruning also respects the token budget instead of a fixed count - compress() minimum message check relaxed from protect_first_n + protect_last_n + 1 to protect_first_n + 3 + 1 - Tool group alignment (no splitting tool_call/result) preserved
PR #6240 changed tail protection from protect_last_n to min(3, ...) which increased the minimum compressible message count and shifted tail boundaries. Three tests broke: - test_summary_role_avoids_consecutive_user_messages: 6→8 msgs - test_double_collision_user_head_assistant_tail: 7→8 msgs - test_no_collision_scenarios_still_work: 6→8 msgs All tests now exceed the new min_for_compress threshold (6) and maintain proper role alternation in both head and tail sections.
Tests for the new behavior paths: - Large tool outputs no longer block compaction (motivating scenario) - Hard minimum of 3 tail messages always protected - 1.5x soft ceiling for oversized messages - Small conversations still compress (min 8 messages) - Token-budget prune path in _prune_old_tool_results - Fallback to message-count when no token budget
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvage of PR #6240 by @BongSuCHOI (cherry-picked onto current main) with added test coverage.
Problem
Tail protection during compaction uses
protect_last_n=20(a hard message count). If those 20 messages include large tool outputs (50K+ chars each), the protected tail can total 200K+ tokens — leaving almost nothing for the compressor to summarize. Compaction fires frequently (especially at the 50% threshold) but accomplishes nothing.Fix
Make token budget the primary criterion for tail protection:
protect_last_n)threshold × summary_target_ratioExample (200K context, 50% threshold, 20K tail budget)
Before: 20 messages protected (could be 200K tokens) → almost nothing to summarize
After: ~20K tokens of tail protected (~5 normal msgs, or 1 large tool output + 2 msgs) → 80K+ of middle content available for summarization
Changes
_find_tail_cut_by_tokens, token-budget prune in_prune_old_tool_results, lower compression guard (7 msgs)Cache impact
Zero. Changes only affect which messages survive compression and when compression triggers — the compression event itself is still a single cache break, same as before.
Test plan
py_compilecleanFixes the "compaction fires but accomplishes nothing" pattern reported alongside #6369.