Skip to content

fix(compaction): token-budget primary tail protection#6240

Closed
BongSuCHOI wants to merge 2 commits into
NousResearch:mainfrom
BongSuCHOI:feat/token-budget-tail-protection
Closed

fix(compaction): token-budget primary tail protection#6240
BongSuCHOI wants to merge 2 commits into
NousResearch:mainfrom
BongSuCHOI:feat/token-budget-tail-protection

Conversation

@BongSuCHOI

Copy link
Copy Markdown
Contributor

Problem

Tail protection in context compaction is effectively message-count based despite having a token budget. protect_last_n=20 acts as a hard floor, so a single 50K-token tool output (file read, large API response) causes all 20 recent messages to be preserved — even if they total 200K+ tokens. This leaves almost nothing to summarize, making compaction nearly useless in long-tool-output sessions.

Current behavior (200K context, 50K threshold):

tail_token_budget = 20K tokens  (threshold × summary_target_ratio)
protect_last_n = 20 messages    ← this wins every time

If the last 5 messages total 120K tokens, all 20 are still protected → only head + tiny middle gets summarized.

Solution

Make token budget the primary criterion for tail protection, with a small message-count floor for safety:

  • min_tail reduced from protect_last_n (20) → 3 messages (hard minimum)
  • Budget is allowed to exceed up to 1.5× to avoid cutting mid-oversized-message
  • If even 3 messages exceed 1.5× budget, compression still runs (cut after head)
  • _prune_old_tool_results also respects token budget (new protect_tail_tokens param)
  • Tool group alignment (no splitting tool_call/result pairs) preserved

Changes

Component Before After
Tail min messages 20 (protect_last_n) 3
Tail budget enforcement Hard floor at 20 msgs Soft ceiling at 1.5× budget
Prune boundary protect_last_n × 3 = 60 msgs token budget + min floor
Min messages for compression head + 20 + 1 head + 3 + 1

Example (200K context, 50K threshold, 20K tail budget)

Before: 20 messages protected (could be 200K tokens) → almost nothing to summarize
After: ~20K tokens of recent messages protected (~5 normal msgs, or 1 large tool output + 2 msgs) → much more middle content available for summarization

Backward compatibility

  • _prune_old_tool_results new param is optional (defaults to None → old behavior)
  • protect_last_n still exists as a config param, just no longer the tail floor
  • _find_tail_cut_by_tokens signature unchanged
  • No changes outside context_compressor.py

Tail protection was effectively message-count based despite having a
token budget, because protect_last_n=20 acted as a hard floor.  A single
50K-token tool output would cause all 20 recent messages to be
preserved regardless of budget, leaving little room for summarization.

Changes:
- _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20)
  to 3; token budget is now the primary criterion
- Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message
- _prune_old_tool_results: accepts optional protect_tail_tokens so
  pruning also respects the token budget instead of a fixed count
- compress() minimum message check relaxed from protect_first_n +
  protect_last_n + 1 to protect_first_n + 3 + 1
- Tool group alignment (no splitting tool_call/result) preserved
BongSuCHOI added a commit to BongSuCHOI/hermes-agent that referenced this pull request Apr 8, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
@BongSuCHOI

BongSuCHOI commented Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

Test Fix: context_compressor min_tail=3 ✅ pushed

3 failing tests in test_context_compressor.py fixed and pushed to this branch:

Test Issue Fix
test_summary_role_avoids_consecutive_user_messages 6 msgs = min threshold → returned unchanged 6→8 msgs
test_double_collision_user_head_assistant_tail Tail shift → consecutive assistant 7→8 msgs
test_no_collision_scenarios_still_work Same threshold issue 6→8 msgs, fixed roles

Root cause: PR changed min_tail from protect_last_n to min(3, ...), raising min_for_compress to 6. Tests with exactly 6 msgs hit the early return.

Remaining 10 failures are pre-existing on main (quick_commands print mock, HF model, docker env, vision tools).

PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
@BongSuCHOI

Copy link
Copy Markdown
Contributor Author

CI Failure Analysis

Verified that all 10 test failures are pre-existing on main and unrelated to this PR. The changes here only touch context_compressor.py and its tests — none of the failing test files were modified.

Test failures (all pre-existing):

  • test_quick_commands.py (5) — print mock issue from a recent CLI interface change
  • test_auxiliary_named_custom_providers.py (1) — custom: prefix normalization
  • test_api_key_providers.py (1) — HF model MiniMaxAI/MiniMax-M2.5 not yet added to DEFAULT_CONTEXT_LENGTHS
  • test_docker_environment.py (2) — Docker env var handling in CI
  • test_vision_tools.py (1) — codex auth check

build-and-push failure: pip install hits resolution-too-deep — also fails on the latest main commit. Upstream dependency resolution issue.

9329 tests passed. Safe to merge.

teknium1 pushed a commit that referenced this pull request Apr 9, 2026
PR #6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
teknium1 pushed a commit that referenced this pull request Apr 9, 2026
PR #6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
@teknium1

teknium1 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Merged via PR #6453 — your commits were cherry-picked onto current main with authorship preserved. Added 6 new tests covering the motivating large-tool-output scenario, min tail guarantee, 1.5x soft ceiling, and token-budget prune path. Great fix, @BongSuCHOI — this makes compaction actually effective in tool-heavy sessions!

@teknium1 teknium1 closed this Apr 9, 2026
@BongSuCHOI

Copy link
Copy Markdown
Contributor Author

@teknium1
Thank you!! And if it’s not too much trouble, could you also check version #6239? It seems the usability has improved significantly.

saxster pushed a commit to saxster/hermes-agent that referenced this pull request Apr 9, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
PR NousResearch#6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants