Skip to content

fix(compaction): token-budget primary tail protection#6453

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-b0a4b31e
Apr 9, 2026
Merged

fix(compaction): token-budget primary tail protection#6453
teknium1 merged 3 commits into
mainfrom
hermes/hermes-b0a4b31e

Conversation

@teknium1

@teknium1 teknium1 commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Salvage of PR #6240 by @BongSuCHOI (cherry-picked onto current main) with added test coverage.

Problem

Tail protection during compaction uses protect_last_n=20 (a hard message count). If those 20 messages include large tool outputs (50K+ chars each), the protected tail can total 200K+ tokens — leaving almost nothing for the compressor to summarize. Compaction fires frequently (especially at the 50% threshold) but accomplishes nothing.

Fix

Make token budget the primary criterion for tail protection:

Before After
Tail min messages 20 (protect_last_n) 3 (hard minimum)
Tail budget Ignored Primary — derived from threshold × summary_target_ratio
Soft ceiling N/A 1.5× budget (avoids splitting oversized messages)
Min messages for compression head + 20 + 1 = 24 head + 3 + 1 = 7

Example (200K context, 50% threshold, 20K tail budget)

Before: 20 messages protected (could be 200K tokens) → almost nothing to summarize
After: ~20K tokens of tail protected (~5 normal msgs, or 1 large tool output + 2 msgs) → 80K+ of middle content available for summarization

Changes

  • agent/context_compressor.py — token-budget tail in _find_tail_cut_by_tokens, token-budget prune in _prune_old_tool_results, lower compression guard (7 msgs)
  • tests/agent/test_context_compressor.py — 6 adapted existing tests + 6 new tests covering: large tool outputs no longer block compaction, min 3 tail guarantee, 1.5× soft ceiling, small conversation compression, token-budget prune path, message-count fallback

Cache impact

Zero. Changes only affect which messages survive compression and when compression triggers — the compression event itself is still a single cache break, same as before.

Test plan

  • 42 compressor tests pass (36 existing + 6 new)
  • py_compile clean

Fixes the "compaction fires but accomplishes nothing" pattern reported alongside #6369.

BongSuCHOI and others added 3 commits April 8, 2026 23:33
Tail protection was effectively message-count based despite having a
token budget, because protect_last_n=20 acted as a hard floor.  A single
50K-token tool output would cause all 20 recent messages to be
preserved regardless of budget, leaving little room for summarization.

Changes:
- _find_tail_cut_by_tokens: min_tail reduced from protect_last_n (20)
  to 3; token budget is now the primary criterion
- Soft ceiling at 1.5x budget to avoid cutting mid-oversized-message
- _prune_old_tool_results: accepts optional protect_tail_tokens so
  pruning also respects the token budget instead of a fixed count
- compress() minimum message check relaxed from protect_first_n +
  protect_last_n + 1 to protect_first_n + 3 + 1
- Tool group alignment (no splitting tool_call/result) preserved
PR #6240 changed tail protection from protect_last_n to min(3, ...)
which increased the minimum compressible message count and shifted
tail boundaries. Three tests broke:

- test_summary_role_avoids_consecutive_user_messages: 6→8 msgs
- test_double_collision_user_head_assistant_tail: 7→8 msgs
- test_no_collision_scenarios_still_work: 6→8 msgs

All tests now exceed the new min_for_compress threshold (6) and
maintain proper role alternation in both head and tail sections.
Tests for the new behavior paths:
- Large tool outputs no longer block compaction (motivating scenario)
- Hard minimum of 3 tail messages always protected
- 1.5x soft ceiling for oversized messages
- Small conversations still compress (min 8 messages)
- Token-budget prune path in _prune_old_tool_results
- Fallback to message-count when no token budget
@teknium1 teknium1 merged commit d40264d into main Apr 9, 2026
2 of 4 checks passed
@SHL0MS SHL0MS mentioned this pull request Apr 11, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants