Skip to content

[Bug]: Infinite Context Compaction Loop (messages=N->N) on low context_length / limit configurations #40803

@Ardem2025

Description

@Ardem2025

🌀 Bug Audit & Fix: Solving the Infinite Context Compaction Loop (messages=N->N) on Hermes Agent

Hey Hermes Community! 👋

If you are running your agents with custom context sizes (or low Context Lens constraints) and suddenly noticed that your agent is getting stuck in an infinite loop of compression—printing logs like messages=16->16 or messages=20->20 on every single message turn, accompanied by API overhead and no token savings—you are likely hitting the Compaction Loop bug.

Here is a deep-dive analysis of why this happens, how the code gets stuck in a logic trap, and how you can fix it right now with configuration adjustments.


🔍 The Symptom

In gateway.log or agent.log, search for context compression done. If you see a sequence like this:

2026-06-05 02:16:46,217 INFO [...] context compression started: messages=16 tokens=~92,997
2026-06-05 02:18:10,357 INFO [...] context compression done: messages=16->16 rough_tokens=~60,092

2026-06-05 02:18:14,905 INFO [...] context compression started: messages=18 tokens=~104,419
2026-06-05 02:19:38,801 INFO [...] context compression done: messages=18->18 rough_tokens=~61,329

2026-06-05 02:19:47,748 INFO [...] context compression started: messages=20 tokens=~106,727
2026-06-05 02:21:11,648 INFO [...] context compression done: messages=20->20 rough_tokens=~62,024

Note how the token count stays above the trigger threshold, and no messages are removed (e.g. 16->16, 18->18, 20->20). This causes the compressor to fire on every message exchange, running in circles, wasting API costs, and raising latency dramatically.


🧮 The Mathematical Trap

Let's look at the math when system parameters are configured to:

  • context_length = 96,000 tokens.
  • threshold_percent = 0.65 (Global threshold to trigger compression).
  • summary_target_ratio = 0.45 (Keep ~45% of context as uncompressed tail).
  • MINIMUM_CONTEXT_LENGTH = 64,000 (Hardcoded floor in agent/model_metadata.py).

1. The Trigger

The trigger threshold is computed as:
$$threshold_tokens = \max(96,000 \times 0.65, 64,000) = 64,000 \text{ tokens}$$

So, whenever the active session prompt size exceeds 64,000 tokens, compression triggers.

2. The Tail Allocation

Once compression is triggered, the system tries to determine how much of the "tail" (recent conversation history) to preserve:
$$tail_token_budget = 64,000 \times 0.45 = 28,800 \text{ tokens}$$

To avoid slicing halfway through a message or formatting block, the compressor applies a hardcoded 1.5x safety multiplier to this budget, yielding a soft_ceiling:
$$soft_ceiling = 28,800 \times 1.5 = 43,200 \text{ tokens}$$

3. The Collapse

When compression is first triggered, the base overhead (system prompt + 42 tool definitions) consumes approximately 21,360 tokens.
This means the transcript history has:
$$transcript_tokens = 64,000 - 21,360 = 42,640 \text{ tokens}$$

The compressor walks backward from the latest messages, accumulating token lengths to find the cut boundary. Since the total transcript size ($42,640$) is strictly less than the soft ceiling ($43,200$), the backward walk never breaks early! It searches all the way back to the head of the transcript.

To prevent destroying the entire conversation, the safety limits in _find_tail_cut_by_tokens (and _ensure_last_user_message_in_tail) pull the compression boundary up to the latest user message boundary (closest to the head, e.g. index 3 or 4).

As a result, the compression window $[compress_start : compress_end]$ vanishes to almost nothing (1 message). The compressor attempts to compress 1 message, saves 0 tokens due to summary templating overhead, and exits. Since the context size remains above 64,000 tokens, the loop repeats eternally.


🛠️ The Fix

You can easily break out of this loop without code modifications by adjusting your configuration in one of two ways:

Solution 1: Decrease summary_target_ratio (Highly Recommended)

Decrease the summary_target_ratio parameter to the default ratio of 0.20.

  • How it works:
    • tail_token_budget drops to $64,000 \times 0.20 = 12,800$ tokens.
    • soft_ceiling becomes $12,800 \times 1.5 = 19,200$ tokens.
    • When compression triggers, the transcript ($42,640$) is greater than the soft ceiling ($19,200$). The backward walk breaks at $19,200$, leaving $23,440$ tokens to be compressed.
    • This successfully shrinks the context back down, breaking the loop!

Solution 2: Increase Context Lens (context_length)

If you require a larger uncompressed tail history, increase the context_length parameter to 128,000 tokens or higher.

  • How it works:
    • threshold_tokens increases to $83,200$ tokens.
    • soft_ceiling for the tail increases to $56,160$ tokens.
    • When compression triggers at $83,200$ tokens (transcript size $\approx 61,840$), it exceeds the $56,160$ soft ceiling. The walk breaks early, leaving enough oldest space to compress and breaking the loop.

💻 Code Patch (For Developers)

To prevent this logically, we suggest adding a safety check in _find_tail_cut_by_tokens inside context_compressor.py. If the total transcript token size is less than soft_ceiling and we cannot find a valid compression window that yields meaningful token reduction, the system should raise the threshold or dynamically scale down the tail budget rather than entering a no-op loop.

Let us know if you run into this issue and if scaling down summary_target_ratio resolves it on your custom clusters! 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions