[Bug]: Infinite Context Compaction Loop (messages=N->N) on low context_length / limit configurations

# 🌀 Bug Audit & Fix: Solving the Infinite Context Compaction Loop (messages=N->N) on Hermes Agent

Hey Hermes Community! 👋 

If you are running your agents with custom context sizes (or low Context Lens constraints) and suddenly noticed that your agent is getting stuck in an infinite loop of compression—printing logs like `messages=16->16` or `messages=20->20` on **every single message turn**, accompanied by API overhead and no token savings—you are likely hitting the **Compaction Loop** bug.

Here is a deep-dive analysis of why this happens, how the code gets stuck in a logic trap, and how you can fix it right now with configuration adjustments.

---

## 🔍 The Symptom

In `gateway.log` or `agent.log`, search for `context compression done`. If you see a sequence like this:

```text
2026-06-05 02:16:46,217 INFO [...] context compression started: messages=16 tokens=~92,997
2026-06-05 02:18:10,357 INFO [...] context compression done: messages=16->16 rough_tokens=~60,092

2026-06-05 02:18:14,905 INFO [...] context compression started: messages=18 tokens=~104,419
2026-06-05 02:19:38,801 INFO [...] context compression done: messages=18->18 rough_tokens=~61,329

2026-06-05 02:19:47,748 INFO [...] context compression started: messages=20 tokens=~106,727
2026-06-05 02:21:11,648 INFO [...] context compression done: messages=20->20 rough_tokens=~62,024
```

Note how the token count stays above the trigger threshold, and no messages are removed (e.g. `16->16`, `18->18`, `20->20`). This causes the compressor to fire on every message exchange, running in circles, wasting API costs, and raising latency dramatically.

---

## 🧮 The Mathematical Trap

Let's look at the math when system parameters are configured to:
- `context_length = 96,000` tokens.
- `threshold_percent = 0.65` (Global threshold to trigger compression).
- `summary_target_ratio = 0.45` (Keep ~45% of context as uncompressed tail).
- `MINIMUM_CONTEXT_LENGTH = 64,000` (Hardcoded floor in `agent/model_metadata.py`).

### 1. The Trigger
The trigger threshold is computed as:
$$threshold\_tokens = \max(96,000 \times 0.65, 64,000) = 64,000 \text{ tokens}$$

So, whenever the active session prompt size exceeds **64,000 tokens**, compression triggers.

### 2. The Tail Allocation
Once compression is triggered, the system tries to determine how much of the "tail" (recent conversation history) to preserve:
$$tail\_token\_budget = 64,000 \times 0.45 = 28,800 \text{ tokens}$$

To avoid slicing halfway through a message or formatting block, the compressor applies a hardcoded `1.5x` safety multiplier to this budget, yielding a `soft_ceiling`:
$$soft\_ceiling = 28,800 \times 1.5 = 43,200 \text{ tokens}$$

### 3. The Collapse
When compression is first triggered, the base overhead (system prompt + 42 tool definitions) consumes approximately **21,360 tokens**.
This means the transcript history has:
$$transcript\_tokens = 64,000 - 21,360 = 42,640 \text{ tokens}$$

The compressor walks backward from the latest messages, accumulating token lengths to find the cut boundary. Since the total transcript size ($42,640$) is **strictly less than the soft ceiling ($43,200$)**, the backward walk never breaks early! It searches all the way back to the head of the transcript.

To prevent destroying the entire conversation, the safety limits in `_find_tail_cut_by_tokens` (and `_ensure_last_user_message_in_tail`) pull the compression boundary up to the latest user message boundary (closest to the head, e.g. index 3 or 4).

As a result, the compression window $[compress\_start : compress\_end]$ vanishes to almost nothing (1 message). The compressor attempts to compress 1 message, saves **0 tokens** due to summary templating overhead, and exits. Since the context size remains above 64,000 tokens, the loop repeats eternally.

---

## 🛠️ The Fix

You can easily break out of this loop without code modifications by adjusting your configuration in one of two ways:

### Solution 1: Decrease `summary_target_ratio` (Highly Recommended)
Decrease the `summary_target_ratio` parameter to the default ratio of `0.20`. 
- **How it works**:
  - `tail_token_budget` drops to $64,000 \times 0.20 = 12,800$ tokens.
  - `soft_ceiling` becomes $12,800 \times 1.5 = 19,200$ tokens.
  - When compression triggers, the transcript ($42,640$) is greater than the soft ceiling ($19,200$). The backward walk breaks at $19,200$, leaving $23,440$ tokens to be compressed. 
  - This successfully shrinks the context back down, breaking the loop!

### Solution 2: Increase Context Lens (`context_length`)
If you require a larger uncompressed tail history, increase the `context_length` parameter to `128,000` tokens or higher.
- **How it works**:
  - `threshold_tokens` increases to $83,200$ tokens.
  - `soft_ceiling` for the tail increases to $56,160$ tokens.
  - When compression triggers at $83,200$ tokens (transcript size $\approx 61,840$), it exceeds the $56,160$ soft ceiling. The walk breaks early, leaving enough oldest space to compress and breaking the loop.

---

### 💻 Code Patch (For Developers)

To prevent this logically, we suggest adding a safety check in `_find_tail_cut_by_tokens` inside `context_compressor.py`. If the total transcript token size is less than `soft_ceiling` and we cannot find a valid compression window that yields meaningful token reduction, the system should raise the threshold or dynamically scale down the tail budget rather than entering a no-op loop.

Let us know if you run into this issue and if scaling down `summary_target_ratio` resolves it on your custom clusters! 🚀


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Infinite Context Compaction Loop (messages=N->N) on low context_length / limit configurations #40803

🌀 Bug Audit & Fix: Solving the Infinite Context Compaction Loop (messages=N->N) on Hermes Agent

🔍 The Symptom

🧮 The Mathematical Trap

1. The Trigger

2. The Tail Allocation

3. The Collapse

🛠️ The Fix

Solution 1: Decrease `summary_target_ratio` (Highly Recommended)

Solution 2: Increase Context Lens (`context_length`)

💻 Code Patch (For Developers)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Infinite Context Compaction Loop (messages=N->N) on low context_length / limit configurations #40803

Description

🌀 Bug Audit & Fix: Solving the Infinite Context Compaction Loop (messages=N->N) on Hermes Agent

🔍 The Symptom

🧮 The Mathematical Trap

1. The Trigger

2. The Tail Allocation

3. The Collapse

🛠️ The Fix

Solution 1: Decrease summary_target_ratio (Highly Recommended)

Solution 2: Increase Context Lens (context_length)

💻 Code Patch (For Developers)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Solution 1: Decrease `summary_target_ratio` (Highly Recommended)

Solution 2: Increase Context Lens (`context_length`)