fix(compression): three bugs causing auto-compression to never trigger by devilardis · Pull Request #14696 · NousResearch/hermes-agent

devilardis · 2026-04-23T19:07:34Z

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

Bug 1: context_length=64000, threshold=0.7 → threshold_tokens=44800 (70%), should_compress(44800)=True
Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
run_agent.py: Bug 3 fix (L7596-7607)

1. MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000 - When context_length equals MINIMUM_CONTEXT_LENGTH (64000), the floor value in threshold_tokens calculation dominates, making the threshold equal to 100% of the context window. The API errors out before prompt_tokens can reach that value, so compression never fires. - Fix: fall back to percentage-based value when floor >= context_length. - Closes NousResearch#14690 2. Anti-thrashing protection permanently disables compression with no recovery - After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists — only /new resets the counter. - Fix: add time-based auto-recovery (300s). If enough time has passed since the last compression attempt, reset the counter. - Closes NousResearch#14694 3. Post-compression token estimate excludes tools schema - After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold. - Fix: use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern. - Closes NousResearch#14695

devilardis · 2026-04-25T09:01:13Z

Note on Related PRs

This PR provides a comprehensive fix for all three bugs (#14690, #14694, #14695) in a single change. Other contributors have submitted individual PRs (#15431, #15433, #15496) for these issues. This PR offers the advantage of a single atomic fix with complete analysis. Open to feedback if maintainers prefer smaller PRs.

devilardis · 2026-04-29T10:03:40Z

👋 Hey @NousResearch maintainers!

This is a P1 bug fix PR (#14696) that's been sitting for 6 days (since Apr 23). It fixes three critical bugs in the context auto-compression system that cause compression to never trigger for models with context_length ≥ 64000 tokens.

The bugs are:

MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000
Anti-thrashing protection permanently disables compression with no recovery
Post-compression token estimate excludes tools schema

This is blocking users with large context windows from getting proper compression. Could someone please take a look? 🙏

Thanks!

devilardis · 2026-04-29T10:43:21Z

@teknium1 @austinpickett @shannonsands 🚨 P1 bug fix - auto-compression never triggers for 64K models. This has been open for 6+ days and affects all users with context_length >= 64000. Please review and merge when possible. Thanks!

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026

devilardis closed this Apr 30, 2026

devilardis deleted the fix/context-compression-three-bugs branch April 30, 2026 15:55

devilardis mentioned this pull request May 1, 2026

fix(compression): three bugs causing auto-compression to never trigger #18514

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compression): three bugs causing auto-compression to never trigger#14696

fix(compression): three bugs causing auto-compression to never trigger#14696
devilardis wants to merge 1 commit into
NousResearch:mainfrom
devilardis:fix/context-compression-three-bugs

devilardis commented Apr 23, 2026

Uh oh!

devilardis commented Apr 25, 2026

Uh oh!

devilardis commented Apr 29, 2026

Uh oh!

devilardis commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

devilardis commented Apr 23, 2026

Summary

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Bug 3: Post-compression token estimate excludes tools schema

Testing

Files Changed

Uh oh!

devilardis commented Apr 25, 2026

Note on Related PRs

Uh oh!

devilardis commented Apr 29, 2026

Uh oh!

devilardis commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants