Skip to content

BUG: Context auto-compression never triggers when context_length == MINIMUM_CONTEXT_LENGTH (64000) #14690

@devilardis

Description

@devilardis

Bug Description

Context auto-compression never triggers when the model's context_length equals MINIMUM_CONTEXT_LENGTH (64000 tokens). This causes conversations to grow until they hit the model's context limit and get forcefully degraded, instead of being automatically compressed at the configured threshold.

Root Cause

In agent/context_compressor.py, the threshold_tokens calculation uses MINIMUM_CONTEXT_LENGTH (64000) as an absolute floor:

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

When context_length == 64000 (e.g., a local model with 192K context split across 3 parallel slots = 64K per slot), the floor value dominates:

  • max(44800, 64000) = 64000 → threshold = 100% of context window
  • should_compress() checks prompt_tokens >= threshold_tokens, but the API errors out before prompt_tokens can reach 64000
  • Compression never fires, regardless of the configured threshold percentage

This affects any configuration where context_length <= MINIMUM_CONTEXT_LENGTH / threshold_percent:

  • context_length=64000, threshold=0.7 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.5 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.85 → threshold_tokens = 64000 (100%) ❌
  • context_length=80000, threshold=0.7 → threshold_tokens = 64000 (80%) — works but threshold is higher than configured
  • context_length=128000, threshold=0.7 → threshold_tokens = 89600 (70%) ✅ correct

Reproduction

  1. Configure a local model with context_length: 64000 in config.yaml
  2. Set compression.threshold: 0.7
  3. Start a long conversation and observe that context grows past 70% without triggering compression
  4. Conversation eventually hits the context limit and gets forcefully degraded

Config:

model:
  context_length: 64000
compression:
  enabled: true
  threshold: 0.7

Fix

Add a safety check after the max() calculation — if the floor value pushes the threshold to 100% or beyond, fall back to the percentage-based value:

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

This preserves the original intent of the floor (preventing premature compression on large-context models) while ensuring compression can actually trigger when context_length is at or near the minimum.

Related Design Issues (not bugs, but worth noting)

  1. Anti-thrashing has no auto-recovery: _ineffective_compression_count >= 2 causes should_compress() to permanently return False until /new resets the session. No decay or timeout mechanism exists.

  2. Post-compression token estimate excludes tools schema: After compression, last_prompt_tokens is set to a rough estimate (len(str)//4) that does not include tools schema tokens (potentially 20-30K), causing the next compression cycle to trigger later than configured.

Environment

  • Hermes Agent version: latest main (ce08916)
  • Model: Qwen3.6-35B-A3B (local llama.cpp, 192K context / 3 parallel slots = 64K per slot)
  • OS: Linux (ROCm)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions