Skip to content

BUG: Anti-thrashing protection permanently disables auto-compression with no recovery #14694

@devilardis

Description

@devilardis

Bug Description

When the anti-thrashing protection triggers (2 consecutive compressions saving <10% each), should_compress() permanently returns False for the rest of the session. There is no timeout, decay, or auto-recovery mechanism — the only way to restore auto-compression is to manually run /new or /reset.

This means a session that had two ineffective compressions early on will never auto-compress again, even as the context grows far beyond the configured threshold, eventually hitting the model's context limit and getting forcefully degraded.

Root Cause

In agent/context_compressor.py, the anti-thrashing counter only resets on effective compression or session reset:

# should_compress() — lines 417-427
if self._ineffective_compression_count >= 2:
    # ... warning log ...
    return False  # Permanently blocked, no recovery path

# compress() — lines 1267-1273
if savings_pct < 10:
    self._ineffective_compression_count += 1  # Only increases
else:
    self._ineffective_compression_count = 0   # Only resets on effective compression

# on_session_reset() — line 298
self._ineffective_compression_count = 0  # Only resets on /new

The gap: If two consecutive compressions are ineffective (e.g., the middle region has few messages because most are in the protected head/tail), the counter hits 2 and never decreases. Subsequent context growth is completely ignored.

Trigger Scenario

  1. User has a conversation where the middle region is small (most messages are in the protected head=3 + tail=20)
  2. First compression saves only 8% → counter = 1
  3. Second compression saves only 5% → counter = 2
  4. Anti-thrashing kicks in, should_compress() returns False forever
  5. User continues the conversation, context grows to 90%+ of limit
  6. No auto-compression fires → context hits limit → forced degradation

Fix

Add a time-based auto-recovery: if enough time has passed since the last compression attempt, reset the counter. This preserves the anti-thrashing protection (preventing rapid-fire ineffective compressions) while allowing recovery when the conversation has grown significantly.

# In __init__:
self._last_compression_time: float = 0.0
self._ANTI_THRASH_RECOVERY_SECONDS: float = 300.0  # 5 minutes

# In should_compress():
if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
        logger.info("Anti-thrashing reset: %.0fs since last compression attempt", _elapsed)
    else:
        return False

# In compress() — after updating savings:
self._last_compression_time = time.monotonic()

The 300-second (5-minute) recovery window is conservative enough to prevent thrashing while ensuring that a session isn't permanently locked out of compression.

Environment

  • Hermes Agent version: latest main (ce08916)
  • OS: Linux (ROCm)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions