Skip to content

fix(compression): three bugs causing auto-compression to never trigger#14696

Closed
devilardis wants to merge 1 commit into
NousResearch:mainfrom
devilardis:fix/context-compression-three-bugs
Closed

fix(compression): three bugs causing auto-compression to never trigger#14696
devilardis wants to merge 1 commit into
NousResearch:mainfrom
devilardis:fix/context-compression-three-bugs

Conversation

@devilardis

Copy link
Copy Markdown

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

  • Bug 1: context_length=64000, threshold=0.7threshold_tokens=44800 (70%), should_compress(44800)=True
  • Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
  • Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

  • agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
  • run_agent.py: Bug 3 fix (L7596-7607)

1. MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000
   - When context_length equals MINIMUM_CONTEXT_LENGTH (64000), the floor
     value in threshold_tokens calculation dominates, making the threshold
     equal to 100% of the context window. The API errors out before
     prompt_tokens can reach that value, so compression never fires.
   - Fix: fall back to percentage-based value when floor >= context_length.
   - Closes NousResearch#14690

2. Anti-thrashing protection permanently disables compression with no recovery
   - After 2 consecutive ineffective compressions (<10% savings each),
     should_compress() returns False forever. No timeout, decay, or
     auto-recovery mechanism exists — only /new resets the counter.
   - Fix: add time-based auto-recovery (300s). If enough time has passed
     since the last compression attempt, reset the counter.
   - Closes NousResearch#14694

3. Post-compression token estimate excludes tools schema
   - After compression, last_prompt_tokens is set using
     estimate_messages_tokens_rough() which omits tools schema tokens
     (20-30K with 50+ tools). This causes the next compression cycle
     to trigger much later than the configured threshold.
   - Fix: use estimate_request_tokens_rough() which includes tools schema,
     consistent with the preflight compression check pattern.
   - Closes NousResearch#14695
@devilardis

Copy link
Copy Markdown
Author

Note on Related PRs

This PR provides a comprehensive fix for all three bugs (#14690, #14694, #14695) in a single change. Other contributors have submitted individual PRs (#15431, #15433, #15496) for these issues. This PR offers the advantage of a single atomic fix with complete analysis. Open to feedback if maintainers prefer smaller PRs.

@devilardis

Copy link
Copy Markdown
Author

👋 Hey @NousResearch maintainers!

This is a P1 bug fix PR (#14696) that's been sitting for 6 days (since Apr 23). It fixes three critical bugs in the context auto-compression system that cause compression to never trigger for models with context_length ≥ 64000 tokens.

The bugs are:

  1. MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000
  2. Anti-thrashing protection permanently disables compression with no recovery
  3. Post-compression token estimate excludes tools schema

This is blocking users with large context windows from getting proper compression. Could someone please take a look? 🙏

Thanks!

@devilardis

Copy link
Copy Markdown
Author

@teknium1 @austinpickett @shannonsands 🚨 P1 bug fix - auto-compression never triggers for 64K models. This has been open for 6+ days and affects all users with context_length >= 64000. Please review and merge when possible. Thanks!

@devilardis devilardis closed this Apr 30, 2026
@devilardis devilardis deleted the fix/context-compression-three-bugs branch April 30, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

2 participants