Skip to content

fix(agent): honor model.context_length override below 64K floor#8962

Open
ismell0992-afk wants to merge 1 commit into
NousResearch:mainfrom
ismell0992-afk:pr/agent-honor-ctx-override
Open

fix(agent): honor model.context_length override below 64K floor#8962
ismell0992-afk wants to merge 1 commit into
NousResearch:mainfrom
ismell0992-afk:pr/agent-honor-ctx-override

Conversation

@ismell0992-afk

Copy link
Copy Markdown
Contributor

What does this PR do?

Two call sites hard-coded MINIMUM_CONTEXT_LENGTH (64K) as an immovable floor, silently defeating the model.context_length config override that their own error message tells users to reach for:

"Choose a model with at least 64K context, or set model.context_length in config.yaml to override."

  1. AIAgent.__init__ rejected any compressor whose context was below the floor, regardless of whether the user had set an override. The override is the only way to run a sub-64K model, so the reject was unreachable for its intended case.
  2. ContextCompressor.__init__ floored threshold_tokens at 64K even when total context was below 64K. On an opt-in 32K model this pushed the compression threshold to 32K — equal to total context — so compression would literally never fire.

Motivating case: hermes-brain:qwen3-14b-ctx32k, a Modelfile wrapping qwen3:14b with num_ctx 32768 for a ~15K-token baseline system prompt plus real conversation history. Without both fixes, startup fails at (1) and — once (1) is bypassed — compression never fires at (2).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • run_agent.py — extract the inline 64K reject block from AIAgent.__init__ into _check_minimum_context_length(), which skips when self._config_context_length is not None (the user opt-in the error message promises).
  • agent/context_compressor.py — when self.context_length < MINIMUM_CONTEXT_LENGTH, use the raw percentage as threshold_tokens instead of clamping to the 64K floor. Above the floor, behavior is unchanged.
  • tests/run_agent/test_switch_model_context.py — 4 new _check_minimum_context_length_* cases (rejects without override, accepts with override, accepts above floor, no-ops on 0).
  • tests/agent/test_context_compressor.pytest_threshold_floor_skipped_for_opt_in_tiny_models proves an opt-in 32K model with threshold_percent=0.50 gets threshold_tokens=16384 (not 32768).

How to Test

  1. pytest tests/agent/test_context_compressor.py tests/run_agent/test_switch_model_context.py -q — passes (47 cases total).
  2. With a sub-64K local model declared in config.yaml via model.context_length: 32768, hermes chat -q "hi" no longer raises "is below the minimum 64,000 required".
  3. Watch the same session: compression fires when the conversation crosses ~50% of total context, not at 100%.

Checklist

  • Conventional Commits
  • Tests added
  • PR contains only changes related to this fix
  • Tested on Linux (Ubuntu)

Two call sites hard-coded MINIMUM_CONTEXT_LENGTH (64K) as an immovable
floor, silently defeating the ``model.context_length`` config override
that their own error message tells users to reach for:

  "Choose a model with at least 64K context, or set model.context_length
   in config.yaml to override."

1. ``AIAgent.__init__`` rejected any compressor whose context was below
   the floor, regardless of whether the user had set an override. The
   override is the only way to run a sub-64K model, so the reject was
   unreachable for its intended case. Extracted the check into a new
   ``_check_minimum_context_length`` helper that skips when
   ``self._config_context_length`` is not None.

2. ``ContextCompressor.__init__`` floored ``threshold_tokens`` at 64K
   even when total context was below 64K. On an opt-in 32K model this
   pushed the compression threshold to 32K — equal to total context —
   so compression would literally never fire. For models whose total
   context is below the floor, use the raw percentage instead.

Motivating case: ``hermes-brain:qwen3-14b-ctx32k``, a Modelfile wrapping
qwen3:14b with ``num_ctx 32768`` for a ~15K-token baseline system prompt
plus real conversation history. Without both fixes, startup fails at
(1) and — once (1) is fixed — compression never fires at (2).

Adds parametrized tests for both paths:
- _check_minimum_context_length: rejects low context, accepts override,
  accepts above-floor, no-ops on 0.
- ContextCompressor: opt-in 32K model gets 16384-token threshold, not
  32768.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #9142 — same root cause: model.context_length config override doesn't bypass 64K MINIMUM_CONTEXT_LENGTH floor in AIAgent.init and ContextCompressor. Also duplicated by #11097.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants