fix(agent): honor model.context_length override below 64K floor#8962
Open
ismell0992-afk wants to merge 1 commit into
Open
fix(agent): honor model.context_length override below 64K floor#8962ismell0992-afk wants to merge 1 commit into
ismell0992-afk wants to merge 1 commit into
Conversation
Two call sites hard-coded MINIMUM_CONTEXT_LENGTH (64K) as an immovable floor, silently defeating the ``model.context_length`` config override that their own error message tells users to reach for: "Choose a model with at least 64K context, or set model.context_length in config.yaml to override." 1. ``AIAgent.__init__`` rejected any compressor whose context was below the floor, regardless of whether the user had set an override. The override is the only way to run a sub-64K model, so the reject was unreachable for its intended case. Extracted the check into a new ``_check_minimum_context_length`` helper that skips when ``self._config_context_length`` is not None. 2. ``ContextCompressor.__init__`` floored ``threshold_tokens`` at 64K even when total context was below 64K. On an opt-in 32K model this pushed the compression threshold to 32K — equal to total context — so compression would literally never fire. For models whose total context is below the floor, use the raw percentage instead. Motivating case: ``hermes-brain:qwen3-14b-ctx32k``, a Modelfile wrapping qwen3:14b with ``num_ctx 32768`` for a ~15K-token baseline system prompt plus real conversation history. Without both fixes, startup fails at (1) and — once (1) is fixed — compression never fires at (2). Adds parametrized tests for both paths: - _check_minimum_context_length: rejects low context, accepts override, accepts above-floor, no-ops on 0. - ContextCompressor: opt-in 32K model gets 16384-token threshold, not 32768. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 25, 2026
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Two call sites hard-coded
MINIMUM_CONTEXT_LENGTH(64K) as an immovable floor, silently defeating themodel.context_lengthconfig override that their own error message tells users to reach for:AIAgent.__init__rejected any compressor whose context was below the floor, regardless of whether the user had set an override. The override is the only way to run a sub-64K model, so the reject was unreachable for its intended case.ContextCompressor.__init__flooredthreshold_tokensat 64K even when total context was below 64K. On an opt-in 32K model this pushed the compression threshold to 32K — equal to total context — so compression would literally never fire.Motivating case:
hermes-brain:qwen3-14b-ctx32k, a Modelfile wrappingqwen3:14bwithnum_ctx 32768for a ~15K-token baseline system prompt plus real conversation history. Without both fixes, startup fails at (1) and — once (1) is bypassed — compression never fires at (2).Type of Change
Changes Made
run_agent.py— extract the inline 64K reject block fromAIAgent.__init__into_check_minimum_context_length(), which skips whenself._config_context_length is not None(the user opt-in the error message promises).agent/context_compressor.py— whenself.context_length < MINIMUM_CONTEXT_LENGTH, use the raw percentage asthreshold_tokensinstead of clamping to the 64K floor. Above the floor, behavior is unchanged.tests/run_agent/test_switch_model_context.py— 4 new_check_minimum_context_length_*cases (rejects without override, accepts with override, accepts above floor, no-ops on 0).tests/agent/test_context_compressor.py—test_threshold_floor_skipped_for_opt_in_tiny_modelsproves an opt-in 32K model withthreshold_percent=0.50getsthreshold_tokens=16384(not 32768).How to Test
pytest tests/agent/test_context_compressor.py tests/run_agent/test_switch_model_context.py -q— passes (47 cases total).config.yamlviamodel.context_length: 32768,hermes chat -q "hi"no longer raises"is below the minimum 64,000 required".Checklist