fix(agent): preserve MiniMax context length on delta-only overflow errors#9170
fix(agent): preserve MiniMax context length on delta-only overflow errors#9170maelrx wants to merge 1 commit into
Conversation
|
[gus-first-pass] This PR tackles a specific context overflow issue for the MiniMax provider, maintaining potential model performance levels. Here are points for your consideration: \n\n1. Provider-Specific Handling: Ensure the safeguards introduced for MiniMax do not inadvertently affect other providers during similar overflow scenarios. Conduct thorough regression testing across various contexts. \n\n2. Documentation: Clear documentation around the parsing logic variations between providers will enhance maintainability and future updates. Providing context here is essential for understanding different operational environments. \n\nGreat job addressing a critical area.** Overall, I endorse these changes, but clarity and safeguarding are key.** |
|
Thanks — agreed on both points. This change is intentionally scoped to MiniMax only:
I also added regression coverage for both sides:
On documentation/maintainability: agreed as well. I opened a follow-up architecture issue to track the broader cleanup around provider-specific overflow semantics and base-vs-effective context handling: #9181. |
For release-notes attribution of PR #9170 (MiniMax context preservation).
|
Merged via #14743 — your commit was cherry-picked onto current main with authorship preserved via rebase-merge (branch was 1,508 commits behind). Thanks for the fix — it resolves the MiniMax M2.7 context-halving bug reported in Discord. Also added you to scripts/release.py AUTHOR_MAP for release-notes attribution. |
|
Superseded by #14743 which salvaged this fix onto current main. |
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
For release-notes attribution of PR NousResearch#9170 (MiniMax context preservation).
Summary
This fixes a MiniMax-specific context overflow recovery bug in
AIAgent.When MiniMax's Anthropic-compatible endpoint returns an error like:
context window exceeds limit (2013)The number in parentheses is only the overflow delta, not the actual context window. Hermes already knows MiniMax's real context length, but the recovery path treated this error as "no limit available" and incorrectly probed down to the next generic tier.
This PR keeps the known MiniMax context length intact for that provider-specific error format and compresses the conversation without demoting the model to a smaller inferred window.
Problem
Hermes handles context overflow in
run_agent.pyby:get_next_probe_tier(old_ctx)That fallback is correct for genuinely unknown providers or endpoints, but it is wrong for MiniMax's delta-only overflow message.
In practice, this caused:
204800128000Because Hermes's in-loop
ContextCompressordefaults to a 50% threshold, this effectively moved the compression trigger from102400tokens to64000tokens for the affected path.Root Cause
The root cause was not the generic parser.
parse_context_limit_from_error()correctly returnsNonefor:context window exceeds limit (2013)because the message does not contain the actual context window.
The real bug was in the recovery logic:
AIAgenttreated "no parsed limit" as a reason to probe down even when the provider was MiniMax and the model context was already known.What Changed
1. MiniMax-specific recovery guard in
run_agent.pyFor MiniMax and MiniMax China provider resolution, and for MiniMax Anthropic-compatible base URLs, Hermes now detects the delta-only overflow format and:
context_lengthunchangedget_next_probe_tier()2. Parser behavior remains generic
I did not change the generic parser semantics for small valid context windows.
This is important because Hermes already supports valid provider error formats that may report context limits such as:
327684096Those should continue to work as before.
3. Regression coverage added
Added tests to cover:
Nonefrom the parser204800and compresses instead of probing downWhy This Approach
This repo's architecture separates:
Given that provider runtime selection is shared across CLI, gateway, cron, ACP, and auxiliary tasks, scoping this behavior to MiniMax is safer than weakening the generic overflow logic globally.
This keeps the PR focused and avoids regressions for other providers that may still need probe-down behavior when the real limit is unknown.
Files Changed
run_agent.pytests/agent/test_model_metadata.pytests/run_agent/test_run_agent.pyTesting
Targeted tests run locally on Windows with Python 3.12:
Results:
Scope / Non-Goals
This PR does not:
Rationale
This follows Hermes's existing pattern of making provider and error-class-specific recovery decisions inside
AIAgentrather than broadening a generic parser in ways that could regress other providers.