Summary
The current MiniMax delta-only overflow bug exposed a broader design issue in Hermes's context-overflow recovery path.
Today, Hermes often infers overflow semantics from free-form provider error text, then mutates context_length directly as part of recovery. That works for many providers, but it conflates:
- the model's declared/base context window
- the session's temporary effective context window during recovery
- provider-specific overflow semantics (input overflow vs output-cap overflow vs subscription-tier limits vs delta-only overflow messages)
This issue tracks the architectural cleanup needed beyond the tactical MiniMax hotfix in #9170.
What Happened
MiniMax returned:
context window exceeds limit (2013)
In that format, 2013 is the overflow delta, not the actual context limit.
Hermes correctly failed to parse a real context window from the message, but the generic fallback path treated parsed_limit is None as a reason to probe down to the next generic tier.
That caused an incorrect downgrade from 204800 to 128000, which then affected compression thresholds and subsequent session behavior.
Root Problem
The deeper issue is that Hermes currently relies on text-pattern heuristics plus a mutable context_length field to represent multiple distinct concepts:
- Base model capability
- What the model/provider can actually support.
- Effective runtime budget
- A temporary reduction applied for this session or this recovery path.
- Provider-specific semantics
- Whether an error means:
- input too large
- output cap too large
- subscription tier gate
- provider returned only an overflow delta
- provider returned the real limit
Those concepts should not all be encoded into a single mutable number with recovery logic spread across run_agent.py.
Why This Matters
When context_length is mutated directly, Hermes also mutates:
- compression threshold
- tail budget
- summary budget
- downstream recovery heuristics
- any status/usage reporting derived from the compressor
That means one misclassified overflow can distort the behavior of the rest of the session.
Current Tactical Mitigations
The codebase already contains multiple tactical recoveries for related but different overflow classes, including:
- output-cap overflow (
max_tokens too large) handled without shrinking context
- Anthropic long-context tier gating handled as a temporary runtime reduction
- MiniMax delta-only overflow handled as compress-only while preserving known context
These are correct tactical fixes, but they also show that the current recovery architecture is carrying multiple provider-specific semantics inline.
Proposed Roadmap
Phase 1: State hygiene for context windows
Goal: separate base context from temporary reductions.
Deliverables:
- introduce explicit base/effective context semantics in the context engine/compressor
- ensure
/new and /reset restore the base context window
- temporary recovery reductions must not overwrite the base context unless the provider confirms a real limit
Status:
Phase 2: Structured overflow recovery policy
Goal: move overflow semantics out of the main agent loop.
Deliverables:
- extract a dedicated recovery/policy module for context overflow decisions
- normalize recovery actions into structured outcomes, e.g.:
compress_only
shrink_context
clamp_output_only
tier_downgrade
unknown_overflow
- keep
run_agent.py as the orchestrator, not the semantic parser
Phase 3: Provider-specific overflow semantics registry
Goal: formalize known provider quirks in one place.
Deliverables:
- register known provider-specific overflow behaviors:
- MiniMax delta-only overflow
- Anthropic long-context tier gating
- output-cap overflow semantics
- local/custom endpoint unknown-limit behavior
- avoid scattering provider-specific string handling across the main loop
Design Constraints
Any cleanup should preserve Hermes's existing architecture and contribution priorities:
- keep bugfix PRs focused and reviewable
- preserve plugin compatibility for custom context engines
- avoid weakening generic parsers just to patch a provider-specific behavior
- keep provider hardening narrowly scoped unless the semantics are truly generic
Suggested Acceptance Criteria
- temporary context reductions do not survive
/new or /reset
- provider-confirmed real limits can still update the base context safely
- overflow classes are represented as structured recovery actions rather than inline string heuristics in the main loop
- adding a new provider-specific overflow quirk no longer requires touching the core loop directly
Related
Summary
The current MiniMax delta-only overflow bug exposed a broader design issue in Hermes's context-overflow recovery path.
Today, Hermes often infers overflow semantics from free-form provider error text, then mutates
context_lengthdirectly as part of recovery. That works for many providers, but it conflates:This issue tracks the architectural cleanup needed beyond the tactical MiniMax hotfix in #9170.
What Happened
MiniMax returned:
context window exceeds limit (2013)In that format,
2013is the overflow delta, not the actual context limit.Hermes correctly failed to parse a real context window from the message, but the generic fallback path treated
parsed_limit is Noneas a reason to probe down to the next generic tier.That caused an incorrect downgrade from
204800to128000, which then affected compression thresholds and subsequent session behavior.Root Problem
The deeper issue is that Hermes currently relies on text-pattern heuristics plus a mutable
context_lengthfield to represent multiple distinct concepts:Those concepts should not all be encoded into a single mutable number with recovery logic spread across
run_agent.py.Why This Matters
When
context_lengthis mutated directly, Hermes also mutates:That means one misclassified overflow can distort the behavior of the rest of the session.
Current Tactical Mitigations
The codebase already contains multiple tactical recoveries for related but different overflow classes, including:
max_tokenstoo large) handled without shrinking contextThese are correct tactical fixes, but they also show that the current recovery architecture is carrying multiple provider-specific semantics inline.
Proposed Roadmap
Phase 1: State hygiene for context windows
Goal: separate base context from temporary reductions.
Deliverables:
/newand/resetrestore the base context windowStatus:
maelrx:fix/context-state-hygienePhase 2: Structured overflow recovery policy
Goal: move overflow semantics out of the main agent loop.
Deliverables:
compress_onlyshrink_contextclamp_output_onlytier_downgradeunknown_overflowrun_agent.pyas the orchestrator, not the semantic parserPhase 3: Provider-specific overflow semantics registry
Goal: formalize known provider quirks in one place.
Deliverables:
Design Constraints
Any cleanup should preserve Hermes's existing architecture and contribution priorities:
Suggested Acceptance Criteria
/newor/resetRelated