Architecture: separate base vs effective context in overflow recovery

﻿## Summary

The current MiniMax delta-only overflow bug exposed a broader design issue in Hermes's context-overflow recovery path.

Today, Hermes often infers overflow semantics from free-form provider error text, then mutates `context_length` directly as part of recovery. That works for many providers, but it conflates:

- the model's declared/base context window
- the session's temporary effective context window during recovery
- provider-specific overflow semantics (input overflow vs output-cap overflow vs subscription-tier limits vs delta-only overflow messages)

This issue tracks the architectural cleanup needed beyond the tactical MiniMax hotfix in #9170.

## What Happened

MiniMax returned:

`context window exceeds limit (2013)`

In that format, `2013` is the overflow delta, not the actual context limit.

Hermes correctly failed to parse a real context window from the message, but the generic fallback path treated `parsed_limit is None` as a reason to probe down to the next generic tier.

That caused an incorrect downgrade from `204800` to `128000`, which then affected compression thresholds and subsequent session behavior.

## Root Problem

The deeper issue is that Hermes currently relies on text-pattern heuristics plus a mutable `context_length` field to represent multiple distinct concepts:

1. **Base model capability**
   - What the model/provider can actually support.
2. **Effective runtime budget**
   - A temporary reduction applied for this session or this recovery path.
3. **Provider-specific semantics**
   - Whether an error means:
     - input too large
     - output cap too large
     - subscription tier gate
     - provider returned only an overflow delta
     - provider returned the real limit

Those concepts should not all be encoded into a single mutable number with recovery logic spread across `run_agent.py`.

## Why This Matters

When `context_length` is mutated directly, Hermes also mutates:

- compression threshold
- tail budget
- summary budget
- downstream recovery heuristics
- any status/usage reporting derived from the compressor

That means one misclassified overflow can distort the behavior of the rest of the session.

## Current Tactical Mitigations

The codebase already contains multiple tactical recoveries for related but different overflow classes, including:

- output-cap overflow (`max_tokens` too large) handled without shrinking context
- Anthropic long-context tier gating handled as a temporary runtime reduction
- MiniMax delta-only overflow handled as compress-only while preserving known context

These are correct tactical fixes, but they also show that the current recovery architecture is carrying multiple provider-specific semantics inline.

## Proposed Roadmap

### Phase 1: State hygiene for context windows

Goal: separate base context from temporary reductions.

Deliverables:

- introduce explicit base/effective context semantics in the context engine/compressor
- ensure `/new` and `/reset` restore the base context window
- temporary recovery reductions must not overwrite the base context unless the provider confirms a real limit

Status:
- a local validated branch exists for this phase:
  - `maelrx:fix/context-state-hygiene`
- this branch is currently stacked on top of #9170 and should be split/rebased appropriately after the hotfix lands

### Phase 2: Structured overflow recovery policy

Goal: move overflow semantics out of the main agent loop.

Deliverables:

- extract a dedicated recovery/policy module for context overflow decisions
- normalize recovery actions into structured outcomes, e.g.:
  - `compress_only`
  - `shrink_context`
  - `clamp_output_only`
  - `tier_downgrade`
  - `unknown_overflow`
- keep `run_agent.py` as the orchestrator, not the semantic parser

### Phase 3: Provider-specific overflow semantics registry

Goal: formalize known provider quirks in one place.

Deliverables:

- register known provider-specific overflow behaviors:
  - MiniMax delta-only overflow
  - Anthropic long-context tier gating
  - output-cap overflow semantics
  - local/custom endpoint unknown-limit behavior
- avoid scattering provider-specific string handling across the main loop

## Design Constraints

Any cleanup should preserve Hermes's existing architecture and contribution priorities:

- keep bugfix PRs focused and reviewable
- preserve plugin compatibility for custom context engines
- avoid weakening generic parsers just to patch a provider-specific behavior
- keep provider hardening narrowly scoped unless the semantics are truly generic

## Suggested Acceptance Criteria

- temporary context reductions do not survive `/new` or `/reset`
- provider-confirmed real limits can still update the base context safely
- overflow classes are represented as structured recovery actions rather than inline string heuristics in the main loop
- adding a new provider-specific overflow quirk no longer requires touching the core loop directly

## Related

- Tactical MiniMax hotfix PR: #9170


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture: separate base vs effective context in overflow recovery #9181

Summary

What Happened

Root Problem

Why This Matters

Current Tactical Mitigations

Proposed Roadmap

Phase 1: State hygiene for context windows

Phase 2: Structured overflow recovery policy

Phase 3: Provider-specific overflow semantics registry

Design Constraints

Suggested Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Architecture: separate base vs effective context in overflow recovery #9181

Description

Summary

What Happened

Root Problem

Why This Matters

Current Tactical Mitigations

Proposed Roadmap

Phase 1: State hygiene for context windows

Phase 2: Structured overflow recovery policy

Phase 3: Provider-specific overflow semantics registry

Design Constraints

Suggested Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions