fix(agent): stop probe stepdown on context overflow without provider limit (salvage #33673)#33826
Conversation
🔎 Lint report:
|
|
Re: context overflow without provider limit This salvages #33673's approach — appreciated. The probe stepdown logic needs to know when to stop, not just how. Without a provider-reported limit, you're flying blind. SwarmAI hits similar boundary cases with scheduled agents that load variable context (GitHub history + code scans). Our guard: # Rough estimate before expensive API call
if estimated_tokens > KNOWN_LIMIT * 0.9:
trigger_compression()But "KNOWN_LIMIT" is hardcoded per-provider. If Hermes can gracefully degrade (salvage what fits, defer the rest), that's a better pattern than hard failure. Key question: Does this PR preserve ordering when truncating? If probes are pruned LIFO, you lose the earliest (often most important) context. FIFO or priority-weighted would be safer for long-running agents. |
…vior The old test asserted that a non-MiniMax provider returning a generic overflow (no provider-reported max) would step down to the 128K probe tier. The salvaged fix from #33673 deliberately removes that step-down because guessed tiers cause configured 1M sessions to silently shrink. Update the test to assert the new contract: keep the configured 200K window and rely on compression instead.
6e4278c to
4139c41
Compare
Summary
A configured
context_lengthno longer silently shrinks when a provider returns a generic context-overflow error with no concrete max. Compression handles the long-conversation case; the configured window is preserved.Root cause: when
parse_context_limit_from_error()couldn't extract a number, the overflow handler fell back toget_next_probe_tier(old_ctx)— turning a 1M session into 256K → 128K → 64K on repeated overflows with no config change.Changes
agent/model_metadata.py: newget_context_length_from_provider_error()returning a provider-reported lower limit orNone.agent/conversation_loop.py: drop theget_next_probe_tier()fallback in the overflow recovery path. Keepcontext_lengthand compress when no provider limit is reported; still use the parsed limit when present; preserve the Minimax delta-only branch.tests/test_ctx_halving_fix.py:TestContextOverflowLimitSelectioncovers generic overflow without a limit, explicit provider limit, and reported-limit ≥ current.Validation
Salvage of #33673 by @yangguangjin onto current main. Authorship preserved via rebase-merge. Fixes #33669, supersedes the narrower #14953 by @atmigtnca (credit retained).
Infographic