[codex] Prevent long-context probe collapse#14499
Conversation
Providers can report an oversized context request without including a parseable limit. Hermes then fell back to the first probe tier, which was also the unknown-model default, causing long-window models such as GPT-5.4 to drop directly from 1.05M to 128K. This separates the conservative unknown-model fallback from the probe-down ladder: unknown models still start at 128K, while known long-context sessions step down through 1.05M, 400K, 272K, and 200K before reaching 128K. Constraint: Unknown model detection should remain conservative at 128K Rejected: Raising DEFAULT_FALLBACK_CONTEXT above 128K | would make truly unknown endpoints over-aggressive Confidence: high Scope-risk: narrow Directive: Keep DEFAULT_FALLBACK_CONTEXT independent from CONTEXT_PROBE_TIERS when adding future probe tiers Tested: ./venv/bin/python -m pytest tests/agent/test_model_metadata.py tests/agent/test_model_metadata_local_ctx.py tests/test_ctx_halving_fix.py -q Not-tested: live provider overflow against OpenAI/GPT-5.4
|
Follow-up after I checked the latest main locally at CONTEXT_PROBE_TIERS = [128_000, 64_000, 32_000, 16_000, 8_000]
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]So when the provider reports a context overflow without a parseable limit, Hermes still computes the next tier from I also checked nearby recent issues/PRs:
I marked this PR ready for review now that the issue has reproduced again on the latest update path. |
|
Opened a narrower follow-up PR here: #14858. This one keeps the #14499 evidence, but avoids changing the global probe tiers. It follows the shape of the already-merged #14743 fix instead: when the provider reports context overflow without a parseable limit, Hermes should not mutate a known 1,050,000-token GPT-5.4 window down to an untrusted guessed tier that is already below the prompt being recovered. For my local Codex/Hermes usage this still reproduces after |
|
Adding another repro/data point for this PR, from a real Discord gateway session. This looks like the same long-context probe collapse, but with Environment
EvidenceGateway logs showed: Then it repeated in the same failure pattern: The important part is the same direct collapse: Even though the live session estimate was around QuestionDoes this PR also cover the If yes, this repro supports the same fix. If not, I can open a separate issue focused on the gateway-level user-facing failure mode: repeated context-compression retry after long-context probe collapse. |

Summary
Fixes long-context probe-down behavior so known large-window models do not collapse directly to the conservative unknown-model fallback.
DEFAULT_FALLBACK_CONTEXTfixed at 128K for truly unknown models/endpoints.Root cause
CONTEXT_PROBE_TIERS[0]served two different meanings: the default when context detection fails, and the next probe size when a provider reports that a known request is too large without a parseable limit.That made Hermes fall straight from a known long-context model window to 128K whenever the provider error did not include a usable context limit.
Evidence
Observed failure from the report screenshot:
The problematic jump is the direct
1,050,000 -> 128,000step even though the request was only about 285K tokens.Validation
Also checked before commit:
Notes
I did not run a live OpenAI/GPT-5.4 overflow call; this PR covers the fallback/probe selection logic with focused regression tests.