[codex] Prevent long-context probe collapse by ztcshen · Pull Request #14499 · NousResearch/hermes-agent

ztcshen · 2026-04-23T09:50:39Z

Summary

Fixes long-context probe-down behavior so known large-window models do not collapse directly to the conservative unknown-model fallback.

Adds intermediate probe tiers above 128K: 1,050,000, 400,000, 272,000, and 200,000 tokens.
Keeps DEFAULT_FALLBACK_CONTEXT fixed at 128K for truly unknown models/endpoints.
Updates model metadata tests to distinguish unknown-model fallback from provider error probe-down tiers.

Root cause

CONTEXT_PROBE_TIERS[0] served two different meanings: the default when context detection fails, and the next probe size when a provider reports that a known request is too large without a parseable limit.

That made Hermes fall straight from a known long-context model window to 128K whenever the provider error did not include a usable context limit.

Evidence

Observed failure from the report screenshot:

Provider: openai-codex Model: gpt-5.4
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model.
Context: 492 msgs, ~285,378 tokens
Context length exceeded - stepping down: 1,050,000 -> 128,000 tokens
Context too large (~285,378 tokens) - compressing (1/3)...
gpt-5.4 270K/128K ... 100%

The problematic jump is the direct 1,050,000 -> 128,000 step even though the request was only about 285K tokens.

Validation

./venv/bin/python -m pytest tests/agent/test_model_metadata.py tests/agent/test_model_metadata_local_ctx.py tests/test_ctx_halving_fix.py -q
128 passed in 3.79s

Also checked before commit:

git diff --check
# no output

Notes

I did not run a live OpenAI/GPT-5.4 overflow call; this PR covers the fallback/probe selection logic with focused regression tests.

Providers can report an oversized context request without including a parseable limit. Hermes then fell back to the first probe tier, which was also the unknown-model default, causing long-window models such as GPT-5.4 to drop directly from 1.05M to 128K. This separates the conservative unknown-model fallback from the probe-down ladder: unknown models still start at 128K, while known long-context sessions step down through 1.05M, 400K, 272K, and 200K before reaching 128K. Constraint: Unknown model detection should remain conservative at 128K Rejected: Raising DEFAULT_FALLBACK_CONTEXT above 128K | would make truly unknown endpoints over-aggressive Confidence: high Scope-risk: narrow Directive: Keep DEFAULT_FALLBACK_CONTEXT independent from CONTEXT_PROBE_TIERS when adding future probe tiers Tested: ./venv/bin/python -m pytest tests/agent/test_model_metadata.py tests/agent/test_model_metadata_local_ctx.py tests/test_ctx_halving_fix.py -q Not-tested: live provider overflow against OpenAI/GPT-5.4

ztcshen · 2026-04-23T10:02:05Z

Added the original screenshot evidence for the context-window collapse:

Key line shown in the screenshot: Hermes steps gpt-5.4 down from 1,050,000 to 128,000 while the conversation is only about 285,378 tokens.

ztcshen · 2026-04-24T02:08:32Z

Follow-up after hermes update on 2026-04-24: this still reproduces because current origin/main does not include this PR's change.

I checked the latest main locally at 6fdbf2f2 (Merge pull request #14820 from NousResearch/bb/tui-at-fuzzy-match). The relevant code is still the old behavior:

CONTEXT_PROBE_TIERS = [128_000, 64_000, 32_000, 16_000, 8_000]
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]

So when the provider reports a context overflow without a parseable limit, Hermes still computes the next tier from 1,050,000 as 128,000, matching the screenshot above.

I also checked nearby recent issues/PRs:

gpt-5.4 shows 32k context in Hermes instead of 1,050,000 #5173 / fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache #5179 / Fix gpt-5.4 context length resolution for Codex #5174: related GPT-5.4 context resolution / poisoned cache path, but focused on resolving 1,050,000 vs small cached values.
fix(agent): preserve MiniMax context length on delta-only overflow (salvage #9170) #14743: merged MiniMax delta-only overflow fix; it preserves the old context length for provider messages that report overflow amount only, but does not change the generic probe ladder.
fix(agent): pass config_context_length in fallback activation path #14727 and fix(model_metadata): fall through to defaults when endpoint has model but no context_length #12059: related fallback/context-resolution paths, but not the direct 1,050,000 -> 128,000 probe collapse fixed here.

I marked this PR ready for review now that the issue has reproduced again on the latest update path.

ztcshen · 2026-04-24T02:19:03Z

Opened a narrower follow-up PR here: #14858.

This one keeps the #14499 evidence, but avoids changing the global probe tiers. It follows the shape of the already-merged #14743 fix instead: when the provider reports context overflow without a parseable limit, Hermes should not mutate a known 1,050,000-token GPT-5.4 window down to an untrusted guessed tier that is already below the prompt being recovered.

For my local Codex/Hermes usage this still reproduces after hermes update and happens at least five times per day, so #14858 focuses on the smallest fail-closed guard for that path.

LindalyX-Lee · 2026-04-25T06:20:56Z

Adding another repro/data point for this PR, from a real Discord gateway session.

This looks like the same long-context probe collapse, but with openai-codex / gpt-5.5 instead of gpt-5.4.

Environment

Hermes Agent v0.10.0 (2026.4.16)
Provider: openai-codex
Model: gpt-5.5
Gateway: Discord DM
Platform: macOS
Session shape: long-lived multi-day gateway session

Evidence

Gateway logs showed:

Provider: openai-codex  Model: gpt-5.5
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model. Please adjust your input and try again.
Context length exceeded — stepping down: 1,050,000 → 128,000 tokens
Context too large (~433,727 tokens) — compressing (1/3)...
Compressed 328 → 12 messages, retrying...
Session compressed 2 times — accuracy may degrade. Consider /new to start fresh.

Then it repeated in the same failure pattern:

Context too large (~435,199 tokens) — compressing (1/3)...
Compressed 332 → 9 messages, retrying...
API call failed (attempt 1/3): APIError
Provider: openai-codex  Model: gpt-5.5
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model. Please adjust your input and try again.
Context length exceeded — stepping down: 1,050,000 → 128,000 tokens
Context too large (~436,047 tokens) — compressing (1/3)...
Compressed 334 → 8 messages, retrying...

The important part is the same direct collapse:

1,050,000 → 128,000

Even though the live session estimate was around 433K–436K, so 128K was too small and the gateway kept entering compression/retry behavior.

Question

Does this PR also cover the gpt-5.5 / openai-codex / Discord gateway case where the provider error does not expose a parseable limit and Hermes collapses straight from 1,050,000 to 128,000?

If yes, this repro supports the same fix. If not, I can open a separate issue focused on the gateway-level user-facing failure mode: repeated context-compression retry after long-context probe collapse.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026

ztcshen marked this pull request as ready for review April 24, 2026 02:08

ztcshen mentioned this pull request Apr 24, 2026

[codex] Guard untrusted context probe shrink #14858

Closed

This was referenced Apr 27, 2026

Generic 400/disconnect errors misclassified as context_overflow in 1M-context sessions #16351

Closed

fix(error_classifier): avoid large-context false overflow heuristics #16352

Closed

ztcshen closed this by deleting the head repository May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Prevent long-context probe collapse#14499

[codex] Prevent long-context probe collapse#14499
ztcshen wants to merge 1 commit into
NousResearch:mainfrom
ztcshen:codex/context-probe-intermediate-tiers

ztcshen commented Apr 23, 2026

Uh oh!

ztcshen commented Apr 23, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

LindalyX-Lee commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ztcshen commented Apr 23, 2026

Summary

Root cause

Evidence

Validation

Notes

Uh oh!

ztcshen commented Apr 23, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

LindalyX-Lee commented Apr 25, 2026

Environment

Evidence

Question

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants