Skip to content

[codex] Prevent long-context probe collapse#14499

Closed
ztcshen wants to merge 1 commit into
NousResearch:mainfrom
ztcshen:codex/context-probe-intermediate-tiers
Closed

[codex] Prevent long-context probe collapse#14499
ztcshen wants to merge 1 commit into
NousResearch:mainfrom
ztcshen:codex/context-probe-intermediate-tiers

Conversation

@ztcshen

@ztcshen ztcshen commented Apr 23, 2026

Copy link
Copy Markdown

Summary

Fixes long-context probe-down behavior so known large-window models do not collapse directly to the conservative unknown-model fallback.

  • Adds intermediate probe tiers above 128K: 1,050,000, 400,000, 272,000, and 200,000 tokens.
  • Keeps DEFAULT_FALLBACK_CONTEXT fixed at 128K for truly unknown models/endpoints.
  • Updates model metadata tests to distinguish unknown-model fallback from provider error probe-down tiers.

Root cause

CONTEXT_PROBE_TIERS[0] served two different meanings: the default when context detection fails, and the next probe size when a provider reports that a known request is too large without a parseable limit.

That made Hermes fall straight from a known long-context model window to 128K whenever the provider error did not include a usable context limit.

Evidence

Observed failure from the report screenshot:

Provider: openai-codex Model: gpt-5.4
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model.
Context: 492 msgs, ~285,378 tokens
Context length exceeded - stepping down: 1,050,000 -> 128,000 tokens
Context too large (~285,378 tokens) - compressing (1/3)...
gpt-5.4 270K/128K ... 100%

The problematic jump is the direct 1,050,000 -> 128,000 step even though the request was only about 285K tokens.

Validation

./venv/bin/python -m pytest tests/agent/test_model_metadata.py tests/agent/test_model_metadata_local_ctx.py tests/test_ctx_halving_fix.py -q
128 passed in 3.79s

Also checked before commit:

git diff --check
# no output

Notes

I did not run a live OpenAI/GPT-5.4 overflow call; this PR covers the fallback/probe selection logic with focused regression tests.

Providers can report an oversized context request without including a parseable limit. Hermes then fell back to the first probe tier, which was also the unknown-model default, causing long-window models such as GPT-5.4 to drop directly from 1.05M to 128K.

This separates the conservative unknown-model fallback from the probe-down ladder: unknown models still start at 128K, while known long-context sessions step down through 1.05M, 400K, 272K, and 200K before reaching 128K.

Constraint: Unknown model detection should remain conservative at 128K

Rejected: Raising DEFAULT_FALLBACK_CONTEXT above 128K | would make truly unknown endpoints over-aggressive

Confidence: high

Scope-risk: narrow

Directive: Keep DEFAULT_FALLBACK_CONTEXT independent from CONTEXT_PROBE_TIERS when adding future probe tiers

Tested: ./venv/bin/python -m pytest tests/agent/test_model_metadata.py tests/agent/test_model_metadata_local_ctx.py tests/test_ctx_halving_fix.py -q

Not-tested: live provider overflow against OpenAI/GPT-5.4
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026
@ztcshen

ztcshen commented Apr 23, 2026

Copy link
Copy Markdown
Author

Added the original screenshot evidence for the context-window collapse:

gpt-5.4 context probe collapse screenshot

Key line shown in the screenshot: Hermes steps gpt-5.4 down from 1,050,000 to 128,000 while the conversation is only about 285,378 tokens.

@ztcshen ztcshen marked this pull request as ready for review April 24, 2026 02:08
@ztcshen

ztcshen commented Apr 24, 2026

Copy link
Copy Markdown
Author

Follow-up after hermes update on 2026-04-24: this still reproduces because current origin/main does not include this PR's change.

I checked the latest main locally at 6fdbf2f2 (Merge pull request #14820 from NousResearch/bb/tui-at-fuzzy-match). The relevant code is still the old behavior:

CONTEXT_PROBE_TIERS = [128_000, 64_000, 32_000, 16_000, 8_000]
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]

So when the provider reports a context overflow without a parseable limit, Hermes still computes the next tier from 1,050,000 as 128,000, matching the screenshot above.

I also checked nearby recent issues/PRs:

I marked this PR ready for review now that the issue has reproduced again on the latest update path.

@ztcshen

ztcshen commented Apr 24, 2026

Copy link
Copy Markdown
Author

Opened a narrower follow-up PR here: #14858.

This one keeps the #14499 evidence, but avoids changing the global probe tiers. It follows the shape of the already-merged #14743 fix instead: when the provider reports context overflow without a parseable limit, Hermes should not mutate a known 1,050,000-token GPT-5.4 window down to an untrusted guessed tier that is already below the prompt being recovered.

For my local Codex/Hermes usage this still reproduces after hermes update and happens at least five times per day, so #14858 focuses on the smallest fail-closed guard for that path.

@LindalyX-Lee

Copy link
Copy Markdown

Adding another repro/data point for this PR, from a real Discord gateway session.

This looks like the same long-context probe collapse, but with openai-codex / gpt-5.5 instead of gpt-5.4.

Environment

  • Hermes Agent v0.10.0 (2026.4.16)
  • Provider: openai-codex
  • Model: gpt-5.5
  • Gateway: Discord DM
  • Platform: macOS
  • Session shape: long-lived multi-day gateway session

Evidence

Gateway logs showed:

Provider: openai-codex  Model: gpt-5.5
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model. Please adjust your input and try again.
Context length exceeded — stepping down: 1,050,000 → 128,000 tokens
Context too large (~433,727 tokens) — compressing (1/3)...
Compressed 328 → 12 messages, retrying...
Session compressed 2 times — accuracy may degrade. Consider /new to start fresh.

Then it repeated in the same failure pattern:

Context too large (~435,199 tokens) — compressing (1/3)...
Compressed 332 → 9 messages, retrying...
API call failed (attempt 1/3): APIError
Provider: openai-codex  Model: gpt-5.5
Endpoint: https://chatgpt.com/backend-api/codex
Error: Your input exceeds the context window of this model. Please adjust your input and try again.
Context length exceeded — stepping down: 1,050,000 → 128,000 tokens
Context too large (~436,047 tokens) — compressing (1/3)...
Compressed 334 → 8 messages, retrying...

The important part is the same direct collapse:

1,050,000 → 128,000

Even though the live session estimate was around 433K–436K, so 128K was too small and the gateway kept entering compression/retry behavior.

Question

Does this PR also cover the gpt-5.5 / openai-codex / Discord gateway case where the provider error does not expose a parseable limit and Hermes collapses straight from 1,050,000 to 128,000?

If yes, this repro supports the same fix. If not, I can open a separate issue focused on the gateway-level user-facing failure mode: repeated context-compression retry after long-context probe collapse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants