[codex] Guard untrusted context probe shrink by ztcshen · Pull Request #14858 · NousResearch/hermes-agent

ztcshen · 2026-04-24T02:18:46Z

Summary

Keep a known model context length when Hermes only guessed the next context probe tier and that guessed tier is already below the prompt being recovered.

This is a narrower alternative to #14499. It follows the same recovery shape as #14743: if the provider error does not contain a trustworthy parseable context limit, do not mutate context_length to an untrusted lower value. Compress with the known window instead, or fail closed if compression cannot reduce the session.

User impact

This is a high-frequency Codex/GPT-5.4 usability issue for my setup. I hit this at least five times per day, and it significantly disrupts normal Hermes usage. Local session history shows the issue was reported repeatedly on 2026-04-23 and 2026-04-24, including after hermes update.

The observed failure mode from the attached evidence in #14499:

Provider: openai-codex Model: gpt-5.4
Endpoint: https://chatgpt.com/backend-api/codex
Context: 492 msgs, ~285,378 tokens
Context length exceeded - stepping down: 1,050,000 -> 128,000 tokens
gpt-5.4 270K/128K

Once Hermes believes the context is 128K, an otherwise recoverable long-session overflow turns into repeated compression/failure against an artificially tiny window.

What changed

Added _should_keep_context_length_on_untrusted_probe(...).
When parse_context_limit_from_error(...) returns no concrete limit and get_next_probe_tier(...) would shrink below the current prompt estimate, keep old_ctx.
Preserve existing behavior when:
- provider returned a concrete parseable limit;
- guessed probe tier is still above current prompt estimate;
- normal non-Codex unknown-provider probe-down applies.
Added pure helper tests and a run_conversation regression test modeled after fix(agent): preserve MiniMax context length on delta-only overflow (salvage #9170) #14743.

Related work

fix(agent): preserve MiniMax context length on delta-only overflow (salvage #9170) #14743: merged MiniMax fix that preserves known context length when provider overflow text is not a real context limit.
[codex] Prevent long-context probe collapse #14499: broader previous attempt that changed global probe tiers. This PR is intentionally narrower.
gpt-5.4 shows 32k context in Hermes instead of 1,050,000 #5173 / Fix gpt-5.4 context length resolution for Codex #5174 / fix(model_metadata): add gpt-5.x context lengths + guard against poisoned cache #5179: related GPT-5.4 context resolution/cache issues, but do not cover this untrusted probe shrink path.

Validation

./venv/bin/python -m pytest tests/test_ctx_halving_fix.py tests/run_agent/test_run_agent.py::TestRunConversation::test_untrusted_probe_below_prompt_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_minimax_delta_overflow_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_non_minimax_delta_overflow_still_probes_down -q
30 passed in 7.37s

./venv/bin/python -m py_compile run_agent.py
git diff --check

Providers can report context overflow without a parseable limit. In that case Hermes currently guesses the next probe tier and may shrink a known large model window below the prompt already being recovered, such as gpt-5.4 moving from 1,050,000 to 128,000 for a ~285K prompt. This keeps the known context length when the guessed tier is below the current prompt estimate, then lets compression either recover or fail without inventing a smaller model window. Tested: ./venv/bin/python -m pytest tests/test_ctx_halving_fix.py tests/run_agent/test_run_agent.py::TestRunConversation::test_untrusted_probe_below_prompt_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_minimax_delta_overflow_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_non_minimax_delta_overflow_still_probes_down -q Tested: ./venv/bin/python -m py_compile run_agent.py && git diff --check

ztcshen · 2026-04-24T02:19:35Z

Screenshot evidence from the original reproduction in #14499:

The important part is the context estimate and mutation shown there:

Context: 492 msgs, ~285,378 tokens
Context length exceeded - stepping down: 1,050,000 -> 128,000 tokens
gpt-5.4 270K/128K

This is the specific fail-open path this PR guards: when the provider did not give Hermes a parseable context limit, the guessed tier should not overwrite a known 1,050,000-token window with 128,000 while the active prompt estimate is already ~285K. For my local Codex/Hermes workflow this happens at least five times per day and makes long sessions effectively unusable after the first bad probe.

alt-glitch · 2026-04-24T02:26:37Z

Related to #14499 (broader probe tier fix) and #9181 (architecture: separate base vs effective context). This is a narrower, safer alternative to #14499.

ztcshen · 2026-04-24T02:43:41Z

Additional note after switching models today: this is not intended to be GPT-5.4-specific. I have also reproduced the same failure mode after switching to the latest GPT-5.5 path.

The guard in this PR is intentionally model-agnostic. It applies when all of these are true:

the provider reports a context overflow;
Hermes cannot parse a trustworthy concrete context limit from that provider error;
the guessed next probe tier would be below the prompt currently being recovered.

So although the original screenshot evidence shows gpt-5.4 stepping from 1,050,000 to 128,000, the same protection should apply to GPT-5.5 or any future large-context model that hits the same untrusted probe-shrink path.

ztcshen mentioned this pull request Apr 24, 2026

[codex] Prevent long-context probe collapse #14499

Closed

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API labels Apr 24, 2026

This was referenced Apr 27, 2026

Generic 400/disconnect errors misclassified as context_overflow in 1M-context sessions #16351

Closed

fix(error_classifier): avoid large-context false overflow heuristics #16352

Closed

ztcshen closed this by deleting the head repository May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Guard untrusted context probe shrink#14858

[codex] Guard untrusted context probe shrink#14858
ztcshen wants to merge 1 commit into
NousResearch:mainfrom
ztcshen:codex/guard-untrusted-context-probe

ztcshen commented Apr 24, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

alt-glitch commented Apr 24, 2026

Uh oh!

ztcshen commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants