[codex] Guard untrusted context probe shrink#14858
Conversation
Providers can report context overflow without a parseable limit. In that case Hermes currently guesses the next probe tier and may shrink a known large model window below the prompt already being recovered, such as gpt-5.4 moving from 1,050,000 to 128,000 for a ~285K prompt. This keeps the known context length when the guessed tier is below the current prompt estimate, then lets compression either recover or fail without inventing a smaller model window. Tested: ./venv/bin/python -m pytest tests/test_ctx_halving_fix.py tests/run_agent/test_run_agent.py::TestRunConversation::test_untrusted_probe_below_prompt_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_minimax_delta_overflow_keeps_known_context_length tests/run_agent/test_run_agent.py::TestRunConversation::test_non_minimax_delta_overflow_still_probes_down -q Tested: ./venv/bin/python -m py_compile run_agent.py && git diff --check
|
Screenshot evidence from the original reproduction in #14499: The important part is the context estimate and mutation shown there: This is the specific fail-open path this PR guards: when the provider did not give Hermes a parseable context limit, the guessed tier should not overwrite a known 1,050,000-token window with 128,000 while the active prompt estimate is already ~285K. For my local Codex/Hermes workflow this happens at least five times per day and makes long sessions effectively unusable after the first bad probe. |
|
Additional note after switching models today: this is not intended to be GPT-5.4-specific. I have also reproduced the same failure mode after switching to the latest GPT-5.5 path. The guard in this PR is intentionally model-agnostic. It applies when all of these are true:
So although the original screenshot evidence shows |

Summary
Keep a known model context length when Hermes only guessed the next context probe tier and that guessed tier is already below the prompt being recovered.
This is a narrower alternative to #14499. It follows the same recovery shape as #14743: if the provider error does not contain a trustworthy parseable context limit, do not mutate
context_lengthto an untrusted lower value. Compress with the known window instead, or fail closed if compression cannot reduce the session.User impact
This is a high-frequency Codex/GPT-5.4 usability issue for my setup. I hit this at least five times per day, and it significantly disrupts normal Hermes usage. Local session history shows the issue was reported repeatedly on 2026-04-23 and 2026-04-24, including after
hermes update.The observed failure mode from the attached evidence in #14499:
Once Hermes believes the context is 128K, an otherwise recoverable long-session overflow turns into repeated compression/failure against an artificially tiny window.
What changed
_should_keep_context_length_on_untrusted_probe(...).parse_context_limit_from_error(...)returns no concrete limit andget_next_probe_tier(...)would shrink below the current prompt estimate, keepold_ctx.run_conversationregression test modeled after fix(agent): preserve MiniMax context length on delta-only overflow (salvage #9170) #14743.Related work
Validation