fix(agent): tolerate large Codex stream prefill#33383
Closed
sanghyuk-seo-nexcube wants to merge 1 commit into
Closed
fix(agent): tolerate large Codex stream prefill#33383sanghyuk-seo-nexcube wants to merge 1 commit into
sanghyuk-seo-nexcube wants to merge 1 commit into
Conversation
a2d2ceb to
5c58fb1
Compare
teknium1
added a commit
that referenced
this pull request
May 27, 2026
Contributor
|
Salvaged onto current main via #33390 (merged as 3476509). Your authorship is preserved in |
3 tasks
mathias3
pushed a commit
to mathias3/hermes-agent
that referenced
this pull request
May 28, 2026
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
mosaiq-systems
pushed a commit
to mosaiq-systems/hermes-agent
that referenced
this pull request
May 29, 2026
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Large
openai-codexsubscription requests can spend longer than the fixed TTFB cutoff in backend admission or prompt prefill before the first SSE event is emitted. Hermes treated that as a no-first-byte stream failure and reconnected, which caused long-context Codex turns to retry even though the backend could still complete successfully.This updates the Codex Responses watchdog policy to distinguish three cases:
openai-codexrequests wait for backend prefill instead of being killed by the first-byte watchdogHow this differs from recent Codex stream fixes
This complements the existing Codex stream fixes rather than replacing them:
response.completed.output = nullcrashes after stream events had already been collected.responses.stream()helper and consumes rawresponses.create(stream=True)events directly, making that null-output parser failure structurally impossible.Those fixes address stream consumption once events arrive, or improve the message shown when the TTFB watchdog kills a request. This PR addresses the separate pre-first-event case where hosted Codex accepts a large request but spends a long time in admission/prompt prefill before emitting the first SSE event.
Related Issue
Addresses the no-first-byte / large-context TTFB portion of #33075.
Related: #22986, #7069, #32963, #33042, #33133
Type of Change
Changes Made
agent/chat_completion_helpers.pyto detect the hostedopenai-codexResponses backend in the streaming watchdog path.HERMES_CODEX_TTFB_STRICT=1for strict behavior.tests/agent/test_codex_ttfb_watchdog.pywith regression coverage for small no-byte stalls, first-event-then-idle stalls, large-context prefill delay, and strict-mode behavior.How to Test
scripts/run_tests.sh tests/agent/test_codex_ttfb_watchdog.pyscripts/run_tests.sh tests/agent/test_codex_ttfb_watchdog.py tests/agent/test_auxiliary_client.py tests/run_agent/test_run_agent_codex_responses.pyopenai-codexturns with large estimated input context no longer reconnect solely because the hosted backend takes longer than the small-request TTFB cutoff before the first SSE event.Checklist
Code
fix(scope):,feat(scope):, etc.)Documentation & Housekeeping
cli-config.yaml.exampleupdate N/ACONTRIBUTING.mdorAGENTS.mdupdate N/AScreenshots / Logs
Before this fix, large hosted Codex requests could show repeated reconnects like:
No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.After this fix, large hosted Codex requests wait for backend prefill; small no-byte stalls and post-first-event idle stalls are still retried.