What happened
After the recent Codex timeout fixes, openai-codex / gpt-5.5 still frequently stalls before emitting the first stream event.
This is not the old context=~0 tokens / 300s stale-timeout behavior. My local checkout already includes the merged fixes from #31967 and #32016. Hermes now detects the stall earlier via the Codex TTFB watchdog, but the underlying request still often accepts the connection and emits no stream events.
Typical user-facing output:
No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.9s (attempt 1/3)...
⏳ Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
In one scheduled gateway run, the same failure family progressed like this:
Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe
Expected behavior
openai-codex / gpt-5.5 should either:
- emit a first stream event normally,
- return a structured error if the backend rejects the request, or
- fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.
Actual behavior
The backend frequently accepts the request but emits no stream events for 45s. Hermes kills the request and retries, but the same first-byte stall can repeat multiple times in a single turn.
When the retry eventually reaches the stale-call detector, the request can end with ReadError: [Errno 32] Broken pipe after Hermes closes the stalled connection.
Environment
- Hermes checkout: local
main
- Local checkout includes:
- Platform: macOS, launchd gateway
- Provider:
openai-codex
- Model:
gpt-5.5
- Base URL:
https://chatgpt.com/backend-api/codex
- Affected surfaces: gateway sessions and scheduled agent runs
fallback_providers: none on the affected profile(s)
Why this seems distinct from the fixed issue
#31967 fixed the timeout accounting bug and lowered the default stale timeout from 300s to 90s. That fix appears to be active: logs now show realistic context estimates such as context=~45,779 tokens, not context=~0 tokens.
#32016 also appears active: the checkout contains the gpt-5.5 Codex silent-hang hint path.
The remaining issue is earlier in the request lifecycle: the Codex Responses stream often produces no first byte at all, so the 45s TTFB watchdog fires repeatedly.
Related issues / PRs
Notes
This may still be backend-side, but it is severe enough in gateway/scheduled usage that the current retry behavior produces repeated stalls and occasional job failure. A clean structured backend rejection, a payload sanitizer, or a more reliable fallback trigger would make this much easier to operate.
What happened
After the recent Codex timeout fixes,
openai-codex/gpt-5.5still frequently stalls before emitting the first stream event.This is not the old
context=~0 tokens/ 300s stale-timeout behavior. My local checkout already includes the merged fixes from #31967 and #32016. Hermes now detects the stall earlier via the Codex TTFB watchdog, but the underlying request still often accepts the connection and emits no stream events.Typical user-facing output:
In one scheduled gateway run, the same failure family progressed like this:
Expected behavior
openai-codex/gpt-5.5should either:Actual behavior
The backend frequently accepts the request but emits no stream events for 45s. Hermes kills the request and retries, but the same first-byte stall can repeat multiple times in a single turn.
When the retry eventually reaches the stale-call detector, the request can end with
ReadError: [Errno 32] Broken pipeafter Hermes closes the stalled connection.Environment
mainfix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults)fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern)openai-codexgpt-5.5https://chatgpt.com/backend-api/codexfallback_providers: none on the affected profile(s)Why this seems distinct from the fixed issue
#31967 fixed the timeout accounting bug and lowered the default stale timeout from 300s to 90s. That fix appears to be active: logs now show realistic context estimates such as
context=~45,779 tokens, notcontext=~0 tokens.#32016 also appears active: the checkout contains the gpt-5.5 Codex silent-hang hint path.
The remaining issue is earlier in the request lifecycle: the Codex Responses stream often produces no first byte at all, so the 45s TTFB watchdog fires repeatedly.
Related issues / PRs
reasoning,include, andstoreforgpt-5.5on the Codex backendNotes
This may still be backend-side, but it is severe enough in gateway/scheduled usage that the current retry behavior produces repeated stalls and occasional job failure. A clean structured backend rejection, a payload sanitizer, or a more reliable fallback trigger would make this much easier to operate.