Skip to content

openai-codex / gpt-5.5 repeatedly produces no first byte after #31967/#32016 #32373

@JiehoonKwak

Description

@JiehoonKwak

What happened

After the recent Codex timeout fixes, openai-codex / gpt-5.5 still frequently stalls before emitting the first stream event.

This is not the old context=~0 tokens / 300s stale-timeout behavior. My local checkout already includes the merged fixes from #31967 and #32016. Hermes now detects the stall earlier via the Codex TTFB watchdog, but the underlying request still often accepts the connection and emits no stream events.

Typical user-facing output:

No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.9s (attempt 1/3)...
⏳ Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.

In one scheduled gateway run, the same failure family progressed like this:

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

Expected behavior

openai-codex / gpt-5.5 should either:

  • emit a first stream event normally,
  • return a structured error if the backend rejects the request, or
  • fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.

Actual behavior

The backend frequently accepts the request but emits no stream events for 45s. Hermes kills the request and retries, but the same first-byte stall can repeat multiple times in a single turn.

When the retry eventually reaches the stale-call detector, the request can end with ReadError: [Errno 32] Broken pipe after Hermes closes the stalled connection.

Environment

Why this seems distinct from the fixed issue

#31967 fixed the timeout accounting bug and lowered the default stale timeout from 300s to 90s. That fix appears to be active: logs now show realistic context estimates such as context=~45,779 tokens, not context=~0 tokens.

#32016 also appears active: the checkout contains the gpt-5.5 Codex silent-hang hint path.

The remaining issue is earlier in the request lifecycle: the Codex Responses stream often produces no first byte at all, so the 45s TTFB watchdog fires repeatedly.

Related issues / PRs

Notes

This may still be backend-side, but it is severe enough in gateway/scheduled usage that the current retry behavior produces repeated stalls and occasional job failure. A clean structured backend rejection, a payload sanitizer, or a more reliable fallback trigger would make this much easier to operate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/openaiOpenAI / Codex Responses APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions