Skip to content

codex: binary stalls after rawResponseItem/completed — turn/completed never arrives #87071

@Marvinthebored

Description

@Marvinthebored

Bug type

Hang (process stays alive but stops making progress)

Beta release blocker

No (but causes turn drops that are hard to distinguish from #86948)

Summary

The codex app-server binary (0.130.0) occasionally stops sending notifications after emitting rawResponseItem/completed. The expected turn/completed notification never arrives. The binary process remains alive (no crash, no stderr), the gateway's stall detector flags it, and eventually the 30-minute terminal idle timeout fires — or the user manually kills it.

This is distinct from the false idle timeout race in #86948. That bug causes false 60s timeouts when turn/completed IS in the notification queue but hasn't been processed yet. This bug is a genuine stall where turn/completed never arrives at all.

Steps to reproduce

  1. Gateway running 5.26 or current main, codex binary 0.130.0
  2. openai/gpt-5.5 via codex app-server runtime
  3. Multi-turn tool-call-heavy session (file reads, analysis tasks)
  4. After several successful turns, one turn completes its tool calls and emits rawResponseItem/completed
  5. No further notifications arrive — no turn/completed, no thread/tokenUsage/updated
  6. Binary process stays alive, gateway stall detector reports every 30s

Evidence

Gateway stall detector output (2026-05-27, post-#86948 fix, TUI channel):

00:03:17 [WARN] stalled session: sessionId=4fcadb49-...
  sessionKey=agent:main:tui-113f92de-...
  state=processing age=137s queueDepth=1
  reason=queued_behind_terminal_active_work
  classification=stalled_agent_run
  activeWorkKind=embedded_run
  lastProgress=codex_app_server:notification:rawResponseItem/completed
  lastProgressAge=137s terminalProgressStale=true recovery=none

00:04:17 [WARN] stalled session: ... lastProgressAge=197s ...
00:05:17 [WARN] stalled session: ... lastProgressAge=257s ...
00:06:17 [WARN] stalled session: ... lastProgressAge=317s ...
00:07:17 [WARN] stalled session: ... lastProgressAge=377s recovery=checking

The last notification from the binary was rawResponseItem/completed. The gateway correctly detected a stall — no false timeout (the #86948 root-cause fix was applied), no terminalTurnNotificationQueued was set (because turn/completed never arrived).

Hypothesis

The codex binary relays rawResponseItem/completed (an intermediate event from the OpenAI Responses API) but the API never sends response.completed, so the binary never emits turn/completed. Possible causes:

  • OpenAI Responses API drops the final event under certain conditions (connection reset, streaming interruption)
  • The binary's response stream handler doesn't detect the missing terminal event
  • The binary has no timeout of its own for "all items completed but no response.completed"

Expected behavior

The codex binary should:

  1. Detect when all response items have completed but response.completed hasn't arrived within a reasonable window
  2. Emit turn/completed (or turn/failed) so the gateway can proceed
  3. Not hang indefinitely waiting for an event that may never come

Environment

Refs #86948, #87022, PR #87070.
cc @Peetiegonzalez

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions