Bug type
Hang (process stays alive but stops making progress)
Beta release blocker
No (but causes turn drops that are hard to distinguish from #86948)
Summary
The codex app-server binary (0.130.0) occasionally stops sending notifications after emitting rawResponseItem/completed. The expected turn/completed notification never arrives. The binary process remains alive (no crash, no stderr), the gateway's stall detector flags it, and eventually the 30-minute terminal idle timeout fires — or the user manually kills it.
This is distinct from the false idle timeout race in #86948. That bug causes false 60s timeouts when turn/completed IS in the notification queue but hasn't been processed yet. This bug is a genuine stall where turn/completed never arrives at all.
Steps to reproduce
- Gateway running 5.26 or current main, codex binary 0.130.0
openai/gpt-5.5 via codex app-server runtime
- Multi-turn tool-call-heavy session (file reads, analysis tasks)
- After several successful turns, one turn completes its tool calls and emits
rawResponseItem/completed
- No further notifications arrive — no
turn/completed, no thread/tokenUsage/updated
- Binary process stays alive, gateway stall detector reports every 30s
Evidence
Gateway stall detector output (2026-05-27, post-#86948 fix, TUI channel):
00:03:17 [WARN] stalled session: sessionId=4fcadb49-...
sessionKey=agent:main:tui-113f92de-...
state=processing age=137s queueDepth=1
reason=queued_behind_terminal_active_work
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:rawResponseItem/completed
lastProgressAge=137s terminalProgressStale=true recovery=none
00:04:17 [WARN] stalled session: ... lastProgressAge=197s ...
00:05:17 [WARN] stalled session: ... lastProgressAge=257s ...
00:06:17 [WARN] stalled session: ... lastProgressAge=317s ...
00:07:17 [WARN] stalled session: ... lastProgressAge=377s recovery=checking
The last notification from the binary was rawResponseItem/completed. The gateway correctly detected a stall — no false timeout (the #86948 root-cause fix was applied), no terminalTurnNotificationQueued was set (because turn/completed never arrived).
Hypothesis
The codex binary relays rawResponseItem/completed (an intermediate event from the OpenAI Responses API) but the API never sends response.completed, so the binary never emits turn/completed. Possible causes:
- OpenAI Responses API drops the final event under certain conditions (connection reset, streaming interruption)
- The binary's response stream handler doesn't detect the missing terminal event
- The binary has no timeout of its own for "all items completed but no response.completed"
Expected behavior
The codex binary should:
- Detect when all response items have completed but
response.completed hasn't arrived within a reasonable window
- Emit
turn/completed (or turn/failed) so the gateway can proceed
- Not hang indefinitely waiting for an event that may never come
Environment
Refs #86948, #87022, PR #87070.
cc @Peetiegonzalez
Bug type
Hang (process stays alive but stops making progress)
Beta release blocker
No (but causes turn drops that are hard to distinguish from #86948)
Summary
The codex app-server binary (0.130.0) occasionally stops sending notifications after emitting
rawResponseItem/completed. The expectedturn/completednotification never arrives. The binary process remains alive (no crash, no stderr), the gateway's stall detector flags it, and eventually the 30-minute terminal idle timeout fires — or the user manually kills it.This is distinct from the false idle timeout race in #86948. That bug causes false 60s timeouts when
turn/completedIS in the notification queue but hasn't been processed yet. This bug is a genuine stall whereturn/completednever arrives at all.Steps to reproduce
openai/gpt-5.5via codex app-server runtimerawResponseItem/completedturn/completed, nothread/tokenUsage/updatedEvidence
Gateway stall detector output (2026-05-27, post-#86948 fix, TUI channel):
The last notification from the binary was
rawResponseItem/completed. The gateway correctly detected a stall — no false timeout (the #86948 root-cause fix was applied), noterminalTurnNotificationQueuedwas set (becauseturn/completednever arrived).Hypothesis
The codex binary relays
rawResponseItem/completed(an intermediate event from the OpenAI Responses API) but the API never sendsresponse.completed, so the binary never emitsturn/completed. Possible causes:Expected behavior
The codex binary should:
response.completedhasn't arrived within a reasonable windowturn/completed(orturn/failed) so the gateway can proceedEnvironment
openai/gpt-5.5Refs #86948, #87022, PR #87070.
cc @Peetiegonzalez