Skip to content

Beta blocker: codex — app-server turns silently drop with event loop saturation #86948

@Marvinthebored

Description

@Marvinthebored

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Yes

Summary

The in-process codex app-server plugin silently drops turns after 1–4 successful interactions. The OpenAI Responses API sends a notification:item/completed event but the turn never resolves. The gateway's Node.js event loop reaches 100% utilization with P99 delays exceeding 5 seconds (worst case 95 seconds) during codex turns, consistent with accumulated unclosed I/O (likely SSE response streams) choking the event loop until the 60-second idle timeout fires. Reproduced 9 times across 7 sessions on 2026.5.26 and 2026.5.25-beta.1. Non-codex models (PI runtime) through the same gateway never exhibit this. cc @Peetiegonzalez

Steps to reproduce

  1. Start gateway normally (openclaw gateway), 5 plugins loaded (brave, codex, discord, lossless-claw, memory-core)
  2. Select openai/gpt-5.5 (codex runtime, agentRuntime.id: "codex")
  3. /reset, /new to start a fresh session (any channel: Discord, GUI, TUI, or cron)
  4. Send a warm-up message — succeeds, model responds in 2–5s server-side
  5. Send a task requiring tool calls (e.g. multi-step file reads, or any multi-tool task)
  6. Model begins working (tool calls execute, item/started and item/completed notifications arrive)
  7. Model goes silent mid-turn — no further SSE notifications arrive
  8. After exactly 60 seconds, idle timeout fires: codex app-server turn idle timed out waiting for completion
  9. Auth profile is incorrectly marked as failed (auth_profile_failure_state_updated, reason: timeout)

Reliability: 9/9 unique codex threads timed out in a single day. Some completed 1–3 turns before failing; none survived more than ~4 turns of tool-call-heavy work.

Expected behavior

Codex turns should complete reliably. The standalone Codex CLI using the same binary (0.130.0), same OAuth credentials, and same model (GPT-5.5) responds in 2.5–3s with zero drops. The in-process codex plugin should match this reliability.

Actual behavior

Every codex session eventually times out with the same signature:

{
  "idleMs": 60001,
  "timeoutMs": 60000,
  "lastActivityReason": "notification:item/completed",
  "lastNotificationMethod": "item/completed"
}

The plugin receives item/completed (partial — e.g. for a tool call item) then goes completely idle. No response.completed or turn-level completion event arrives. After 60s silence, the idle timeout fires and the turn is abandoned.

OpenClaw version

2026.5.26 (6f57286) — also reproduced on 2026.5.25-beta.1 (bbf4117)

Operating system

macOS Darwin 24.6.0, x86_64, Intel, 16GB RAM

Install method

openclaw update --channel dev (git clone + build), then sudo npm install -g /Users/admin/openclaw

Model

openai/gpt-5.5

Provider / routing chain

openclaw → codex plugin (in-process app-server) → OpenAI Responses API (Codex subscription OAuth)

Additional provider/model setup details

  • agentRuntime.id: "codex" on the gpt-5.5 model entry (stock default config)
  • Provider shows as openai-codex in session logs, API as openai-codex-responses
  • Context engine: lossless-claw (LCM)
  • Tested with both @openai/codex binary 0.133.0 (bundled) and 0.130.0 (from standalone CLI) — same failure on both, ruling out binary version regression
  • Non-OpenAI models use PI runtime and never exhibit this issue through the same gateway

Logs, screenshots, and evidence

1. Event loop saturation — gateway liveness diagnostics during codex turns:

These are the gateway's own liveness warning entries. Every entry below was logged during an active codex turn. PI runtime turns on the same gateway never trigger liveness warnings.

Timestamp (GMT+8)     P99 Delay    Max Delay    Utilization  Active Work
06:08:05              7,432ms      7,432ms      99.7%        codex embedded_run (age=432s)
06:10:28              10,981ms     10,981ms     100%         codex embedded_run (age=574s)
06:12:38              12,147ms     12,147ms     100%         codex tool_call
06:14:50              17,767ms     17,767ms     99.9%        codex model_call
09:00:13              95,429ms     95,429ms     100%         codex embedded_run (age=120s)
09:02:37              16,945ms     16,945ms     100%         codex embedded_run (age=264s)
09:11:40              34,393ms     34,393ms     100%         codex embedded_run (age=186s)
09:30:39              16,853ms     75,967ms     98.2%        codex embedded_run
13:45:23              2,027ms      4,182ms      77.2%        codex turn:start
15:54:44              5,516ms      5,516ms      100%         codex item/started

Worst case: event loop blocked for 95 seconds straight (09:00:13). At that point no I/O can be processed, no SSE events can be read, and the idle timeout is the only escape.

2. TCP/fd accumulation — external process monitoring (5-second sample interval):

Gateway process (PID 13064) monitored during a GPT-5.5 tool-bench-2 run:

Time      RSS(MB)  CPU%   FDs  TCP  Established  Unix   Phase
23:03:35  656      1      85   5    3            1      idle baseline
23:04:07  656      104    85   5    3            1      turn starts
23:04:14  711      54     91   7    4            4      +6 fds, +3 unix
23:05:07  648      101    92   9    6            4      tcp climbing
23:05:13  705      8      93   10   7            4      PEAK: 10 TCP, 7 established
23:05:38  714      7      93   10   7            4      peak again
23:06:23  665      187    91   8    5            4      CPU 187%
23:07:40  661      103    86   6    4            1      timeout fires, cleanup
23:07:53  680      0.6    84   4    2            1      back to baseline

TCP established connections climb from baseline 3 → peak 7 during turns. Unix fds jump 1 → 4. After the timeout fires and cleanup runs, everything drops back to baseline. Connections accumulate during active turns and don't fully drain between tool call rounds.

3. All 9 timeouts on 2026-05-26:

#  Time(GMT+8)  Thread(prefix)  Session(prefix)  Channel          Turns before timeout
1  12:05:29     019e626b        36e08c7c         Discord          3 successful
2  13:46:53     019e62d0        3c7d931e         GUI dashboard    ~5 successful
3  14:10:00     019e62df        3c7d931e         GUI dashboard    ~7 successful
4  16:01:47     019e634c        98276c33         Cron job         0 (first turn)
5  16:09:57     019e6346        a793f936         Discord          multiple
6  19:21:40     019e6400        0fba2bb4         Discord          3 successful
7  21:58:39     019e648f        57306d96         Discord          ~8 successful
8  22:46:34     019e64be        571f2479         Discord          1 successful
9  23:07:37     019e64d1        0e065b71         Discord          2 successful

All 9 have identical lastActivityReason: "notification:item/completed" signature.

4. Codex app-server runs in-process (no child process):

$ pgrep -P <gateway_pid>    # no codex child
$ ps -p <gateway_pid>       # only the node gateway process

Resource leaks in the codex plugin directly affect the gateway's event loop.

5. Binary version ruled out:

Swapped bundled @openai/codex 0.133.0 → 0.130.0 (from standalone CLI, known working). Gateway restarted. Same timeouts on 0.130.0 (timeout #8 and #9 above occurred after the swap). Standalone Codex CLI at 0.130.0 works perfectly — 2.5–3s responses, zero drops.

Impact and severity

  • Affected: All users relying on OpenAI models via the codex runtime (the default for openai/* models). All channels affected: Discord, GUI webchat/dashboard, TUI, cron jobs.
  • Severity: High — blocks any sustained work with GPT-5.5. Sessions fail within 1–4 tool-call turns. The agent goes silent with no user-visible error (silent failure).
  • Frequency: 9/9 codex threads timed out in one day of testing. 52 thread bindings total, 9 timed out = ~17% failure rate per thread binding, but since failures tend to cluster after a few successful turns, the per-session failure rate is effectively 100%.
  • Consequence: GPT-5.5 is unusable for any multi-turn or tool-call-heavy work. Users must switch to non-codex models as a workaround. Auth profile is falsely marked as failed after each timeout, potentially cascading into failover routing issues.

Additional information

Last known good version: 2026.5.12 (bundled @openai/codex 0.130.0) — had codex overhead but no turn drops.
First known bad version: 2026.5.25-beta.1 — also present on 2026.5.26 (main).

Hypothesis — SSE stream leak:
The codex plugin opens SSE connections to the OpenAI Responses API for each turn. Evidence suggests these streams are not properly closed after turn completion: TCP established connections climb during turns (3→7), event loop utilization hits 100%, and accumulated pending I/O handlers prevent new SSE events from being processed. The item/completed notification gets through (last thing processed before saturation) but response.completed never arrives. After the 60s idle timeout fires and cleanup runs, all metrics return to baseline — confirming the cleanup works but is triggered too late.

Secondary bug — auth profile poisoning:
Each timeout marks the auth profile as failed (auth_profile_failure_state_updated, reason: timeout). This is a false positive — auth succeeded, the thread was bound, tool calls executed. A turn timeout is not an auth failure.

Workaround:
Switch to any non-codex model. All non-OpenAI models use the PI runtime, bypass the codex plugin entirely, and work reliably through the same gateway pipeline:

{ "model": { "primary": "opencode-go/deepseek-v4-pro", "fallbacks": ["fireworks-ai/.../kimi-k2p6-turbo", "xai/grok-4.3"] } }

Note: Model-scoped agentRuntime.id: "pi" does NOT work for openai/* models — the codex plugin auto-claims the turn regardless of the policy. This was tested and reverted.

Data files (available on request):

  • Gateway log: /tmp/openclaw/openclaw-2026-05-26.log
  • Process monitor CSV (5s sample): codex-monitor-20260526-230334.log
  • Response time data (32+ message pairs): response-time-data-20260526.md
  • 7 session JSONL files with full message/tool/usage data

Metadata

Metadata

Assignees

Labels

P1High-priority user-facing bug, regression, or broken workflow.beta-blockerPlugin beta-release blocker pending stable cutoff triageclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions