Beta blocker: codex — app-server turns silently drop with event loop saturation

### Bug type

Crash (process/app exits or hangs)

### Beta release blocker

Yes

### Summary

The in-process codex app-server plugin silently drops turns after 1–4 successful interactions. The OpenAI Responses API sends a `notification:item/completed` event but the turn never resolves. The gateway's Node.js event loop reaches 100% utilization with P99 delays exceeding 5 seconds (worst case 95 seconds) during codex turns, consistent with accumulated unclosed I/O (likely SSE response streams) choking the event loop until the 60-second idle timeout fires. Reproduced 9 times across 7 sessions on 2026.5.26 and 2026.5.25-beta.1. Non-codex models (PI runtime) through the same gateway never exhibit this. cc @Peetiegonzalez

### Steps to reproduce

1. Start gateway normally (`openclaw gateway`), 5 plugins loaded (brave, codex, discord, lossless-claw, memory-core)
2. Select `openai/gpt-5.5` (codex runtime, `agentRuntime.id: "codex"`)
3. `/reset`, `/new` to start a fresh session (any channel: Discord, GUI, TUI, or cron)
4. Send a warm-up message — succeeds, model responds in 2–5s server-side
5. Send a task requiring tool calls (e.g. multi-step file reads, or any multi-tool task)
6. Model begins working (tool calls execute, `item/started` and `item/completed` notifications arrive)
7. Model goes silent mid-turn — no further SSE notifications arrive
8. After exactly 60 seconds, idle timeout fires: `codex app-server turn idle timed out waiting for completion`
9. Auth profile is incorrectly marked as failed (`auth_profile_failure_state_updated`, reason: `timeout`)

Reliability: 9/9 unique codex threads timed out in a single day. Some completed 1–3 turns before failing; none survived more than ~4 turns of tool-call-heavy work.

### Expected behavior

Codex turns should complete reliably. The standalone Codex CLI using the same binary (0.130.0), same OAuth credentials, and same model (GPT-5.5) responds in 2.5–3s with zero drops. The in-process codex plugin should match this reliability.

### Actual behavior

Every codex session eventually times out with the same signature:

```json
{
  "idleMs": 60001,
  "timeoutMs": 60000,
  "lastActivityReason": "notification:item/completed",
  "lastNotificationMethod": "item/completed"
}
```

The plugin receives `item/completed` (partial — e.g. for a tool call item) then goes completely idle. No `response.completed` or turn-level completion event arrives. After 60s silence, the idle timeout fires and the turn is abandoned.

### OpenClaw version

2026.5.26 (6f57286) — also reproduced on 2026.5.25-beta.1 (bbf4117)

### Operating system

macOS Darwin 24.6.0, x86_64, Intel, 16GB RAM

### Install method

`openclaw update --channel dev` (git clone + build), then `sudo npm install -g /Users/admin/openclaw`

### Model

openai/gpt-5.5

### Provider / routing chain

openclaw → codex plugin (in-process app-server) → OpenAI Responses API (Codex subscription OAuth)

### Additional provider/model setup details

- `agentRuntime.id: "codex"` on the gpt-5.5 model entry (stock default config)
- Provider shows as `openai-codex` in session logs, API as `openai-codex-responses`
- Context engine: `lossless-claw` (LCM)
- Tested with both `@openai/codex` binary 0.133.0 (bundled) and 0.130.0 (from standalone CLI) — same failure on both, ruling out binary version regression
- Non-OpenAI models use PI runtime and never exhibit this issue through the same gateway

### Logs, screenshots, and evidence

**1. Event loop saturation — gateway liveness diagnostics during codex turns:**

These are the gateway's own `liveness warning` entries. Every entry below was logged during an active codex turn. PI runtime turns on the same gateway never trigger liveness warnings.

```
Timestamp (GMT+8)     P99 Delay    Max Delay    Utilization  Active Work
06:08:05              7,432ms      7,432ms      99.7%        codex embedded_run (age=432s)
06:10:28              10,981ms     10,981ms     100%         codex embedded_run (age=574s)
06:12:38              12,147ms     12,147ms     100%         codex tool_call
06:14:50              17,767ms     17,767ms     99.9%        codex model_call
09:00:13              95,429ms     95,429ms     100%         codex embedded_run (age=120s)
09:02:37              16,945ms     16,945ms     100%         codex embedded_run (age=264s)
09:11:40              34,393ms     34,393ms     100%         codex embedded_run (age=186s)
09:30:39              16,853ms     75,967ms     98.2%        codex embedded_run
13:45:23              2,027ms      4,182ms      77.2%        codex turn:start
15:54:44              5,516ms      5,516ms      100%         codex item/started
```

Worst case: event loop blocked for **95 seconds straight** (09:00:13). At that point no I/O can be processed, no SSE events can be read, and the idle timeout is the only escape.

**2. TCP/fd accumulation — external process monitoring (5-second sample interval):**

Gateway process (PID 13064) monitored during a GPT-5.5 tool-bench-2 run:

```
Time      RSS(MB)  CPU%   FDs  TCP  Established  Unix   Phase
23:03:35  656      1      85   5    3            1      idle baseline
23:04:07  656      104    85   5    3            1      turn starts
23:04:14  711      54     91   7    4            4      +6 fds, +3 unix
23:05:07  648      101    92   9    6            4      tcp climbing
23:05:13  705      8      93   10   7            4      PEAK: 10 TCP, 7 established
23:05:38  714      7      93   10   7            4      peak again
23:06:23  665      187    91   8    5            4      CPU 187%
23:07:40  661      103    86   6    4            1      timeout fires, cleanup
23:07:53  680      0.6    84   4    2            1      back to baseline
```

TCP established connections climb from baseline 3 → peak 7 during turns. Unix fds jump 1 → 4. After the timeout fires and cleanup runs, everything drops back to baseline. Connections accumulate during active turns and don't fully drain between tool call rounds.

**3. All 9 timeouts on 2026-05-26:**

```
#  Time(GMT+8)  Thread(prefix)  Session(prefix)  Channel          Turns before timeout
1  12:05:29     019e626b        36e08c7c         Discord          3 successful
2  13:46:53     019e62d0        3c7d931e         GUI dashboard    ~5 successful
3  14:10:00     019e62df        3c7d931e         GUI dashboard    ~7 successful
4  16:01:47     019e634c        98276c33         Cron job         0 (first turn)
5  16:09:57     019e6346        a793f936         Discord          multiple
6  19:21:40     019e6400        0fba2bb4         Discord          3 successful
7  21:58:39     019e648f        57306d96         Discord          ~8 successful
8  22:46:34     019e64be        571f2479         Discord          1 successful
9  23:07:37     019e64d1        0e065b71         Discord          2 successful
```

All 9 have identical `lastActivityReason: "notification:item/completed"` signature.

**4. Codex app-server runs in-process (no child process):**

```
$ pgrep -P <gateway_pid>    # no codex child
$ ps -p <gateway_pid>       # only the node gateway process
```

Resource leaks in the codex plugin directly affect the gateway's event loop.

**5. Binary version ruled out:**

Swapped bundled `@openai/codex` 0.133.0 → 0.130.0 (from standalone CLI, known working). Gateway restarted. Same timeouts on 0.130.0 (timeout #8 and #9 above occurred after the swap). Standalone Codex CLI at 0.130.0 works perfectly — 2.5–3s responses, zero drops.

### Impact and severity

- **Affected:** All users relying on OpenAI models via the codex runtime (the default for `openai/*` models). All channels affected: Discord, GUI webchat/dashboard, TUI, cron jobs.
- **Severity:** High — blocks any sustained work with GPT-5.5. Sessions fail within 1–4 tool-call turns. The agent goes silent with no user-visible error (silent failure).
- **Frequency:** 9/9 codex threads timed out in one day of testing. 52 thread bindings total, 9 timed out = ~17% failure rate per thread binding, but since failures tend to cluster after a few successful turns, the per-session failure rate is effectively 100%.
- **Consequence:** GPT-5.5 is unusable for any multi-turn or tool-call-heavy work. Users must switch to non-codex models as a workaround. Auth profile is falsely marked as failed after each timeout, potentially cascading into failover routing issues.

### Additional information

**Last known good version:** 2026.5.12 (bundled `@openai/codex` 0.130.0) — had codex overhead but no turn drops.
**First known bad version:** 2026.5.25-beta.1 — also present on 2026.5.26 (main).

**Hypothesis — SSE stream leak:**
The codex plugin opens SSE connections to the OpenAI Responses API for each turn. Evidence suggests these streams are not properly closed after turn completion: TCP established connections climb during turns (3→7), event loop utilization hits 100%, and accumulated pending I/O handlers prevent new SSE events from being processed. The `item/completed` notification gets through (last thing processed before saturation) but `response.completed` never arrives. After the 60s idle timeout fires and cleanup runs, all metrics return to baseline — confirming the cleanup works but is triggered too late.

**Secondary bug — auth profile poisoning:**
Each timeout marks the auth profile as failed (`auth_profile_failure_state_updated`, reason: `timeout`). This is a false positive — auth succeeded, the thread was bound, tool calls executed. A turn timeout is not an auth failure.

**Workaround:**
Switch to any non-codex model. All non-OpenAI models use the PI runtime, bypass the codex plugin entirely, and work reliably through the same gateway pipeline:
```json
{ "model": { "primary": "opencode-go/deepseek-v4-pro", "fallbacks": ["fireworks-ai/.../kimi-k2p6-turbo", "xai/grok-4.3"] } }
```

**Note:** Model-scoped `agentRuntime.id: "pi"` does NOT work for `openai/*` models — the codex plugin auto-claims the turn regardless of the policy. This was tested and reverted.

**Data files (available on request):**
- Gateway log: `/tmp/openclaw/openclaw-2026-05-26.log`
- Process monitor CSV (5s sample): `codex-monitor-20260526-230334.log`
- Response time data (32+ message pairs): `response-time-data-20260526.md`
- 7 session JSONL files with full message/tool/usage data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Beta blocker: codex — app-server turns silently drop with event loop saturation #86948

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Beta blocker: codex — app-server turns silently drop with event loop saturation #86948

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions