You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In OpenClaw 2026.4.10 (PR #64298, released 2026-04-10), using a codex/gpt-* model (bundled Codex plugin) causes inference to hang indefinitely with no error, no log output from the codex subsystem, and no timeout. The child codex app-server subprocess is spawned successfully and sits idle. The bug occurs with both stdio and websocket transports and with both codex-cli 0.118.0 and 0.120.0, ruling out a transport-layer or codex-cli version issue.
The existing openai-codex/gpt-5.4 OAuth path continues to work fine, so this is specific to the new bundled Codex harness.
Environment
openclaw2026.4.10 (44e5b62)
codex-cli0.118.0 and 0.120.0 (both reproduce)
macOS 26.4 (arm64), Node v25.9.0
Codex CLI auth mode: chatgpt (ChatGPT Pro OAuth), verified with codex login status
openclaw capability model run --local --model codex/gpt-5.4 --prompt "say hi"
Expected: A reply within a few seconds. Actual: The command hangs for 4+ minutes until killed. No output. No error. No log entry from the codex subsystem in /tmp/openclaw/openclaw-YYYY-MM-DD.log.
ps during hang shows the child process spawned but idle:
{"id":1,"result":{"userAgent":"openclaw/0.118.0 (Mac OS 26.4.0; arm64) kitty (openclaw; 2026.4.10)","codexHome":"/Users/jankutschera/.codex","platformFamily":"unix","platformOs":"macos"}}
The returned userAgent parses fine through the current regex /^[^/\s]+\/(\d+\.\d+\.\d+(?:[-+][^\s()]*)?)/ at dist/harness-uOoDA-Wv.js:588, so the parser path does not appear to be the failure point in this case.
openai-codex/gpt-5.4 (legacy OAuth path) works fine — the issue is isolated to the new bundled Codex harness.
WebSocket transport hangs identically. I started an external codex app-server --listen ws://127.0.0.1:39175 (confirmed reachable via nc -zv), switched the plugin to appServer.transport: "websocket" with that URL, and reran. OpenClaw connected (I can see the websocket accepted and codex_core::codex logs a skill load on the server side), but no reply ever flows back to OpenClaw. This rules out transport-specific stdio issues.
Source analysis
File: dist/harness-uOoDA-Wv.js (bundled)
No timeout on initial handshake
CodexAppServerClient.initialize() at line 380-392:
initialize() calls this.request("initialize", ...)without any options.timeoutMs, so the if branch at line 419 is false and no timeout is ever set for the initialize request. If the response doesn't arrive (or isn't parsed/matched back to the pending promise), OpenClaw waits forever.
The caller at line 643 (getSharedCodexAppServerClient()) does not pass timeoutMs, so the wrapper effectively applies a 0ms timeout (no timeout). The "codex app-server initialize timed out" message is dead code for the current call path.
writeMessage at line 476-477: this.child.stdin.write(\${JSON.stringify(message)}\n`)— newline-terminated, nojsonrpc: "2.0"field. Manual pipe test above confirmscodex app-server` accepts this format.
codex-cli version mismatch — reproduced on both 0.118.0 (min required per assertSupportedCodexAppServerVersion) and 0.120.0.
auth / quota — direct codex exec works, ChatGPT Pro OAuth is verified via codex login status.
Transport layer — both stdio and websocket transports hang with identical symptoms.
Malformed SKILL.md — I had one skill missing frontmatter (/Users/.../github/.agents/claw-score/SKILL.md); I fixed it, no change.
Root cause hypothesis
The hang is on the OpenClaw client side, not the transport and not codex app-server. Either:
The request/response correlation (mapping the id on the incoming response back to the pending Map) fails silently, or
The stdout read loop never sees the bytes (possibly a line-splitting / buffering issue in the stdio reader — not visible in the bundled code excerpts above), or
There's an unhandled-promise-rejection path that swallows an exception during assertSupportedCodexAppServerVersion without logging.
Either way, the immediate visible bug is that CodexAppServerClient.initialize() has no timeout protection — which makes this latent failure mode manifest as an infinite hang instead of a clear error.
Suggested minimal fix (to surface the underlying issue)
And/or at line 651, pass the requestTimeoutMs from plugin config (60_000 default) instead of 0. That will at least convert the silent hang into an actionable initialize timed out error that can be diagnosed properly.
Happy to gather more diagnostics (e.g., dtruss on the child stdio, NODE_DEBUG=child_process log, or a minimal standalone Node client that reproduces the spawn) if helpful — just let me know what would move this forward.
Summary
In OpenClaw 2026.4.10 (PR #64298, released 2026-04-10), using a
codex/gpt-*model (bundled Codex plugin) causes inference to hang indefinitely with no error, no log output from the codex subsystem, and no timeout. The childcodex app-serversubprocess is spawned successfully and sits idle. The bug occurs with both stdio and websocket transports and with both codex-cli 0.118.0 and 0.120.0, ruling out a transport-layer or codex-cli version issue.The existing
openai-codex/gpt-5.4OAuth path continues to work fine, so this is specific to the new bundled Codex harness.Environment
openclaw2026.4.10 (44e5b62)codex-cli0.118.0 and 0.120.0 (both reproduce)chatgpt(ChatGPT Pro OAuth), verified withcodex login statusCODEX_INTERNAL_ORIGINATOR_OVERRIDEis not set (checkedenvandlaunchctl getenv), so PR fix: [codex-harness] parse Desktop app-server user agents #64666 parser bug is ruled outReproduction
Wicki agent config:
{ "id": "main", "model": { "primary": "codex/gpt-5.4", "fallbacks": ["openai-codex/gpt-5.4", "ollama/minimax-m2.7:cloud"] } }Plugin config:
{ "plugins": { "entries": { "codex": { "enabled": true, "config": { "discovery": { "enabled": true } } } } } }Run:
openclaw capability model run --local --model codex/gpt-5.4 --prompt "say hi"Expected: A reply within a few seconds.
Actual: The command hangs for 4+ minutes until killed. No output. No error. No log entry from the
codexsubsystem in/tmp/openclaw/openclaw-YYYY-MM-DD.log.psduring hang shows the child process spawned but idle:What works (to narrow the scope)
Direct Codex CLI inference works (so auth + ChatGPT Pro OAuth quota are fine):
Manual JSON-RPC handshake to
codex app-server --listen stdio://works:Returns a clean response:
{"id":1,"result":{"userAgent":"openclaw/0.118.0 (Mac OS 26.4.0; arm64) kitty (openclaw; 2026.4.10)","codexHome":"/Users/jankutschera/.codex","platformFamily":"unix","platformOs":"macos"}}The returned
userAgentparses fine through the current regex/^[^/\s]+\/(\d+\.\d+\.\d+(?:[-+][^\s()]*)?)/atdist/harness-uOoDA-Wv.js:588, so the parser path does not appear to be the failure point in this case.openai-codex/gpt-5.4(legacy OAuth path) works fine — the issue is isolated to the new bundled Codex harness.WebSocket transport hangs identically. I started an external
codex app-server --listen ws://127.0.0.1:39175(confirmed reachable vianc -zv), switched the plugin toappServer.transport: "websocket"with that URL, and reran. OpenClaw connected (I can see the websocket accepted andcodex_core::codexlogs a skill load on the server side), but no reply ever flows back to OpenClaw. This rules out transport-specific stdio issues.Source analysis
File:
dist/harness-uOoDA-Wv.js(bundled)No timeout on initial handshake
CodexAppServerClient.initialize()at line 380-392:CodexAppServerClient.request()at line 393-449:initialize()callsthis.request("initialize", ...)without anyoptions.timeoutMs, so theifbranch at line 419 is false and no timeout is ever set for the initialize request. If the response doesn't arrive (or isn't parsed/matched back to the pending promise), OpenClaw waits forever.Shared-client bootstrap timeout also 0ms
At line 651 (shared client initializer):
The caller at line 643 (
getSharedCodexAppServerClient()) does not passtimeoutMs, so the wrapper effectively applies a 0ms timeout (no timeout). The "codex app-server initialize timed out" message is dead code for the current call path.Transport is correct
createStdioTransportat line 236-246:writeMessageat line 476-477:this.child.stdin.write(\${JSON.stringify(message)}\n`)— newline-terminated, nojsonrpc: "2.0"field. Manual pipe test above confirmscodex app-server` accepts this format.Ruled out
CODEX_INTERNAL_ORIGINATOR_OVERRIDEis not set in my environment; userAgent parses cleanly. Also, this PR's symptom is a thrown error, not a silent hang.assertSupportedCodexAppServerVersion) and 0.120.0.codex execworks, ChatGPT Pro OAuth is verified viacodex login status./Users/.../github/.agents/claw-score/SKILL.md); I fixed it, no change.Root cause hypothesis
The hang is on the OpenClaw client side, not the transport and not
codex app-server. Either:idon the incoming response back to thependingMap) fails silently, orassertSupportedCodexAppServerVersionwithout logging.Either way, the immediate visible bug is that
CodexAppServerClient.initialize()has no timeout protection — which makes this latent failure mode manifest as an infinite hang instead of a clear error.Suggested minimal fix (to surface the underlying issue)
At line 382, pass a reasonable timeout:
And/or at line 651, pass the
requestTimeoutMsfrom plugin config (60_000 default) instead of0. That will at least convert the silent hang into an actionableinitialize timed outerror that can be diagnosed properly.Related
fix: [codex-harness] parse Desktop app-server user agents— different symptom (throw, not hang), but also in the codex harness init path.fix: [codex-harness] avoid treating cumulative app-server usage as current context— adjacent area.Happy to gather more diagnostics (e.g.,
dtrusson the child stdio,NODE_DEBUG=child_processlog, or a minimal standalone Node client that reproduces the spawn) if helpful — just let me know what would move this forward.