Skip to content

TUI shows "Timeout waiting for child process to exit" when WebSocket transport fails; 75s startup delay before HTTP fallback #22634

@echograil

Description

@echograil

What version of Codex CLI is running?

codex-cli 0.130.0-alpha.5

What subscription do you have?

ChatGPT Plus

Which model were you using?

gpt-5.5 medium

What platform is your computer?

Microsoft Windows NT 10.0.26200.0 x64

What terminal emulator and version are you using (if applicable)?

PowerShell on Windows Terminal

Codex doctor report

not available

What issue are you seeing?

Summary

On Windows with ChatGPT auth, every interactive hello (or any first message) takes exactly ~75 seconds before responding. The TUI shows misleading "Timeout waiting for child process to exit" messages with "Reconnecting... 1/5 ... 5/5" before eventually succeeding.

The actual root cause is that Codex tries responses_websocket transport first, each attempt times out at 15 seconds, and after 5 retries (5 × 15s = 75s) it falls back to HTTP/SSE and succeeds.

Symptoms

  1. TUI shows misleading error: Timeout waiting for child process to exit — this points users toward debugging local subprocess issues, but sandbox.log shows all child processes exit with SUCCESS. The error is unrelated to subprocess management.

  2. Fixed 75-second delay on every first message of every session.

  3. Eventually succeeds after the 5th retry, when the client falls back to HTTP.

Logs

From ~/.codex/log/codex-tui.log, the actual failure pattern:stream disconnected - retrying sampling request
model_client.stream_responses_websocket
(repeated 5 times, ~15s apart)

Also seen during startup:
schannel: AcquireCredentialsHandle failed: SEC_E_NO_CREDENTIALS
failed to get curated plugins repository
failed to send remote plugin sync request

The plugin sync failures are a separate but related issue — disabling plugins via features disable plugins does NOT remove the 75s delay. Only switching transport away from WebSocket fixes it.

Workaround

Adding a custom HTTP-only provider to ~/.codex/config.toml:

model_provider = "chatgpt-http"

[model_providers.chatgpt-http]
name = "ChatGPT HTTP"
base_url = "https://chatgpt.com/backend-api/codex"
wire_api = "responses"
requires_openai_auth = true
supports_websockets = false

After this, codex exec "Reply with exactly: hello" returns in ~6 seconds instead of ~75s.

Environment notes

  • Network: US residential connection (not behind corporate proxy, no VPN)
  • Antivirus: tested both with and without Huorong Security (no difference)
  • Sandbox mode: tested both unelevated and elevated (sandbox mode affects PATH injection warnings but does NOT affect the 75s delay)
  • Plugins/apps/tool_search/browser_use all disabled via features disable (does NOT remove the 75s delay)

The 75s delay is fully attributable to WebSocket transport retries.

Suggestions

  1. Fix the misleading error message: "Timeout waiting for child process to exit" should not be shown when the actual failure is WebSocket connection timeout. This sent debugging in completely the wrong direction for hours.

  2. Expose a CLI/config flag to disable WebSocket transport without requiring users to define a custom provider. Something like [transport] prefer_http = true would be ideal.

  3. Reduce WebSocket retry count or timeout from 5 × 15s to something faster (e.g., 2 × 5s). 75 seconds is far too long for any user to wait through without thinking the tool is broken.

  4. Improve error logging: surface "WebSocket transport failed, falling back to HTTP" prominently in the TUI, so users know what's happening.

Related

Possibly related: #19821

What steps can reproduce the bug?

Steps to reproduce

  1. Install Codex CLI 0.130.0-alpha.5 on Windows 11 (x64).
  2. Log in via ChatGPT OAuth (codex login).
  3. Use the default configuration in ~/.codex/config.toml (no custom provider defined).
  4. Launch interactive Codex:
5. In the TUI, send any first message, e.g. `hello`.

## What happens

- The TUI displays:Reconnecting... 1/5 (15s • esc to interrupt)
└ Timeout waiting for child process to exit
  This progresses through 2/5, 3/5, 4/5, 5/5, each ~15 seconds apart.
- After exactly ~75 seconds, the message is processed and the model replies normally.
- This happens **every time** on the first message of every new session — fully reproducible.

## Confirming root cause (non-interactive)

Same delay reproduces with `codex exec`:

```powershellC:\Users<user>\AppData\Local\OpenAI\Codex\bin\codex.exe exec --json --skip-git-repo-check -C D:\some\dir "Reply with exactly: hello"

Logs at `~/.codex/log/codex-tui.log` show the actual cause:stream disconnected - retrying sampling request
model_client.stream_responses_websocket
repeated 5 times before falling back to HTTP and succeeding.

## Confirming the fix

Adding a custom HTTP-only provider to `~/.codex/config.toml`:

```tomlmodel_provider = "chatgpt-http"[model_providers.chatgpt-http]
name = "ChatGPT HTTP"
base_url = "https://chatgpt.com/backend-api/codex"
wire_api = "responses"
requires_openai_auth = true
supports_websockets = false

After this change, the same `codex exec "Reply with exactly: hello"` returns in ~6 seconds. No retry messages, no 75s delay.

### What is the expected behavior?

## Expected behavior

1. **First message responds within a few seconds**, regardless of whether WebSocket transport works in the user's network environment. If WebSocket fails, the client should fall back to HTTP quickly (e.g., after 1–2 short retries), not after 5 × 15s = 75 seconds.

2. **Error messages should accurately describe the failure.** "Timeout waiting for child process to exit" is misleading — there is no failing child process. The actual failure is at the WebSocket transport layer. This message sent debugging in completely the wrong direction (suspected local antivirus interference, sandbox subprocess bugs, PATH issues) before logs revealed the real cause.

3. **A simple way to disable WebSocket transport should be available** without requiring users to define a custom provider. For example, a top-level config option like:
```toml
   [transport]
   prefer_http = true

or a CLI flag like --no-websocket.

  1. The TUI should surface "WebSocket transport unavailable, using HTTP" as an informational message rather than as a sequence of "Reconnecting / child process timeout" errors.

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingconnectivityIssues involving networking or endpoint connectivity problems (disconnections)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions