openai-codex can hang on Working... with zero-usage aborted turns

## Summary

`openai-codex` / `gpt-5.5` sometimes leaves the interactive TUI stuck on `Working...` with no streamed text, no tool call, and no visible error. The only way to recover is pressing Escape, which records an aborted assistant turn.

This has happened repeatedly over the last couple of days in normal interactive use.

## Environment

- pi: `0.75.5`
- Node: `v22.22.1`
- Provider/model: `openai-codex` / `gpt-5.5`
- Thinking level: `xhigh`
- No explicit `transport` setting in user settings
- Default tools/extensions enabled

## Observed behavior

When the issue occurs:

- TUI keeps showing `Working...` for minutes.
- No assistant text is streamed.
- No tool call is emitted.
- Pressing Escape aborts the turn.
- The saved session entry for that assistant turn has:

```json
{
  "role": "assistant",
  "stopReason": "aborted",
  "content": [],
  "usage": {
    "input": 0,
    "output": 0,
    "cacheRead": 0,
    "cacheWrite": 0,
    "totalTokens": 0
  }
}
```

I saw this pattern multiple times, including cases where the previous turn had completed normally and the next user message then hung before any provider usage was recorded.

This looks different from a long reasoning turn: there is no usage, no partial reasoning/text, and no tool call.

## Expected behavior

If the provider/transport stalls before the first event, pi should eventually surface a timeout/transport error or retry in a bounded way, instead of keeping the TUI in `Working...` indefinitely until the user manually aborts.

## Suspected area

From the installed `0.75.5` package:

- `SettingsManager.getTransport()` returns `"auto"` when no setting is present.
- The docs/settings table says the default is `"sse"`, so there may also be a docs/runtime default mismatch.
- For `openai-codex-responses`, `transport=auto` attempts WebSocket first.
- `retry.provider.timeoutMs` appears to be passed into `streamSimple()`, but the `openai-codex-responses` implementation does not seem to apply it to the Codex fetch/WebSocket wait path in the same way as SDK-based providers.
- The WebSocket event loop can wait for the first message/completion without an obvious idle timer.

So the likely failure mode is: Codex WebSocket/transport waits before the first event; no assistant event is produced; interactive UI keeps showing `Working...`; Escape finally records an aborted zero-usage turn.

## Suggested fix / mitigation

Possible fixes:

1. Add a hard idle timeout for `openai-codex-responses` WebSocket and SSE stream waits, especially before the first event.
2. Ensure `retry.provider.timeoutMs` or `httpIdleTimeoutMs` applies consistently to this provider path.
3. If `auto` is intended as the runtime default, update docs; if `sse` is intended, adjust `SettingsManager.getTransport()`.
4. Optionally show a clearer status/error when a provider turn has produced zero events for a long time.

Local workaround I am considering, but have not applied yet:

```json
{
  "transport": "sse",
  "httpIdleTimeoutMs": 120000
}
```

I can provide more sanitized session metadata if helpful, but I avoided attaching raw session logs because they contain private conversation/tool context.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai-codex can hang on Working... with zero-usage aborted turns #4945

Summary

Environment

Observed behavior

Expected behavior

Suspected area

Suggested fix / mitigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

openai-codex can hang on Working... with zero-usage aborted turns #4945

Description

Summary

Environment

Observed behavior

Expected behavior

Suspected area

Suggested fix / mitigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions