Skip to content

TUI exits to shell on subagent/thread switch when startup skills/list fails; parent rollout stays empty #16914

@iqdoctor

Description

@iqdoctor

Summary

On codex-cli 0.119.0-alpha.11, switching into a freshly spawned subagent thread from the TUI can terminate the parent TUI back to the shell with:

Error: skills/list failed in TUI

This is distinct from the stale-busy-spinner bug in #16904. In this case the TUI actually exits, the parent rollout file stays empty, and the spawned child thread continues to exist independently.

Environment

  • @openai/codex 0.119.0-alpha.11
  • local source audited at e169c915824307eef6c175b7f28fb381da853ef0
  • Linux in tmux
  • happened on April 6, 2026 at about 15:26:29Z (18:26:29 Europe/Moscow)

Repro shape

I hit this while working in a tmux window named copilot fix.

High-level sequence:

  1. Run a normal Codex TUI session in tmux.
  2. Spawn a clean subagent / switch into that agent thread from the parent TUI.
  3. The UI briefly shows the spawned agent.
  4. The parent TUI exits to the shell with Error: skills/list failed in TUI.

The specific user action was a request to run a review in a clean subagent without forked history, but the important part seems to be subagent spawn -> thread/session switch -> startup skills refresh.

Actual behavior

  • The pane returned to a shell prompt.
  • The visible last error in the pane was:
Error: skills/list failed in TUI
  • The parent thread's rollout file was created but remained empty:
    • ~/.codex/sessions/2026/04/06/rollout-2026-04-06T15-42-28-019d62d0-d99e-7132-81d5-cf06a8fad414.jsonl (size=0)
  • The parent shell snapshot file also remained empty:
    • ~/.codex/shell_snapshots/019d62d0-d99e-7132-81d5-cf06a8fad414.tmp-1775479348072805930 (size=0)
  • The spawned child thread did exist and got its own rollout:
    • child thread id 019d6367-056d-7ef2-a8f5-9117c84e6c38

Runtime evidence

From ~/.codex/log/codex-tui.log around the crash window for parent thread 019d62d0-d99e-7132-81d5-cf06a8fad414:

  • repeated rollout recorder failures before and during spawn:
failed to record rollout items: failed to queue rollout items: channel closed
  • then successful subagent/session init for child thread 019d6367-056d-7ef2-a8f5-9117c84e6c38

So by the time the TUI switched/attached, the parent already had a broken rollout writer, and then the TUI surfaced skills/list failed in TUI before exiting.

Code-path audit

I did a local source review of 0.119.0-alpha.11. The most important finding is that skills/list is treated as fatal in a path where it should almost certainly be degradable.

1. Thread/session switch triggers a skills refresh automatically

When a session is configured, the chat widget immediately submits list_skills(force_reload = true):

  • codex-rs/tui/src/chatwidget.rs:2009

That SessionConfigured path is reached during thread replacement / attach flows such as:

  • codex-rs/tui/src/app.rs:3319
  • codex-rs/tui/src/app.rs:3328
  • codex-rs/tui/src/app.rs:3056

So switching to a spawned subagent thread can trigger a fresh skills/list during the attach/configure path.

2. skills/list RPC failures bubble as hard errors out of the event loop

The RPC wrapper itself uses:

  • codex-rs/tui/src/app_server_session.rs:618
.wrap_err("skills/list failed in TUI")

Then AppCommandView::ListSkills uses await?:

  • codex-rs/tui/src/app.rs:2322

And both AppEvent::CodexOp and AppEvent::SubmitThreadOp also use await?:

  • codex-rs/tui/src/app.rs:4268
  • codex-rs/tui/src/app.rs:4271

The main loop breaks on any error returned by handle_event:

  • codex-rs/tui/src/app.rs:3863
  • codex-rs/tui/src/app.rs:3925

That makes a skills/list failure terminate the entire TUI instead of surfacing a non-fatal UI error and continuing.

3. There is a separate fatal-disconnect path as well

If the app-server event stream disconnects, the adapter explicitly requests fatal exit:

  • codex-rs/tui/src/app/app_server_adapter.rs:149
  • codex-rs/tui/src/app.rs:4264

I did not capture a direct app-server event stream disconnected log line for this exact repro, so I cannot prove that path fired here. But the code means the TUI currently has no graceful recovery for an app-server-side failure while switching threads.

4. Rollout recorder breakage is visible before the exit

The rollout recorder error comes from:

  • codex-rs/rollout/src/recorder.rs:504
  • codex-rs/core/src/codex.rs:3744

This matches the runtime log evidence that the parent session's rollout writer was already broken before the visible skills/list failed in TUI exit.

Why this seems buggy

skills/list is not a critical user turn operation. It is a startup / refresh convenience RPC. Failing to refresh skills should not crash the whole TUI, especially during thread switches or subagent attach.

At minimum, this path should degrade to an inline error or warning and leave the session alive.

Expected behavior

  • Switching to a spawned subagent thread should not exit the parent TUI.
  • If skills/list fails, the UI should continue running and display a recoverable error.
  • If the rollout recorder channel is already dead, that state should be surfaced clearly and should not cascade into a blank parent rollout plus fatal TUI exit.

Test gap

I found tests around SessionConfigured handling and thread attach behavior, but I did not find a regression test for:

  • SessionConfigured -> list_skills(force_reload=true) failing during thread switch / attach
  • TUI surviving that failure without exiting

Suggested fix direction

Two likely fixes, both useful:

  1. Make skills/list failures non-fatal in the SessionConfigured / refresh path.
  2. Investigate why the parent rollout recorder channel can already be closed during the same turn, since that appears to leave the parent thread with an empty rollout file and may be a precursor to the crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    TUIIssues related to the terminal user interface: text input, menus and dialogs, and terminal displaybugSomething isn't workingskillsIssues related to skills

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions