CLI/TUI session can restart during long tool-heavy turns after repeated 'Function call output is missing for call id'

## Summary

Codex repeatedly restarts or loses turn continuity during long-running sessions with many tool calls,
especially while polling active `exec_command` sessions via repeated `write_stdin` calls.

The visible user symptom is that the agent "crashes after a while", the conversation resumes with
lost short-term continuity, and in some cases running command output retrieval gets aborted mid-flow.

## Environment

- Codex client version: `0.114.0`
- Observed on: `2026-03-16`
- Host path with logs: `/home/drindt/.codex`
- Active thread during the most recent reproductions:
  `019cf764-6fc4-7cd3-81d3-3872977ee047`

## User-visible behavior

- Long-running task proceeds normally for several minutes.
- Agent polls running commands repeatedly with `write_stdin`.
- After a while the session appears to restart or lose continuity.
- The user sees repeated interruptions and has to say `Fortsetzen.` multiple times.

## Reproduction pattern

1. Start a long-running terminal task with `exec_command`, for example:
   `make gcp-development-vm-tunnel-up`
2. Poll the running session repeatedly with `write_stdin`.
3. Interleave further tool calls and file edits during the same long turn.
4. After enough iterations, Codex loses continuity and the session is effectively restarted.

## Strong indicators from local logs

### Repeated internal normalization error

`/home/drindt/.codex/log/codex-tui.log` shows repeated entries like:

- `2026-03-16T17:45:09Z ... Function call output is missing for call id: call_8RF9wsbJahTQ8eOYUnuYLKYa`
- The same message repeats many times for the same thread.
- A second call id also appears later:
  `call_G7bB0oUZRjUNo8KOmxuv8oAU`

This is visible both in `codex-tui.log` and in `logs_1.sqlite`.

### Session restart / shutdown around the same time

The same log file shows session shutdown/re-init around the affected period:

- `2026-03-16T17:52:41Z ... codex_core::codex::handlers: Shutting down Codex instance`
- Shortly after:
  `2026-03-16T17:52:54Z ... Resumed rollout successfully from "/home/drindt/.codex/sessions/2026/03/16/rollout-2026-03-16T17-04-40-019cf764-6fc4-7cd3-81d3-3872977ee047.jsonl"`

### Related warnings seen in the same timeframe

- Shell snapshot deletion warnings:
  `Failed to delete shell snapshot ... No such file or directory`
- File watcher warning:
  `failed to unwatch /home/drindt/.codex/skills/.system: No watch was found`
- MCP process group cleanup warnings in earlier restarts:
  `Failed to kill MCP process group ... No such process`

These warnings may be secondary, but they cluster around the restart events.

## Concrete evidence query

The following query returns the relevant thread-local events:

```sql
select datetime(ts,'unixepoch','localtime'), level, target, substr(message,1,220)
from logs
where thread_id='019cf764-6fc4-7cd3-81d3-3872977ee047'
  and ts >= strftime('%s','2026-03-16 17:40:00')
order by ts desc, ts_nanos desc
limit 120;
```

## Likely failure mode

Codex appears to enter an inconsistent internal state where tool-call bookkeeping loses the output for
one or more call ids (`Function call output is missing for call id ...`). After that, the session is
eventually shut down and resumed, which looks like a crash from the user perspective.

This does **not** currently look like a user-shell process crash in the target repo. It looks more
like an internal Codex session/state management bug during long tool-heavy turns.

## Impact

- Long debugging sessions become unreliable.
- Operator trust drops because the agent appears to "randomly crash".
- The user must manually continue the session multiple times.
- Mid-flight reasoning context is partially lost even though rollout resume exists.

## Requested investigation

1. Investigate why `context_manager::normalize` repeatedly logs
   `Function call output is missing for call id ...`.
2. Check whether repeated `write_stdin` polling of long-running `exec_command` sessions can orphan or
   desynchronize tool-call bookkeeping.
3. Check whether session shutdown/resume is being triggered as a recovery path for this state.
4. Review the surrounding shell snapshot and file watcher warnings for causal relevance.

## Relevant local files

- `/home/drindt/.codex/log/codex-tui.log`
- `/home/drindt/.codex/logs_1.sqlite`
- `/home/drindt/.codex/history.jsonl`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI/TUI session can restart during long tool-heavy turns after repeated 'Function call output is missing for call id' #14824

Summary

Environment

User-visible behavior

Reproduction pattern

Strong indicators from local logs

Repeated internal normalization error

Session restart / shutdown around the same time

Related warnings seen in the same timeframe

Concrete evidence query

Likely failure mode

Impact

Requested investigation

Relevant local files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CLI/TUI session can restart during long tool-heavy turns after repeated 'Function call output is missing for call id' #14824

Description

Summary

Environment

User-visible behavior

Reproduction pattern

Strong indicators from local logs

Repeated internal normalization error

Session restart / shutdown around the same time

Related warnings seen in the same timeframe

Concrete evidence query

Likely failure mode

Impact

Requested investigation

Relevant local files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions