Summary
Codex repeatedly restarts or loses turn continuity during long-running sessions with many tool calls,
especially while polling active exec_command sessions via repeated write_stdin calls.
The visible user symptom is that the agent "crashes after a while", the conversation resumes with
lost short-term continuity, and in some cases running command output retrieval gets aborted mid-flow.
Environment
- Codex client version:
0.114.0
- Observed on:
2026-03-16
- Host path with logs:
/home/drindt/.codex
- Active thread during the most recent reproductions:
019cf764-6fc4-7cd3-81d3-3872977ee047
User-visible behavior
- Long-running task proceeds normally for several minutes.
- Agent polls running commands repeatedly with
write_stdin.
- After a while the session appears to restart or lose continuity.
- The user sees repeated interruptions and has to say
Fortsetzen. multiple times.
Reproduction pattern
- Start a long-running terminal task with
exec_command, for example:
make gcp-development-vm-tunnel-up
- Poll the running session repeatedly with
write_stdin.
- Interleave further tool calls and file edits during the same long turn.
- After enough iterations, Codex loses continuity and the session is effectively restarted.
Strong indicators from local logs
Repeated internal normalization error
/home/drindt/.codex/log/codex-tui.log shows repeated entries like:
2026-03-16T17:45:09Z ... Function call output is missing for call id: call_8RF9wsbJahTQ8eOYUnuYLKYa
- The same message repeats many times for the same thread.
- A second call id also appears later:
call_G7bB0oUZRjUNo8KOmxuv8oAU
This is visible both in codex-tui.log and in logs_1.sqlite.
Session restart / shutdown around the same time
The same log file shows session shutdown/re-init around the affected period:
2026-03-16T17:52:41Z ... codex_core::codex::handlers: Shutting down Codex instance
- Shortly after:
2026-03-16T17:52:54Z ... Resumed rollout successfully from "/home/drindt/.codex/sessions/2026/03/16/rollout-2026-03-16T17-04-40-019cf764-6fc4-7cd3-81d3-3872977ee047.jsonl"
Related warnings seen in the same timeframe
- Shell snapshot deletion warnings:
Failed to delete shell snapshot ... No such file or directory
- File watcher warning:
failed to unwatch /home/drindt/.codex/skills/.system: No watch was found
- MCP process group cleanup warnings in earlier restarts:
Failed to kill MCP process group ... No such process
These warnings may be secondary, but they cluster around the restart events.
Concrete evidence query
The following query returns the relevant thread-local events:
select datetime(ts,'unixepoch','localtime'), level, target, substr(message,1,220)
from logs
where thread_id='019cf764-6fc4-7cd3-81d3-3872977ee047'
and ts >= strftime('%s','2026-03-16 17:40:00')
order by ts desc, ts_nanos desc
limit 120;
Likely failure mode
Codex appears to enter an inconsistent internal state where tool-call bookkeeping loses the output for
one or more call ids (Function call output is missing for call id ...). After that, the session is
eventually shut down and resumed, which looks like a crash from the user perspective.
This does not currently look like a user-shell process crash in the target repo. It looks more
like an internal Codex session/state management bug during long tool-heavy turns.
Impact
- Long debugging sessions become unreliable.
- Operator trust drops because the agent appears to "randomly crash".
- The user must manually continue the session multiple times.
- Mid-flight reasoning context is partially lost even though rollout resume exists.
Requested investigation
- Investigate why
context_manager::normalize repeatedly logs
Function call output is missing for call id ....
- Check whether repeated
write_stdin polling of long-running exec_command sessions can orphan or
desynchronize tool-call bookkeeping.
- Check whether session shutdown/resume is being triggered as a recovery path for this state.
- Review the surrounding shell snapshot and file watcher warnings for causal relevance.
Relevant local files
/home/drindt/.codex/log/codex-tui.log
/home/drindt/.codex/logs_1.sqlite
/home/drindt/.codex/history.jsonl
Summary
Codex repeatedly restarts or loses turn continuity during long-running sessions with many tool calls,
especially while polling active
exec_commandsessions via repeatedwrite_stdincalls.The visible user symptom is that the agent "crashes after a while", the conversation resumes with
lost short-term continuity, and in some cases running command output retrieval gets aborted mid-flow.
Environment
0.114.02026-03-16/home/drindt/.codex019cf764-6fc4-7cd3-81d3-3872977ee047User-visible behavior
write_stdin.Fortsetzen.multiple times.Reproduction pattern
exec_command, for example:make gcp-development-vm-tunnel-upwrite_stdin.Strong indicators from local logs
Repeated internal normalization error
/home/drindt/.codex/log/codex-tui.logshows repeated entries like:2026-03-16T17:45:09Z ... Function call output is missing for call id: call_8RF9wsbJahTQ8eOYUnuYLKYacall_G7bB0oUZRjUNo8KOmxuv8oAUThis is visible both in
codex-tui.logand inlogs_1.sqlite.Session restart / shutdown around the same time
The same log file shows session shutdown/re-init around the affected period:
2026-03-16T17:52:41Z ... codex_core::codex::handlers: Shutting down Codex instance2026-03-16T17:52:54Z ... Resumed rollout successfully from "/home/drindt/.codex/sessions/2026/03/16/rollout-2026-03-16T17-04-40-019cf764-6fc4-7cd3-81d3-3872977ee047.jsonl"Related warnings seen in the same timeframe
Failed to delete shell snapshot ... No such file or directoryfailed to unwatch /home/drindt/.codex/skills/.system: No watch was foundFailed to kill MCP process group ... No such processThese warnings may be secondary, but they cluster around the restart events.
Concrete evidence query
The following query returns the relevant thread-local events:
Likely failure mode
Codex appears to enter an inconsistent internal state where tool-call bookkeeping loses the output for
one or more call ids (
Function call output is missing for call id ...). After that, the session iseventually shut down and resumed, which looks like a crash from the user perspective.
This does not currently look like a user-shell process crash in the target repo. It looks more
like an internal Codex session/state management bug during long tool-heavy turns.
Impact
Requested investigation
context_manager::normalizerepeatedly logsFunction call output is missing for call id ....write_stdinpolling of long-runningexec_commandsessions can orphan ordesynchronize tool-call bookkeeping.
Relevant local files
/home/drindt/.codex/log/codex-tui.log/home/drindt/.codex/logs_1.sqlite/home/drindt/.codex/history.jsonl