Skip to content

CLI/TUI session can restart during long tool-heavy turns after repeated 'Function call output is missing for call id' #14824

@drindt

Description

@drindt

Summary

Codex repeatedly restarts or loses turn continuity during long-running sessions with many tool calls,
especially while polling active exec_command sessions via repeated write_stdin calls.

The visible user symptom is that the agent "crashes after a while", the conversation resumes with
lost short-term continuity, and in some cases running command output retrieval gets aborted mid-flow.

Environment

  • Codex client version: 0.114.0
  • Observed on: 2026-03-16
  • Host path with logs: /home/drindt/.codex
  • Active thread during the most recent reproductions:
    019cf764-6fc4-7cd3-81d3-3872977ee047

User-visible behavior

  • Long-running task proceeds normally for several minutes.
  • Agent polls running commands repeatedly with write_stdin.
  • After a while the session appears to restart or lose continuity.
  • The user sees repeated interruptions and has to say Fortsetzen. multiple times.

Reproduction pattern

  1. Start a long-running terminal task with exec_command, for example:
    make gcp-development-vm-tunnel-up
  2. Poll the running session repeatedly with write_stdin.
  3. Interleave further tool calls and file edits during the same long turn.
  4. After enough iterations, Codex loses continuity and the session is effectively restarted.

Strong indicators from local logs

Repeated internal normalization error

/home/drindt/.codex/log/codex-tui.log shows repeated entries like:

  • 2026-03-16T17:45:09Z ... Function call output is missing for call id: call_8RF9wsbJahTQ8eOYUnuYLKYa
  • The same message repeats many times for the same thread.
  • A second call id also appears later:
    call_G7bB0oUZRjUNo8KOmxuv8oAU

This is visible both in codex-tui.log and in logs_1.sqlite.

Session restart / shutdown around the same time

The same log file shows session shutdown/re-init around the affected period:

  • 2026-03-16T17:52:41Z ... codex_core::codex::handlers: Shutting down Codex instance
  • Shortly after:
    2026-03-16T17:52:54Z ... Resumed rollout successfully from "/home/drindt/.codex/sessions/2026/03/16/rollout-2026-03-16T17-04-40-019cf764-6fc4-7cd3-81d3-3872977ee047.jsonl"

Related warnings seen in the same timeframe

  • Shell snapshot deletion warnings:
    Failed to delete shell snapshot ... No such file or directory
  • File watcher warning:
    failed to unwatch /home/drindt/.codex/skills/.system: No watch was found
  • MCP process group cleanup warnings in earlier restarts:
    Failed to kill MCP process group ... No such process

These warnings may be secondary, but they cluster around the restart events.

Concrete evidence query

The following query returns the relevant thread-local events:

select datetime(ts,'unixepoch','localtime'), level, target, substr(message,1,220)
from logs
where thread_id='019cf764-6fc4-7cd3-81d3-3872977ee047'
  and ts >= strftime('%s','2026-03-16 17:40:00')
order by ts desc, ts_nanos desc
limit 120;

Likely failure mode

Codex appears to enter an inconsistent internal state where tool-call bookkeeping loses the output for
one or more call ids (Function call output is missing for call id ...). After that, the session is
eventually shut down and resumed, which looks like a crash from the user perspective.

This does not currently look like a user-shell process crash in the target repo. It looks more
like an internal Codex session/state management bug during long tool-heavy turns.

Impact

  • Long debugging sessions become unreliable.
  • Operator trust drops because the agent appears to "randomly crash".
  • The user must manually continue the session multiple times.
  • Mid-flight reasoning context is partially lost even though rollout resume exists.

Requested investigation

  1. Investigate why context_manager::normalize repeatedly logs
    Function call output is missing for call id ....
  2. Check whether repeated write_stdin polling of long-running exec_command sessions can orphan or
    desynchronize tool-call bookkeeping.
  3. Check whether session shutdown/resume is being triggered as a recovery path for this state.
  4. Review the surrounding shell snapshot and file watcher warnings for causal relevance.

Relevant local files

  • /home/drindt/.codex/log/codex-tui.log
  • /home/drindt/.codex/logs_1.sqlite
  • /home/drindt/.codex/history.jsonl

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions