Skip to content

codex_exec: apply_patch deletes lose file content in rollout (exec starts threads in Limited persistence mode) #18533

@moqimoqidea

Description

@moqimoqidea

Summary

When codex_exec (the non-interactive CLI path) executes an apply_patch that deletes a file, the rollout file written to ~/.codex/sessions/<year>/<month>/<day>/rollout-<id>.jsonl retains only the model's raw DSL (*** Delete File: <path>), which by design carries no file body. The TUI originator does not have this problem because TUI threads are started with persist_extended_history: true, which keeps EventMsg::PatchApplyEnd — the event that carries structured FileChange::Delete { content }.

As a result, downstream consumers that read rollout files (audit tooling, code-change exporters, compliance logs) can see that a file was deleted under exec, but not what was deleted.

Environment

  • Repo: openai/codex, main at e3f44ca3b (Fix plugin cache panic when cwd is unavailable, Fix plugin cache panic when cwd is unavailable #18499)
  • Platform: observed on macOS (Darwin 25.4.0), reproduces anywhere rollout files are written
  • Codex originators compared: codex-tui (OK) vs codex_exec (broken)

Reproduction

Ask the same Codex model, under each originator, to create and delete a small file, then compare rollout JSONL.

originator: codex-tui — complete

{"type":"event_msg","payload":{
  "type":"patch_apply_end",
  "success":true,
  "changes":{
    "foo.txt":{"type":"delete","content":"hello\nworld\n"}
  }
}}

FileChange::Delete { content: String } carries the full file body — downstream audit readers can consume it directly.

originator: codex_exec — incomplete

{"type":"response_item","payload":{
  "type":"custom_tool_call",
  "name":"apply_patch",
  "call_id":"call_abc",
  "input":"*** Begin Patch\n*** Delete File: foo.txt\n*** End Patch\n"
}}

*** Delete File: in the *** Begin Patch DSL is a path-only hunk — it was never required to ship content. The paired custom_tool_call_output.exit_code === 0 confirms the delete succeeded, but the removed bytes are gone from the on-disk log.

Scenario Expected Actual
tui + delete rollout records deleted content works
tui + add/update rollout records full change works
exec + delete rollout records deleted content empty
exec + add *** Add File carries + lines works
exec + update *** Update File carries hunks works

Root Cause

The divergence is not in the apply_patch handler. The handler reads and attaches the deleted file's content on both paths — see codex-rs/apply-patch/src/invocation.rs where the Hunk::DeleteFile branch calls fs.read_file_text(&path, sandbox).await and stores the bytes on ApplyPatchFileChange::Delete { content }. convert_apply_patch_to_protocol (in codex-rs/core/src/apply_patch.rs) then preserves that content into FileChange::Delete { content }. The emitter issues a EventMsg::PatchApplyEnd carrying the structured changes map — in memory the exec path already has the content.

The divergence is in the rollout persistence policy:

  • codex-rs/rollout/src/policy.rs classifies EventMsg::PatchApplyEnd under EventPersistenceMode::Extended. In Limited mode it is filtered out before being written to rollout.
  • codex-rs/tui/src/app_server_session.rs passes persist_extended_history: true on thread/start, thread/resume, and thread/fork requests, so TUI rollouts run in Extended mode and PatchApplyEnd is preserved.
  • codex-rs/exec/src/lib.rs in thread_start_params_from_config builds ThreadStartParams with ..ThreadStartParams::default(), which leaves persist_extended_history at its bool default of false (the field was added as an experimental opt-in in feat(app-server): experimental flag to persist extended history #11227). Exec rollouts therefore run in Limited mode and drop PatchApplyEnd unconditionally.

The user-visible effect: exec rollouts retain only the raw response_item/custom_tool_call (which is always persisted), and that raw record cannot carry delete content because the *** Begin Patch DSL doesn't require it.

Proposed fix (minimal)

Opt codex_exec into Extended rollout persistence at thread start, matching TUI. This is a one-line change in codex-rs/exec/src/lib.rs:

 fn thread_start_params_from_config(config: &Config) -> ThreadStartParams {
     ThreadStartParams {
         model: config.model.clone(),
         model_provider: Some(config.model_provider_id.clone()),
         cwd: Some(config.cwd.to_string_lossy().to_string()),
         approval_policy: Some(config.permissions.approval_policy.value().into()),
         approvals_reviewer: approvals_reviewer_override_from_config(config),
         sandbox: sandbox_mode_from_policy(config.permissions.sandbox_policy.get()),
         config: config_request_overrides_from_config(config),
         ephemeral: Some(config.ephemeral),
+        // Match TUI so non-interactive runs keep structured apply_patch/exec
+        // events in rollout (e.g. `FileChange::Delete { content }`), which
+        // audit tooling reads via `EventMsg::PatchApplyEnd`.
+        persist_extended_history: true,
         ..ThreadStartParams::default()
     }
 }

Plus a regression unit test in codex-rs/exec/src/lib_tests.rs that pins thread_start_params_from_config(&config).persist_extended_history == true, following the same ConfigBuilder pattern used by the existing thread_start_params_include_review_policy_* tests.

Side note: #17568's implementation proposal for codex exec fork already specifies persist_extended_history: true for its fork params — this PR would put start/resume on the same footing as that planned fork path, rather than leaving the three branches inconsistent.

Why this is safe

  • persist_extended_history only controls the on-disk rollout filter (EventPersistenceMode, see codex-rs/rollout/src/policy.rs and codex-rs/core/src/session/session.rs). It does not alter what events are emitted on exec's event stream or what the model sees.
  • The additional variants preserved under Extended (ExecCommandEnd, McpToolCallEnd, PatchApplyEnd, GuardianAssessment, Error, WebSearchEnd, ViewImageToolCall, Collab*End, DynamicToolCallRequest/Response) are already being written by TUI sessions today; rollout readers on the resume/fork/thread-export paths already handle them.
  • Rollout files grow only with tool-call volume, not per-token overhead. Exec runs are typically short and bounded.

Alternatives considered

  1. Reclassify EventMsg::PatchApplyEnd as Limited in rollout/src/policy.rs. Smaller persistence footprint, but it picks favorites among Extended-only events and leaves sibling "exec rollout is lossy" reports for other events (ExecCommandEnd etc.) unresolved. If the long-term desire is to tighten Limited for specific events, that's a separate architectural decision.
  2. Enrich ResponseItem::CustomToolCall at rollout write time with a resolved changes structure. Fixes Limited-mode consumers directly but enlarges the rollout schema, introduces a filesystem read on the rollout write path, creates two sources of truth for the same data, and risks divergence if the file has already been removed by the time the write happens. Disproportionate for this bug.
  3. Change the *** Begin Patch DSL to require full content on *** Delete File:. Biggest prompt-contract and back-compat change; rejected.

Option 1 via the exec-side flag (as proposed) is the smallest change that matches what exec consumers typically want: lossless rollouts for resume/fork/export, same as TUI.

Verification

Reference branch (on my fork, ready for review if a PR invitation is extended): https://github.com/moqimoqidea/codex/tree/fix/codex-exec-persist-extended-history

  • cargo test -p codex-exec: 40 lib + 1 doc + 61 integration tests all pass, including the new regression test and the existing apply_patch / resume / ephemeral suites.
  • cargo clippy -p codex-exec --all-targets --no-deps: clean.
  • Manual repro on the branch: the same prompt that previously yielded only *** Delete File: ... in rollout now also emits event_msg/patch_apply_end with changes[<path>].content populated.

Impact

This directly affects anyone running audit / observability / compliance tooling on top of codex rollout files under exec — per-run statistics like "lines deleted" or "content removed by model" are currently underreported for exec but not for TUI, producing inconsistent numbers for the same underlying operation.

Related

Happy to share additional repro details, a diff, or more data from our downstream audit tooling if useful. If the team is open to the direction, I'd be glad to prepare an invited PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingexecIssues related to the `codex exec` subcommand

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions